2,729 Matching Annotations
  1. Nov 2023
    1. Author Response

      Reviewer #3 (Public Review):

      Comment 1: I'm having some difficulty understanding the logic of Figure 5 in determining cis processing. It is an inverse of figure 4, and in my view, provides further evidence of trans processing. A better experiment would be to use WT-citrine tagged protein with catalytic dead mcherry and image them together. This would show WT cis processing occurs faster than trans processing as citrine specks should appear earlier than the mCherry ones. Can also do colocalization and FRET-based assays with the pair.

      We thank the reviewer for pointing this out. While our data demonstrate that the same molecule must be catalytically active and competent for processing at the IDL (Figure 5), we agree that the data do not rule out trans-processing as a mechanism for speck formation. We have therefore modified the interpretation of these findings accordingly (pp. 7-8). We agree that some of the quantitative assays the reviewer has suggested would strengthen this logic, and we are making efforts to carry out a kinetic FRET-based assay for our upcoming biochemistry-focused manuscript to better characterize the enzymatic affinity of Casp11 for cis- vs. trans- based autoprocessing, and how either impacts Casp11 speck assembly.

      Comment 2: Do those casp11 specks still contain CARDs?- i.e. is the second cleavage necessary for speck formation? Is CARD necessary at all? Would adding the TEV site at CDL and b/w p20 and p10 rescue? i.e. trans-activate?

      We are grateful to the reviewer for these insightful questions, which we also had considered. We addressed this question in two ways – first by replacing the CARD with a DmrB dimerizable domain that undergoes inducible dimerization of Casp11 in the presence of the dimerizing drug AP20187. Critically, inducible dimerization of DmrB-ΔCARD-Casp11-mCherry significantly enhances Casp11-mCherry speck formation, and this speck formation requires catalytic activity, even in the presence of dimerizer (Figure 6A-C). Moreover, we generated CARD-less Casp11-mCherry constructs containing wild-type p20-p10 and catalytically inactive p20-p10. Intriguingly, the CARD was dispensable for spontaneous Casp11-mCherry speck formation, which again was dependent on catalytic activity (Figure 6-figure supplement 2A-B). While we do not currently have data with a TEV-cleavable CDL construct, our data here demonstrate that the CARD is dispensable for speck formation in an overexpression system, implying that the p20/p10 contains all the information that is necessary and sufficient to mediate spontaneous assembly of Casp11 specks in HEK293T cells. Nonetheless, as forced dimerization enhances speck formation (Figure, we hypothesize that CARD-LPS interactions act to facilitate catalytic activity and push cooperative assembly of the Casp11 speck.

      To address whether both the N-terminal CARD and C-terminal p10 domains are present in Casp11 specks, we performed a dual-fluorophore co-localization assay in which we transiently expressed C-terminal mCherry-tagged Casp11 constructs (Casp11-mCherry) in HEK293T cells that stably express N-terminal Flag-tagged Casp11 (2xFLAG-Casp11). As expected, Casp11-mCherry formed specks spontaneously in this setting (Figure 3-figure supplement 1). Critically, both the N-terminal FLAG and C-terminal mCherry were found together in these specks, indicating the presence of both Casp11 N- and C- termini within the specks. Moreover, the wild-type Casp11-mCherry also recruited catalytically inactive 2xFLAG-Casp11C254A, again supporting the finding that wild-type Casp11 can recruit a catalytic mutant to noncanonical inflammasome complexes.

      Comment 3: What are the equations that fit experimental data points and R2 for? E.g. Figure 1E. What are the parameters being fitted/compared and how are those interpreted? A table of fitted values and proper interpretation should be provided.

      We thank the reviewer for this request to clarify how the curves were fit to the experimental data points. We have modified our ‘Statistical Analysis’ section and all figure legends that contain dose-response curves to reflect the equations used to fit each curve. Additionally, please find a table of raw values in the corresponding source data provided for each dose-response curve (Figure 2 Source Data 5; Figure 4 Source Data 3, 6; Figure 5 Source Data 3, 4; Figure 7 Source Data 2; and Figure 4-figure supplement 1 Source Data 1).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines different signaling networks and attempts to give general results for when the network will exhibit biphasic behavior, which is the situation when the output of the network is a non-monotonic function of its inputs. The strength of the paper is in the approach it takes. It starts with the simplest network motifs that produce biphasic behavior and then asks too what happens when these motifs are parts of larger networks. Their approach is in contrast to the usual way in which this question is tackled, which tends to be within the confines of a specific signaling network, where general results like the ones that the authors are after, might be hard to spot.

      We thank the reviewer for the careful reading of the manuscript and for the comments and appreciate the fact the reviewer regards the approach as the strength of the paper.

      The weakness of the paper, in my opinion, is the rather formal description of the results which I am afraid will be of rather limited utility to experimental groups seeking to make use of them. The paper attempts to provide general rules for when to expect biphasic behavior and it was hard to assess to what extent such rules exist as behaviors can change depending on the context of a larger network in which the smaller biphasic one is embedded. The other thing that made assessing the generality of the results difficult is that the input-output functions shown in all the figures are computed for a specific choice of parameters and I was left wondering how different choices of parameters might change the reported behaviors. The lack of specific proposals for how their results should guide future experiments on different signaling networks is another weakness.

      We address these points in a number of ways. Initially our presentation was intended to highlight unambiguously which systems (especially the substrate modification building blocks) were capable of biphasic response and which were not, and highlighting parameter dependence on intrinsic kinetic parameters. Based on both referee comments, we make a number of changes

      (a) We highlight the rationale for choosing the suite of biochemical substrate modification systems: enzyme/substrate sharing is a key driver for the origins of biphasic responses and the suite of systems we employ allows us to systematically explore this (see Response to Essential Revisions). These are building blocks of many pathways,

      (b) Biphasic responses emerge from a built in competing effect. In every instance of substrate modification systems, we now highlight the mechanistic underpinning which gives rise to the competing effect responsible for the biphasic response. This will help experimentalists and modellers alike obtain insights into how such behaviour may arise, and the associated ingredients which facilitate that (which may be relevant in other systems). Similarly, we highlight how altered behaviour at the network level may arise from a biphasic interaction pattern, providing the intuition therein and guide further experimental investigation (also see Response to Essential Revisions).

      (c) With regard to parameters (also see Response to Essential Comments) firstly we emphasize that we completely characterize at the substrate modification level, whether biphasic responses are possible as a function of intrinsic kinetic constants. This is done for every system studied. In Fig 2, we depict this, along with sample biphasic dose responses, for pictorial depiction. However, the essential point is that the parametric dependence on intrinsic kinetic parameters is completely done. We indicate in which cases biphasic responses are impossible irrespective of intrinsic kinetic parameters, where they can be obtained for every value of the intrinsic kinetic parameters, and where there are partial restrictions in the intrinsic kinetic parameter space for obtaining this. In the revision we have performed further parametric analysis to assess the impact of species total amount providing further insights. We have also shown that in all these systems biphasic responses can be obtained in ranges of kinetic parameters similar to those found experimentally (eg Wistel et al 2018) and for reasonable species total amounts in systems and synthetic biology. This is analyzed, and depicted in Figure 2-figure supplement 3 and Figure 2-figure supplement 4.

      (d) Also, in response to another comment (about behaviour changing in networks): we first emphasize that we start at the substrate modification level to uncover drivers of biphasic responses at this level. Biphasic responses arise from an inbuilt competing effect and we demonstrate different ways in which such an inbuilt competing effect arises, through sharing of enzymes or substrates. While it is true that the behaviour can change as part of a network (a) It still remains that there are these in-built competing effects which can generate biphasic responses (both substrate and enzyme) and this can manifest at a pathway or network level under suitable conditions (b) the fact that behaviour at a network level may be altered is exactly why we consider studies at the network level showing both biphasic patterns in interaction (the overall behaviour is determined by the motif and the biphasic pattern of interaction and studies involving interaction of biphasic responses at both the network and substrate modification level!! (subsection: The network level)

      (e) We have also expanded on a paragraph on testable predictions in the conclusions (p10).

      Taken together, we believe that these results should interest both experimentalists and modellers and have intrinsic value as well.

      While I appreciate that the authors adopted a style of presenting their results such that all the mathematics is buried in the figures, I found that it made reading the paper quite difficult, and contributed to my confusion about which results are general and insensitive to parameter choices and which are not. I believe a narrative that integrated the math with some simple intuition might have been more effective. For example, when the authors say in the text that model M0 is incapable of displaying biphasic response, how general is that result? Later on, when discussing model M2, they provide a criterion for biphasic response in terms of products of rate constants satisfying an inequality, but the meaning of this condition is not described. Such things make it hard to learn from the authors' work.

      This has indeed been incorporated, and we agree that presenting the intuition and mechanistic underpinning for the behaviour aids readability. In addition to the points about parameters which are now explained at length in the paper , there are a number of paragraphs providing the mechanistic underpinning and intuition for why the behaviour is obtained. Both these are discussed at length in Response to Essential Revisions. Thus, both the mechanistic intuition and the role of parameters are addressed in detail in the revision.

      When M0 is mentioned to be incapable of yielding biphasic responses we mean just that: irrespective of any parameter choice in the model. The meaning of the criterion in Model M2 is now discussed. We take the point about not being able to learn from the work seriously and have made various changes both on the intuition and clarifying the impact of parameters.

      The text is sprinkled with statements like "this reveals the plurality of information processing behaviors..." where the meaning is quite opaque (for this example, there is no description of "information processing" and what it might mean in this context) and therefore it makes it hard to understand what are the lessons learned from these calculations. Another example is found in the description of Erk regulation where the authors speak of "significant robustness" but what is meant by "significant" is also unclear.

      Yes, we agree that these phrases are distracting and not adding much and so we have removed them.

      Overall, I think this is an interesting attempt to provide a general mathematical framework for analyzing biphasic response of signaling networks, but the authors fall short for the reasons described above. I think a lot can be fixed by improving the way the results are presented.

      We have indeed taken these comments on board and aimed to improve the presentation

      Reviewer #2 (Public Review):

      Biphasic responses are widely observed in biological systems and the determination of general design principles underlying biphasic responses is an important problem. The authors attempt to study this problem using a range of biochemical signaling models ranging from simple enzymatic modification and de-modification of a single substrate to systems with multiple enzymes and substrates. The authors used analytical and computational calculations to determine conditions such as network topology, range of concentrations, and rate parameters that could give rise to biphasic responses. I think the approach and the result of their investigation are interesting and can be potentially useful. However, the conditions for biphasic responses are described in terms of parameter ranges or relationships in particular biochemical models, and these parameters have not been connected to the values of concentrations or rates in real biological systems. This makes it difficult to evaluate how these findings would be applicable in nature or in experiments. It might also help if some general mechanisms in terms of competition/cooperation of time scales/processes are gleaned which potentially can be used to analyze biphasic responses in real biological systems.

      We thank the reviewer for a careful reading of the manuscript and for the various comments and are happy to see the reviewer find the approach interesting. We address these comments in more detail below.

      Reading these comments, we recognized how various analysis and algebraic equations could appear opaque to a reader both in terms of what it conveys and its import. To address this, we made a number of changes.

      1. First and foremost, we provide the mechanistic underpinning and intuition for why a competing effect emerges in the first place. We do this for every substrate modification system we analyze and make further comments in the subsection focussing on the network level as well as ERK This intuition should help a reader where the result is coming from and be then able to see if it might apply in a quite different system. This is discussed in detail in Response to Essential Revisions.

      2. Secondly, we have discussed many aspects of the parameters in more detail. Our goal, especially in substrate modification systems was to be able to completely characterize the role of intrinsic kinetic parameters: whether biphasic responses was impossible irrespective of parameters, whether they were possible for every value of intrinsic kinetic parameters or whether they were possible in a subset of kinetic parameter space. This has been done for every substrate modification system, and has been discussed more explicitly in the revision. Furthermore, when biphasic responses were possible, we aimed to determine the impact of species total amounts which facilitated the response. Here we performed additional analytical and semi-analytical work. Additionally with the semi-analytical work and parameters chosen in ranges very similar to those found experimentally (eg Wistel et al 2018), we are able to show that biphasic responses can indeed be obtained in experimentally feasible ranges. Further aspects of the parameters are discussed in detail in the Response to Essential Revisions. In particular, a number of new paragraphs (p2-3, p6) and plots Figure 2-figure supplement 3 and Figure 2-figure supplement 4 specifically deal with this.

      Taken together these address the reviewers points.

    1. Author Response

      Reviewer #1 (Public Review):

      This interesting manuscript sets out to develop for the mouse a series of important concepts and models that this group has previously developed for models of monkey brains, where they showed that in a large-scale model, anterior → posterior spatial gradients such as spine density (and thus inferred strength of local coupling) lead to a transition from transient stimulus responses to persistent responses, capable of supporting working memory (WM). No such spine density gradient is found in the mouse. Here, the authors propose and use modeling to explore the idea, that the corresponding gradient may be that of density of inhibitory PV cells in different regions of the brain.

      The goal of the study - a large-scale, anatomically-constrained model of WM - is an extremely valuable one, and the authors' efforts in this direction should be supported. That said, some of the main claims in the manuscript were not, at least as currently written, clearly supported by the data, a number of important clarifications need to be made, and some claims of novelty are made in a way that, for a typical reader, may obscure the actual contribution being made.

      The biggest issue is that one of the main claims, that together with cell-type specific long-range targeting, "density of cell classes define working memory representations" (abstract), is not terribly clear. For example, Figs. 2D and 2E show that a brain region's hierarchical location tightly predicts its persistent firing rate (2D), but that PV cell fraction has a far weaker correlation (2E). Is hierarchical location sufficient? If PV cell fraction were constant across model brain regions, would we still get persistent activity modes? It seems likely that the answer may be "yes", but the answer, easily within reach of the authors, is surprisingly not in the current version of the manuscript. Figure 3D, for the thalamocortical model, shows no significant correlation of firing rate with PV density.

      Given the claim about PV density (in the abstract and the first main point of the discussion), this is a big concern. Yet it seems easily addressable: e.g. if indeed the authors found that hierarchy was sufficient and PV density immaterial, the model would be no less interesting. And if the authors demonstrated clearly that a PV density gradient is required, that would make the claim a solid one. If, within the model, such a causal demonstration is present, this reader at least missed it.

      MAJOR CONCERNS:

      (1) The model appears to be a model of a single side of the brain. Perhaps each brain region in the model could be considered an amalgam of that region across both sides of the brain. Yet given results like Li et al. Nature 2016, who show that persistent activity is robust to inhibition of one side, but not both sides of ALM, at the very least discussion of the issue is warranted.

      The model is indeed a one-hemisphere model, and an expansion to a bihemispheric model is considered for future work. We have added the following sentence in the Discussion section:

      “Future versions of the large-scale model may consider different interneuron types to understand their contributions to activity patterns in the cortex (Kim et al,2017; Meng et al., 2023; Tremblay and Rudy, 2016; Nigro et al., 2022), the role of interhemispheric projections in providing robustness for short-term memory encoding (Ni et al., 2016), and the inclusions of populations with tuning to various stimulus features and/or task parameters that would allow for switching across tasks (Yang et al, 2018).”

      (2) The authors make an interesting attempt to distinguish core WM regions from other regions such as "readout" regions, defined as showing persistent activity yet not having an effect on persistent activity elsewhere in the network.

      However, this definition seemed problematic: for example, consider a network that consists of 20 brain regions, all interconnected to each other, and all equivalent to each other, capable of displaying persistent activity thanks to mutual connectivity. Imagine that inhibition of any one of these regions is not sufficient to significantly perturb persistent activity in the other 19. Then they would all be labeled as "readout". Yet, by construction in this thought experiment, they are all equivalent to each other and are all core areas. Such redundancy may well be present in the brain. How would the authors address this redundancy issue?

      We acknowledge the importance of this thought experiment. Although we initially restricted the definition of core area to how a single area contributes to working memory, we proceeded with concurrent inhibition of multiple readout areas (see Essential Revisions response 6 above).

      (3) Also important to discuss would be the fact that every brain region in this model is set up as composed of two populations, and when long-range interactions are strong and the attractors strongly coupled, the entire brain is set up as a 1-bit working memory. How would results and the approach be impacted by considering WM for more flexible situations?

      We have used a model of two populations as the simplest way to integrate large-scale connectivity and inhibitory gradients. Indeed, future work should consider more realistic connectivity and populations with various degrees of tuning to different task parameters. (see Reviewer 1 response 1 above)

      (4) Another concern that is important yet easily addressed is the authors' use of the term "novel cell-type specific graph theory measures". Describing in the abstract and elsewhere the fact that what they mean is to take into account the sign of connections, not just their magnitude, would transmit to readers the essence of the contribution in a manner very simple to understand. Most readers would fail to grasp the essential point of the current labeling, which sounds potentially very vague and complex.

      We have reworded the abstract - see also Essential reviews response 2 above.

      (5) Finally, the overall significance of the study, and advances over previous work, were not entirely clear. In the discussion, the authors identify three major findings: (1) WM function is shaped by the PV cell density gradient. But as above, further work is required to make it clear that this claim is supported by the model. (2) if local recurrent excitation is insufficient to generate persistent activity, then long-range recurrent excitation is needed to generate it. I had trouble understanding why a model was needed to reach this conclusion - it seems as if it is simply a question of straightforward logic. The discussion states that in this regard, the work here "offers specific predictions to be tested experimentally", but I had trouble identifying what these specific predictions are. (3) Taking into account sign, not only magnitude, of connections, is important. This last point once again seemed a matter of straightforward logic, making its novelty difficult to assess.

      We thank the reviewer, we have addressed these issues in the Essential Revisions 3) above.

      Reviewer #2 (Public Review):

      This paper uses the mouse mesoscale connectome, combined with data on the number and fraction of PV-type interneurons, to build a large-scale model of working memory activity in response to inputs from various sensory modalities. The key claims of the paper are two-fold. First, previous work has shown that there does not appear to be an increase in the number of excitatory inputs (spines) per pyramidal neuron along the cortical hierarchy (and this increase was previously suggested to underlie working memory activity occurring preferentially in higher areas along the cortical hierarchy). Thus, the claim is that a key alternative mechanism in the mouse is the heterogeneity in the fraction of PV interneurons. Second, the authors claim to develop novel cell type-specific graph theory.

      I liked seeing the authors put all of the mouse connectomic information into a model to see how it behaved and expect that this will be useful to the community at large as a starting point for other researchers wishing to use and build upon such large-scale models. However, I have significant concerns about both primary scientific claims. With regard to the PV fraction, this does not look like a particularly robust result. First, it's a fairly weak result to start, much smaller than the simple effect of the location of an area along the cortical hierarchy (compare Figs. 2D, 2E; 3C, 3D). Second, the result seems to be heavily dependent upon having subdivided the somatosensory cortex into many separate points and focusing the main figures of the paper (and the only ones showing rates as a function of PV cell fraction) solely on simulations in which the sensory input is provided to the visual cortex. With regards to the claim of novel cell type-specific graph theory, there doesn't appear to be anything particularly novel. The authors simply make sure to assign negative rather than positive weights to inhibitory connections in their graph-theoretic analyses.

      Major issues:

      1) Weakness of result on effect of PV cell fraction. Comparing Figures 2D and 2E, or 3C and 3D, there is a very clear effect of cortical hierarchy on firing rate during the delay period in Figures 2D and 3C. However, in Figure 2E relating delay period firing rate to PV cell fraction, the result looks far weaker. (And similarly for Figs. 3C, 3D, with the latter result not even significant). Moreover, the PV cell fraction results are dominated by the zero firing rate brain regions (as opposed to being a nice graded set of rates, both for zeros and non-zeros, as with the cortical hierarchy results of Figures 2D), and these zeros are particularly contributed to by subdividing somatosensory (SS) into many subregions, thus contributing many points at the lower right of the graph.

      Further, it should be noted that Figure 2E is for visual inputs. In the supplementary Figure 2 - supplement 1, the authors do apply sensory inputs to auditory and somatosensory cortex...but then only show the result that the delay period firing rate increases along the cortical hierarchy (as in Figure 2D for the visual input), but strikingly omit the plots of firing rate versus PV cell fraction. This omission suggests that the result is even weaker for inputs to other sensory modalities, and thus difficult to justify as a defining principle.

      We have now made an effort to exhaustively compare the contributions of PV versus hierarchy in defining the firing rate activity patterns in the model - see Essential Revisions response 1 above. Moreover, we included plots of firing rate versus PV cell fraction for other sensory modalities, and the results would still support a common architecture for short-term memory maintenance.

      2) Graph theoretic analyses. The main comparison made is between graph-theoretic quantities when the quantities account for or do not account for, PV cells contributing negative connection strengths. This did not seem particularly novel.

      See Essential Revisions response 2 above

      3) It was not clear to me how much the cell-type specific loop strength results were a result of having inhibitory cell types, versus were a result of the assumption ('counter-stream inhibitory bias') that there is a different ratio of excitation to inhibition in top-down versus bottom-up connections. It seems like the main results were more a function of this assumed asymmetry in top-down vs. bottom-up than it was a function of just using cell-type per se. That is, if one ignored inhibitory neurons but put in the top-down vs. bottom-up asymmetry, would one get the same basic results? And, likewise, if one didn't assume asymmetry in the excitatory vs. inhibitory connectivity in top-down versus bottom-up connections, but kept the Pyramidal and PV cell fraction data, would the basic result go away?

      We have addressed the issue of cell-type specific loop strength in Essential Revisions response 2 above.

      4) In the Discussion, there is a third 'main finding' claimed: "when local recurrent excitation is not sufficient to sustain persistent activity...distributed working memory must emerge from long-range interactions between parcellated areas". Isn't this essentially true by definition?

      We have addressed this important issue in Essential Revisions response 3 above.

      5) I don't know if it's even "CIB" that's important or just "any asymmetry (excitatory or inhibitory) between top-down vs. bottom-up directions along the hierarchy". This is worth clarifying and thinking more about, as assigning this to inhibition may be over-attributing a more basic need for asymmetry to a particular mechanism.

      We found that this asymmetry is indeed crucial, which may be provided by CIB or, in some regimes, it is sufficient that a PV gradient is present - see Essential Revisions response 1 above.

      Other questions:

      1) Is it really true that less than 2% of neurons are PV neurons for some areas? Are there higher fractions of other inhibitory interneuron types for these areas, and does this provide a confound for interpreting model results that don't include these other types?

      Maybe related to the above, the authors write in the Results that local excitation in the model is proportional to PV interneuron density. However, in the methods, it looks like there are two terms: a constant inhibition term and a term proportional to density. Maybe this former term was used to account for other cell types. Also, is local excitation in the model likewise proportional to pyramidal interneuron density (and, if not, why not?)?

      The reviewer is correct in pointing out that the ‘constant inhibition term’, which we interpret as a minimal inhibition, accounts for other cell types. We have added the respective explanation in the Methods section. Future versions of the model may include different interneuron types - see Reviewer 1 Response 1 above.

      2) Non-essential areas. The categorization of areas as 'non-essential' as opposed to, e.g. "inputs" is confusing. It seems like the main point is that, since the delay period activity as a whole is bistable, certain areas' contributions may be small enough that, alone, they can't flip the network between its bistable down and up states. However, this does not mean that such areas (such as the purple 'non-essential' area in Figure 5a) are 'non-essential' in the more common sense of the word. Rather, it seems that the purple area is just a 'weaker input' area, and it's confusing to thus label it as 'non-essential' (especially since I'd guess that, whether or not an area flips on/off the bistability may also depend on the assumed strength of the external input signal, i.e. if one made the labeled 'input area' a bit too weak to alone trigger the bistability, then the purple area might become 'essential' to cross the threshold for triggering a bistable-up state).

      This is an important point, and a similar point was also raised by Reviewer 1. For simplicity, we have restricted the definition of the function of an area (e.g., input, vs core vs non essential) to how a single area contributes to working memory. The existence of ‘subnetworks’ for any of these functions is indeed plausible - and potentially important, but we have left this for future modeling work. (see Essential Revisions response 6 above). The point that distinguishes ‘input’ and ‘non-essential’ areas is simply whether inhibiting said area during the stimulus period affects stimulus-specific persistent activity. Surely some of the areas that we have classified as ‘non-essential’ have important roles, even for the contents of working memory, however they are not essential to produce the activity pattern we observe here.

      3) Relation between 'core areas' and loop strength. The measure underlying 'prediction accuracy = 0.93' in Figure 6D and the associated results seems incomplete by being unidirectional. It captures the direction: 'given high cell-type specific loop strength, then core area' but it does not capture the other direction: 'given a cell is part of a core area, is its predicted cell-type specific loop strength strong?'. It would be good to report statistics for both directions of association between loop strength and core area.

      Indeed the prediction accuracy refers to the direction loop strength->core area, for which we estimate how well a continuous variable (loop strength) predicts a binary variable (whether core area or not). A prediction in the reverse direction is not well defined, namely to predict a continuous variable from a binary variable, so the reverse association may be only indirectly inferred from Figure 8D.

      4) More justification would be useful on the assumption that the reticular nucleus provides tonic inhibition across the entire thalamus.

      Relatively little is known about how specific this inhibition may be. We have included references in the Discussion section that speak to this fact. (Crabtree 2018, Hardinger et al., 2023).

      5) Is NMDA/AMPA ratio constant across areas and is this another difference between mice and monkeys? I am aware of early work in the mouse (Myme et al., J. Neurophys., 2003) suggesting no changes at least in comparing two brain regions' layer 2/3, but has more work been performed related to this?

      Recent anatomical in-vitro autoradiography work in the macaque shows that NMDA/AMPA ratio (in terms of receptor density) varies across the cortical hierarchy (Klatzmann et al., 2022). Functionally NMDA receptors seem important in PFC L2/3 for persistent activity, while in V1, they contribute relatively little to the stimulus response, which is dominated by AMPA-mediated excitation. This was shown by a recent physiological study in the macaque (Yang et al., 2018). This could indeed point to a species difference, although like-for-like comparisons of equivalent experiments across species are lacking in the literature.. We have included this and other related references in our Discussion - see Essential Revision 4 above.

      6) Are bilateral connections between the left and right sides of a given area omitted and could those be important?

      These potentially important connections were omitted for simplicity in the model, please see Reviewer 1 Responses 1, 3 above.

      Reviewer #3 (Public Review):

      Combining dynamical modelling and recent findings of mouse brain anatomy, Ding et al. developed a cell-type-specific connectome-based dynamical model of the mouse brain underlying working memory. The authors find that there is a gradient across the cortex in terms of whether mnemonic information can be sustained persistently or only transiently, and this gradient is negatively correlated to the local density of parvalbumin (PV) positive inhibitory cells but positively correlated with mesoscale-defined cortical hierarchy. In addition, weighing connectivity strength by PV density at target areas provides a more faithful relationship between input strength and delay firing rate. The authors also investigate a model where cortical persistent activity can only be sustained with thalamus input intact, although this result is rather separate from the rest of the study. The authors then use this model to test the causal contributions of different areas to working memory. Although some of the in silico perturbations are consistent with existing experimental data, others are rather surprising and need to be further discussed. Finally, the authors investigate patterns of attractor states as a result of different local and long-range connections and suggest that distinct attractor states could underlie different task demands.

      The importance of PV density as a predictor for working memory activity patterns in the mouse brain is in contrast to recent computational findings in the primate brain where the number of spines (excitatory synapses per pyramidal cell) is the key predictor. This finding reveals important species differences and provides complementary mechanisms that can shape distributed patterns of working memory representation across cortical regions. The method of biologically-based near-whole-brain dynamical modeling of a cognitive function is compelling, and the main conclusions are mostly well supported by evidence. However, some aspects of the method, result, and discussion need to be clarified and extended.

      1) Based on existing anatomical data, the authors reveal a negative correlation between cortical hierarchy (defined by mesoscale connectivity; this concept needs to be explicitly defined in the Results session, not just in the Method section) and local PV density (Fig. 1). In the dynamical model, the authors find that working memory activity is positively (and strongly) correlated with cortical hierarchy and negatively (and less strongly) correlated with PV cell density (Fig. 2), and conclude that working memory activity depends on both. But could the negative correlation between activity and PV density simply result from the inherent relationship between hierarchy and PV density across regions? To strengthen this result, the authors should quantify the predictive power of local PV density on working memory activity beyond the predictive power of cortical hierarchy.

      We have systematically compared the relationship between PV and hierarchy in generating delay-patterns of activity - see Essential Revisions response 1 above.

      2) In Fig. 4, the authors find that cell-type-specific graph measures more accurately predict delay-period firing rates. Specifically, the authors weigh connections with a cell-type-projection coefficient, which is smaller when the PV cell fraction is higher in the target area. Considering that local PV cell fraction is already correlated with delay activity patterns, weighing the input with the same feature will naturally result in a better input-output relationship. This result will be strengthened if there is a more independent measure of cell-type-projection coefficient, such as the spine density of PV vs excitatory cells across regions, or even the percentage of inhibitory versus excitatory cells targeted by upstream region (even just for an example set of brain regions).

      We have compared different measures of cell-type projection coefficients and how they predict delay-patterns of activity and whether an area is a core area - see Essential Revisions response 2 above.

      3) The authors aim to identify a core subnetwork that generates persistent activity across the cortex by characterising delay activity as well as the effects of perturbations during the stimulus and delay period. Consistent with existing data, the model identifies frontal areas and medial orbital areas as core areas. Surprisingly, areas such as the gustatory area are also part of the core areas. These more nuanced predictions from the model should be further discussed. Also surprisingly, the secondary motor cortex (MOs), which has been indicated as a core area for short-term memory and motor planning by many existing studies is classified as a readout area. The authors explain this potential discrepancy as a difference in task demand. The task used in this study is a visual delayed response task, and the task(s) used to support the role of MOs in short-term memory is usually a whisker-based delayed response task or an auditory delay response task. In all these tasks, activity in the delay period is likely a mixture of sensory memory, decision, and motor preparation signals. Therefore, task demand is unlikely the reason for this discrepancy. On the other hand, motor effectors (saccade, lick, reach, orient) could be a potential reason why some areas are recruited as part of the core working-memory network in one task and not in another task. The authors should further discuss both of these points.

      We have addressed this important point in Essential Revisions response 5 above.

      4) As a non-expert in the field, it is rather difficult to grasp the relationship between the results in Fig. 7 and the rest of the paper. Are all the attractor states related to working memory? If so, why are the core regions for different attractor states so different? And are the core regions identified in Fig. 5 based on arbitrary parameters that happen to identify certain areas as core (PL)? The authors should at least further clarify the method used and discuss these results in the context of previous results in this study.

      Attractor states that have a stable baseline are, by definition, related to working memory in that there is a baseline and a memory state associated with the model. Some areas, such as PL are more likely to be associated with different core subnetworks given its position in the hierarchy. In the current version of the manuscript, we provide a motivation for the different attractor states and how they may relate to cognitive function.

    1. Author Response

      Reviewer #1

      While the article clearly outlines the strengths of the chosen approach, it lacks an equally clear exposition of its limitations and a more thorough comparison to established approaches. Two examples of limitations that should be stated more clearly, in my opinion: models need to be small enough to fit on a single machine (in contrast to e.g. NEURON and NEST which support distributed computation via MPI), and only single-compartment models are supported; both limitations are mentioned in passing in the discussion, but would merit a more upfront mention.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems.

      In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      The study does not verify the accuracy of the presented framework. While its basic approach (time-step-based simulation, standard numerical integration algorithms) is sufficiently similar to other software to not expect major discrepancies, an explicit comparison would remove any doubt. Quantitative measures of accuracies are particularly important in the context of benchmarks (see below), since simulations can be made arbitrarily fast by sacrificing performance.

      We agree that an explicit comparison would help alleviate any doubts and provide a more comprehensive understanding of our framework's accuracy. We have revised our manuscript to include a dedicated section, particularly Appendix 11. In this section, we verified that all simulators generated consistent average firing rates for the given benchmark network models (figure 1 and figure 2 in Appendix 11). These verifications were performed under different network sizes (ranging from 4e^3 to 4e^5) and different computing platforms (CPU, GPU and TPU). We also qualitatively compared the overall network activity patterns produced by each simulator to ensure they exhibited the same dynamics (figure 3 and figure 4 in Appendix 11). While exact spike-to-spike reproducibility was not guaranteed between different simulator implementations, we confirmed that our simulator produced activity consistent with the reference simulators for both firing rates and network-level dynamics. Additionally, BrainPy did not sacrifice simulation accuracy for speed performance. Despite using single precision floating point, BrainPy was able to produce consistent firing rates and raster diagrams across all simulations (see figure 3 and figure 4 in Appendix 11).

      We hope these revisions can ensure that our manuscript provides a clear and robust validation of the accuracy of our simulator.

      Benchmarking against other software is obviously important, but also full of potential pitfalls. The current article does not state clearly whether the results are strictly comparable. In particular: are the benchmarks on the different simulators calculating results to the same accuracy (use of single or double precision, same integration algorithm, etc.)? Does each simulator use the fastest possible execution mode (e.g. number of threads/processes for NEST, C++ standalone mode in Brian2, etc.)? What is exactly measured (compilation time, network generation time, simulation execution time, ...) - these components will scale differently with network size and simulation duration, so summing them up makes the results difficult to interpret. Details are also missing for the comparison between the XLA operator customization in C++ vs. Python: was the C++ variant written by the authors or by someone else? Does the NUMBA→XLA mechanism also support GPUs/TPUs? This comparison also seems to be missing from the GitHub repository provided for reproducing the paper results.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the comparison between XLA operator customization in C++ and Python, we utilized our self-implemented C++ version, which is accessible in the Appendix 8 Listing 2. Presently, the NUMBA→XLA mechanism does not support GPUs/TPUs; however, we are working on expanding this capability to other platforms. We have made this clarification in the revised manuscript as well (see L1278 - L1285).

      While the authors convincingly argue for the merits of their Python-based/object-oriented approach, in my opinion, they do not fully acknowledge the advantages of domain-specific languages (NMODL, NestML, equation syntax of ANNarchy and Brian2, ...). In particular, such languages aim at a strong decoupling of the mathematical model description from its implementation and other parts of the model. In contrast, models described with BrainPy's approach often need to refer to such details, e.g. be aware of differences between dense and sparse connectivity schemes, online, or batch mode, etc. It might also be worth mentioning descriptive approaches to synaptic connectivity as supported by other simulators (connection syntax in Brian2, Connection Set Algebra for NEST).

      We have made revisions to better acknowledge the merits of DSLs while providing a more comprehensive comparison. These revisions are incorporated in Discussion (L452 - L466) and Appendix 1 (L778 - L788).

      Reviewer #2

      While the results presented are impressive, publishing further details of the benchmarks in an appendix would be helpful for evaluating the claims and the overall conclusion would be more convincing if the performance benefits were demonstrated on a wider selection of test cases. Unsatisfyingly, the authors gave up on making a direct comparison to Brian running on GPUs with GeNN which would have been a fairer comparison than CPU-based simulations. The code for the chosen benchmarks is also likely to be highly optimised by the authors for running on BrainPy but less so for the other platforms - a fairer test would be to invite the authors of the other simulators to optimise the same models and re-evaluate the benchmarks.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the wider selection of test cases, we understand the importance of demonstrating the performance benefits on a broader range of scenarios. Particularly, we have designed two kinds of benchmark models:

      • Sparse connection models. This category models include COBA-LIF network and COBA-HH network. The former is a standard E/I balanced network for comparing simualtion speed of a brain simulator, while the latter uses the complex computational expensive HH neuron model as the elements. Both models can be effectively to demonstrate the capability of a brain simulator for the sparse and event-driven computation.

      • Dense connection models. The local circuits of a cortical column are usually connected densely (Science 366, 1093). Particularly, we use the decision making network proposed by (Wang, 2002) for evaluations.

      In the revised version, we include extensive experiments on these three test cases under different kinds of computing platforms (including CPU, GPU, and TPU) to strengthen the overall conclusion and provide a more comprehensive evaluation of our approach.

      Regarding the comparison to Brian running on GPUs with GeNN, we apologize for not including that in our initial submission. We have conducted the necessary experiments on all three benchmark models we have used in our evaluations and include these results in the revised version of the paper (see Figure 8). This addition will enhance the credibility of our findings and allow for a more meaningful comparison between different simulation platforms. Furthermore, we have also reached out to the authors of other simulators and invite them to optimize the same models used in our benchmarks. We believe this collaborative approach will ensure a more equitable evaluation of the simulators and provide a more robust and convincing analysis of our work.

      Furthermore, the manuscript reads like an advertisement for the platform with very little discussion of its limitations, weaknesses, or directions for further improvement. A more frank and balanced perspective would strengthen the manuscript and give the reader greater confidence in the platform.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems. In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      Since simulators wax and wane in popularity, it would be reassuring to see a roadmap for development with a proposed release cadence and a sustainable governance policy for the project. This would serve to both clearly indicate the areas of active development where community contributions would be most valuable and also to reassure potential users that the project is unlikely to be abandoned in the near future, ensuring that their time investment in learning to use the framework will not be wasted.

      We appreciate the reviewer raising the point for demonstrating the project's sustainability. In response to this feedback, we have made the following efforts.

      Firstly, we add and maintain a "Development roadmap" section in the BrainPy GitHub homepage (https://github.com/brainpy/BrainPy). This will enable the community to have a clear understanding of the project's direction and the areas of active development. Additionally, the "Future work" section in our revised paper has also outlined a comprehensive roadmap for next stages of the BrainPy development.

      Secondly, to address the concern about the sustainability of our project and the potential risk of abandonment, we have incorporated a ACKNOWLEDGMENTS.md file in the GitHub (https://github.com/brainpy/BrainPy/blob/master/ACKNOWLEDGMENTS.md) to outline our sustainable funding support. These supports demonstrates our commitment to the long-term maintenance and development of the project, thus may help to dispel doubts of users for the project abandonment.

      Similarly, a complex set of dependencies, which need to be modified for BrainPy, will likely make the project hard to maintain and so a similar plan to those given for the CI pipeline and documentation generation for automation of these modifications would be a good addition. It is also important to periodically reflect on whether it still makes sense to combine all the disparate tools into one framework as the codebase grows and starts to strain under modifications required to maintain its unification.

      We appreciate the reviewer's valuable suggestions on the BrainPy framework.

      First, BrainPy is a self-contained package designed specifically for brain dynamics programming. It boasts minimal dependencies, relying only on fundamental packages within the Python scientific computing ecosystem. In essence, BrainPy relies on numpy for array-based computations and utilizes jax and jaxlib for JIT compilation. While we currently utilize numba to customize dedicated operators, we can also remove this dependency by rewriting these operators with C++ code. We incorporate the use of brainpylib, a package developed by ourselves, which provides dedicated operators for CPUs and GPUs in the context of brain dynamics modeling. Additionally, BrainPy leverages mature solutions within the field for certain auxiliary functions. For instance, we integrate the use of tqdm to facilitate the display of a progress bar during model execution, and employ matplotlib for visualization purposes, capitalizing on its well-established capabilities in the scientific community.

      Second, we agree that there is a risk of overly complex dependencies and architectural strains. To mitigate this risk, we have taken the following changes:

      • We prioritize good software engineering practices like loose coupling, high cohesion and modularity in the framework design. This will isolate dependencies and changes to specific components. For example, brainpy.visualize nodule defines abstract visualization functions in which the visualization backend can be changed anytime.

      • We invest in automating aspects of the build, test, and release process to relieve manual maintenance burdens. We heavily use the GitHub actions for testing BrainPy codes and building documentations.

      • We document dependencies clearly and maintain backwards compatibility when possible. New APIs will be clearly stated supported after which BrainPy version, and deprecated APIs will be deprecated over multiple release cycles.

      • We continuously monitor code complexity metrics and refactor/simplify the architecture when needed.

      • When new tools have significantly different requirements, we will consider spinning them off into separate projects rather than forcing them into the core framework.

      Finally, a live demonstration would be a very useful addition to the project. For example, a Jupyter notebook hosted on mybinder.org or similar, and a fully configured Docker image, would each enable potential users to quickly experiment with BrainPy without having to install a stack of dependencies and troubleshoot version conflicts with their pre-existing setup. This would greatly lower the barrier to adoption and help to convince a larger base of modellers of the potential merits of BrainPy, which could be major, both in terms of the computational speed-up and ease of development for a wide range of modelling paradigms.

      We appreciate the reviewer's valuable feedback and suggestion. We have hosted a Jupyter notebook and a fully configured Docker image on mybinder.org (https://mybinder.org/v2/gh/brainpy/BrainPy-binder/main). Users can easily experiment with BrainPy without the need to install multiple dependencies or troubleshoot version conflicts.

      Reviewer #3

      One potential issue is that the scope of the neuro-simulator is not very clearly explained and the target audience is not well defined: is BrainPy primarily intended for computational neuroscientists or for neuro-AI practitioners? The simulator offers very detailed neural models (HH, fractional order models), classical point-models (LIF, AdEx), rate-coded models (reservoirs), but also deep learning layers (Conv, MaxPool, BatchNorm, LSTM). Is there an advantage to using BrainPy rather than PyTorch for purely deep networks? Is it possible to build hybrid models combining rate-coded reservoirs or convnets with a network of HH neurons? Without such a hybrid approach, it is unclear why the deep learning layers are needed.

      We appreciate the reviewer's concern regarding the scope of BrainPy and the need for clarification regarding the target audience.

      BrainPy is designed to cater to both computational neuroscientists and neuro-AI practitioners by integrating detailed neural models, classical point models, rate-coded models, and deep learning models. The platform aims to provide a general-purpose programming framework for modeling brain dynamics, allowing users to explore the dynamics of brain or brain-inspired models that combines insights from biology and machine learning.

      Particularly, brain dynamics models (provided in brainpy.dyn module) and deep learning models (provided in brainpy.dnn module) are closely integrated with each other in BrainPy. First, to build brain dynamics models, users should use the building blocks in brainpy.dnn module to create synaptic projections.

      Second, to build brain-inspired computing models for machine learning, users could also take advantages of neuronal and synaptic dynamics have been provided in brainpy.dyn module.

      To that end, BrainPy provides building blocks of detailed conductance-based models like Hodgkin-Huxley, as well as common deep learning layers like convolutions.

      Regarding the advantage of using BrainPy over PyTorch for purely deep networks, we acknowledge that existing deep learning libraries like Flax in the JAX ecosystem provide extensive tools and examples for constructing traditional deep neural networks. While BrainPy does implement standard deep learning layers, our primary focus is not to compete directly with those libraries. Instead, we provide these models for the seamless integration of deep learning layers within BrainPy's core modeling abstractions, including variables and dynamical systems. This integration allows researchers to incorporate common deep learning layers into their brain models. Additionally, the inclusion of deep learning layers in BrainPy serves as examples for customization and facilitates the development of tailored layers for neuroscience research. Researchers can modify or extend the implementations to suit their specific needs.

      In summary, BrainPy's scope focuses on the general-purpose brain dynamics programming. The target audience includes computational neuroscientists who want to incorporate insights from machine learning, as well as some ML researchers interested in integrating brain-like components.

      In terms of plasticity, only external training procedures are implemented (backpropagation, FORCE, surrogate gradients). No local plasticity mechanism (Hebbian learning for rate-coded networks, STDP and its variants for spiking networks) seems to be implemented, apart from STP. Is it a planned feature? Appendix 8 refers to bp.synplast.STDP(), but it is not present in the current code (https://github.com/brainpy/BrainPy/tree/master/brainpy/_src/dyn/synplast). Spiking networks without STDP are not going to be very useful to computational neuroscientists, so this suggests that the simulator targets primarily neuro-AI, i.e. AI researchers interested in using spiking models in a machine learning approach.

      We appreciate that the reviewer raising the limitations of BrainPy in terms of local plasticity mechanisms. We are sorry for the delay of implementing STDP models in BrainPy. Currently, we provide very general implementations of STDP. It can be compatible with any synaptic model (such as Exponential, Dual Exponential, AMPA, GABA, and NMDA dynamics), and common connection patterns (such as Dense, and Sparse connection patterns).

      bp.dyn.STDP_Song2001(pre, post, delay, syn, comm, out)

      It can also be easily used with the combination of short-term plasticity models. The modular design of BrainPy's framework also make the plasticity component straightforward to be implemented and integrated into existing models.

      A second weakness of the paper concerns the demos and benchmarks used to demonstrate the versatility and performance of BrainPy, which are not sufficiently described. In Fig. 4, it is for example not explained how the reservoirs are trained (only the readout weights, or also the recurrent ones? Using BPTT only makes sense when the recurrent weights are also trained.), nor how many neurons they have, what the final performance is, etc. The comparison with NEURON, NEST, and Brian2 is hard to trust without detailed explanations. Why are different numbers of neurons used for COBA and COBAHH? How long is the simulation in each setting? Which time is measured: the total time including compilation and network creation, or just the simulation time? Are the same numerical methods used for all simulators? It would also be interesting to discuss why the only result involving TPUs (Fig 8c) shows that it is worse than the V100 GPU. What could be the reason? Are there biologically-realistic networks that would benefit from a TPU? As the support for TPUs is a major selling point of BrainPy, it would be important to investigate its usage further.

      We appreciate the reviewer for raising the important question about the demos and benchmarks used to demonstrate the versatility and performance of BrainPy. To address these concerns, we have added more details in the revised paper, including:

      • In Fig. 4, we explain how the reservoirs are trained in Appendix 10, in which only the readout weights are trained, and they are trained using backpropagation, FORCE learning, and ridge regression algorithms, respectively. We also specify the number of neurons in each reservoir (see L1397), and the final performance of the reservoirs on the task (see Figure 4).

      • To enable readers to better interpret the simulator comparisons in Fig. 8, we have also added more detailed explanations of the comparison with NEURON, NEST, and Brian2 in Appendix 11.

      • In the current revised paper, we provide a comprehensive analysis of BrainPy's compatibility with different hardware platforms, including TPUs, and to identify the specific conditions under which TPUs may offer advantages (see Figure 8 and Appendix 11—figure 7 ). We have also discussed the potential benefits of TPUs for biologically-realistic networks (see L514 - L521). Particularly, for the biological network with arbitrary sparsity, TPUs does not show advantage over GPUs (see Appendix 11—figure 7). TPUs are best at exploiting certain kinds of structured sparsity, for example block sparsity.

    1. Author Response

      Reviewer #1 (Public Review):

      Due complicated and often unpredictable idiosyncratic differences, comparing fMRI topography between subjects typically would require extra expensive scan time and extra laborious analyzing steps to examine with specific functional localizer scan runs that contrast fMRI responses of every subject to different stimulus categories. To overcome this challenge, hyperaligning tools have recently been developed (e.g., Guntupalli et al., 2016; Haxby et al., 2011) based on aligning in a high-dimensional space of voxels of subjects' fMRI responses to watching a given movie. In the present study, Jiahui and colleagues propose a significantly improved version of hyperaligning functional brain topography between individuals. This new version, based on fMRI connectivity, works robustly on datasets when subjects watched different movies and were scanned with different parameters/scanners at different MRI centers.

      Robustness is the major strength of this study. Despite the fact that datasets from different subjects watching different movies at different MRI centers with different scan parameters were used, the results of functional brain topography from between-subject hyperalignment based on fMRI connectivity were comparable to the golden standard of within-subject functional localizations, and significantly better than regular surface anatomical alignments. These results also support the claim that the present approach is a useful improvement from previous hyperalignments based on time-locked fMRI voxel responses, which would require normative samples of subjects watching a same movie.

      We thank the reviewer for the appreciation of our work.

      Given the robustness, this new version of hyperalignment would provide much stronger statistical power for group-level comparisons with less costs of time and efforts to collect and analyze data from large sample size according to the current stringent standard, likely being useful to the whole research community of functional neuroimaging. That said, more discussions of the limit of the present hyperalignment approach would be helpful to potential eLife readers. For example, to what extend the present hyperalignment approach would be applicable to individuals with atypical functional brain topography such as brain lesion patients with e.g., acquired prosopagnosia? Even in typical populations, while bilateral fusiform face areas can be identified in the majority through functional localizer scans, the left fusiform face area sometimes cannot be found. Moreover, many top-down factors are known to modulate functional brain topography. Due to these factors, brain responses and functional connectivity may be different even when a same subject watched a same movie twice (e.g., Cui et al., 2021).

      We thank the reviewer for the suggestion and agree that it would be fascinating if the predictions can be made with high fidelity in neuropsychological populations. Although we are optimistic that our algorithm is able to generalize across diverse populations, to date, no previous literature has provided empirical evidence to illustrate the effectiveness, including optimizations and special applications beyond typical brains. Besides the neuropsychological population, it would also be valuable to study the generalization across a broad age range, for example, from infants to the elderly. The brain changes across age both anatomically and functionally, so it is a challenge to predict functional topographies based on a normative group that only includes young participants. With all these potential applications in mind, future research is needed to illustrate the efficacy, build the pipeline, and construct the representative normative groups to meet the requirements of accurate individualized predictions in diverse populations.

      In typical populations, although participants have great individual variabilities in their functional topographies, for instance, some participants have distinguishable patches of activations in their left ventral temporal cortex while some participants don’t, our algorithms successfully captured these individualized differences in the prediction. The figure below shows, as an example, the face-selective topographies of two individuals that have markedly different face-selective topographies on the left ventral temporal cortex. The left participant has prominent face-selective areas on the left ventral temporal cortex that are in similar sizes as the right side, while the right participant only has a few scattered small face-selective spots on the left side. No matter what their face-selective areas look like, our algorithm accurately recovered the individualized locations, shapes, and sizes, retaining the individual variability in the functional topographies.

      Functional connectivity profiles based on naturalistic stimuli are very stable across the cortex, even when participants watch different movies. In Figure 4-figure supplement 9, the mean correlations of fine-scaled connectome for most searchlights (r = 15mm) when participants watched The Grand Budapest Hotel and the Raiders of the Lost Ark were generally around 0.8. The mean correlations were about 0.9 between the first and second half of the same movie although the stimuli contents were different between the two halves. Thus, the fine-grained functional connectivity profiles remain highly stable and reliable across movie contents, which contributes to the robustness of cross-movie, time, and other parameters (e.g., scanner models, scanning parameter) predictions using our algorithms.

      We added a paragraph in the discuss section to address the concerns (page 18-19):

      “This study successfully illustrated that accurate individualized predictions are both robust and applicable across a variety of conditions, including movie types, languages, scanning parameters, and scanner models. Importantly, the intricate connectivity profiles remain consistent even when participants view entirely different movies, as evidenced by Figure 4-figure supplement 9, reinforcing the prediction's stability in various scenarios. However, all four datasets in this study only included typical participants with anatomically intact brains. An unanswered question is whether individualized topographies of neuropsychological populations with atypical cortical function (e.g., developmental prosopagnosics) or with lesioned brains (e.g., acquired prosopagnosics) could also be accurately predicted using the hyperalignment-based methods. Up to now, as far as we know, no previous literature has investigated this question. Beyond neuropsychological groups, it is also valuable to investigate how well the predictions will be across a wide range of age, from infants to the elderly. Future research is essential to adapt our algorithms to diverse populations.”

      Reviewer #2 (Public Review):

      Guo and her colleagues develop a new approach to map the category-selective functional topographies in individual participants based on their movie-viewing fMRI data and functional localizer data from a normative sample. The connectivity hyperalignment are used to derived the transformation matrices between the participants according to their functional connectomes during movies watching. The transformation matrices are then used to project the localizer data from the normative sample into the new participant and create the idiosyncratic cortical topography for the participant. The authors demonstrate that a target participant's individualized category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. The new approach allows researchers to estimate a broad range of functional topographies based on naturalistic movies and a normative database, making it possible to integrate datasets from laboratories worldwide to map functional areas for individuals. The topic is of broad interest for neuroimaging community; the rationale of the study is straightforward and the experiments were well designed; the results are comprehensive. I have some concerns that the authors may want to address, particularly on the details of the pipeline used to map individual category-selective functional topographies.

      We thank the reviewer for the encouragement.

      1) How does the length of the scan-length of movie-viewing fMRI affect the accuracy in predicting the idiosyncratic cortical topography? Similarly, how does the number of participants in the normative database affect the prediction of the category-selective topography? This information is important for the researchers who are interested in using the approach in their studies.

      To investigate the influence of movie-viewing data length and the number of participants in the normative database on prediction performance, we systematically varied these parameters. Specifically, we altered the number of runs utilized in the analysis for both the normative and target data and experimented with varying the number of participants in the normative dataset using the Budapest and the Sraiders datasets. We have included a new Figure 4-figure supplement 5 to present a summary of these findings.

      The results reveal that both within-dataset and between-dataset prediction performances are positively correlated with the length of movie-viewing fMRI data used for both the normative and target groups. A similar trend was observed with respect to the number of participants included in the normative dataset. It is important to highlight, though, that, even when analyzing as little as one run of movie-viewing data—roughly 10-15 minutes, our hyperalignment-based prediction performance was significantly higher than that achieved using traditional surface alignment. This held true even when the normative dataset included as few as five participants.

      In summary, our results show that prediction performance generally improves with longer movie-viewing sessions and larger normative datasets. However, it is noteworthy that even with minimal data—10 minutes of movie-viewing and a small number of participants in the normative dataset—our algorithm still outperforms traditional surface alignment methods significantly.

      We also added sentences in the discussion section (page 15):

      “We investigated the influence of naturalistic movie length and the size of the training group on the prediction accuracy of individualized functional topographies. By incrementally increasing both the number of movie runs in the training and target dataset and the participants in the training group in the Budapest and Sraiders dataset, we observed enhanced prediction accuracy (Figure 4-figure supplement 5). Notably, even with just one movie run in the training or target dataset, or with a mere five participants in the training group, our prediction performance (Pearson r) ranged from about 0.6 to 0.7. This accuracy significantly outperformed results obtained using surface-based alignment.”

      2) The data show that category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. I'm wondering whether the functional connectome from resting state fMRI can do the same job as the movie-watching fMRI. If it is yes, it will expand the approach to broader data.

      We agree with the reviewer that demonstrating the applicability of the resting state data will expand the application scenarios of this approach. Most previous findings on resting state connectivity, including the comparison between the naturalistic and the resting state paradigms, focused on the macro-scale similarities and differences (e.g., Samara et al., 2023). Very few studies have investigated the fine-scaled connectome based on resting state data. The study on connectivity hyperalignment (Guntupalli et al., 2018) demonstrated a shared fine-scale connectivity structure among individuals that co-exists with the common coarse-scale structure and built the algorithm to successfully hyperalign individuals to the shared fine-scaled space. Another study from our lab (Feilong et al., 2021) revealed that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence, indicating reliable and biologically relevant fine-scaled resting state connectome structures. Thus, it is highly plausible that our approach is able to be generalized to the resting state data, generating significantly better predictions of individualized functional topographies than traditional surface alignment. However, due to the limitations of the current datasets, we do not have resting state data available in the current datasets to perform this analysis. We are in the process of collecting new data to explore this hypothesis in future work.

      We added sentences to the discussion section to discuss this idea (page 18):

      “Studies comparing movie-viewing and resting state functional connectivity have shown that both paradigms yield overlapping macroscale cortical organizations (29), though naturalistic viewing introduces unique modality-specific hierarchical gradients. However, there remains a gap in research comparing the fine-scaled connectomes of naturalistic and resting state paradigms. Guntupalli and colleagues (14) revealed a shared fine-scale structure that coexists with the coarse-scale structure, and connectivity hyperalignment successfully improved intersubject correlations across a wide variety of tasks. Feilong et al. (13) noted that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence. This suggests a reliable and biologically relevant fine-scale resting state connectivity structure among individuals. Therefore, it is plausible that individualized functional topography could be effectively estimated using resting state functional connectivity, expanding the applicability of our approach. Future studies are needed to explore this direction.”

      3) The authors averaged the hyper-aligned functional localizer data from all of subjects to predict individual category-selective topographies. As there are large spatial variability in the functional areas across subjects, averaging the data from many subjects may blur boundaries of the functional areas. A better solution might be to average those subjects who show highly similar connectome to the target subjects.

      We appreciate the reviewer’s insightful question about optimizing prediction performance by selecting participants most similar in functional connectivity to the target individuals. This is a promising direction and difficult problem as well. Our approach is based on fine-scale connectome to hyperalign participants, thus different groups of participants may be similar to the target participant in different searchlights. In addition, based on results discussed in the response to Q2, the more participants included in the normative dataset, the better the prediction performance. Thus, there is a trade-off between the number of participants included in the normative dataset for the prediction and the overall similarity of those participants to the target participant.

      To quantitatively explore this idea, we used a searchlight in the right ventral temporal cortex, roughly at the location of posterior fusiform face area (pFFA).We sorted participants by their connectome similarity to each target participant and then examined prediction performance based on either the top nine most similar participants or the bottom nine least similar participants. Our results, presented in Figure 4-figure supplement 8, reveal that hyperalignment consistently outperforms surface alignment regardless of the subset of participants used. Notably, using the nine most similar participants did not significantly alter prediction performance (Tukey Test, z = -0.09, p = 0.996), while using the least similar participants did negatively impact it (Tukey Test, z = 2.492, p = 0.034). Interestingly, the stability of hyperalignment-based predictions remained high even when only a subset of participants was used, contrasting with the variability observed in surface-alignment-based predictions.

      Overall, these findings suggest that while selecting functionally similar participants is a promising avenue for future optimization, the process will require nuanced, searchlight-specific criteria. Each searchlight may necessitate its own set of optimal participants to balance between the performance boost from having more participants and the fidelity gained from participant similarity.

      We added the following to the discussion in the manuscript (page 16):

      “In our study, we used fine-scale connectomes, noting that some participants are more similar to the target participant in specific searchlights. It is an interesting question whether predictions could be enhanced by exclusively selecting those more similar participants for the target participant. To explore this option, we examined a searchlight in the right ventral temporal cortex that was roughly at the location of the posterior fusiform area (pFFA) using the top and bottom nine participants similar to each target participant measured by their fine-scale connectome similarities in the budapest dataset. Generally, using all or part of the participants for the prediction generated similar results (Figure 4-figure supplement 8). Compared to using all the participants, using only the top nine participants who are the most similar to the target participants did not significantly improve the prediction (Tukey Test, z = -0.09, p = 0.996), but using only the bottom nine participants generated significantly lower prediction accuracies (Tukey Test, z = 2.492, p = 0.034). This suggests a trade-off between the number of participants included in the prediction and the similarity of the participants. Future studies are needed to explore the optimal threshold for the number of participants included for each searchlight to refine the algorithm.”

      4) It is good to see that predictions made with hyperalignment were close to and sometimes even exceeded the reliability values measured by Cronbach's alpha. But, please clarify how the Cronbach's alpha is calculated.

      Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. For example, Guntupalli et al. (2016) used correlations of category-selectivity maps between odd and even localizer runs as the measure of reliability. The odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.

      Cronbach’s alpha has been used in many previously published works from our lab (e.g., Feilong et al., 2021; Jiahui et al., 2020, 2023). The code for implementing this metric is publicly accessible on the first author’s Github repository (https://github.com/GUOJiahui/face_DCNN/blob/main/code/cronbach_alpha.py).

      We added the detailed explanation above to the Material and Methods section (page 24):

      “Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. The common odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.”

      5) Which algorithm was used to perform surface-based anatomical alignment? Can the state-ofthe-art Multimodal Surface Matching (MSM) algorithm from HCP achieve better performance?

      We preprocessed our datasets using fMRIPrep, which employs algorithms from FreeSurfer’s recon-all for surface-based anatomical alignment. It is worth noting that different alignment methods can yield varying degrees of performance. For instance, a study by Coalson et al. (2018) compared the localization performance of multiple surface-based alignment methods, including Multimodal Surface Matching (MSM) and FreeSurfer. The study found that MSM outperformed FreeSurfer in terms of peak probabilities and spatial clustering, suggesting better overall localization.

      Additionally, Guntupalli et al. (2018) evaluated intersubject correlations (ISC) of functional connectivity from movie-viewing data using both Connectivity Hyperalignment (CHA) and MSM-All with the Human Connectome Project (HCP) dataset. The study showed that although MSM-All yielded marginally better ISC than traditional surface alignment, CHA’s performance was significantly superior.

      In summary, while using a more advanced alignment algorithm like MSM could marginally improve prediction performance, its advantages may not be substantial when compared to our CHA-based predictions. The combination of MSM and CHA represents an intriguing direction for future research, although it falls outside the scope of our current study.

      6) Is it necessary to project to the time course of the functional localizer from the normative sample into the new participants? Does it work if we just project the contrast maps from the normative samples to the new subjects?

      It is an interesting question and a practical alternative to researchers to know whether time series of the localizer runs are required to obtain reasonable predictions, as in some scenarios, contrast maps may be the only accessible data in the analysis. To quantitatively explore this possibility, we applied transformation matrices derived from the movie data to training participants’s individual pre-calculated contrast maps of all four categories, and evaluated the predictions. We found nearly similar prediction performance between the two flavors within and across datasets (Figure 4-figure supplement 7). However, it is worth noting that applying transformation matrices directly to contrast maps did not get as much improvement in the interactive steps as the other flavor in the advanced CHA, perhaps due to the scale changes when multiple iterations were implemented and the difficulty to properly normalize the t-maps compared to the regular time series.

      Overall, although our algorithm is originally designed to be used on the time course of the functional localizer runs, relatively comparable results can be generated even when the contrast maps are directly projected from the normative group to the target participant. However, to derive the best results with our approach, time series are recommended when the situation permits.

      We have also added the contents into the Discussion section (page 16):

      “Our original algorithm is designed to apply transformation matrices to the time series of localizer data of training participants before generating contrast maps. To explore whether directly applying these matrices to pre-calculated contrast maps yields comparable results, we conducted an additional analysis across the four categories. Our findings indicate that the prediction outcomes were indeed quite similar between the two approaches for both the within- and across-datasets predictions (Figure 4-figure supplement 7). However, it is worth noting that the improvements observed with enhanced CHA were not as pronounced when applied directly to the contrast maps as opposed to the time series.”

      7) Saygin and her colleagues have demonstrated that structural connectivity fingerprints can predict cortical selectivity for multiple visual categories across cortex (Osher DE et al, 2016, Cerebral Cortex; Saygin et al, 2011, Nat. Neurosci). I think there's a connection between those studies and the current study. If the author can discuss the connection between them, it may help us understand why CHA work so well.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      Reviewer #3 (Public Review):

      In this paper, Jiahui and colleagues propose a new method for learning individual-specific functional resonance imaging (fMRI) patterns from naturalistic stimuli, extending existing hyperalignment methods. They evaluate this method - enhanced connectivity hyperalignment (CHA) - across four datasets, each comprising between nine (Raiders) and twenty (Budapest, Sraiders) participants.

      The work promises to address a significant need in existing functional alignment methods: while hyperalignment and related methods have been increasingly used in the field to compare participants scanned with overlapping stimuli (or lack thereof, in the case of resting state data), their use remains largely tied to naturalistic stimuli. In this case, having non-overlapping stimuli is a significant constraint on application, as many researchers may have access to only partially overlapping stimuli or wish to compare stimuli acquired under different protocols and at different sites.

      It is surprising, however, that the authors do not cite a paper that has already successfully demonstrated a functional alignment method that can address exactly this need: a connectivitybased Shared Response Model (cSRM; Nastase et al., 2020, NeuroImage). It would be relevant for the authors to consider the cSRM method in relation to their enhanced CHA method in detail. In particular, both the relative predictive performance as well as associated computational costs would be useful for researchers to understand in considering enhanced CHA for their applications.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      With this in mind, I noted several current weaknesses in the paper:

      First, while the enhanced CHA method is a promising update on existing CHA techniques, it is unclear why this particular six step, iterative approach was adopted. That is: why was six steps chosen over any other number? At present, it is not clear if there is an explicit loss function that the authors are minimizing over their iterations. The relative computational cost of six iterations is also likely significant, particularly compared to previous hyperalignment algorithms. A more detailed theoretical understanding of why six iterations are necessary-or if other researchers could adopt a variable number according to the characteristics of their data-would significantly improve the transferability of this method.

      In the advanced connectivity hyperalignment implementation, we gradually increased the number of targets. The six steps were not intentionally chosen but were the result of the increase to the maximum number of fine-grained targets, namely single cortical vertices.

      Our datasets were resampled to the cortical mesh with 18,742 vertices across both hemispheres (approximately 3 mm vertex spacing; icoorder 5; 20,484 vertices before removing non-cortical vertices). Step 1 was the classic standard connectivity hyperalignment implementation based on the anatomically-aligned data. Since using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data generates poor functional correspondence across participants (Busch et al., 2021), we used 1,284 vertices (icoorder 3, before removing the medial wall) as connectivity targets in step 1. However, it is beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales. To better align across participants, we iterated the alignment for another two times (step 2 and step 3) with the same number of 1,284 coarse connectivity targets to ensure improved alignment before increasing the number of targets in the later steps. In step 4, we increased the number of targets to 5,124 (icoorder 4, before removing the medial wall), and iterated with this number of vertices for two times in total (step 4 & step 5) before using all vertices as targets. In the final step (step 6), all vertices were used as connectivity targets.

      It is true that the multiple iteration steps largely increased the computational complexity compared to the classic connectivity hyperalignment, but the prediction increase was steady across all datasets and became comparable to response hyperalignment performance which requires time-locked stimuli. We did not use an explicit loss function in the algorithm, but followed the natural progression of the number of potential connectivity targets in the implementation. On the other hand, the difference between the performance of the improved and the classic connectivity hyperalignment was relatively small (difference of r < 0.05), which indicates the effectiveness of our classic algorithm. It is up to the researchers’ own options to adopt the number of iterations and the pace of increasing the number of targets in each step. If computational resources are limited or if a shorter total computational time is the primary priority, using the classic connectivity hyperalignment may be the best option to balance the trade-offs.

      The Materials and Methods section had the details of the implementation (page 22-23):

      “Using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data usually generates poor functional correspondence across participants (33). It is, however, beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales.

      We used six steps to further improve the connectivity hyperalignment method. Step 1 was the initial connectivity hyperalignment step as described above that was based on the raw anatomically aligned movie data. The resultant transformation matrices were applied to those movie runs, and the hyperaligned data were then used in step 2 to calculate new connectivity patterns and calculate new transformation matrices. We repeated this procedure iteratively six times and derived transformation matrices for each step. In steps 1, 2, and 3, 642 × 2 (icoorder3, before removing the medial wall) connectivity targets were defined with 13 mm searchlights. In step 4 and 5, 2562 × 2 (icoorder 4, before removing the medial wall) connectivity targets were used with 7 mm searchlights to calculate target mean time series. In the final step 6, all 18742 vertices were included as separate connectivity targets, using each vertex’s time series rather than calculating the mean in a searchlight. Each step of this advanced connectivity hyperalignment algorithm increased the prediction performance (Figure 4-figure supplement 2).”

      But to help the readers understand the logic of the advanced connectivity hyperalignment algorithm used in this study, we expanded the discussion section (page 15):

      “Because using dense connectivity targets (e.g., using all vertices as connectivity targets) with anatomically-alignment data often leads to suboptimal alignment across participants (33), we started with coarse connectivity targets and gradually increased the number of connectivity targets to form a denser representation of connectivity profiles. The iterations improved the prediction performance step by step, and at the final step (step 6, all vertices were used as connectivity targets) in this analysis, the enhanced CHA generated comparable performance with RHA (Figure 4-figure supplement 4).”

      Second, the existing evaluations for enhanced CHA appear to be entirely based on imagederived correlations. That is, the authors compare the predicted image from CHA with the ground-truth image using correlation. While this provides promising initial evidence, correlation-based measures are often difficult to interpret given their sensitivity to image characteristics such as smoothness. Including Cronbach's alpha reliability as a baseline does not address this concern, as it is similarly an image-based statistic. It would be useful to see additional predictive experiments using frameworks such as time-segment classification, intersubject decoding, or encoding models.

      We appreciate the reviewer’s concern regarding the stability of local correlations in relation to image characteristics. To address this, we conducted additional analysis using different searchlight sizes (with radii of 10 mm, 15 mm, and 20 mm) to evaluate the predicted categoryselective maps, focusing specifically on the Budapest dataset. The local correlations between the predicted category-selective maps (obtained using enhanced CHA) and participants’ own maps based on classic localizer runs were calculated for each searchlight. We averaged these correlations across participants and plotted the resulting maps, as shown in Figure 4-figure supplement 10. Although using a larger searchlight radius is similar to employing a larger smoothing kernel, the results remained relatively stable across different searchlight sizes, particularly in regions selectively responsive to the specific category. This stability suggests that while the evaluation may be influenced by image-related features, the conclusion would remain consistent under varying parameters.

      As for the use of enhanced CHA, it serves as an optimized version of the classic CHA, specifically designed for predicting individualized functional topographies. Evaluating prediction performance in our study is based on t-value contrast maps for each participant. Given this, it's unclear how time-segment classification or other decoding/encoding models could be appropriately implemented for performance evaluation. However, prior research from our lab has already established the effectiveness of classic CHA. Specifically, Guntupalli et al. (2018) showed that classic CHA significantly improved intersubject correlations (ISC) of connectivity profiles across the cortex. They also revealed that CHA captured fine-scale variations in connectivity profiles for nearby cortical nodes across participants and led to improved betweensubject multivariate pattern classification accuracies (bsMVPC) of movie segments. These findings serve as robust evidence for the effectiveness of classic CHA, laying the groundwork for our enhanced CHA approach.

      We added Figure 4-figure supplement 10 to the supplementary material:

      Addressing these concerns and considering cSRM as a comparison model would significantly strengthen the paper. There are also notable strengths that I would encourage the authors to further pursue. In particular, the authors have access to a unique dataset in which the same Raiders of the Lost Ark stimulus was scanned for participants within the Budapest (SRaiders) dataset as well as non-overlapping participants in the Raiders dataset. Exploring the relative performance for cross-movie prediction within a dataset as compared to a shared movie prediction across datasets is particularly interesting for methods development. I would encourage the authors to explicitly report results in this framework to highlight both this unique testing structure as well as the performance of their enhanced CHA method.

      We appreciate the reviewer's suggestion to examine a shared time-series but non-overlapping participants scenario using the Sraiders and Raiders datasets. However, there are significant differences between the two datasets that preclude such direct comparison. These differences include varying scanning parameters, MRI scanners, localizer types, and data collection procedures. Due to these methodological divergences, the datasets cannot be treated as identical time-series.

      Firstly, the scanning parameters vary considerably. Sraiders were scanned with TR = 1 s (TR/TE = 1000/33 ms, flip angle = 59 °, resolution = 2.5 mm3 isotropic voxels, matrix size = 96 × 96, FoV = 240 × 240 mm, multiband acceleration factor = 4, and no in-plane acceleration), and Raiders were scanned with TR = 2.5 s (TR = 2.5 s, TE = 35 ms, Flip angle = 90°, 80 × 80 matrix, FOV = 240 mm × 240 mm, resolution = 0.938 mm × 0.938 mm × 1.0 mm).

      Secondly, participants in the Sraiders were scanned with a 3 T S Magnetom Prisma MRI scanner with a 32 channel head coil and the Raiders dataset, collected more than 10 years ago, used a 3T Philips Intera Achieva scanner with an eight-channel head coil.

      Thirdly, the stimuli presentations were different. In the Sraiders dataset, the movie Raiders of the Lost Ark was split into eight parts (~15 min each), and the first four parts were watched outside of the scanner prior to the scanning (~56 min). The later four parts were watched in the scanner (57 min) with audio. And in the Raiders dataset, the audio-visual movie was split into eight parts (~15 min each). Participants watched all eight parts in the scanner with audio (one part / per run).

      Fourthly and critically, the two datasets included two types of localizers. The Sraiders dataset included dynamic localizer runs, and the Raiders dataset only contained a static localizer that was similarly designed as in the Forrest dataset.

      With all four points, it is not suitable to treat the two datasets as identical time-series. The difference in the localizer type is a further issue. The topographies generated from the two types of localizers are dissimilar in many ways. For all categories, the dynamic localizer elicited stronger and broader category-selective activations than the static localizer, and the searchlight analysis showed that the dynamic localizer had higher reliabilities across the cortex, especially in regions that were selectively responsive to the target category. Due to these differences, crossdataset predictions yielded lower correlations than within-dataset predictions. This is not indicative of methodological failure but reflects diverging topographies activated by different localizers.

      In the manuscript, we have extensively analyzed cross-dataset predictions (Figure 2-figure supplement 1-Figure 4-figure supplement 4 & 6).

      ● Figure 2-figure supplement 1 demonstrates that, despite the limitations of cross-localizertype evaluation, both R-to-S (Raiders to Sraiders) and S-to-R (Sraiders to Raiders) predictions significantly outperformed surface alignment methods across categories.

      ● Figure Figure 2-figure supplement 2 confirms that the prediction performance remained stable across individual participants, underscoring the robustness of our methodology.

      ● Figure 3-figure supplement 1 & Figure 3-figure supplement 2 display contrast maps generated from both native and alternate localizers, revealing that the maps share similar topographies irrespective of the dataset origin.

      ● Figure 4-figure supplement 1 presents a correlation analysis of local similarities in R-to-S and S-to-R predictions, highlighting particularly strong correlations in the ventral face regions.

      ● Figure 4-figure supplement 2 employs histograms to showcase performance across major cortices and furnishes additional evidence regarding the influence of localizer types on the results.

      ● Figure 4-figure supplement 3 offers a searchlight analysis for other categories, enriching the scope of our investigation.

      ● Figure 4-figure supplement 4 affirms that the advanced CHA is effective in both R-to-S and S-to-R predictions.

      ● Figure 4-figure supplement 6 compares the efficacy of 1-step vs. 2-step prediction methods for R-to-S and S-to-R, showing a clear advantage for the 1-step approach.

      These analyses affirmed that our approach outperforms surface alignment methods. But the inherent limitations in data collection and localizer types preclude a direct exploration of the reviewer’s hypothesis. These complexities necessitate further research to fully validate the proposed scenario.

      Overall, I share the authors' enthusiasm for the potential of cross-movie, cross-dataset prediction, and I believe that methods such as enhanced CHA are likely to significantly improve our ability to make these comparisons in the near future. At present, however, I find that the theoretical and experimental support for enhanced CHA is incomplete. It is therefore difficult to assess how enhanced CHA meets its goals or how successfully other researchers would be able to adopt this method in their own experiments.

      We hope our new analysis and replies addressed the reviewer’s concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors describe an elegant genetic screen for mutants that suppress defects of MCT1 deletions which are deficient in mitochondrial fatty acid synthesis. This screen identified many genes, including that for Sit4. In addition, genes for retrograde signaling factors (Rtg1, Rtg2 and Rtg3), proteins influencing proteasomal degradation (Rpn4, Ubc4) or ribosomal proteins (Rps17A, Rps29A) were found. From this mix of components, the authors selected Sit4 for further analysis. In the first part of the study, they analyzed the effect of Sit4 in context of MCT1 mutant suppression. This more specific part is very detailed and thorough, the experiments are well controlled and convincing. The second, more general part of the study focused on the effect of Sit4 on the level of the mitochondrial membrane potential. This part is of high general interest, but less well developed. Nevertheless, this study is very interesting as it shows for the first time that phosphate export from mitochondrial is of general relevance for the membrane potential even in wild type cells (as long as they live from fermentation), that the Sit4 phosphatase is critical for this process and that the modulation of Sit4 activity influences processes relying on the membrane potential, such as the import of proteins into mitochondria. However, some aspects should be further clarified.

      1) It is not clear whether Sit4 is only relevant under fermentative conditions. Does Sit4 also influence the membrane potential in respiring cells? Fig. S2D shows the membrane potential in glucose and raffinose. Both carbon sources lead to fermentative growths. The authors should also test whether Sit4 levels influence the membrane potential when cells are grown under respirative conditions, such in ethanol, lactate or glycerol. Even if deletions of Sit4 affect respiration, mutants with altered activity can be easily analyzed.

      sit4Δ cells fail to grow on nonfermentable media as shown by us (Figure 2—figure supplement 1C) and others (Arndt et al., 1989; Dimmer et al., 2002; Jablonka et al., 2006). In our opinion, the exact reason is unclear, but there is an interesting observation that addition of aspartate can partially restore growth on ethanol (Jablonka et al., 2006). Despite the lack of thorough investigation on this sit4Δ defect, an early study speculated that this defect could be related to the cAMP-PKA pathway (Sutton et al., 1991). This study pointed out genetic interactions of SIT4 with multiple genes in cAMP-PKA (Sutton et al., 1991). In addition, sit4Δ cells have similar phenotypes as those cAMP-PKA null mutants, such as glycogen accumulation, caffeine resistant, and failure to grow on nonfermentable media (Sutton et al., 1991). We have not found sit4Δ mutants that could grow on nonfermentable media based on literature search.

      2) The authors should give a name to the pathway shown in Fig. 4D. This would make it easier to follow the text in the results and the discussion. This pathway was proposed and characterized in the 90s by George Clark-Walker and others, but never carefully studied on a mechanistic level. Even if the flux through this pathway cannot be measured in this study, the regulatory role of Sit4 for this process is the most important aspect of this manuscript.

      We now refer this mechanism as the mitochondrial ATP hydrolysis pathway.

      3) To further support their hypothesis, the authors should show that deletion of Pic1 or Atp1 wipes out the effect of a Sit4 deletion. In these petite-negative mutants, the phosphate export cycle cannot be carried out and thus, Sit4, should have no effect.

      The mitochondrial phosphate transport activity is electroneutral as it also pumps a proton together with inorganic phosphate. The F1 subunit of the ATP synthase (Atp1 and Atp2) is suggested among many literatures to be responsible for the ATP hydrolysis. We performed tetrad dissection to generate atp1Δ or atp2Δ in pho85Δ background. After streaking the single colony to a fresh plate, we noticed that atp1Δ mct1Δ and atp2Δ mct1Δ cells are lethal, and knocking out PHO85 rescued this synthetic lethality. It is not surprising that atp1Δ mct1Δ or atp2Δ mct1 Δ cells are lethal since the F1 subunit is important to generate a minimum of MMP in mct1 Δ cells when the ETC is absent (i.e., rho0 cells). However, knocking out PHO85 can generate MMP independent of F1 subunit of ATP synthase, which is suggested by the viable atp1Δ mct1Δ pho85Δ and atp2Δ mct1Δ pho85Δ cells. There are many ATPases in the mitochondrial matrix that could hydrolyze ATP for ADP/ATP carrier to generate MMP theoretically. However, we do not currently know exactly which ATPase(s) is activated by phosphate starvation. This data is now included as Figure 5—figure supplement 1F-G.

      4) What is the relevance of Sit4 for the Hap complex which regulates OXPHOS gene expression in yeast? The supplemental table suggests that Hap4 is strongly influenced by Sit4. Is this downstream of the proposed role in phosphate metabolism or a parallel Sit4 activity? This is a crucial point that should be addressed experimentally.

      To investigate the role of the Hap complex in MMP generation in sit4Δ cells, we overexpressed and knocked out HAP4, the catalytic subunit of the Hap complex, separately in wild-type and sit4Δ cells. We confirmed the HAP4 overexpression by the enriched abundance of ETC complexes as shown in the BN-PAGE (Figure 2—figure supplement 1E). However, we did not observe any rescue of ETC or ATP synthase in mct1Δ cells when HAP4 was overexpressed. The enriched level of ETC complexes by HAP4 overexpress is not sufficient to rescue the MMP (Figure 2—figure supplement 1F).

      Next, we knocked out HAP4 in sit4Δ cells. Knocking out SIT4 could still increase MMP in hap4Δ cells with a much-reduced magnitude, which phenocopied ETC subunit and RPO41 deletion in sit4Δ cells (Figure 2—figure supplement 1G).

      In conclusion, the Hap complex is involved in the MMP increase when SIT4 is absent. However, it is not sufficient to increase MMP by overexpressing HAP4. The Hap complex discussion is now included in the manuscript, and the data is presented as Figure 2—figure supplement 1E-G.

      5) The authors use the accumulation of Ilv2 precursors as proxy for mitochondrial protein import efficiency. Ilv2 was reported before as a protein which, if import into mitochondria is slow, is deviated into the nucleus in order to be degraded (Shakya,..., Hughes. 2021, Elife). Is it possible that the accumulation of the precursor is the result of a reduced degradation of pre-Ilv2 in the nucleus rather than an impaired mitochondrial import? Since a number of components of the ubiquitin-proteasome system were identified with Sit4 in the same screen, a role of Sit4 in proteasomal degradation seems possible. This should be tested.

      We thank the reviewer for pointing out this potential caveat with our Ilv2-FLAG reporter. With limited search and tests, we could not find another reporter that behaves like Ilv2FLAG. The reason Ilv2-FLAG is a perfect reporter for this study is because in wild-type cells, Ilv2-FLAG is not 100% imported. Therefore, we could demonstrate that mitochondria with higher MMP import more efficiently. Unfortunately, all of the mitochondrial proteins that we tested could efficiently import in wild-type cells. To identify other suitable mitochondrial proteins that behave like Ilv2-FLAG, we would need to conduct a more comprehensive screen.

      To address the concern of the involvement of protein degradation in obscuring the interpretation of Ilv2-FLAG import, we performed two experiments. First, we measured the proteasomal activity in wild-type and our mutants using a commercial kit (Cayman). We did not observe a statistically significant difference in 20S proteasomal activity between wild-type and sit4Δ cells.

      In the second experiment, we reduced the MMP of sit4 cells using CCCP treatment and measured the Ilv2-FLAG import. We first treated sit4Δ cells with different dosage of CCCP for six hours and measured their MMP. sit4Δ cells treated with 75 µM CCCP had comparable MMP to wild-type cells. When we treated sit4Δ cells with higher concentrations of CCCP, most of the cells did not survive after six hours. Next, we performed the Ilv2-FLAG import assay. We observed similar level of unimported Ilv2FLAG (marked with *) in sit4Δ cells treated with 75 µM CCCP. This result confirms that sit4Δ cells have similar Ilv2-FLAG turnover mechanism and activity as the wild-type cells, because when we lower the MMP in sit4Δ background we observe a similar level of unimported Ilv2-FLAG. We thus feel confident in concluding that the Ilv2-FLAG import results are indeed an accurate proxy for MMP level. These data are now included as Figure 1—figure supplement 1H-J in the manuscript.

      Author response image 1.

      Reviewer #2 (Public Review):

      This study reports interesting findings on the influence of a conserved phosphatase on mitochondrial biogenesis and function. In the absence of it, many nucleus-encoded mitochondrial proteins among which those involved in ATP generation are expressed much better than in normal cells. In addition to a better understanding of th mechanisms that regulate mitochondrial function, this work may help developing therapeutic strategies to diseases caused by mitochondrial dysfunction. However there are a number of issues that need clarification.

      1) The rationale of the screening assay to identify genes required for the gene expression modifications observed in mct1 mutant is not clear. Indeed, after crossing with the gene deletion libray, the cells become heterozygote for the mct1 deletion and should no longer be deficient in mtFAS. Thank you for clarifying this and if needed adjust the figure S1D to indicate that the mated cells are heterozygous for the mct1 and xxx mutations.

      We updated the methods section and the graphic for the genetic screen to clarify these points within the SGA workflow overview. After we created the heterozygote by mating mct1Δ cells with the individual KO cells in the collection, these diploids underwent sporulation and selection for the desired double KO haploid. As a result, the luciferase assay was performed in haploid cells with MCT1 and one additional non-essential gene deleted.

      2) The tests shown in Fig. S1E should be repeated on individual subclones (at least 100) obtained after plating for single colonies a glucose culture of mct1 mutant, to determine the proportion of cells with functional (rho+) mtDNA in the mct1 glucose and raffinose cultures. With for instance a 50% proportion of rho- cells, this could substantially influence the results of the analyses made with these cells (including those aiming to evaluate the MMP).

      We agree that this would provide a more confident estimate for population-level characterization of these colonies. It is important to note that we randomly chose 10 individual subclones, and 100% of these colonies were verified to be rho+. This suggests the population has functional mtDNA, and thus felt confident in the identity of our populations.

      3) The mitochondria area in mct1 cells (Fig.S1G) does not seem to be consistent with the tests in Fig. 1C. that indicate a diminished mitochondrial content in mct1 cells vs wild-type yeast. A better estimate (by WB for instance) of the mitochondrial content in the analyzed strains would enable to better evaluate MMP changes monitored with Mitotracker since the amount of mitochondria in cells correlate with the intensity of the fluorescence signal.

      As this reviewer pointed out, we quantified mitochondrial area based on Tom70-GFP signal. This measurement is quantified by mitochondrial area over cell size. Cell size is an important parameter when measuring organelle size as most of the organelles scale up and down with the cell size. mct1Δ cells generally have smaller cell size than WT cells. Therefore, the mitochondrial area of mct1Δ cells was not significantly different from WT cells when scaled to cell size. We believe this is the best method to compare mitochondrial area. As for quantifying MMP from these microscopy images, we measured the average MitoTracker Red fluorescence intensity of each mitochondria defined by Tom70-GFP. This method inherently normalizes to subtract the influence of mitochondria area when quantifying MMP.

      4) Page 12: "These data demonstrate that loss of SIT4 results in a mitochondrial phenotype suggestive of an enhanced energetic state: higher membrane potential, hyper-tubulated morphology and more effective protein import." Furthermore, the sit4 mutant shows higher levels of OXPHOS complexes compared to WT yeast.

      Despite these beneficial effects on mitochondria, the sit4 deletion strain fails to grow on respiratory substrates. It would be good to know whether the authors have some explanation for this apparent contradiction.

      We agree that this was initially puzzling. We provide a more complete explanation above (see comments to reviewer #1 - major concern #1). Briefly, the growth deficiency in non-fermentable media with sit4Δ cells was reported and studied by multiple groups (Arndt et al., 1989; Dimmer et al., 2002; Jablonka et al., 2006). These seems to indicate that sit4Δ cells contain more ETC complexes and more OCR but cannot respire on nonfermentable carbon source. However, we do not think there is yet a clear explanation for this phenotype. One interesting observation reported is the addition of aspartate partly restoring cells’ growth on ethanol (Jablonka et al., 2006). One early study speculates that this defect could be related to the cAMP-PKA pathway. Sutton et al. pointed out genetic interactions with sit4 and multiple genes in cAMP-PKA (Sutton et al., 1991). In addition, sit4Δ cells have similar phenotypes as those cAMP-PKA null mutants, such as glycogen accumulation, caffeine resistance, and failure to grow on non-fermentable media. However, to keep this manuscript succinct, we opted to stay focused on MMP.

      Reviewer #3 (Public Review):

      In this study, the authors investigate the genetic and environmental causes of elevated Mitochondrial Membrane Potential (MMP) in yeast, and also some physiological effects correlated with increased MMP.

      The study begins with a reanalysis of transcriptional data from a yeast mutant lacking the gene MCT1 whose deletion has been shown to cause defects in mitochondrial fatty acid synthesis. The authors note that in raffinose mct1del cells, unlike WT cells, fail to induce expression of many genes that code for subunits of the Electron Transport Chain (ETC) and ATP synthase. The deletion of MCT1 also causes induction of genes involved in acetyl-CoA production after exposure to raffinose. The authors therefore conduct a screen to identify mutants that suppress the induction of one of these acetylCoA genes, Cit2. They then validate the hits from this screen to see which of their suppressor mutants also reduce expression in four other genes induced in a mct1del strain. This yielded 17 genes that abolished induction of all 5 genes tested in an mct1del background during growth on raffinose.

      The authors chose to focus on one of these hits, the gene coding for the phosphatase SIT4 (related to human PP6) which also caused an increase in expression of two respiratory chain genes. The authors then investigated MMP and mitochondrial morphology in strains containing SIT4 and MCT1 deletions and surprisingly saw that sit4del cells had highly elevated MMP, more reticular mitochondria, and were able to fully import the acetolactate synthase protein Ilv2p and form ETC and ATP synthase complexes, even in cells with an mct1del background, rescuing the low MMP, fragmented mitochondria, low import of Ilv2 and an inability to form ETC and ATP synthase complexes phenotypes of the mct1del strain. Surprisingly, the authors find that even though MMP is high and ETC subunits are present in the sit4del mct1del double deletion strain, that strain has low oxygen consumption and cannot grow under respiratory conditions, indicating that the elevated MMP cannot come from fully functional ETC subunits. The authors also observe that deleting key subunits of ETC complex III (QCR2) and IV (COX5) strongly reduced the MMP of the sit4del mutant, which would suggest that the majority of the increase in MMP of the sit4del mutant was dependant on a partially functional ETC. The authors note that there was still an increase in MMP in the qcr2del sit4del and cox4del sit4del strains relative to qcr2del and cox4del strains indicating that some part of the increase in MMP was not dependent on the ETC.

      The authors dismiss the possibility that the increase in MMP could have been through the reversal of ATP synthase because they observe that inhibition of ATP synthase with oligomycin led to an increase of MMP in sit4del cells. Indicating that ATP synthase is operating in a forward direction in sit4del cells.

      Noting that genes for phosphate starvation are induced in sit4del cells, the authors investigate the effects of phosphate starvation on MMP. They found that phosphate starvation caused an increase in MMP and increased Ilv2p import even in the absence of a mitochondrial genome. They find that inhibition of the ADP/ATP carrier (AAC) with bongkrekic acid (BKA) abolishes the increase of MMP in response to phosphate starvation. They speculate that phosphate starvation causes an increase in MMP through the import and conversion of ATP to ADP and subsequent pumping of ADP and inorganic phosphate out of the mitochondria.

      They further show that MMP is also increased when the cyclin dependent kinase PHO85 which plays a role in phosphate signaling is deleted and argue that this indicates that it is not a decrease in phosphate which causes the increase in MMP under phosphate starvation, but rather the perception of a decrease in phosphate as signalled through PHO85. Unlike in the case of SIT4 deletion, the increase in MMP caused by the deletion of pho85 is abolished when MCT1 is deleted.

      Finally they show an increase in MMP in immortalized human cell lines following phosphate starvation and treatment with the phosphate transporter inhibitor phosphonoformic acid (PFA). They also show an increase in MMP in primary hepatocytes and in midgut cells of flies treated with PFA.

      The link between phosphate starvation and elevated MMP is an important and novel finding and the evidence is clear and compelling. Based on their experiments in various mammalian contexts, this link appears likely to be generalizable, and they propose and begin to test an interesting hypothesis for how MMP might occur in response to phosphate starvation in the absence of the Electron Transport Chain.

      The link between phosphate starvation and deletion of the conserved phosphatase SIT4 is also interesting and important, and while the authors' experiments and analysis suggest some connection between the two observations, that connection is still unclear.

      Major points

      Mitotracker is great fluorescent dye, but it measures membrane potential only indirectly. There is a danger when cells change growth rates, ion concentrations, or when the pH changes, all MMP indicating dyes change in fluorescence: their signal is confounded Change in phosphate levels can possibly do both, alter pH and ion concentrations. Because all conclusions of the manuscript are based on a change in MMP, it would be a great precaution to use a dye-independent measure of membrane potential, and confirm at least some key results.

      Mitochondrial MMP does strongly influence amino acid metabolism, and indeed the SIT4 knockout has a quite striking amino acid profile, with histidine, lysine, arginine, tyrosine being increased in concentration. http://ralser.charite.de/metabogenecards/Chr_04/YDL047W.html Could this amino acid profile support the conclusions of the authors? At least lysine and arginine are down in petites due to a lack of membrane potential and iron sulfur cluster export.- and here they are up. Along these lines, according to the same data resource, the knock-outs CSR2, ASF1, SSN8, YLR0358 and MRPL25 share the same metabolic profile. Due to limited time I did not re-analyse the data provided by the authors- but it would be worth checking if any of these genes did come up in the screens of the authors.

      We tested the mutants within the same cluster as SIT4 shown in this paper from the deletion collection and measured their MMP. yrl358cΔ cells have similar high MMP as observed in sit4Δ cells. However, this gene has a yet undefined function. Beyond YRL358C, we did not observe similar MMP increases in other gene deletions from this panel, which does not support the notion that amino acids such as histidine, lysine, arginine, or tyrosine play a determining effect in driving MMP.

      The media condition and strain used in the suggested paper is very different from what we used in our study. Instead of growing prototrophic cells in minimal media without any amino acids, we used auxotrophic yeast strains and grew them in media containing complete amino acids. So far, none of the other defects or signaling associated with SIT4 deletion could influence MMP as much as the phosphate signaling. We interpret these data to support the hypothesis that the MMP observation in sit4Δ cells is connected with the phosphate signaling as illustrated by the second half of the story in our manuscript.

      Author reponse image 2.

      One important claim in the manuscript attempts to explain a mechanism for the MMP increase in response to phosphate starvation which is independent of the ETC and ATP synthase.

      It seems to me the only direct evidence to support this claim is that inhibition of the AAC with BKA stops the increase of mitotracker fluorescence in response to phosphate starvation in both WT and rho0 cells (Figs 4B and 4C). It would strengthen the paper if the authors could provide some orthogonal evidence.

      This is a similar comment as raised by reviewer #1 - major concern #3. We refer the reviewer to our discussion and the new data above. Briefly, we do not think F1 subunit is responsible for the ATP hydrolysis activity to generate MMP in phosphate depleted situation. We believe there are additional ATPase(s) in the mitochondrial matrix that can be utilized to couple to ADP/ATP carrier for MMP generation during phosphate starvation. However, we have not identified the relevant ATPase(s) at this point, and it is likely that multiple ATPases could contribute to this activity.

      Introduction/Discussion The author might want to make the reader of the article aware that the 'reversal' of the ATP synthase directionality -i.e. ATP hydrolysis by the ATP synthase as a mechanism to create a membrane potential (in petites), has always been a provocative idea - but one that thus far could never be fully substantiated. Indeed some people that are very familiar with the topic, are skeptical this indeed happens. For instance, Vowinckel et al 2021 (PMID: 34799698) measured precise carbon balances for peptide cells, and found no evidence for a futile cycle - peptides grow slower, but accumulate the same biomass from glucose as peptides that re-evolve at a fast growth rate . Perhaps the manuscript could be updated accordingly.

      We thank the reviewer for pointing out this additional relevant study. We have rephased the referenced sentence in the introduction. The MMP generation in phosphate starvation is independent of the F1 portion of ATP synthase. Therefore, our data neither supports or refutes either of these arguments.

      In the introduction and conclusion there is discussion of MMP set points. In particular the authors state:

      "Critically, we find that cells often prioritize this MMP setpoint over other bioenergetic priorities, even in challenging environments, suggesting an important evolutionary benefit."

      This does not seem to be consistent with the central finding of the manuscript that MMP changes under phosphate starvation. MMP doesn't seem so much to have a 'set point' but rather be an important physiological variable that reacts to stimuli such as phosphate starvation.

      The reviewer raises a rational alternative hypothesis to the one that we have proposed. In reality, both of these are complete speculations to explain the data and we can’t think of any way to test the evolutionary basis for the mechanisms that we describe. We recognize that untested/untestable speculative arguments have limitations and there are viable alternative hypotheses. We have softened our language to ensure that it is clear that this is only a speculation.

      The authors suggest that deletion of Pho85 causes an increase in MMP because of cellular signaling. However, they also state in the conclusion:

      "Unlike phosphate starvation, the pho85D mutant has elevated intracellular phosphate concentrations. This suggests that the phosphate effect on MMP is likely to be elicited by cellular signaling downstream of phosphate sensing rather than some direct effect of environmental depletion of phosphate on mitochondrial energetics."

      The authors should cite the study that shows deletion of PHO85 causes increased intracellular phosphate concentrations. It also seems possible that the 'cellular signaling' that causes the increase in MMP could be a result of this increase in intracellular phosphate concentrations, which could constitute a direct effect of an environmental overload of phosphate on mitochondrial energetics.

      We now cited the literature that shows higher intracellular phosphate in pho85Δ cells (Gupta et al., 2019; Liu et al., 2017). Depleting phosphate in the media drastically reduced intracellular phosphate concentration, which is the opposing situation as pho85Δ cells. Nevertheless, we observed higher MMP in either situation. We concluded from these two observations that the increase in MMP is a response to the signaling activated by phosphate depletion rather than the intracellular phosphate abundance.

      Related to this point, in the conclusion, the authors state:

      "We now show that intracellular signaling can lead to an increased MMP even beyond the wild-type level in the absence of mitochondrial genome."

      In sum, the data shows that signaling is important here- but signaling alone is only the message - not the biophysical process that creates a membrane potential. The authors then could revise this slightly.

      We have rephrased this sentence as suggested, which now reads “We now show that intracellular signaling triggers a process that can lead to an increased MMP even beyond the wild-type level in the absence of mitochondrial genome”.

      The authors state in the conclusion that

      "We first made the observation that deletion of the SIT4 gene, which encodes the yeast homologue of the mammalian PP6 protein phosphatase, normalized many of the defects caused by loss of mtFAS, including gene expression programs, ETC complex assembly, mitochondrial morphology, and especially MMP (Fig. 1)"

      The data shown though indicates that a defect in mtFAS in terms of MMP, deletion of SIT4 causes a huge increase (and departure away from normality) whether or not mct1 is present (Fig 1D)

      We changed the word “normalized” to “reversed”. In the discussion section, we also emphasized that many of these increases are independent of mitochondrial dysfunction induced by loss of mtFAS.

      The language "SIT4 is required for both the positive and negative transcriptional regulation elicited by mitochondrial dysfunction" feels strong. SIT4 seems to influence positive transcriptional regulation in response to mitochondrial dysfunction caused by MCT1 deletion (but may not be the only thing as there appears to be an increase in CIT2 expression in a sit4del background following a further deletion of MCT1). In terms of negative regulation, SIT4 deletion clearly affects the baseline, but MCT1 deletion still causes down regulation of both examples shown in Fig 1B, showing that negative transcriptional regulation can still occur in the absence of SIT4. The authors might consider showing fold change of expression as they do in later figures (Figs 4B and C) to help the reader evaluate the quantitative changes they demonstrate.

      We now displayed the fold change as suggested. This sentence now reads “These data suggest that SIT4 positively and negatively influences transcriptional regulation elicited by mitochondrial dysfunction”.

      The authors induce phosphate starvation by adding increasing amounts of potassium phosphate monobasic at a pH of 4.1 to phosphate dropout media supplemented with potassium. The authors did well to avoid confounding effects of removing potassium. The final pH of YNB is typically around 5.2. Is it possible that the authors are confounding a change in pH with phosphate starvation? One would expect the media in the phosphate starvation condition to have a higher pH than the phosphate replacement or control media. Is a change in pH possibly a confounding factor when interpreting phosphate starvation? Perhaps the authors could quantify the pH of the media they use for the experiment to understand how much of a factor that could be. One needs to be careful with Miotracker and any other fluorescent dye when pH changes. Albeit having constraints on its own, MitoLoc as a protein rather than small molecule marker of MMP might be a good complement.

      We followed the protocol used by many other studies that depleted phosphate in the media. The reason we and others adjusted the media without inorganic phosphate to a pH of 4.1 is because that is the pH of phosphate monobasic. From there, we could add phosphate monobasic to create +Pi media without changing the media pH. Therefore, media containing different concentrations of phosphate all have the exact same pH. We now emphasize that all media containing different levels of inorganic phosphate have the same pH to the manuscript to eliminate such concern (see page 18).

      Even though all media have the similar pH, we also provided complementary data using a parallel approach to measure the MMP by assessing mitochondrial protein import as demonstrated previously with Ilv2-FLAG, which shares the same principle as mitoLoc.

      Reference

      Arndt, K. T., Styles, C. A., & Fink, G. R. (1989). A suppressor of a HIS4 transcriptional defect encodes a protein with homology to the catalytic subunit of protein phosphatases. Cell, 56(4), 527–537. https://doi.org/10.1016/00928674(89)90576-X

      Dimmer, K. S., Fritz, S., Fuchs, F., Messerschmitt, M., Weinbach, N., Neupert, W., & Westermann, B. (2002). Genetic basis of mitochondrial function and morphology in Saccharomyces cerevisiae. Molecular Biology of the Cell, 13(3), 847–853. https://doi.org/10.1091/mbc.01-12-0588

      Gupta, R., Walvekar, A. S., Liang, S., Rashida, Z., Shah, P., & Laxman, S. (2019). A tRNA modification balances carbon and nitrogen metabolism by regulating phosphate homeostasis. ELife, 8, e44795. https://doi.org/10.7554/eLife.44795

      Jablonka, W., Guzmán, S., Ramírez, J., & Montero-Lomelí, M. (2006). Deviation of carbohydrate metabolism by the SIT4 phosphatase in Saccharomyces cerevisiae. Biochimica et Biophysica Acta (BBA) - General Subjects, 1760(8), 1281–1291. https://doi.org/10.1016/j.bbagen.2006.02.014

      Liu, N.-N., Flanagan, P. R., Zeng, J., Jani, N. M., Cardenas, M. E., Moran, G. P., & Köhler, J. R. (2017). Phosphate is the third nutrient monitored by TOR in Candida albicans and provides a target for fungal-specific indirect TOR inhibition. Proceedings of the National Academy of Sciences, 114(24), 6346–6351. https://doi.org/10.1073/pnas.1617799114

      Sutton, A., Immanuel, D., & Arndt, K. T. (1991). The SIT4 protein phosphatase functions in late G1 for progression into S phase. Molecular and Cellular Biology, 11(4), 2133–2148.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1)The authors demonstrate that Isw1 has a role in responding to antifungals in Cryptococcus. However, it is not clear if changes in Isw1 stability represent a general response to stress. This study would have benefited from experiments to test: (1) if levels of Isw1 change in response to other stressors (e.g., heat, osmotic, or oxidative stress) and (2) if loss of Isw1 impacts resistance to other stressors.

      A series of experiments were conducted to illustrate and measure phenotypic traits associated with virulence. These traits encompassed capsule formation, melanin synthesis, cell proliferation under stressful conditions, and Isw1 expression levels in response to diverse environmental stimuli. Please see Figure 3a, 3b, 3c, Figure 3-figure supplement 1 and line 237-241.

      2) The authors demonstrate a critical role in the acetylation of K97 and ubiquitination of K441 in regulating Isw1 stability. Additionally, this study shows that K113 is also likely involved in this process. However, it appears that K113 can be either acetylated or ubiquitinated, and it is, thus, less clear if one of the two modifications or both modifications is critical at this residue. Additional experiments may be required to answer this question. This study would have benefited from an additional discussion on the results related to the modification of K113.

      We express our genuine gratitude for this insightful critique pertaining to the K113 site. In our study, we observed the presence of acetylation and ubiquitination changes at the K113 site in our mass spectrometry data. This finding suggests that a proportion of Isw1 is acetylated, while another proportion of Isw1 is ubiquitinated. In order to analyze the K113 function, a series of experiments were conducted, involving the production of triple, double, and single mutations at positions K89, K97, and K113. In addition, the utilization of K-to-R (mimicking deacetylation) and K-to-Q (mimicking acetylation) methodologies was implemented. To elucidate the significance of the acetylation modification of K113, a series of mutants were created. The K-to-R mutation was employed to indicate the deacetylation and deubiquitylation status, while the K-to-Q mutation was utilized to represent the acetylation and deubiquitylation status. In our dataset, it was shown that neither the single mutation of K113 K-to-R nor K-to-Q exhibited any discernible drug resistance phenotype. This finding suggests that, within the physiological context of the Isw1 protein, both post-translational modifications (PTMs) of K113 had minimal or no impact on the regulation of drug resistance. The reason for this phenomenon is because the acetylation modification of K97 imitates the process of ubiquitination of Isw1, hence reducing the interaction between Isw1 and Cdc4, which is an E3 ligase. Hence, the ubiquitination of K113 does not play a crucial role in the regulation of Isw1 protein stability under conditions where K97 is completely acetylated. Nevertheless, upon deacetylation of K97, we observed a notable increase in the abundance of Isw1 protein when K113 is substituted with R. This finding strongly supports the notion that ubiquitination of K113 plays a crucial role in maintaining the stability of the Isw1 protein. Hence, in the case of K97 acetylation, the PTM modifications of K113 are not required for maintaining Isw1 protein levels. However, in the event of K97 deacetylation, the ubiquitination of K113 becomes crucial in regulating protein stability. Considering the intricate post-translational modification (PTM) regulation observed at the K113 site, it would be advantageous to generate antibodies specific to K113ac and K113ub in order to comprehensively investigate the functional role of K113 in the regulatory processes. Nevertheless, the presence of antibodies targeting site-specific ubiquitination is infrequent in scientific literature. We regret any confusion that may have arisen from the previous remark and have made revisions to the manuscript to address this issue. Please refer to line 485-500.

      3)The authors demonstrate that overexpression of ISW1 in select clinical isolates of Cryptococcus increases sensitivity to antifungals. However, these experiments would have benefited from additional controls, such as including overexpression of ISW1 in the wild-type strain (H99) and antifungal-sensitive isolate (CDLC120).

      In response to your concern, we successfully generated the strains as required. In the revised manuscript, we demonstrated that the overexpression of the stable variant of Isw1 in H99 and CDLC120 strains induces heightened susceptibility to antifungal drugs. Please see Figure 8e, 8i and line 404-413.

      Reviewer #3 (Public Review):

      1) ISWI chromatin remodellers are well-characterised in many organisms. How many ISWI proteins does Cryptococcus contain? Why did the authors focus on ISWI?

      We express our gratitude for this criticism. The identification of Isw1 was conducted as a further investigation building upon the findings presented in our previously published data (Li Y, 2019). In prior research, the acetylome in C. neoformans was comprehensively analyzed, and a series of knockout strains were created to investigate the relationship between fungal pathogenicity and acetylation. The Isw1 mutant has been discovered as a modifier of drug resistance. The identification of fungal paralogs of ISW genes was initially observed in Saccharomyces cerevisiae, a species of yeast that has experienced genome duplication. This process involves two paralogs, Isw1 and Isw2, which emerged as a result of the whole genome duplication event (Kellis M, 2004; Tsukiyama T, 1999; Wolfe KH, 1997). Because C. neoformans has not gone through the complete genome duplication event, its genome only encodes one copy of ISW gene. Please see line 129-134..

      2) What is the ISWI protein complex(es)? The Mass-Spec analysis should reveal this.

      Prior research conducted on Saccharomyces cerevisiae has provided evidence that the ISWI complex is comprised of several subunits, namely Isw1, Ioc genes, Itc1, Chd1, and Sua7 (Mellor J, 2004; Smolle M, 2012; Sugiyama and Nikawa, 2001; Vary JC Jr, 2003; Yadon AN, 2013). Upon a thorough examination of the C. neoformans genome, we have not been able to identifying a similar the IOC gene family. This absence likely suggests an evolutionary loss of the IOC gene family in C. neoformans, as suggested on the FungiDB website. However, C. neoformans has Itc1, Chd1, and Sua7. While we concur with the aforementioned statement on the capability of Mass-Spec data to elucidate potential protein-protein interactions and aid in the identification of subunits within the ISWI complex, it is important to acknowledge that the PTM Mass-Spec methodology is solely employed for the purpose of identifying potential sites of protein modification. In order to comprehensively investigate the cryptoccocal ISWI complex, we conducted a standardized Isw1-Flag protein immunoprecipitation procedure, followed by Mass-Spec analysis. In the present study, a total of 22 proteins that interact with Isw1 were found in our experimental data. Among these proteins, 11 have been previously reported to be associated with the regulatory networks including Isw1. In the mass spectrometry results, the protein Itc1 was found to be co-immunoprecipitated with the protein Isw1. Although the Mass-Spec analysis did not reveal the presence of Chd1 and Sua7, our study demonstrated that Chd1 can be coimmunoprecipitated with Isw1 through the utilization of co-IP and immunoblotting techniques. However, no interaction between Isw1 and Sua7 was shown utilizing any of these methods. In brief, cryptococcal ISWI regulatory machinery is distantly related to that from S. cerevisiae. Please see Figure 2 and line 206-219.

      3) Is Cryptococcus ISWI a transcriptional activator or repressor?

      We regret the erroneous representation of Isw1 in the prior iteration of the manuscript. The misclassification of Isw1 as a transcriptional regulator has been identified, since it has been determined to function as a chromatin remodeler instead. The text has been suitably revised in accordance with academic standards. In the revised publication, we have presented a comprehensive transcriptome analysis of the isw1 Δ strain under both FLC treatment and no treatment conditions. This analysis offers valuable insights into the gene regulatory patterns associated with Isw1. In our dataset, we observed that Isw1 exerts a negative regulatory effect on the expression of genes that encode drug pumps, while simultaneously exerting a positive regulatory effect on the expression of genes that are essential for 5-FC resistance. Moreover, the ChIP-PCR study demonstrated the binding of Isw1 to the promoter regions of genes of interest. Hence, the chromatin remodeler Isw1 has a dual role, wherein it both facilitates the activation of certain genes and suppresses the expression of others, in response to varying forms of drug resistance. Please see line 142-153.

      4) Is ISWI function in drug resistance linked to its chromatin remodelling activity?

      In order to investigate the potential role of Isw1 on chromatin activity in the modulation of multidrug resistance, we have conducted protein truncation experiments. Specifically, we deleted the DNA binding domain, the helicase domain, and the SNF2 domain, which have been previously shown to regulate Isw1 chromatin activity in the model organism S. cerevisiae (Grune T, 2003; Mellor J, 2004; Pinskaya M, 2009; Rowbotham SP, 2011). The new data demonstrated that all truncation variants of Isw1 mutants had a growth phenotype consistent with that of the deletional strain isw1Δ. In addition, the levels of gene expression observed in these strains were also similar to those observed in the deletion strain isw1Δ. This finding provides evidence that the regulation of the drug resistance mechanism is influenced by these critical domains involved in modifying chromatin activities. Moreover, the Isw1-Flag strain was utilized to conduct chromatin immunoprecipitation and PCR experiments, which revealed that Isw1 exhibits the ability to directly bind to the promoter regions of target genes. The new findings added evidence substantially supporting the hypothesis that the Isw1 chromatin activity plays a crucial role in modulating its protein function, and acting as a central regulator of drug resistance in C. neoformans. Please see revised Figure 1g, 1h, 1i and line 186-199 in the revised manuscript text.

      5) Does ISWI interact with chromatin? If so, which are ISWI-target genes? Does drug treatment modulate chromatin binding?

      To effectively tackle this concern, we have pursued two distinct approaches to demonstrate the chromatin regulatory effects of Isw1. In this study, the DNA binding domain was deliberately removed through genetic manipulation. The data presented indicates that the Isw1 mutants with shorter variations exhibited a growth phenotype that was characterized by multidrug resistance. This growth phenotype correlates with the growth phenotype obtained in the isw1Δ deletion strain. Additionally, it was observed that the levels of gene expression in the strain were comparable to those detected in the deletion strain isw1Δ. This discovery offers empirical support for the notion that the control of the drug resistance mechanism is indeed impacted by the DNA binding capability of Isw1. Furthermore, the Isw1-Flag strain was employed to perform chromatin immunoprecipitation and PCR assays, demonstrating the direct binding capacity of Isw1 to the promoter regions of target genes. The results obtained from this comprehensive analysis of the revised data offer significant evidence for the proposition that Isw1 interacts with chromatin and that its chromatin activity plays a pivotal role in modulating its protein function. This interaction serves as a central regulatory mechanism for drug resistance in C. neoformans. Furthermore, a transcriptome analysis was performed on both wildtype and isw1 deletion strains in the absence of FLC therapy. Upon comparing the results obtained from two unique experimental settings, specifically those with and without FLC administration, a notable disparity in the control of gene expression between these two situations was identified. In the context of the isw1 deletion strain exposed to FLC treatment, a set of 21 genes, including those belonging to the ABC/MFS family and efflux pumps, displayed significant changes in their gene expression patterns. In particular, a total of 9 genes exhibited downregulation, whilst 12 genes displayed upregulation. In contrast, in the absence of FLC supplementation, a total of 9 genes exhibited alterations in gene expression, with 3 genes showing downregulation and 6 genes showing upregulation. Therefore, the Isw1 protein plays a crucial role in the activation of certain genes, while simultaneously having a suppressive effect on other genes. Hence, the Isw1 undergoes a reconfiguration of its regulatory apparatus in response to drugs. Despite that the performance of ChIP-seq analysis was necessary in this study, it was observed that the treatment of fungal cells resulted in a notable decrease in the abundance of the Isw1 protein. This decrease can be attributed to the activation of Isw1 protein degradation. Consequently, there was an insufficient amount of Isw1 protein available for successful enrichment and subsequent ChIP-seq analysis (please see Figure 4a and 4c). However, the data collected collectively have demonstrated the idea that Isw1 serves as a crucial master regulator of drug resistance in C. neoformans. The text has undergone revisions in order to present our findings in a precise and thorough manner. Please see Figure 1c, 1g, Supplementary File 2, and line 145-153, 186-188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Multimodal experiences that for example contain both visual and tactile components are encoded as associative memories. This manuscript is a valuable contribution supporting structural and functional brain plasticity following associative training protocols that pair together different types of sensory stimuli. The results provide solid support for this plasticity being a basis for cross-modal associative memories.

      We appreciate eLife assessments to our discovery about the recruitment of associative memory neurons in cerebral cortices as a hub for the fulfillment of the first order and the second order of associative memory. Synapse interconnections among associative memory neurons mediate the reciprocal retrieval, the conversion and the translation of associated signals learnt in life span.

      Reviewer #1 (Public Review):

      This manuscript by Xu and colleagues addresses the important question of how multi-modal associations are encoded in the rodent brain. They use behavioral protocols to link stimuli to whisker movement and discover that the barrel cortex can be a hub for associations. Based on anatomical correlations, they suggest that structural plasticity between different areas can be linked to training. Moreover, they provide electrophysiological correlates that link to behavior and structure. Knock-down of nlg3 abolishes plasticity and learning. This study provides an important contribution as to how multi-modal associations can be formed across cortical regions.

      We sincerely thank Reviewer one’s comments, which is a great driving force for us to move forward to reveal the specific roles of neural circuits in associative memory and its relevant cognitive activities and emotional reactions.

      Reviewer #2 (Public Review):

      This manuscript by Xu et al. explores the potential joint storage/retrieval of associated signals in learning/memory and how that is encoded by some associative memory neurons using a mouse model. The authors examined mouse associative learning by pairing multimodal mouse learning including olfactory, tactile, gustatory, and pain/tail heating signals. The key finding is that after associative learning, barrel neurons respond to other multi-model stimulations. They found these barrel cortical neurons interconnect with other structures including piriform cortex, S1-Tr and gustatory cortical neurons. Further studies showed that Neuroligin 3 mediated the recruitment of associative memory neurons during paired stimulation group. The authors found that knockdown Neuroligin 3 in the barrel cortex suppressed the associative memory cell recruitment in the paired stimulation learning. Overall, while the findings of this study are interesting, the concept of associative learning involving multiple functionally connective cortical regions is not that novel. While some data presented are convincing, the other seems to lack rigor. In addition, more details and clarification of the experimental methods are needed.

      Thank you so much for your comments on our studies in terms of the recruitment of associative memory neurons as the hub for the joint storage and reciprocal retrieval of multi-modal associated signals. You are right about that the concept of associative memory neuron and the new established interconnection among cerebral cortices for the formation of associative memory are not novel. The original finding has been reported by senior author’s lab many years ago, which has also been presented in a book by Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019. In addition, we have made certain clarifications in our revision, but the detailed information about experimental approaches and concepts are expected to be seen in our previous publications and this book as well.

      Reviewer #1 (Recommendations For The Authors):

      I have two points that I find would strengthen the manuscript further:

      1. Associative memories are also based on specificity, which is not addressed in this manuscript. The authors could discuss this and also the magnitude of plasticity. In general, I would suggest also testing plasticity in response to a non-linked stimulus to prove specificity.

      This a good point. In terms of the specificity of associative memory in our model, we have shown this point in our previous studies, such as Wang, et al. “Neurons in the barrel cortex turn into processing whisker and odor signals: a cellular mechanism for the storage and retrieval of associative signals”. Frontiers in Cellular Neuroscience 9-320:1-17 2015, and Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019.

      1. Nlg3 knock-down is a strong intervention. The authors could discuss the implications of interfering with synapse assembly and mechanistic implications at the synaptic level. It could help to compare the consequences of this intervention to a post-training lesion.

      This is a good point. To prevent the possibility of post-training lesion by the intervention of Nlg3 knockdown, we have conducted the use of shRNA-scramble control. In addition, the discussion about the intervention of Nlg3 knockdown at synapse level has been added in our discussion.

      1. In general, the clarity of the wording in some sections/sentences could be improved.

      The rewording of certain sentences has been done in our revision.

      Reviewer #2 (Recommendations For The Authors):

      1. The writing of the manuscript needs major editing, there are grammatical errors even in the title. The extremely long introduction and discussion section with repeated details can be distracting from the main focus of the work.

      This point has been taken during our revision.

      1. Many bar graphs, such as Figure 5C and 5G, Figure 6C-6G, have low-resolution images, meaning that the axis titles and labels are unreadable.

      The resolution of Figures have been improved in our revision.

      1. The bar graph with data points and illustration in Figure 1E and 1G are misplaced.

      This mistake has been corrected in our revision.

      1. On page 23, Figure 2B, which layer(s) of the PC, S1Tr and GC were the images taken from? In the PSG group, why is there no red axon terminal signal observed in the three regions? does it indicate that there is no significant projection from the BC axon to PC, S1Tr, or GC neurons? Given that Thy1-YFP labeled glutamatergic neurons at PC, S1Tr, and GC and there is no discernable co-localization of yellow and green cells, can we assume that the glutamatergic neurons at PC, S1Tr, and GC are not involved in the associative learning after PSG paradigm? Lastly, the number of synapse contacts in Figure 2E is only 1-2 per 100um dendrite, but this is not quite consistent with the confocal images in Figure 2D. In Figure 2D, there are at least three tdTomato boutons on the cropped dendrite which is ~16um according to the scale bar.

      If we magnify Figure 2B, we are able to see red boutons, which can be seen in Figure 2C with a higher magnification. In addition, the distribution of synapse contacts is variable, we have demonstrated the averaged values of synapse contacts over dendrites in Figure 2E, such that the single original image may not exactly same as the statistical data.

      1. Figure 4C and Figure 8C, how were the percentages of associative neurons calculated after LFP recording? More details are needed on the method of this in vivo LFP/single unit recordings, including the spike sorting algorithm.

      In the section of Results, the total number of neurons recorded in each of groups has been given. For instance, the neurons recorded from PSG mice (Figure 4) were 70, which was used as denominator. With the number of neurons that responded to two or more signals, the percentage of associative memory neurons recruited in associative learning was calculated. This information has been added in our revision (please see the section of Results).

      1. The rationale for the authors choosing Neuroligin 3 as the target for investigating the formation of new synapse interconnections between BC, PC, S1Tr, and GC after PSG should be more clearly spelled out. Synaptic CAMs include SynCAM, NCAM, Neurexin, Cadherin et al all play a role in new synapse formation. Neuroligin 1 is expressed specifically in the CNS at excitatory synapses. Why did the authors choose to study Neuroligin 3 instead of Neuroligin 1?

      This is a good point. Based on our previous data, miRNA-324 is upregulated during the associative learning by our mouse model, which degrades neuroligin-3 mRNA. The role of neuroligin-3 in the formation of new synapses and the recruitment of associative memory neurons is studied in this paper.

      1. The behavioral results in Figure 5B-5G indicated that after pair-stimulation of WS-OS, WS-TS, or WS-GS, the memory learned in piriform, S1-Tr and gustatory cortical neurons can be retrieved from each other, by jumping over the barrel cortex. Is it possible that there is some direct interconnection formed between piriform, S1-Tr, and gustatory cortical neurons? Maybe they can try to do barrel cortical lesion or chemogenetic inhibition after PGS training and then repeat the behavioral tests as in Figure 5B-5G.

      We have done experiments to examine the potential direct interconnection among piriform, S1-Tr and gustatory cortical neurons, after the associative learning about twelve days. We have no convincing data to support this possibility at this moment.

      1. Some of the images showing the location of virus injections look VERY similar, such as Figure 3A left and right, Figures 7A and 7D. Larger variability of different animals/injection sites is definitely expected.

      The injected viruses in Figure 3 and Figure 7 are different, since AAV-carried fluorescent proteins in different cortical areas are different. In addition, if we carefully enlarge the images in the right and left panels of Figure 3A, we will see that the areas of AAV transfection in morphology are different. The similarity of injection areas as Reviewer two claimed indicates the more precision of our virus-injection sites.

      1. On page 49, are the green neurons in Figure 9B the BC cells? Just to be consistent, the authors should use the same color for BC cells as in Figure 9A. Also, label the primary and the secondary associative memory cells in Figure 9.

      Figure 9 has been thoroughly changed in our revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Soudi, Jahani et al. provide a valuable comparative study of local adaptation in four species of sunflowers and investigate the repeatability of observed genomic signals of adaptation and their link to haploblocks, known to be numerous and important in this system. The study builds on previous work in sunflowers that have investigated haploblocks in those species and on methodologies developed to look at repeated signals of local adaptations. The authors provide solid evidence of both genotype-environment associations (GEA) and genome-wide association study (GWAS), as well as phenotypic correlations with the environment, to show that part of the local adaptation signal is repeatable and significantly co-occur in regions harboring haploblocks. Results also show that part of the signal is species specific and points to high genetic redundancy. The authors rightfully point out the complexities of the adaptation process and that the truth must lie somewhere between two extreme models of evolutionary genetics, i.e. a population genetics view of large effect loci and a quantitative genetics model. The authors take great care in acknowledging and investigating the multiple biases inherent to the used methods (GEA and GWAS) and use a conservative approach to draw their conclusions. The multiplicity of analyses and their interdependence make them slightly hard to understand and the manuscript would benefit from more careful explanations of concepts and logical links throughout. This work will be of interest to evolutionary biologists and population geneticists in particular, and constitutes an additional applied example to the comparative local adaptation literature.

      Some thoughts on the last paragraph of the discussion (L481-497): I think it would be fine to have some more thoughts here on the processes that could contribute to the presence/absence of inversions, maybe in an "Ideas and Speculation" subsection. To me, your results point to the fact that though inversions are often presented as important for local adaptation, they seem to be highly contingent on the context of adaptation in each species. First, repeatability results are only at the window/gene level in your results, the specific mutations are not under scrutiny. Is it possible that inversions are only necessary when sets of small effect mutations are used, opposite to a large effect mutation in other species? Additionally, in a model with epistasis, fitness effects of mutations are dependent on the genomic background and it is possible that inversions were necessary in only certain contexts, even for the same mutations, i.e. some adaptive path contingency. Finally, do you have specific demographic history knowledge in this system that maps to the observations of the presence of inversions or not? For example, have the species "using" inversions been subject to more gene flow compared to others?

      Thank you for the great suggestions and helpful comments. Regarding the question of demography, each of the species actually harbours quite a large number of haploblocks (13 in H. annuus spanning 326Mb, 6 in H. argophyllus spanning 114 Mb, and 18 in H. petiolaris spanning 467 Mb; see Todesco et al. 2020 for more details) so there does not seem to be any clear association with demography. We agree about the complexities that might underly the evolution of inversions that you outline above, and have refined some of the text where we discuss their evolution in the Discussion.

      Reviewer #2 (Public Review):

      In this study the authors sought to understand the extent of similarity among species in intraspecific adaptation to environmental heterogeneity at the phenotypic and genetic levels. A particular focus was to evaluate if regions that were associated with adaptation within putative inversions in one species were also candidates for adaptation in another species that lacked those inversions. This study is timely for the field of evolutionary genomics, due to recent interest surrounding how inversions arise and become established in adaptation.

      Major strengths

      Their study system was well suited to addressing the aims, given that the different species of sunflower all had GWAS data on the same phenotypes from common garden experiments as well as landscape genomic data, and orthologous SNPs could be identified. Organizing a dataset of this magnitude is no small feat. The authors integrate many state-of-the-art statistical methods that they have developed in previous research into a framework for correlating genomic Windows of Repeated Association (WRA, also amalgamated into Clusters of Repeated Association based on LD among windows) with Similarity In Phenotype-Environment Correlation (SIPEC). The WRA/CRA methods are very useful and the authors do an excellent job at outlining the rationale for these methods.

      Thank you!

      Major weaknesses

      The study results rely heavily on the SIPEC measure, but I found the values reported difficult to interpret biologically. For example, in Figure 4 there is a range of SIPEC from 0 to 0.03 for most species pairs, with some pairs only as high as ~0.01. This does not appear to be a high degree of similarity in phenotype-environment correlation. For example, given the equation on line 517 for a single phenotype, if one species has a phenotype-environment correlation of 1.0 and the other has a correlation of 0.02, I would postulate that these two species do not have similar evolutionary responses, but the equation would give a value of (1+0.02)10.02/1 = 0.02 which is pretty typical "higher" value in Figure 4. I also question the logic behind using absolute values of the correlations for the SIPEC, because if a trait increases with an environment in one species but decreases with the environment in another species, I would not predict that the genetic basis of adaptation would be similar (as a side note, I would not question the logic behind using absolute correlations for associations with alleles, due to the arbitrary nature of signing alleles). I might be missing something here, so I look forward to reading the author's responses on these thoughts.

      The reviewer makes a very good point about the range of SIPEC, and we have changed our analysis to reflect this, now reporting the maximum value of SIPEC for each environment (across the axes of the PCA on phenotypes that cumulatively explain 95% of the variance), in Figure 4 and Supplementary Figures S2 and S13. For consistency among manuscript versions and to illustrate the effect of this change, we retain the mean SIPEC value in one figure in the supplementary materials (S12), which shows the small effect of this change on the qualitative patterns. Figure 4 now shows that the maximum SIPEC value is regularly quite strong, which should address the reviewer’s concern that this is not being driven by anomalous and small values. We appreciate this point and think this change now more closely reflects how we are trying to estimate the biological feature of interest – that some axis of phenotypic space is strongly (or not) responding to selection from the environmental variable.

      With respect to the logic behind using absolute value, we still feel this is justified for traits, because if a trait evolves to be bigger or smaller, it may still use the same genes. For example, flowering time may change to be later or earlier, which would result in opposite correlations with a given environment, but might use the same gene (e.g. FT) for this. As such, we think keeping absolute value is more representative as otherwise species with strong but opposite patterns of adaptation would look like they were very different. We have added a statement on line 584 in the methods section to further clarify the reason for this choice.

      An additional potential problem with the analysis is that from the way the analysis is presented, it appears that the 33 environmental variables were essentially treated as independent data points (e.g. in Figure 4, Figure 5). It's not appropriate to treat the environmental variables independently because many of them are highly correlated. For example in Figure 4, many of the high similarity/CRA values tend to be categorized as temperature variables, which are likely to be highly correlated with each other. This seems like a type of pseudo replication and is a major weakness of the framework.

      This is a good point and we fully agree. It is for this reason that we didn’t present any p-values or statistical tests of the overall patterns that are shown in these figures (i.e. the linear relationship between SIPEC and number of CRAs in figure 4 and the tendency for most points to fall above the 1:1 line in figure 5). But to make sure this is even more clear, we have added statements to the captions of these figures to remind readers that points are non-independent. We still feel that in the absence of a formal test, the overall patterns are strongly consistent with this interpretation. A smaller number of non-pseudo-replicated points in Figure 4 would still likely show linear patterns. Similarly, there are almost no significant points falling below the 1:1 line in Figure 5, and it seems unlikely that pseudoreplication would generate this pattern.

      Below I highlight the main claims from the study and evaluate how well the results support the conclusions.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments" (abstract)<br /> Given the questions above about SIPEC, I did not find this conclusion well supported with the way the data are presented in the manuscript.

      We have changed the reporting of the SIPEC metric so that it more clearly reflects whichever axis of phenotypic space is most strongly correlated with environment in both species (using max instead of mean). This shows similar qualitative patterns but illustrates that this happens across much higher values of SIPEC, showing that it is in fact driven by high correlations in each species (or non-similar correlations resulting in low values of SIPEC). While we agree about the pseudo-replication problem preventing formal statistical test of this hypothesis, the visual pattern is striking and seems unlikely to be an artefact, so we think this does still support this conclusion.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments, which are particularly enriched within regions of the genome harbouring an inversion in one species. " (Abstract) And "increased repeatability found in regions of the genome that harbour inversions" (Discussion)<br /> These claims are supported by the data shown in Figure 4, which shows that haploblocks are enriched for WRAs. I want to clarify a point about the wording here, as my understanding of the analysis is that the authors test if haploblocks are enriched with WRAs, not whether WRAs are enriched for haploblocks. The wording of the abstract is claiming the latter, but I think what they tested was the former. Let me know if I'm missing something here.

      We are actually not interested in whether WRAs are enriched for haploblocks; we want to know if WRAs tend to occur more commonly within haploblocks than outside of them. We have tried to clarify that this is our aim in various places in the manuscript. Our analysis for Figure 5 is the one supporting these claims, and it uses the Chi-square test statistic to assess the number of WRAs and non-WRAs that fall within vs. outside of inversions, and a permutation test to assess the significance of this observation, for each environmental variable and phenotype. We don’t think that this test has any direction to it – it’s simply testing if there is non-random association between the levels of the two factors. Thus, we think the wording we have used is consistent with the test result and our aims. Perhaps the confusion arose from the two methods that we present in the Methods (one is used for Figure 5, the other for Figure S6C & D), so we have added clarifications there.

      Notwithstanding the concerns about highly correlated environments potentially inflating some of the patterns in the manuscript, to my knowledge this is the first attempt in the literature to try this kind of comparison, and the results does generally suggest that inversions are more likely capturing, rather than accumulating adaptive variation. However, I don't think the authors can claim that repeated signatures are enriched with haploblock regions, and the authors should take care to refrain from stating the relative importance of different regions of the genome to adaptation without an analysis.

      Actually, we don’t have a strong feeling about whether inversions are capturing vs. accumulating adaptive variation, as these results could be consistent with either. As described above, we do not understand why we can’t claim that repeated signatures are enriched within haploblocks. We thought the reviewer is perhaps referring to the fact that the points are pseudo-replicated in the figures due to environment? We note that a very large number of points are significantly different from random in terms of the distribution of WRAs within vs. outside of haploblocks (light- vs. dark-shaded symbols), and that almost all of them fall above the 1:1 line. While there may be pseudo-replication preventing a test of the bigger multi-environment/multi-species hypothesis across all phenotypes and environments, there is almost a complete lack of significant results in the other direction. This seems like quite strong evidence about enrichment of WRAs within haploblocks, across many environments/species contrasts. We have added some text to the description of patterns in figure 5 to try to clarify this.

      "While a large number of genomic regions show evidence of repeated adaptation, most of the strongest signatures of association still tend to be species-specific, indicating substantial genotypic redundancy for local adaptation in these species." (Abstract)<br /> Figure 3B certainly makes it look like there is very little similarity among species in the genetic basis of adaptation, which leaves the question as to how important the repeated signatures really are for adaptation if there are very few of them. (Is 3B for the whole genome or only that region?). This result seems to be at odds with the large number of CRAs and the claims about the importance of haploblock regions to adaptation, which extend from my previous point.

      Figure 3B is for the whole genome, we have added text to the figure caption to clarify this. We think that both interpretations are possible: that most of the regions of the genome that are driving adaptation are non-repeated, but that a small but significant proportion of regions driving adaptation are repeated above what would be expected at random. Thus, it seems that there is high redundancy, coupled with adaptation via some genes that seem particularly functionally important and non-redundant, and therefore repeated. We added clarifying text on lines 541-548.

      "we have shown evidence of significant repeatability in the basis of local adaptation (Figure 4, 5), but also an abundance of species-specific, non-repeated signatures (Figure 3)"<br /> While the claim is a solid one, I am left wondering how much of these genomes show repeated vs. non-repeated signatures, how much of these genomes have haploblocks, and how much overlap there really is. Finding a way to intuitively represent these unknowns would greatly strengthen the manuscript.

      We agree, and really struggled to find the best way to communicate both the repeated patterns and the large amount of non-repeated signatures. Unfortunately, we have more confidence in the validity of repeated patterns because for the non-repeated patterns, a strong signature of association to environment in only one species could just be the product of structureenvironment correlation, as we didn’t control for population structure. Thus, trying to quantify the proportion of non-repeated signatures is difficult to do with any accuracy and we preferred to avoid putting too much emphasis on the simple calculation of the proportion of top candidate windows that were also WRAs.

      Overall, I think the main claims from the study, the statistical framework, and the results could be revised to better support each other.

      Although the current version of the manuscript has some potential shortcomings with regards to the statistical approaches, and the impact of this paper in its present form could be stifled because the biology tended to get lost in the statistics, these shortcomings may be addressed by the authors.

      With some revisions, the framework and data could have a high impact and be of high utility to the community.

      Thank you for your very helpful comments and suggestions on our paper, we really appreciate it.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Editor's comments:

      The reviewers make a series of reasonable suggestions that I echo. I found the paper quite hard to follow, and got fairly lost in the various layers of analyses done. Partially, this represents the complexity of empirical genomic data, which rarely deliver simple stories of convergence at a few genes. However, the properties of the various statistics used to detail local adaptation and convergence are not particularly clear and the figures presented were not intuitive representations of the data. This leaves the reader with an incomplete view of how much weight to put in the various lines of evidence marshaled. I would suggest simplifying the presentation of the results considerably. I add a few additional comments below.

      Great suggestion, we’ve added a schematic overview of the methods and main research questions to Figure S1 in the supplementary materials.

      A figure would help showing some of the signals of SNPs with putative signals of convergent environmental correlations across species, e.g. frequencies plotted against climate variables. This would help readers get a sense of how strong these signals were. These could be accompanied by the statistics calculated for these SNPs, that would allow the reader to start to get some intuitive sense of what the numbers mean.

      Great suggestion, we have added a schematic overview of the methods to Figure S1 that shows some of the values and illustrates how the methods work using visual examples from our data.

      In general, the introduction and some of the discussion of the inversion results feel oddly framed:<br /> Abstract line 36: "This shows that while inversions may facilitate local adaptation, at least some of the loci involved can still make substantial contributions without the benefit of recombination suppression."

      We have changed “some of the loci involved can still make substantial contributions without the benefit of recombination suppression” here to “some of the loci involved can still harbour mutations that make substantial contributions without the benefit of recombination suppression in species lacking a segregating inversion” as it hopefully clarifies that we’re not talking about individual alleles that are present in both species.

      Models of the role of local adaptation in the establishment of inversions (Kirkpatrick & Barton) assume that there are multiple locally adapted alleles already present. It is the load created by these alleles being constantly maintained in the face of migration and subsequent recombination that allow an inversion to be selected for because it keeps together locally adapted alleles. Thus these models predict that there could well be standing local adaptation at these loci in the absence of the inversion in other species, and that these locally adapted alleles while not fixed may be at high frequency. (After establishment, inversions housing locally adapted alleles, can shield more weakly, locally beneficial alleles from migration allow other alleles to build up.) Empirically it's interesting to find signals of local adaptation in other species that don't contain putative inversions. But the logic of the different predictions is not particularly clear from the introduction, and only becomes somewhat clearer in the discussion.

      Thank you for pointing out this murkiness, we have re-written portions of both the Introduction and Discussion to clarify this aspect.

      From the introduction: Inversions have been implicated in local adaptation in many species (Wellenreuther and Bernatchez 2018), likely due to their effect to suppress recombination among inverted and noninverted haplotypes, and thereby maintain LD among beneficial combinations of locally adapted alleles (Rieseberg 2001; Noor et al. 2001; Kirkpatrick and Barton 2006). This has been approached by models studying the establishment of inversions that capture combinations of locally adapted alleles present as standing variation (e.g., Kirkpatrick and Barton 2006), as well as models examining the accumulation of locally adapted mutations within inversions (e.g., Schaal et al. 2022). If there is variation in the density of loci that can potentially contribute to local adaptation, inversions would be expected to preferentially establish and be retained in regions harbouring a high density of such loci (and this expectation would hold for both the capture and accumulation models). We would also expect to see stronger signatures of repeated local adaptation in such high density regions. Despite mounting evidence of their importance in adaptation, it is unclear how inversions may covary with repeatability of adaptation among species. A fundamental parameter of importance in these models is the relationship between migration rate and strength of selection on individual alleles, which may not make persistent contributions to local adaptation without the suppressing effects of recombination if selection is too weak (Yeaman and Whitlock 2011; Bürger and Akerman 2011). If most alleles have small effects relative to migration rate and can only contribute to local adaptation via the benefit of the recombination-suppressing effect of an inversion, then we would expect little repeatability at the site of an inversion – other species lacking the inversion would not tend to use that same region for adaptation because selection would be too weak for alleles to persist. On the other hand, if some loci are particularly important for local adaptation and regularly yield mutations of large effect, with these patterns being conserved among species, repeatability within regions harbouring inversions may be substantial. Thus, studying whether adaptation at the same genomic region harbouring an inversion is observed in other species lacking the inversion can give insights about the underlying architecture of adaptation, and the evolution and maintenance of inversions.

      From the Discussion: The observed repeatability associated with inversions further supports the local adaptation model as an explanation for the long-term persistence of segregating inversions (at least in sunflowers, rather than mechanisms based on dominance or meiotic drive (Rieseberg 2001). If there is variation across the genome in the density of loci with the potential to be involved in local adaptation, then the establishment and maintenance of inversions would be biased towards regions harbouring a high density such loci under this model. If the genomic basis for local adaptation is conserved amongst species, then these same regions are more likely to have high repeatability. Thus, our observation of genomic regions harbouring inversions also being enriched for WRAs is consistent with this general model for inversion evolution. Unfortunately, our observations do not provide much insight into whether inversions evolve through the capture (e.g. Kirkpatrick and Barton 2006) or accumulation (e.g. Schaal et al. 2022) type of model, as either model would be consistent with our results. Most of the sunflower inversions are >1 My old, and therefore predate any current local adaptation patterns, but likely do not predate the genes underlying local adaptation (which appear to be shared among the species we studied). As for the alleles underlying local adaptation, they may be younger than the inversions, but as our work suggests, these regions are prone to harbouring locally adaptive alleles so it is possible that they also harboured other ancestral locally adaptive alleles.

      As a minor comment, there's a fair number of places where a more nuanced view of the field is needed, e.g.:<br /> "Models in evolutionary genetics tend to focus on extremes: population genetic approaches explore cases where strong selection deterministically drives a change in allele frequency" --This seems like a strange strawman. Population genetic models span a huge parameter range. The empirical approaches of looking for sweeps by detecting genome-wide statistical outliers is predicated on strong selection, but there are numerous papers that have looked for signals of weak selection genome-wide.

      Good point, we have changed our wording here.

      Reviewer #1 (Recommendations For The Authors):

      Comments

      My main comment on the manuscript is that the different levels and diversity of analyses are slightly hard to follow on the first, and even second, read. As there are several layers of correlations and comparisons, as well as some independent analyses, I wonder if it might be helpful to have a summary schematic figure of how all analyses fit together.

      Great idea, we have added Figure S1 that summarizes the main flow of the methods and research questions.

      • L169-171: Would it be more accurate to say that SIPEC is maximized when both species have strong correlations for an environmental variable across the same phenotypes? But maybe I misunderstood the index.

      Good point, we have now simplified SIPEC, reporting the max instead of the mean, which we think better reflects when similar patterns are happening in both species for some phenotype.

      • L191: Given the discussion in the introduction and elsewhere about the correction for population structure, which version is used here? Same for Figure 3.

      We have added clarification there.

      • L348: One [environmental] variable?

      Added

      • L353: Maybe add a percentage indication for 387 so that it is comparable to the following 23.3%.

      Good point, added

      -> L388 and paragraph: You mention "significant repeatability" but it is hard from the results at this point to have a broad idea of the amount of signal that is repeatable. Would it be possible to add here some quantitative measure of the proportion of signal repeatable or not, even if approximated?

      I wish we could, but I think the precision implied by such an approximation would involve a huge amount of uncertainty and likely inaccuracy. Because it is so hard to conclusively identify how many loci are significant but non-repeated, we really don’t have a good handle on the denominator here. We are pretty confident that the repeated loci are strongly enriched for true positives, but the non-repeated loci are also almost certainly strongly enriched for false positives. While we really want to be able to quantify this explicitly, we don’t think it’s possible given our data.

      -L415-418: "If there is variation [...] involved in local adaptation", I do not follow this argument, could you rephrase?

      Changed

      -L447-450: As you say in the supplementary methods, your analyses exclude 3/4 of the genome. Do you think this choice has a large impact on the number of outliers observed here as the genome-wide baseline would change?

      This is a very good question, but one that is quite complex and without a clear answer – we chose not to delve into it in the paper to keep the discussion streamlined. My (SY) feeling is that it is unlikely that regions harbouring transposable elements would contribute much to adaptation, but I think we really don’t know if that is true. Even excluding ¾ of the genome harbouring TEs, ¼ of the genome still constitutes a huge amount of sequence and a very large number of genes and it seems plausible that most genes and genic regions would not contribute to adaptation for a given trait, so I don’t think this would change the results too much in a qualitative way – but would almost certainly change the number of windows that are significant, etc.

      • L455-457: "As we are unable [...] potentially important drivers" Could you provide the logical link here between loci of small effect and them being important drivers. I presume you mean that the large effect loci found here only account for a small proportion of the heritability?

      Yes that’s what we meant here, so we’ve added some clarification.

      • L482: "enriched within inversions" should that be 'in genomic regions where there exist inversions in at least one species'? Thanks for catching that, yes. Changed.

      • Methods/SIPEC L512: Compared to the Results section it is unclear here what is referred to as an "environment" Is it a variable or a set of environment variables?

      This is done per environmental variable.

      I find the presence of the PCA for environment variables in Figure 2 misleading as my first interpretation was that PCs for environment were also used.

      Good point, we have clarified this on line 190-193.

      Maybe one potential addition to the formula would be to add an environment variable $j$ notation such that it reads "$SIPEC_j = \sum_i (|r_{ij,1}| + ...) ...$ where ... between environment variable $j$". I had initial difficulties to understand how this SIPEC was computed relating to environmental variables and this might help.

      Given the other changes we made to SIPEC, we felt it was simpler to just present it as a single calculation on a given combination of phenotype and environment for a pair of species, and then discuss taking the mean and maximum of this later.

      Finally, PCA axes explaining 95% of the variance are used, I would find it interesting to see how many PCs are used in comparison to the number of traits being measured.

      We have added the following sentence to the methods describing this:

      "For comparisons including H. argophyllus, 95% of the variance was typically explained by 8-10 PC axes (out of 28 or 29 phenotypes), whereas for comparisons among other taxa this included 21 or 22 PC axes (out of 65 or 66 phenotypes."

      Typos

      L52: --

      Changed

      L254: portions [of] their

      Changed

      L399: additional closing parenthesis

      Changed

      L458: signatures [of] repeated association

      Changed

      L554: performed [on]

      Changed

      L578: 5 ~~kp~~/kb windows

      Changed

      L601: ~~casual~~/causal SNPs

      Changed

      L615: ~~widow~~/window

      Changed

      L732: ~~Banding~~/Banting Postdoctoral Fellowship

      Changed

      L1002 & L960: [Supplementary] Figure

      Changed

      Supplementary: Some figure titles are in bold and others are not.

      Changed

      Reviewer #2 (Recommendations For The Authors):

      Overall I found the writing to be very clear and easy to follow. Despite my comments, it was clear that a lot of thought went into how to conduct the tests and visualize the results. I recommend ending the Discussion on a positive note, rather than an impossible test.

      Thanks for the positive suggestion, we have done this.

      In Figure 5, is the temperature variable missing in the legend and in the plot?

      No, for this plot we just combined the temperature/precipitation variables into one variable called “climate”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The first major issue is related to the imaging and tracking experiment to examine the formation and migration of F-actin foci as illustrated in figure 3. The formation and centripetally migration of F-actin foci is a significant finding of this MS for the promotion of B cells to switch from spreading to contraction response. Thus, I may suggest to recommend the authors to conduct one more rigorous fluorescent molecular tracking experiment to confirm this phenomenon. Molecular tracking usually requires low labeling density, and the lifeact-GFP labeling here do not meet this requirement which may cause misidentification of the moving molecules. Permeable dye-based fluorescent speckle microscopy is recommended here to track the actin foci if applicable (P. Risteski, Nat. Rev. Mol. Cell Biol., 2023, DOI: 10.1038/s41580-023-00588w & K. Hu, et al, Science, 2007, 315, 111-115).

      We thank the reviewer for the suggestion. We conducted the suggested experiment using membrane-permeable SiR-actin to track B-cell actin dynamics. Unfortunately, two significant issues prevented us from confirming the LifeAct-GFP results using fluorescent speckle microscopy. First, the concentration of SiR-actin required to visualize F-actin in the contact zone of mouse primary B-cells was relatively high due to their smaller sizes (~6 µm diameter) and non-adherent nature. With such a relatively high concentration of SiR-actin, we could not perform fluorescent speckle microscopy. Second, we observed that SiR-actin appeared to stabilize actin structures and reduce actin dynamics, further limiting its use in studying actin dynamics in B-cells.

      Additionally, kymograph is used for foci tracking in figure3 and figure4. Kymograph is indeed a powerful tool for tracking cell protrusion and retraction but is not fairly suitable here, since a Factin focus is a concentrated point which may not move strictly along the selected eight lines generating kymograph. Other imaging processing method should be used to track the foci, for example, time series max projection is recommended if applicable.

      We thank the reviewer for the suggestion and have tried the time series max projection. Unfortunately, it did not provide the resolution to identify individual actin foci, again probably due to the small size of primary mouse B-cells. While kymographs may not track the entire paths of these moving foci, we believe that the conclusions drawn from the kymography analysis in Figure 3 and 4 are reasonable. We generated eight kymographs for each cell in Figure 3 and three kymographs for each cell in Figure 4 to follow as many actin foci as possible within the spreading to contraction transition time window. Our analysis in Figure 3 identifies the fraction of actin foci originating from lamellipodia. In Figure 4, we used the kymographs to trace the path of putative clusters and used these to calculate their relative lifetimes and speed. While this is not what was suggested by the reviewer, our analysis provides qualitatively similar information to the time series max projection and reasonable comparisons between contracted and noncontracted cells, inhibitor-treated and untreated cells, and wild-type and WASP KO cells.

      The second major issue is about the relationship between actin foci formation and NMII recruitment in figure 5. The author concludes that 'N-WASP and Arp2/3 mediated branched actin polymerization promotes the recruitment and the reorganization of NMII ring-like structures by generating inner F-actin foci in the contact zone'. However, there is a lack of strong evidence to directly show the mechanism by which myosin is recruited and the up and down stream relationship between actin foci migration and myosin recruitment. Since myosin-induced actin retrograde flow is a classical model in adherent cells, is it possible that, here also in activated B cells, the recruited myosin driven the formation and migration of actin foci? This reviewer may recommend the author to investigate whether Myosin blocking (e.g., using Y27632) can eliminate the F-actin foci formation and migration.

      This is an excellent suggestion! In the revised manuscript, we have included new data showing that treatment with the non-muscle myosin II motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci ring-like structures, which are associated with B-cell contraction. These results together suggest that the generation of inner actin foci ring-like structure depends on the coordination between N-WASP-mediated actin polymerization and myosin contractile activity. We chose to use blebbistatin rather than Y27632 to inhibit non-muscle myosin II because in addition to the ROCK pathway, myosin light chain kinase can also activate myosin II, and Y27632 may have additional effects besides inhibiting myosin activity. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses: Minor as listed below. The working hypothesis of molecular crowding as a way to push out signalling molecules from the BCR dense foci is interesting. The authors provide evidence for that this is an active process mediated by N-WASP - Arp2/3 induced actin foci. Another possibility is that BCR dense foci formation is an indirect consequence of lamellipodia retraction. Future works should define the specific role of N-WASP, Arp2/3 and actin in the process to form BCR dense foci, especially as the BCR continue to signal in the cytoplasm.

      We thank the reviewer for the comments. We have included the possibility that lamellipodial retraction may be involved in increasing the molecular density of BCR clusters and suggested future studies on the potential roles of N-WASP-dependent inner actin foci and actomyosin structures in BCR internalization and intracellular signaling in the Discussion section.

      Reviewer #3 (Public Review):

      The author prove their claims by mean of thorough image analysis, mainly observing and quantifying the fluorescence and the dynamics of single clusters of antigen and actin foci and analyzing two-colors dynamical images. They perform their observation in control cells, on pharmacologically perturbed cells where the action of Arp2/3 or N-WASP is inhibited, and on modified primary cells (primary derived from genetically engineered mice) to silence N-WASP or WASP. The work is sound and complete, the experiments technically excellent and well explained. Some experiments and discussions are objectively harder to describe, and given the length of the work, the reader might find itself lost some times. A graphical abstract/summary of the main way N-WASP ultimately control signal attenuation would solve this minor point.

      We greatly appreciate the reviewer’s confirmation of our data quality and are delighted to accept the reviewer’s suggestion. In the revised manuscript, we have included a new figure (Figure 10) in the Discussion section, summarizing the results presented in the manuscript as a working model.

      Reviewer #1 (Recommendations For The Authors):

      Some minor points: Figure 1C, E, G and I shows three individual symbols, indicating three independent experiments described in legend. Please double check for accuracy.

      It is better to show statistical data with representative repeat, not the merged means of independent experiments. For example, figure 1C even indicates three "0" data in CK-666 treated cells, meaning no contracting cell was found in ~75 cells, while there are other repeats showing 45% - 50% contracting cells. This applies to all figures involving individual cell imaging data, such as figure 2D, in which 30 cells from three independent experiments were pooled. The authors shall clearly state that those independent experiments are statistically indistinguishable before pooling the data.

      We agree with the reviewer’s comments that these data have variability from individual mice, the quality of isolated primary B-cells, and the lateral mobility of planar lipid bilayers. To show the variability, we displayed the data from each experiment as individual data points. In the revised manuscript, we have utilized three colors of dots to represent three independent experiments in Figure 1C, E, G, and I, Figure 2B-G, and new Figure 5H, which show that the data from the three experiments have the same trend despite the variability.

      In figure 7B-C, figure 8 and figure 9. The significant test results were hard to understand in which groups they compared. Please describe it in more detail in the figure legend or the method section.

      In the legend, the authors claimed blue points in Figure 7B represented individual pCD79a clusters within an equal number of BCR clusters from each time points. The authors used means to qualify the change of blue points distribution. These shall be clearly stated in the Methods. Total BCR cluster numbers shall be shown also. This applies to Figure 7B, 7C, 7D and all figures in figure 8 and figure 9.

      We thank the reviewer for pointing it out. We have revised Figures 7-9, where we utilized square braces to indicate groups of clusters (blue points) being compared. We have also provided additional information in the figure legend and Method sections.

      Reviewer #2 (Recommendations For The Authors):

      199-200: What is the consequence of increased WASP activation in N-WASP knockout B cells? Is this evaluated as increased pWASP activity and/or increased actin polymerization of WASP knockout B cells. Does WASP and N-WASP have an additive or counteractive effect on each other during spreading and contraction?

      Indeed, the relationship between WASP and N-WASP, which are co-expressed in B-cells and other immune cells, is fascinating. Our previous studies, using WASP germline knockout, B-cellspecific N-WASP knockout, WASP and N-WASP double knockout mice, showed that WASP and N-WASP have both additive and counteractive effects during B-cell spreading, but B-cell contraction only depends on N-WASP (Liu et al. 2013. PLoS Biol). Double knockout B-cells fail to spread, and WASP knockout B-cells show reduced spreading but still contract, showing their additive effects. However, WASP and N-WASP suppress each other for activation, as detected by their phosphorylation. Phosphorylated WASP increases in the B-cell contact zone first, and phosphorylated N-WASP increases later when the phosphorylated WASP level decreases. Knocking out one of them enhances the phosphorylation of the other. Consequently, N-WASP knockout B-cells show increased spreading, probably due to enhanced activation of WASP, but exhibit delayed contraction. The revised manuscript has expanded the discussion on this area to relate it to the results presented in this manuscript.

      560-563: Was Syk and SHIP-1 measured in the same cell? If not, the conclusion should be tempered.

      Unfortunately, antibodies specific for Syk and SHIP-1 were from the same host, which did not allow us to stain them in the same cells. The revised manuscript has discussed this as a shortcoming of our work.

      1204-1205: Explain better "three randomly positioned kymographs were generated" - how were they selected?

      We apologize for this unclear sentence. The three kymographs were positioned to track as many inner F-actin foci as possible.

      328: Change "abolished" to "reduced" to describe the data. 354-356: Unclear sentence, please edit. 1171: (H) should be (G). 1325: "PI" should be "FI".

      We thank the reviewer for finding these typos and unclear sentences. We have made the corrections accordingly.

      Methods: The description of the TIRF microscopy method is good. Regarding the image analysis, it is somehow difficult to have a good understanding of what was analyzed just by reading the text. Please show an example of the pipeline for the analysis from a raw image and the processing steps.

      Figure 6-figure supplement 2 shows the image analysis process for tracking Fab’ clusters. We utilized the same approach for the image analysis of Figures 7-9.

      Discussion: Add a paragraph to state the limitations of the study. How do the findings here translate into in vivo activation of B cells and how can this be addressed based on the data presented in this study.

      We thank the reviewer for the suggestion. In several paragraphs of the revised Discussion section, we have brought up the limitations of the study and how these limitations affect the data interpretation. In addition, we have added Figure 10 and the associated text to present our working model, which explains how our findings reveal the cellular mechanism by which BCR surface signaling amplification transitions into attenuation, likely occurring in vivo.

      Figure 2: Add an example of the image analysis for foci determination. From the images, it is not always clear what is a foci and what is not which makes the "number of foci" data difficult to evaluate.

      We have added arrows to Figure 2A to indicate all identified inner F-actin foci in images.

      Figure 3: add a kymograph for the WKO analysis.

      In the revised Figure 4, we have provided a kymograph of a WKO B cell.

      Figure 4M: the analysis of the "relative speed" of the "WT" samples is lower compared to the other control samples "DMSO" and "CK-689". The conclusion is that WKO have similar "relative speed" as "WT" cells, but in fact the "WT" cells may have responded poorly in this experiment. What is the author's experience and explanation?

      We agree that the relative speeds of inner actin foci in the contact zone of WT and WKO B-cells are relatively low compared to DMSO and CK-689. Based on our experience, this parameter is very sensitive to the lateral mobility of planar lipid bilayers. We could only perform one pair of conditions using live cell images each time. The WT and WKO experiments were done at the end and might use relatively aged liposomes. However, it did not affect the number of inner actin foci formed and their relative lifetime, consistent with their similar relative speeds. Unfortunately, we lost the LifeAct-GFP-expressing WKO mouse colony and cannot redo this experiment using freshly made liposomes within a reasonable time.

      Figure 7B-D: Add a more detailed legend for the black and brown lines in the dot plots.

      We have expanded the legend for Figure 7B-D to provide additional details.

      Figure 8-9: Show representative images for SYK, pSYK, SHIP-1 and pSHIP-1. Add a more detailed legend for the black and brown lines in the dot plots.

      We have provided representative images for Syk, pSyk, SHIP-1, and pSHIP-1 in revised Figure 8 and 9.

      Reviewer #3 (Recommendations For The Authors):

      From the paper one understands that NMII is recruited by the actin foci and this recruitment pushes the foci towards the center of the synapse, in what resembles a positive feedback. Could the authors better elucidate this point? What happen at the peak of NMII recruitment? Could this be a mechanism used by the cell to end the contact and detach (which probably cannot be observed in this experimental setup)?

      This is an excellent comment! We have recently shown that NMIIA recruitment peaks right before B-cell contraction occurs, and inhibition of NMII by inhibitors or B-cell conditional knockout blocks B-cell contraction and enhances signaling (Seeley-Fallen et al. 2022. Frontiers in Immunology). In the revised manuscript, we have included new data showing that treatment with the NMII motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci associated with B-cell contraction. These results together suggest that the generation of inner actin foci depends on the coordination between N-WASP-activated actin polymerization and myosin contractile activity, supporting the reviewer’s comment. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Whether the recruited NMII pulls B-cells away from antigen-presenting surfaces remains an interesting question. We have previously shown that high-affinity interaction of surface BCRs with membrane-anchored antigen can cause NMII-dependent B-cell membrane permeabilization, which triggers lysosome exocytosis and lysosomal enzyme-mediated antigen cleavage, allowing antigen internalization and presentation to T-cells (Maeda et al. 2021. eLife). Furthermore, NMII is required for B cells to internalize surface antigens (Natkanski et al. 2013. Science). These results support the possibility that actomyosin structures formed during B-cell contraction may further drive B-cells to internalize antigen. We have discussed this interesting point in the revised manuscript.

      Some experiments/quantification are a bit more complex than others and a reader might find hard to follow them (in particular figs 7,8 and 9). The comprehension could be improved by providing a guide to read them. E.g. it is not clear what the population distribution represents (and it is not particularly affected by any manipulation. How were the group for test chosen? It seems they are based on intensity categories taken every 100 units: is it the case? even if arbitrary, this should be stated it in the legend.

      We thank the reviewer for understanding the complexity of image analysis and pointing out the unclear points. Based on the reviewer’s comments, we have revised Figures 7-9 and the figure legend. We utilized square brackets to indicate groups of clusters (blue points) being compared. The comparison groups were chosen arbitrarily based on Fab’ peak fluorescence intensity every 90 units for Figure 7 and 8 and every 100 units for Figure 9.

      Can the author speculate on how the actin organization passes from actin foci to recruitment of NMII and arc formation? Is it a rearrangement of the actin network (percolation) or simply recruitment of monomers?

      Our previous and new results show that both N-WASP-activated Arp2/3 and NMII are required to form inner F-actin foci. Based on these results, we speculate that N-WASP and Arp2/3mediated actin polymerization may initiate the process and recruit NMII, and recruited NMII coordinates with actin polymerization to reorganize actin structures, promoting inner actin foci maturation and arc formation. We have included these possibilities in the revised discussion.

      The role of SHIP recruitment as way to inhibit the signal downstream of the BCR is an interesting finding. Is this related to the termination of the synapse? Could we relate the time scales (accurately measured in this work) to contact times observed in vivo?

      The reviewer raises an interesting question. In the discussion section, we have speculated that the actomyosin structures responsible for B-cell contraction are potentially the precursor cytoskeleton structures for antigen internalization. However, the relationship of B-cell contraction and signaling attenuation with the termination of the synapse remains unclear.

      The BCR has been shown to be internalised mechanically: do these new data suggest a mechanisms for force generation in antigen internalization at the actin foci? Related to that, how do the dynamics of N-WASP recruitment relate to the force measurement highlighted in Traction Force Microscopy experiments (see for example Wang Sci.Signal. 2018, Kumari Nat.Comm.2019)? What happens in situation when the actin foci are unable to get transported, e.g. as on the more classical antigen on coverslip configuration?

      Indeed, our results allow us to speculate that the actomyosin structures responsible for B-cell contraction potentially contribute to antigen internalization by mechanical forces. We previously showed that the B-cell-specific N-WASP knockout drastically reduced BCR internalization of soluble antigen (Liu et al. 2013. PLoS Biol), and that NMII is required for BCR internalization of membrane-associated antigen (Maeda et al. 2021. eLife and Natkanski et al. 2013. Science). The effect of N-WASP knockout on the internalization of membrane-associated antigen and traction forces generated at the contact membrane and whether traction forces are generated from the inner F-actin foci have not been determined but will be pursued in the future.

      Our previous publication compared the BCR and actin dynamics of B-cells interacting with Fab’ tethered to planer lipid bilayers (Fab’-PLB) and cover glass (Fab’-G) (Ketchum et al. 2014. Biophys J). B-cells interacting with Fab’-G do not contract and generate inner F-actin foci and exhibit less dynamic BCR clusters and actin cytoskeleton than B-cells interacting with Fab’-PLB. Actin foci remain coincident with Fab’ clusters on glass rather than being positioned behind Fab’ clusters on PLB, thus driving their centripetal movement.

      Minor remarks: When several experiments (mice) are presented in dot plots (e.g. fig 2D-G 4J-M), color dot plot (so called "smart plot") where each experiment is identified by a color, could be used to highlight the sample-to-sample variability.

      This is an excellent suggestion. In the revised manuscript, we have utilized three shades of dots to represent the data points from three independent experiments.

      Fig 6A: the fluorophore should be indicated in the picture (Fab'-AF546)

      The suggested correction has been made.

      Fig 6D: how is the contraction phase (purple rectangle) determined? Curve by curve or on the average curve? Please specify this in the legend.

      The contraction phase (purple rectangle) was determined using the average curve of the contact area by IRM over time. We have added this sentence to the revised figure legend.

      Minor typos in the material and methods: in some case C56BL/6 is written instead of C57BL/6 Corrected.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels. The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung. Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored. The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear. The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1. Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      We thank the reviewer for their comments. We are prepared to carry out these power calculations and repeat the experiment if necessary.

      1. All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      We are prepared to measure the levels of Ace, biosynthetic enzyme expression in female mice by qPCR, and ACE protein expression by IF. Additionally, we will test females using the dexamethasone suppression study. The single cell RNA seq analysis was used primarily to inform our model, not for experimental readout. We will explore the dataset as the reviewer suggests and will add additional plots if the analysis substantively changes our previous findings.

      1. IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We have negative controls for antibody staining. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung.

      1. Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      The vendor of this antibody has verified by cell treatment to ensure that the antibody binds to the antigen stated .We are prepared to additionally validate the antibody using other tissues as control, though we point out that ACE is expressed, albeit at lower levels, in endothelial cells throughout the body and so some signal is to be expected in most if not all tissues.

      1. The link between alveolar macrophage Marco and ACE is poorly explored.

      We are prepared do co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence.

      1. Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      We argue that this would be outside the scope if this project, though we would consider exploring such experiments in future studies.

      1. Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We are prepared to measure blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We have access to an AM cell line which we plan to use to do co-culture experiments with an ACE-expressing endothelial cell line. In this way we will test whether this effect is linked to AMs.

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We are prepared to measure blood electrolytes and blood pressure (via tail cuff method) in Marco-deficient and Marco-sufficient mice.

      4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern, we plan to do a co-culture experiment as outlined above.

      Broadly, we thank the reviewers for taking the time to critically appraise this manuscript. The reviewers primary concern seems to be the lack of direct evidence of an effect of AMs on endothelial Ace expresion, which we plan to address as outlined above. We will adjust our conclusions as appropriate based on the results of the experiments outlined above.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Our comments on the initial eLife assessment

      “This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. The claim that these findings are of relevance to psychosis risk and have implications for policy changes are partially supported by the results”

      We sincerely appreciate the editor and reviewers for their valuable feedback and their willingness to accommodate our perspectives in the first revision. In this revision, the comments from the reviewers have allowed us to further improve our manuscript. Regarding the eLife assessment, we would like to discuss two points.

      Firstly, regarding your point of our “findings are of relevance to psychosis risk…partially supported…”, we want to address that our study is closely related to psychosis risk. Childhood psychotic-like experiences (PLEs) are closely linked to psychotic risk and have been shown to increase the risk of general psychopathology, as mentioned in our Introduction and Discussion.

      The reviewers asked for clearer differentiation between PLEs and schizophrenia, which we incorporated in this revision (line 100~111; line 419~430). So, this revised version now clearly points out that findings are relevant primarily to psychosis risk, and only partially relevant to schizophrenia risk.

      Secondly, regarding “…implications for policy changes are partially supported…”, we have revised our study’s social contribution more clearly and specifically. Incorporating the comments, we have revised that our study offers an insight to the future studies by showing the importance of integrative approaches, considering multi-factorial neurocognition and psychopathology ranging from genes to environment (line 503~512), rather than offers direct policy implications.

      Our collaboration with eLife and the reviewers has proven satisfactory and enriching. The community, coupled with the innovative system and culture established around eLife, has significantly advanced the progression of scientific research. We are privileged to contribute to this endeavor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I am happy with the revisions provided by the authors and I think most of my concerns have been addressed satisfactorily. One remaining concern is the authors' conflation of PLEs and schizophrenia. They stated, for example, that it is necessary to adjust for schizophrenia PGS. Even though studies have found a statistical relationship between schizophrenia PGS and PLEs, this relationship is not very strong (although statistically significant) and other studies have found no relationship. Similarly, having PLEs increases the risk of developing psychosis, but that does not necessarily mean that this risk is substantial or specific. I think this needs more nuance in the manuscript and the term 'schizophrenia' should be used sparsely and very carefully as the paper has focused on PLEs. Otherwise, great work on the revisions, thank you.

      Thank you for your comment on the use of PLEs and schizophrenia. We clearly understand the differences between the two and we made relevant corrections throughout the manuscript. In particular, we added that PLEs are not a direct predictor of schizophrenia and corrected any expressions that may imply that PLEs are closely related to schizophrenia in the Introduction.

      “Psychotic-like experiences (PLEs), which are prevalent in childhood, indicate the risk of psychosis (van der Steen et al., 2019; Van Os & Reininghaus, 2016). Although they are not a direct precursor of schizophrenia, children reporting PLEs in ages of 9-11 years are at higher risk of psychotic disorders in adulthood (Kelleher & Cannon, 2011; Poulton et al., 2000). PLEs also point towards the potential for other psychopathologies including mood, anxiety, and substance disorders (van der Steen et al., 2019), are linked to deficits in cognitive intelligence (Cannon et al., 2002; Kelleher & Cannon, 2011) and show a stronger association with environmental risk factors during childhood than other internalizing/externalizing symptoms (Karcher, Schiffman, et al., 2021).

      Maladaptive cognitive intelligence may act as a mediator for the effects of genetic and environmental risks on the manifestation of psychotic symptoms (Cannon et al., 2000; Keefe et al., 2006; Reichenberg et al., 2005).” (line 100~111)

      We also revised any expressions that could be perceived as implying relevance to schizophrenia in the Discussion. “Prior research identifying the mediation of cognitive intelligence focused on either genetic (Karcher, Paul, et al., 2021) or environmental factors (Lewis et al., 2020) alone. Studies with older clinical samples have shown that cognitive deficit may be a precursor for the onset of psychotic disorders (Eastvold et al., 2007; Fett et al., 2020; Vorstman et al., 2015). Our study advances this by demonstrating the integrated effects of genetic and environmental factors on PLEs through the cognitive intelligence in 9-11 years old children. Such comprehensive analysis contributes to assessing the relative importance of various factors influencing children's cognition and mental health, and it can aid future studies designed for identifying health policy implications. Considering the directions and magnitudes of the effects, though the effects of PGS remain significant, aggregated effects of environmental factors account for much greater degrees on PLEs.” (line 419~430)

      Reviewer #2 (Recommendations For The Authors):

      I thank the authors for addressing most of my comments. I feel the manuscript has already greatly improved.

      I have a few more comments.

      1) Although I did not make this comment, I find the authors' reply to the following comment by Reviewer #1 unclear: Original comment 'I like that the assessment of CP (cognitive performance) and self-reports PLEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL (Child Behavior Checklist) were used and how did they correlate with the child-reported PLEs? And how was distress taken into account in the child self-reported PLEs measurement? Which PLEs measures were used?'

      The authors' response refers to correlation coefficients, but I think Reviewer #1's inquiry was on more than these correlations.

      Thank you for your concern. We think that this comment was referring to our previous manuscript submitted elsewhere. In our initial submission to eLife, we already added the details about the four items from the parent-reported CBCL and how distress was considered in the child self-reported PLEs measurement (Appendix S1, page 48).

      2) Regarding the authors' reply that they have 'standardized the use of 'cognitive capacity' - I do not understand what this means. How exactly was this term standardized? In fact, I can find the term 'cognitive capacity' only once and it seemed to have been deleted from the manuscript. This is fine, but it doesn't clearly align with the statement that this term has been standardized.

      We apologize for causing such confusion. What we meant was that throughout our revised manuscript, we used the term “cognitive phenotypes” instead of “cognitive capacity”.

      3) Regarding my initial comment that 'it needs to be described how cognitive performance was defined in Lee 2018.' - I believe this is still not clarified. The authors write 'CP was measured as the respondent's score on cognitive ability assessments', but it remains unclear what exactly these assessments were.

      Thank you for pointing this out. We added that “CP, measured as the respondent's score on cognitive ability assessments of general cognitive function and verbal-numerical reasoning, was assessed in participants from the COGENT consortium and the UK Biobank” (line 204~206).

      4) Regarding the authors' reply to my comment 'In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?'

      I can see that the authors explained 'components' (=the weighted sum of observed variables), but please also add what you mean by 'factors' - and how these are different from 'components' (line 284). Furthermore, I don't think it is correct that SEMs can only model latent factors, but not components (=measured variables). I also cannot see how using a weighted sum of observed variables controls more effectively for bias in estimation than latent factors. However, even though I do have some knowledge on this method, I'm not an expert and would appreciate the authors, other reviewer and/or editor to weigh in on this point.

      Thank you for pointing this out. We added that latent factors are indirectly measured indicators that explain the covariance among observed variables (line 263~271). We also added that standard SEM method using latent factors assumes that observed variables within each construct share a common underlying factor, but if this assumption is not met, then the standard SEM method cannot effectively control for biases. This is the reason why the IGSCA method, which addresses this limitation by allowing for use of both composite and latent factors as constructs.

      “Standard SEM using latent factors (i.e., indirectly measured indicators that explain the covariance among observed variables) to represent indicators such as PGS or family SES relies on the assumption that observed variables within each construct share a common underlying factor. If this assumption is violated, standard SEM cannot effectively control for estimation biases. The IGSCA method addresses this limitation by allowing for the use of composite indicators (i.e., components)—defined as a weighted sum of observed variables—as constructs in the model, more effectively controlling bias in estimation compared to the standard SEM. During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components.” (line 263~271)

      5) I overall disagree with the authors' following statement 'It has been suggested from prior studies that these variables (PGS, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor', but I appreciate the authors' argument.

      Thank you for your comment. To make clarify our statement in the manuscript, we changed the sentence to “Considering that the observed variables of the PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs are evaluated as a composite index by prior research, the IGSCA method can mitigate bias more effectively by representing these constructs as components” (line 274~277).

      6) Regarding 'genetic ethnicity': please describe your methods on how this was defined.

      Genetic ethnicity was defined as the genetic ancestry of participants, which is included as one of observations in the original ABCD Study data. To avoid further confusion, we corrected ‘genetic ethnicity’ to ‘genetic ancestry’ throughout the manuscript.

      7) Regarding 'a more direct genetic predictor of PLEs' - I still don't understand what the contrast is here. More direct than what else?

      The description was unclear; we removed it from our manuscript.

      8) Regarding the factor loadings in Figure 3: I don't understand how deprivation loads positively on 'low neighborhood SES', but poverty loads negatively. Shouldn't they both show the same direction of effect/loading on neighbourhood SES, while 'years of residency' should show the opposite direction (i.e., deprivation and poverty = risk, while years of residency = protective)? Are these unexpected loadings?

      The authors did not yet respond to this point: 'Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?' Were these correlations not modelled? Why not?

      Figure 3B is still unclear. Was intelligence included here? What is the difference between Figure 3A and B? The legend suggests that 3B shows the indirect effects, but figure 3B looks like a direct effect, while 3A seem to show the indirect effect.

      The reviewer’s confusion resulted from our incorrect description. The factor loadings of low neighborhood SES were marked incorrectly. The loading for ‘years of residence’ and ‘poverty’ should be switched: -0.3648 for ‘years of residence’ and +0.877 for ‘poverty’. This was a mistake when we were applying factor loadings in the Figure. We thank you for pointing this out.

      We apologize for missing your point on autocorrelation. Adding autocorrelations between the three PLEs is unrelated to our research goal. In this paper, we investigated how genetic and environmental factors explain the variations in PLEs between participants, regardless of changes over time. Since we used PLEs of multiple follow-ups to ensure that the results are robust irrespective of the timing of PLE measurements, taking autocorrelation into account is not necessary.

      The decision to add autocorrelation, which involves using the outcome variable at time (t-1) as a predictor for the outcome variable at time t, depends on the research focus. If your interest lies in explaining inter-individual variation in the rate of change in PLEs over a one-year period, then autocorrelation should be controlled for (typically, predictors measured at different time points are used in such cases). However, this was not the focus of this paper, which is why we did not apply autocorrelation in the SEM analysis.

      We apologize for the confusion between Figure 3A and 3B. To clarify, we added titles in the figure images as “Direct effects” and “Indirect effects”. We also changed the legend as well.

      “A. Direct pathways from PGS, high family SES, low neighborhood SES, and positive environment to cognitive intelligence and PLEs. Standardized path coefficients are indicated on each path as direct effect estimates (significance level *p<0.05). B. Indirect pathways to PLEs via intelligence were significant for polygenic scores, high family SES, low neighborhood SES, and positive environment, indicating the significant mediating role of intelligence.” (line 968~973)

      Figure 3A shows direct effects: i.e., the coefficients of paths from PGS, family SES, neighborhood SES, and positive environment to intelligence and PLEs, as well as the coefficient of paths from intelligence to PLEs. This is why Figure 3A shows colored arrows starting from PGS, family and neighborhood SES, and positive environment towards intelligence and PLEs, as well as the arrows from intelligence to PLEs. On the other hand, in Figure 3B, the colored arrows staring from PGS, family and neighborhood SES, and positive environment goes through intelligence, and heads towards PLEs. This was meant to show that the indirect effects shown in Figure 3B indicate the specific effects of PGS, family SES, neighborhood SES, and positive environment on PLEs mediated by intelligence.

      In short, Figure 3 can be seen as a diagram drawn from Table 2: direct effects of the genetic and environmental variables on intelligence and PLEs, and direct effects of intelligence on PLEs are shown in Figure 3A; indirect effects of genetic and environmental variables on PLEs mediated by intelligence are shown in Figure 3B.

      9) Regarding Supporting Information tables: to make these more digestible, I suggest using Excel and adding one table per sheet with a clear title and legend, indicating what each table shows. For example, Table S1 has 9(?) different subsections, all called the same (Linear Mixed Model: Multiethnic). It is not clear how each subsection differs from the others. Separate tables in separate excel sheets might be easier.

      Also, I think two decimal points might be good enough, enhancing readability of these tables.

      Thank you for your suggestion. We moved the supplementary tables into an external Excel file, with each sheet showing different tables, as well as titles, legends, and clear subsections.

      10) Regarding reporting exact p-values in Table 2: I don't understand. At the moment, categorical significance statements are reported. Were these not based on exact p-values (or how else was it decided if a finding was significant at a 0.05 (?) significance level).

      Either remove the significance column completely (as p-values cannot be estimated due to non-normality) or specify exactly/clarify what this column shows and this was derived.

      We apologize for the confusion. In Table 2, we checked the significance of each path using 95% confidence intervals with 5,000 bootstrapping iterations. Since 95% confidence intervals that does not include zero is equivalent to p-values below 0.05 significance level, we believe this is an appropriate alternative for reporting the significance of each path in the SEM model.

      We specified the reason why we were not able to calculate exact p-values (clean copy: line 299~303). “As a trade-off for obtaining robust nonparametric estimates without distributional assumptions for normality, the IGSCA method does not return exact p-values (Hwang, Cho, Jung, et al., 2021). As a reasonable alternative, we obtained 95% confidence intervals based on 5,000 bootstrap samples to test the statistical significance of parameter estimates.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not. The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This important study from Godneeva et al. establishes a Drosophila model system for understanding how the activity of Tif1 proteins is modified by SUMO. The authors nicely show that Bonus, like homologous mammalian Tif1 proteins, is a repressor, and that it interacts with other co-repressors Mi-2/NuRD and setdb1 in Drosophila ovaries and S2 cells. They also show that Bonus is SUMOylated by Su(var)2-10 on at least one lysine at its N-terminus to promote its interaction with setdb1. By combining nice biochemistry with an elegant reporter gene approach, they show that SUMOylation is important for Bonus interaction with setdb1, and that this SUMO-dependent interaction triggers high levels of H3K9me3 deposition and gene silencing. While there are still major questions of how SUMO molecularly promotes this process, this study is a valuable first step that opens the door for interesting future experimentation.

      Major Point:

      The RNAseq and ChIPseq data is not available. This is critical for the review of the paper and would help the readers and reviewers interpret the Bonus mutant phenotype and its mechanism of repressing genes.

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      1) The author's conclusion that Bonus SUMOylation is "essential for its chromatin localization" is not supported by the data. Figure 5F shows less 3KR mutant in the chromatin fraction but there is still significant signal.

      We appreciate the reviewer's feedback and agree that the term "essential" was not appropriate in this context. We have revised the manuscript to replace "essential" with "contributes to" to accurately reflect our findings.

      2) The author's conclusion that Bonus is SUMOylated at a single site close to its N-terminus is not necessarily true. In several SUMO and Bonus blots throughout the paper (5B, 6C, S4A), there are >2 differentially migrating species that could represent more than one SUMO added to Bonus. While the single K20R mutation eliminates all of these species in Fig 5C, it is possible that K20R SUMOylation is required for additional SUMOylation events on other residues. One way to determine if Bonus is SUMOylated on multiple sites is to add recombinant SUMO protease to the extract and see if multiple higher molecular weight bands collapse into a single migrating species (implying multiple SUMOs) or multiple migrating species (implying something else is altering gel migration).

      We appreciate the suggestion made by the reviewer. While we acknowledge the presence of occasional multiple bands in SUMO Western blots, the predominant pattern is the presence of unmodified Bon and a single additional band corresponding to SUMO-modified Bon. To investigate the possibility of multi-site SUMOylation, we performed requested experiment where we added SENP2 SUMO protease to the extract and checked Bon's SUMOylation. In the presence of NEM, we observed the unmodified form of Bon, as well as a single additional band representing a SUMO-modified form of Bon. Following SENP2 SUMO protease treatment, SUMOylation form of Bon was completely abolished in all samples, leaving only the unmodified Bon band (Extended Data Fig. 4D). This indicates that Bon is not SUMOylated on multiple sites and that the observed differential migration species likely result from other factors affecting gel migration.

      3) The authors state that most upregulated genes in BonusGLKD are not highly enriched in H3K9me3. The heatmap in figure 3D is not an ideal presentation of this argument. The authors should show an example of what the signal on a highly enriched gene looks like for comparison. The authors also argue that because most upregulated genes in BonusGLKD are not highly enriched in H3K9me3, they must be indirectly repressed. Another possibility is that bonus-mediated H3K9me3 is only important (and present) during early nurse cell differentiation and is later lost and dispensable during the rapid endocycles. After bonus establishes repression though H3K9me3, it might be maintained through bonus-Mi2/Nurd, something else, or nothing at all. The authors could discuss this possibility or perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation rather than in whole ovaries, which are enriched for later stages.

      We thank the reviewer for their thoughtful comments and suggestions. In our revised manuscript we have included the tracks of gene that is highly enriched in H3K9me3 but remain unchanged upon Bon GLKD (Extended Data Fig. 3B). This addition allows for a visual comparison and better supports our argument that majority of genes upregulated in Bon GLKD are not enriched in H3K9me3 mark. We also appreciate the reviewer's suggestion regarding the potential temporal dynamics of Bon-mediated H3K9me3. It is indeed possible that Bon's role in establishing H3K9me3 might be more prominent during early nurse cell differentiation and less critical in later stages. We included discussion of this possibility in revised manuscript. To further explore it would be valuable to perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation. However, given the limitations of our current resources and time limitations, we were unable to perform these experiments for the revised manuscript.

      4) The BonusGLKD RNAseq analysis is underwhelming. The conclusion that "Bonus represses tissue-specific genes" has limited value. Every gene that is not expressed in ovaries is "tissue-specific." What subset of tissue-specific genes does Bonus repress? What common features do these genes have and how do they compare to other sets of tissue-specific genes, such as those reportedly repressed by setdb1, Polycomb proteins, small ovary, l(3)mbt, and stonewall (among others in female germ cells). Comparing these available data sets could help the authors understand the mechanism of Bonus repression and how BonusGLKD leads to sterility. The authors could also further analyze the differences between nos-Gal4 and MT-Gal4 to better understand why nos- but not MT-driven knockdown is sterile.

      We appreciate the reviewer's feedback regarding the RNA-seq analysis and acknowledge the importance of identifying the specific subset of tissue-specific genes. The Figure 2C shows specific tissues where genes derepressed upon Bon GLKD are normally expressed. These are tissues/organs such as the head, digestive system, and nervous system. The reviewer's suggestion to compare our findings with existing datasets are valid and could indeed provide a more comprehensive understanding of Bon repression and its implications in female germ cells. However, many of the published datasets are based on mutant fly lines or use different GAL4 drivers to induce knockdowns, making direct comparisons challenging. We have conducted a preliminary analysis of available data, specifically nos-Gal4>SetDB1KD (GSE109852), and identified an overlap of 135 genes out of the 464 genes upregulated upon nos-Gal4>BonusKD with those affected by SetDB1 knockdown. We have included this result in the revised manuscript.

      Main Study Limitations:

      1) It is unclear which genes are directly vs indirectly regulated by bonus, which makes it difficult to understand Bonus's repressive mechanism. Several lines of experiments could help resolve this issue. 1) Bonus ChIPseq, which the authors mentioned was difficult. 2) RNAseq of BonusGLKD rescued with KR3 mutation. This would help separate SUMO/setdb1-dependent regulation from Mi-2 dependent regulation. Similarly, comparing differentially expressed genes in Su(var)2-10GLKD, setdb1GLKD, 3KR rescue, and MI-2 GLKD could identify overlapping targets and help refine how bonus represses subsets of genes through these different corepressors.

      We appreciate the reviewer's suggestions and agree that discrimination between direct and indirect Bon targets should be the next step in understanding Bon repressive mechanism. We have previously attempted to determine Bon direct targets using ChIP-seq approach. However, despite our multiple efforts using both native Bon antibodies and GFP-tagged Bon fly lines, analysis of ChIP-seq data did not reveal specific enrichment indicating that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP. The recommendation for RNA-seq analysis of Bon GLKD rescued with the 3KR mutation is valuable, and we will certainly consider it for future investigations.

      We compared differentially expressed genes in Su(var)2-10 GLKD and Mi-2 GLKD and found limited overlap: out of the 231 genes affected by Bon GLKD, 39 genes were affected in Mi-2 GLKD and 42 in Su(var)2-10 GLKD. We acknowledge the importance of understanding which genes are directly or indirectly regulated by Bon and the potential for further experiments to address this question.

      2) The paper falls short in discussing how SUMO might promote repression. This is important when considering the conservation (of lack thereof) of SUMOylation sites in Tif1 proteins in distantly related animals. One piece of data that was not discussed is the apparent localization of SUMOylated bonus in the cytoplasmic fraction of the blot in Figure 5F. Su(var)2-10 is mostly a nuclear protein, so is bonus SUMOylated in the nucleus and then exported to the cytoplasm? Also, setdb1 is a nuclear protein, so it is unlikely that the SUMOylated bonus directly interacts with setdb1 on target genes. Together with Fig 5E (unSUMOylatable Bonus aggregates in the nucleus), one could make a model where SUMO solubilizes bonus (perhaps by disassembling aggregates) and indirectly allows it to associate with setdb1 and chromatin. It is also important to note that in Figure 5I, the K3R mutation appears to lessen but not eliminate Bonus interaction with setdb1. This data again disfavors a model where SUMO establishes an interaction interface between setdb1 and Bonus. To determine which form of Bonus interacts with setdb1, the authors could perform a setdb1 pulldown and monitor the SUMOylation state of coIPed Bonus through mobility shift. If mostly unSUMOylated bonus interacts with setdb1, and SUMO indirectly promotes Bonus interaction with setdb1 (perhaps by disassembling Bonus aggregates), then the precise locations of Bonus SUMOylation sites could more easily shift during evolution, disfavoring the author's convergent evolution hypothesis.

      We appreciate the reviewer's valuable feedback. Regarding the observation of SUMOylated Bon in the cytoplasmic fraction in Figure 5F, we recognize its significance. This finding has prompted us to consider a model in which SUMOylation may play a role in translocating Bon from the nucleus to the cytoplasm, potentially influencing interactions with SetDB1 and chromatin indirectly. Furthermore, Figure 5I which shows only a partial reduction in Bon-SetDB1 interaction with the 3KR mutation, suggests that SUMO may not be the primary mediator of this interaction. We recognize the need for further investigations to clarify SUMO's exact role in this context. In response to the reviewer's suggestion, we conducted SetDB1 pulldown experiments in S2 cells. The results reveal that indeed SetDB1 primarily interacts with unmodified Bon which is by far more abundant compared to SUMOylated form (Extended Data Fig. 5C). We think this experiment presents certain technical challenges, as the signal for Bon, when used as prey in co-IP experiments, is relatively faint, making it inherently difficult to detect the lower levels of SUMO-modified Bon. Additionally, in revised manuscript we have added new result of determining Bon interactors in ovary using mass-spec analysis, which showed that SetDB1 associates with wild-type, but not SUMO-deficient Bon. While our data support the idea that SUMO may contribute to Bon solubilization, possibly by disassembling aggregates, thereby indirectly facilitating its association with SetDB1 and chromatin, we acknowledge that the precise mechanism remains unclear.

      Reviewer #2 (Public Review):

      Summary:

      The authors analyze the functions and regulation of Bon, the sole Drosophila ortholog of the TIF1 family of mammalian transcriptional regulators. Bon has been implicated in several developmental programs; however, the molecular details of its regulation have not been well understood. Here, the authors reveal the requirement of Bon in oogenesis, thus establishing a previously unknown biological function for this protein. Furthermore, careful molecular analysis convincingly established the role of Bon in transcriptional repression. This repressor function requires interactions with the NuRD complex and histone methyltransferase SetDB1, as well as sumoylation of Bon by the E3 SUMO ligase Su(var)2-10. Overall, this work represents a significant advance in our understanding of the functions and regulation of Bon and, more generally, the TIF1 family. Since Bon is the only TIF1 family member in Drosophila, the regulatory mechanisms delineated in this study may represent the prototypical and important modes of regulation of this protein family. The presented data are rigorous and convincing. As discussed below, this study can be strengthened by a demonstration of a direct association of Bon with its target genes, and by analysis of the biological consequences of the K20R mutation.

      Strengths:

      1. This study identified the requirement for Bon in oogenesis, a previously unknown function for this protein.
      2. Identified Bon target genes that are normally repressed in the ovary, and showed that the repression mechanism involves the repressive histone modification mark H3K9me3 deposition on at least some targets.
      3. Showed that Bon physically interacts with the components of the NuRD complex and SetDB1. These protein complexes are likely mediating Bon-dependent repression.
      4. Identified Bon sumoylation site (K20) that is conserved in insects. This site is required for repression in a tethering transcriptional reporter assay, and SUMO itself is required for repression and interaction with SetDB1. Interestingly, the K20-mutant Bon is mislocalized in the nucleus in distinct puncta.
      5. Showed that Su(var)2-10 is a SUMO E3 ligase for Bon and that Su(var)2-10 is required for Bon-mediated repression.

      Weaknesses:

      The study would be strengthened by demonstrating a direct recruitment of Bon to the target genes identified by RNA-seq. Given that the global ChIP-seq was not successful, a few possibilities could be explored. First, Bon ChIP-qPCR could be performed on the individual targets that were functionally confirmed (e.g. rbp6, pst). Second, a global Bon ChIP-seq has been reported in PMID: 21430782 - these data could be used to see if Bon is associated with specific targets identified in this study. In addition, it would be interesting to see if there is any overlap with the repressed target genes identified in Bon overexpression conditions in PMID: 36868234.

      We greatly appreciate the reviewer's suggestion to demonstrate the direct recruitment of Bon to the target genes. As described in our answer to reviewer #1, we attempted to determine Bon direct targets using ChIP-seq approach using both native Bon antibodies and GFP-tagged Bon fly lines. However, analysis of ChIP-seq data did not reveal specific enrichment. Similarly, Bon ChIP-qPCR on individual targets showed the same results suggesting that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP protocol, at least in standard conditions. To further explore this issue, we have analyzed results of a global Bon ChIP-seq reported in PMID: 21430782. We did not find Bon binding to individual targets, but even more importantly, we did not see clear Bon enrichment elsewhere in the genome confirming a conclusion that Bon targets on chromatin cannot be determined by ChIP. Additionally, we explored the possibility of overlap between target genes repressed by Bon in our study and those observed under Bon overexpression conditions in PMID: 36868234. While we did identify 41 genes in common, it's important to note that the datasets are derived from different tissues (pupal eyes vs. ovaries), making direct comparison problematic.

      The second area where the manuscript can be improved is to analyze the biological function of the K20R mutant Bonus protein. The molecular data suggest that this residue is important for function, and it would be important to confirm this in vivo.

      We appreciate the reviewer's suggestion to analyze the biological function of the K20R mutant Bon protein. While we acknowledge that we did not use single-site K20R mutant for in vivo experiments, we demonstrated that the mutant with the three-residue substitution (3KR) is incapable of inducing repression (Figure 5G). Given that other experiments consistently showed that K20 is the primarily SUMOylation site, this result supports the conclusion that K20 SUMOylation plays an important role in Bon-mediated transcriptional silencing.

      Reviewer #1 (Recommendations for The Authors):

      Make the RNAseq and ChIPseq data publicly available!

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      Reviewer #2 (Recommendations for The Authors):

      It would be interesting to identify the biological basis of aberrant ovary development in Bon depletion conditions. Previous studies (e.g. PMID: 11336699) suggested that Bon loss of function clones are cell lethal, and the developmental defects in oogenesis presented in the current study offer an opportunity to delve more into the causes of cell loss, e.g. by showing that the cells die via apoptosis.

      Thank you for your valuable suggestion. In response to your comment, we performed a TUNEL assay to investigate whether germ cells in nos-Gal4>BonusKD ovaries undergo apoptosis. Our results indeed indicate that germ cells in these ovaries exhibit apoptosis, as evidenced by the TUNEL signal (Extended Data Fig. 1C). This information has been included in the revised manuscript to provide insights into the biological basis of aberrant ovary development in Bon depletion conditions.

      The K20 residue could also be ubiquitinated. This possibility could at least be discussed, particularly given the presence of the RING Ub ligase domain in Bon that might potentially perform self-ubiquitination.

      Indeed, the possibility that Bon can be ubiquitinated is a valid consideration. We have explored this possibility. We did not detect any signals with the Ubiquitin antibody in both wild-type Bon immunoprecipitant and triple-mutant [3KR] ovaries (in which K20 is also mutated) (Extended Data Fig. 4C). This suggests that K20 is more likely responsible for Bon SUMOylation rather than ubiquitination. We appreciate the reviewer's suggestion and have included this information into the revised manuscript.

    1. Author Response

      We very much appreciate all the reviewers’ positive feedback and additional comments and suggestions for this manuscript!

      In this provisional reply, we’d like to quickly address only one selected key point, for which we have already collected relevant experimental data:

      Reviewer 1 suggests that ‘it would have been more rigorous for the authors to independently reproduce the kinetics reported for nsp8/9 using their specific experimental conditions.’ We absolutely agree with this and have already carried out these kinetic experiments while our paper was under review. We have now measured kinetic parameters for cleavage of the nsp8/9 peptide in our own hands under the same conditions as we used for nsp4/5 and TRMT1. We measured kcat and KM values of 0.019 +/- 0.002 s-1 and 40 +/- 7.5 µM, respectively, for nsp8/9 cleavage; these data are very much in line with the previously reported values from MacDonald et al (kcat = 0.013 +/- 0.001 s-1, KM = 36 +/- 6.0 µM) that we used for comparison in Figure 4 and listed in Table S2. We will add our own measured kinetic values for nsp8/9 in the next version of our manuscript, but wanted to report these numbers as soon as possible, because this further supports and validates our claim that the human TRMT1 sequence is cleaved at a similar rate to the known nsp8/9 viral polypeptide cleavage site.

      We will provide a detailed, point-by-point reply to all reviewer comments accompanying the forthcoming revised manuscript, in which we intend to have new and updated data and additional MD simulations that directly address key questions raised by the reviewers.

    1. Author Response

      We thank the reviewers for their suggestions in improving the manuscript. We are currently working on a formal revision and plan to submit a revised manuscript in the near future. However, we would be remiss, if we did not address concerns regarding the conceptual merits of the paper. Below we speak to major points of note that address select reviewer comments and the eLife assessment of our manuscript.

      eLife assessment:

      However, the strength of evidence is incomplete due to the concern that larval contraction is a result of chilling the nervous system and muscles, which causes spreading depolarization and mechanical contraction of the body, rather than an active sensorimotor response to cold.

      Reviewer #3:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no.

      We thank the reviewer for their questions and clarify, here. Exposure to cold temperatures does elicit a recognizable and repeatable pattern of behavior across multiple strains, including both wildtype and genetic control strains (w1118, Oregon R) and numerous control conditions that have been previously published (Himmel et al., 2021, Himmel et al., 2023, Patel et al., 2022, Turner et al., 2016, Turner et al., 2018, Tenedini et al., 2019). Our initial publication on Drosophila cold nociception demonstrated a variety of cold-evoked behavior responses including head and/or tail raising of the larva as well as contraction behavior. These behaviors were repeatedly observed in assays involving either local cold stimulation with a cold probe or global cold stimulation on a cold plate. Head and/or tail raise behaviors are consistent with behavior that displaces the larval body from the cold surface, however, exposure to increasingly colder temperatures leads to an increasing level of cold-evoked contraction (CT) responses which result in a reduction of larval area (Turner et al., 2016). Presumably, increasing the level of CIII md neuron activation leads to greater activation of downstream circuitry. We previously performed optogenetic dose response assays to further clarify the increased prevalence CT response to strong noxious cold stimuli and investigated how CIII md neurons discriminate between innocuous touch and noxious cold stimuli. Here, we found that lower-level activation of CIII md neurons lead to predominantly touch-evoked behaviors whereas high-level activation led predominantly to cold-evoked responses (Turner et al., 2016). These analyses were coupled with stimulus-evoked calcium imaging, which revealed that touch-evoked Ca2+ levels were significantly lower than cold-evoked Ca2+ levels (Turner et al., 2016).

      In this manuscript, we confirm our previously published findings that neural silencing of CIII md neurons with either tetanus toxin expression or impairing action potential propagation results impaired cold-evoked CT responses (Turner et al., 2016, Turner et al., 2018). However, neural silencing of CIII md neurons did not eliminate cold-evoked CT responses. We interpret this finding as evidence that some component of cold-evoked CT response may be due to cold-induced muscle contraction. Furthermore, in this manuscript, we implicate the requirement of chordotonal (Ch) neurons in cold-evoked CT and demonstrate cold-evoked Ca2+ increases in Ch neurons. Furthermore, neural silencing of multiple sensory neuron types (CIII + Ch or CIII + CII) resulted in greater deficits in cold-evoked behaviors (Turner et al., 2016). Thus, the noxious cold stimulus is detected by multiple peripheral sensory neurons and inhibiting neural activity in CIII md neurons alone cannot eliminate cold-evoked CT responses.

      In this manuscript and in several other publications, studies have shown that optogenetic activation of CIII md neurons, or CIII neurons plus CII neurons or Ch neurons elicits CT-like responses (Hwang et al., 2007, Shearin et al., 2013, Turner et al., 2016). Conversely, optogenetic stimulation of CIII md neurons knocked down for paralytic, the α-subunit of voltage-gated sodium channel, did not elicit blue light-evoked CT responses due to impaired action potential propagation. These analyses collectively indicate that CIII md neuron activation is sufficient for eliciting CT-like responses. Additionally, we have previously published electrophysiological recordings of CIII md neurons under cold exposure. To address potential confounds of cold-induced muscle contraction on cold-induced electrical activity of CIII md neurons, we performed these analyses on de-muscled fillets revealing that CIII neural activity is not dependent upon muscles in response to cold. Exposure to noxious cold stimuli results in temperature-dependent increases in CIII neuron firing pattern consisting of both bursting and tonic firing (Himmel et al., 2021, Himmel et al., 2023, Maksymchuk et al., 2022, Patel et al., 2022, Himmel et al., 2022, Maksymchuk et al., 2023).

      Reviewer #3:

      Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm?

      We were similarly curious about the neuroethological and/or protective implications of cold-evoked behaviors. In Drosophila larvae, noxious mechanical stimuli-evoked body rolling allows for lateral escape from predatory wasp (Hwang et al., 2007). Reducing the overall surface area that is exposed to cold (e.g., huddling behavior) serves as a protective strategy in many species (Canals et al., 1997, Contreras, 1984, Gilbert et al., 2006, Vickery and Millar, 1984, Hayes et al., 1992). Low temperatures can be fatal to poikilotherms (e.g., insects), however, many species have evolved the ability to cold acclimate thereby increasing their cold tolerance. To explore the potential evolutionary benefit of CIII-mediated contraction response to cold, we previously published work revealing a neural basis for cold acclimation in Drosophila larvae implicating these neurons (Himmel et al., 2021). We demonstrated that cold-evoked CT behavior is evolutionarily conserved across 11 different drosophilid species and that other cold-induced behaviors (e.g., tail raise) were also observed. Furthermore, drosophilid species adapted to rapid temperature swings were more likely to retain the ability to locomote even at lower temperatures (Himmel et al., 2021). Next, we elucidated the role of CIII md neurons in cold acclimation. Silencing CIII md neurons resulted in the inability to cold acclimate. We additionally investigated roles of Ch or CII md neurons, which alone did not inhibit the ability of larvae to cold acclimate. However, combinatorial silencing of CIII with CII or Ch neurons resulted in an inability to cold acclimate but did not obviously increase baseline cold tolerance. We explored how developmental exposure to noxious cold temperature impacts CIII md neuron cold-evoked firing pattern. Electrophysiological analyses revealed that cold acclimation results in hypersensitization in CIII md neurons (Himmel et al., 2021). Lastly, developmental optogenetic activation of CIII md neurons led to increased cold tolerance. Therefore, CIII md neurons are necessary and sufficient for cold tolerance and our collective evidence demonstrate that CIII-mediated cold nociception constitutes a peripheral neural basis for Drosophila larval cold acclimation (Himmel et al., 2021).

      Reviewer #3:

      It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013)…The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites)…. Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?”

      We agree with the reviewer that mild activation of CIII md neurons results in gentle touch-like responses. In this manuscript, and other previously published work, it has been shown that optogenetic activation of CIII neurons, or CIII neurons and other sensory neurons, using a variety of optogenetic actuators (ChR2, ChETA, and CsChrimson) promotes bilateral contraction of the larval body along the anterior-posterior axis (Shearin et al., 2013, Hwang et al., 2007, Meloni et al., 2020, Turner et al., 2016, Patel and Cox, 2017, Patel et al., 2022, Himmel et al., 2023).

      As described above, in our initial publication documenting larval cold nociception in Drosophila, we investigated how CIII md neurons discriminate multimodal stimuli to elicit stimulus relevant behavioral responses. We reported that increased activation of CIII md neurons results in cold-evoked behaviors, where lower activation results in touch-evoked behaviors. Subsequent, calcium analyses revealed greater stimulus-evoked calcium response to noxious cold and milder calcium response to gentle touch (Turner et al., 2016).

      Though we have not performed cold-evoked Ca2+ imaging of CIII md neurons in larval preparations without muscles, we have recorded electrical responses of CIII md neurons in the absence of muscle contractions using de-muscled larvae fillets to analyze cold-evoked firing patterns of CIII md neurons (Himmel et al., 2021, Himmel et al., 2022, Himmel et al., 2023, Patel et al., 2022, Maksymchuk et al., 2022, Maksymchuk et al., 2023). These studies demonstrate the cold-evoked CIII neural activity is not dependent upon muscles.

      Reviewer #3:

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.”

      We conducted extensive screening of interneuron populations post-synaptically connected to CIII neurons in an effort to identify post-synaptic partners that were sufficient to trigger CT response. Much to our surprise, we were unable to find any individual neuron type or driver line that was sufficient to elicit a CT response. However, we provide substantial supporting evidence for our co-activation experiments including neural silencing, EM connectivity and calcium imaging. We also report necessity for the reported second/third order neurons in cold-evoked behavioral responses, where inhibiting neural activity resulted in reduced cold-evoked behavior. Second/third order neurons also exhibit cold-evoked calcium responses. Lastly, we also report CIII-evoked (using optogenetics) increases in calcium response in downstream post-synaptic neurons.

      Previously published literature investigating CIV md neuron circuitry has implicated downstream neurons that are not sufficient to elicit rolling behavior upon activation. In CIV md neuron circuit dissection, select neurons are reported as acting downstream of CIV md neurons that require additional circuit components in order to execute rolling behavior. For example, A00c neuron activation alone does not lead to rolling behavior, however, co-activation of A00c and Basin-4 neurons facilitates rolling response (Ohyama et al., 2015). Similarly, co-activation of Basin-1 and Basin-4 neurons significantly enhance rolling probability relative to Basin-4 alone (Ohyama et al., 2015). Further, DnB neurons require Goro command neuron activity to promote rolling behavior (Burgos et al., 2018). Thus, there is precedent for co-activation requirements to elicit robust behavioral output in sensorimotor circuits and we employed a similar strategy after we discovered that activation of second or third order neurons alone did not elicit CT response.

      Reviewer #3:

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this co-activation approach is not particularly clear.”

      We agree with the reviewer’s comment that various cell-types that were investigated are repeated in every abdominal neuromere, however, only select post-synaptic neurons (Basin 1-4, DnB, mCSI, and Chair neurons) are segmentally repeated in every abdominal segment. Conversely, other projection and ascending neurons we investigated (A09e, A00c, A05q, Goro, TePn04/05, and A08n) are not segmentally repeated in every section. We used connectome evidence to guide our experiments on populations of neurons to explore in cold-evoked behavior and as alluded to above our co-activation approach was driven by the observation that an individual subpopulation of connected interneurons was not found to be sufficient to elicit CT behavior. That said, it does not change the findings that inhibition of neural activity in these subpopulations impairs cold-evoked behavior, nor does it change the observation that connected interneurons exhibit cold-evoked Ca2+ responses that can also be observed with optogenetic activation of CIII neurons. Reviewer #3: “The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.” Many studies exploring circuit bases of behavior have applied large-scale optogenetic, including co-activation strategies, or silencing screens to identify circuit components involved in specific behaviors under investigation. We employed similar methods in our circuit-based dissection and our conclusions are not solely based upon optogenetic analyses.

      References: BURGOS, A., HONJO, K., OHYAMA, T., QIAN, C. S., SHIN, G. J.-E., GOHL, D. M., SILIES, M., TRACEY, W. D., ZLATIC, M., CARDONA, A. & GRUEBER, W. B. 2018. Nociceptive interneurons control modular motor pathways to promote escape behavior in Drosophila. eLife, 7:e26016.

      CANALS, M., ROSENMANN, M. & BOZINOVIC, F. 1997. Geometrical aspects of the energetic effectivenes of huddling in small mammals. Acta Theriologica 42(3):321-328..

      CONTRERAS, L. C. 1984. Bioenergetics of Huddling: Test of a Psycho-Physiological Hypothesis. Journal of Mammalogy, 65, 256-262.

      GILBERT, C., ROBERTSON, G., LE MAHO, Y., NAITO, Y. & ANCEL, A. 2006. Huddling behavior in emperor penguins: Dynamics of huddling. Physiol Behav, 88, 479-88.

      HAYES, J. P., SPEAKMAN, J. R. & RACEY, P. A. 1992. The Contributions of Local Heating and Reducing Exposed Surface Area to the Energetic Benefits of Huddling by Short-Tailed Field Voles (Microtus agrestis). Physiological Zoology, 65, 742-762.

      HIMMEL, N. J., LETCHER, J. M., SAKURAI, A., GRAY, T. R., BENSON, M. N., DONALDSON, K. J. & COX, D. N. 2021. Identification of a neural basis for cold acclimation in Drosophila larvae. iScience, 24, 102657.

      HIMMEL, N. J., SAKURAI, A., DONALDSON, K. J. & COX, D. N. 2022. Protocols for measuring cold-evoked neural activity and cold tolerance in Drosophila larvae following fictive cold acclimation. STAR Protoc, 3, 101510.

      HIMMEL, N. J., SAKURAI, A., PATEL, A. A., BHATTACHARJEE, S., LETCHER, J. M., BENSON, M. N., GRAY, T. R., CYMBALYUK, G. S. & COX, D. N. 2023. Chloride-dependent mechanisms of multimodal sensory discrimination and nociceptive sensitization in Drosophila. elife, 12:e76863.

      HWANG, R. Y., ZHONG, L., XU, Y., JOHNSON, T., ZHANG, F., DEISSEROTH, K. & TRACEY, W. D. 2007. Nociceptive Neurons Protect Drosophila Larvae from Parasitoid Wasps. Current Biology, 17, 2105-2116.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. 2022. Transient and Steady-State Properties of Drosophila Sensory Neurons Coding Noxious Cold Temperature. Frontiers in Cellular Neuroscience, 16:831803.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. S. 2023. Cold-Temperature Coding with Bursting and Spiking Based on TRP Channel Dynamics in Drosophila Larva Sensory Neurons. Int J Mol Sci, 24(19):14638.

      MELONI, I., SACHIDANANDAN, D., THUM, A. S., KITTEL, R. J. & MURAWSKI, C. 2020. Controlling the behaviour of Drosophila melanogaster via smartphone optogenetics. Scientific Reports, 10, 17614.

      OHYAMA, T., SCHNEIDER-MIZELL, C. M., FETTER, R. D., ALEMAN, J. V., FRANCONVILLE, R., RIVERA-ALBA, M., MENSH, B. D., BRANSON, K. M., SIMPSON, J. H., TRUMAN, J. W., CARDONA, A. & ZLATIC, M. 2015. A multilevel multimodal circuit enhances action selection in Drosophila. Nature, 520, 633-639.

      PATEL, A. & COX, D. 2017. Behavioral and Functional Assays for Investigating Mechanisms of Noxious Cold Detection and Multimodal Sensory Processing in Drosophila Larvae. BIO-PROTOCOL, 7(13):e2388.

      PATEL, A. A., SAKURAI, A., HIMMEL, N. J. & COX, D. N. 2022. Modality specific roles for metabotropic GABAergic signaling and calcium induced calcium release mechanisms in regulating cold nociception. Front Mol Neurosci 15:942548.

      SHEARIN, H. K., DVARISHKIS, A. R., KOZELUH, C. D. & STOWERS, R. S. 2013. Expansion of the Gateway MultiSite Recombination Cloning Toolkit. PLoS ONE, 8, e77724-e77724.

      TENEDINI, F. M., SÁEZ GONZÁLEZ, M., HU, C., PEDERSEN, L. H., PETRUZZI, M. M., SPITZWECK, B., WANG, D., RICHTER, M., PETERSEN, M., SZPOTOWICZ, E., SCHWEIZER, M., SIGRIST, S. J., CALDERON DE ANDA, F. & SOBA, P. 2019. Maintenance of cell type-specific connectivity and circuit function requires Tao kinase. Nature Communications, 10, 3506.

      TURNER, H. N., ARMENGOL, K., PATEL, A. A., HIMMEL, N. J., SULLIVAN, L., IYER, S. C., BHATTACHARYA, S., IYER, E. P. R., LANDRY, C., GALKO, M. J. & COX, D. N. 2016. The TRP Channels Pkd2, NompC, and Trpm Act in Cold-Sensing Neurons to Mediate Unique Aversive Behaviors to Noxious Cold in Drosophila. Curr Biol, 26, 3116-3128.

      TURNER, H. N., PATEL, A. A., COX, D. N. & GALKO, M. J. 2018. Injury-induced cold sensitization in Drosophila larvae involves behavioral shifts that require the TRP channel Brv1. PLoS One, 13, e0209577.

      VICKERY, W. L. & MILLAR, J. S. 1984. The Energetics of Huddling by Endotherms. Oikos, 43, 88-93.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: The current study reports a cryo-EM structure of MFS transporter MelB trapped in an inward-facing state by a conformationally selective nanobody. The authors compare this structure to previously-resolved crystal structures of outward-facing MelB. Additionally, the authors report H/D exchange/ mass spec experiments that identify accessible residues in the protein.

      Strengths: The authors overcame very significant technical challenges to solve the first inward-facing structure of the small, model MFS transporter MelB by cryo-EM. The use of conformation-trapping nanobodies (which had been reported previously by this group) is particularly nice.

      We appreciate reviewer #1’s positive comments.

      Weaknesses: Maps and coordinates were not provided by the authors, which presents a gap in this assessment.

      We didn’t know specific requests for maps & coordinates during the initial submission but will provide them per request.

      The authors highlight the use of HDX experiments as a measurement of protein conformational dynamics. However, this experiment does not measure the conformational dynamics of the transporter, since in these experiments exchange is not initiated by ligand addition or another trigger. The experiment instead measures the accessibility of different residues, and of course, a freely-exchanging sodium bound transporter would have more exchangeable positions than when a conformation-trapping nanobody is bound. It is not clear what new mechanistic information this provides, since this property of the nanobody has already been established.

      We thank you for your comment. We will address your and reviewer 2’s similar questions later.

      Based on the evidence presented, it is somewhat speculative that the structure represents the EIIa-bound regulatory state.

      We believe that have presented convincing evidence obtained by ITC and gel-filtration chromatography to support this statement. The effects of Nb725 or EIIAGlc on MelB functions are similar: little change in Na+ binding, little change in Nb725 or EIIAGlc binding in the absence or presence of the EIIAGlc or Nb725, but a great reduction in sugar-binding affinity (sFigs. 2&3; tables 1&2; published two papers in J. Biol. Chem. 2014; 289: 33012-33019 and 2023; 299:104967). To make it clear, we will add the related data from the two JBC papers into the table 2. Nb725 and EIIAGlc can concurrently bind to MelBSt (sFigs. 2&3; tables 1&2). Further, we will provide a new figure to show that a complex composed of all three proteins can be isolated by gel-filtration chromatography. We have also established this finding with another Nb733 from the same family (JBC, 2023; 299:104967). However, given the EIIAGlc-bound structure has not been resolved yet, we will tune down the related argument.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, Hariharan and colleagues present an elegant study regarding the mechanistic basis of sugar transport by the prototypical Na+-coupled transporter MelB. The authors identified a nanobody (Nb 725) that reduces melibiose binding but not Na+ binding. In vitro (ITC) experiments suggest that the conformation targeted by this nanobody is different from the published outward-open structures. They go on to solve the structure of this other conformational by cryo-EM using the Nanobody grafted with a fiducial marker and enhancer and, as predicted, capture a new conformation of MelB, namely the inward-open conformation. Through MD simulations and ITC measurements, they demonstrate that such state has a reduced affinity for sugar but that Na+ binding is mostly unaffected. A detailed observation and comparison between previously published structures in the outward-open conformation and this new conformational intermediate allows to strengthen and develop the mobile barrier hypothesis underpinning sugar transport. The conformational transition to the inward-facing state leads to the formation of a barrier on the extracellular side that directly affects the amino acid arrangement of the sugar binding site, leading to a decreased affinity that drives the direction of transport. In contrast, the Na+ binding remains the same. This structural data is complemented with dynamic insights from HDX-MS experiments conducted in the presence and absence of the Nb. These measurements highlight the overall protective effect of nanobody binding, consistent with the stabilization of one conformational intermediate.

      Strengths: The experimental strategy to isolate this elusive conformational intermediate is smart and well-executed. The biochemical and biophysical data were obtained in a lipid system (nanodiscs), which allows dismissing questions about detergent induced artefacts. The new conformation observed is of great interest and allows to have a better mechanistic understanding of ion-coupled sugar transport. The comparison between the two structures and the mobile barrier mechanism hypothesis is convincingly depicted and tested.

      We appreciate the reviewer’s insightful understanding of our novel findings and the associated explanations on the cation-coupled symport mechanisms.

      Weaknesses: This is excellent experimental work. My recommendations stem mostly from concerns regarding the interpretation of the observed results. In particular, I am somewhat puzzled by the important role the authors give to the regulatory protein EIIa with little structural or biophysical data to back up their claims. The hypothesis that the conformation captured by the Nb is physiologically and functionally equivalent to that caused by EIIa binding is definitely a worthy hypothesis, but it is not an experimental result. Evidence in support could include a structure with EIIa bound. Since it does not bind at the same location as the Nb, it seems feasible. Or, the authors could have performed HDX-MS in the presence of EIIa to determine if the effect is similar to that of Nb_725 binding. In the absence of these experiments, discussion about EIIa should be limited. Along the same lines, I find it misleading to put in the abstract a sentence such as "It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding, and also a structure mimicking the physiological regulatory state of MelB under the global regulator EIIAGlc of the glucose-specific phosphoenolpyruvate:phosphotransferase system." None of this is supported by the experimental work presented in this article: the Na+ is modelled (with great confidence, but still) and whether this structure mimics the physiological state of MelB bound to EIIa is not known. The results of the paper are strong and interesting enough per se, and there is no need to inflate them with hypothesis that belongs to the discussion section.

      As stated in the response to reviewer 1, we believe that we presented strong data to argue for a structure mimicking the physiological regulatory state of MelB. The only missing data is the lack of the structure determination of the EIIA-bound state. We will change the title and tune down the related discussions in a new version.

      Regarding our statement in our abstract that “It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding”, we believe that our claim is supported by the resolved Na+ binding in the cryoEM structure. So far, to our knowledge, there was no experimentally determined cation on its canonical binding site reported yet.

      I also note that the HDX-MS experiments do not distinguish between two conformational states, but rather an ensemble of states vs one state.

      We will address both reviewers 1 and 2 together. We agree with your comments and we compared the one (inward) state and ensembles of (predominantly outward) states. A lot of published data have demonstrated that the WT MelBSt predominantly populates outward-facing states, especially in the presence of Na+. The major differences in HDX-MS between the inward-facing state in the presence of the Nb and the outward-facing ensembles in the absence of the Nb should be related to the conformational changes between the inward- and outward-facing states, but not quantitatively. The type of measurements we performed do not contain information on the rates of conformational changes, but this study identified the dynamics regions involved in this conformational switch.

      Reviewer #3 (Public Review):

      Summary: The manuscript authored by Lan Guan and colleagues reveals the structure of the cytosol-facing conformation of the MelB sodium/Li coupled permease using the nab-Fab approach and cryoEM for structure determination. The study reveals the conformational transitions in the melB transport cycle and allows understanding the role of sugar and ion specificities within this transporter.

      Strengths: The study employs a very exciting strategy of transferring the CDRS of a conformation specific nano body to the nab-fab system to determine the inward-open structure of MelB. The resolution of the structure is reasonable enough to support the major conclusions of the study. This is overall a well-executed study.

      Thank you for your positive comments.

      Weaknesses: The authors seem to have mixed up the exothermic and endothermic aspects of ITC binding in their description. Positive heats correspond to endothermic heat changes in ITC and negative heat changes correspond to exothermic heats. The authors seem to suggest the opposite.

      This is consistently observed throughout the manuscript.

      All of our ITC data are correctly presented. Our data were collected from the NanoITC (TA instruments, Inc), which directly measures the heat release/enthalpic changes and projects exotherm with positive values. This is in contrast to the MicroCal device, which detects heat changes through voltage compensation and exotherm is depicted with negative values. We will further emphasize this in related figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This manuscript describes a set of four passage-reading experiments which are paired with computational modeling to evaluate how task-optimization might modulate attention during reading. Broadly, participants show faster reading and modulated eye-movement patterns of short passages when given a preview of a question they will be asked. The attention weights of a Transformerbased neural network (BERT and variants) show a statistically reliable fit to these reading patterns above-and-beyond text- and semantic-similarity baseline metrics, as well as a recurrent-networkbased baseline. Reading strategies are modulated when questions are not previewed, and when participants are L1 versus L2 readers, and these patterns are also statistically tracked by the same transformer-based network.

      I should note that I served as a reviewer on an earlier version of this manuscript at a different venue. I had an overall positive view of the paper at that point, and the same opinion holds here as well.

      Strengths:

      • Task-optimization is a key notion in current models of reading and the current effort provides a computationally rigorous account of how such task effects might be modeled

      • Multiple experiments provide reasonable effort towards generalization across readers and different reading scenarios

      • Use of RNN-based baseline, text-based features, and semantic features provides a useful baseline for comparing Transformer-based models like BERT

      Thank you for the accurate summary and positive evaluation.

      Weaknesses:

      1) Generalization across neural network models seems, to me, somewhat limited: The transformerbased models differ from baseline models in numerous ways (model size, training data, scoring algorithm); it is thus not clear what properties of these models necessarily supports their fit to human reading patterns.

      Thank you for the insightful comment. To dissociate the effect of model architecture and the effect of training data, we have now compared the attention weights across three transformer-based models that have the same architecture but different training data/task: randomized (with all model parameters being randomized), pretrained, and fine-tuned models. Remarkably, even without training on any data, the attention weights in randomly initialized models exhibited significant similarity to human attention patterns (Figure. 3A). The predictive power of randomly initialized transformer-based models outperformed that of the SAR model. Through subsequent pre-training and fine-tuning, the predictive capacity of the models was further elevated. Therefore, both model architecture and the training data/task contribute to human-like attention distribution in the transformer models. We have now reported this result:

      “The attention weights of randomly initialized transformer-based models could predict the human word reading time and the predictive power, which was around 0.3, was significantly higher than the chance level and the SAR (Fig. 3A, Table S1). The attention weights of pre-trained transformerbased models could also predict the human word reading time, and the predictive power was around 0.5, significantly higher than the predictive power of heuristic models, the SAR, and randomly initialized transformer-based models (Fig. 3A, Table S1). The predictive power was further boosted for local but not global questions when the models were fine-tuned to perform the goal-directed reading task (Fig. 3A, Table S1).”

      In addition, we reported how training influenced the sensitivity of attention weights to text features and question relevance. As shown in Figure 4AB, attention in the randomized models were sensitive to text features across all layers. After pretraining, the models exhibited increased sensitivity to text features in the shallow layers, and decreased sensitivity to text features in deep layers. Subsequent finetuning on the reading comprehension task further attenuates the encoding of text features in deep layers but strengthens the sensitivity to task-relevant information.

      2) Inferential statistics are based on a series of linear regressions, but these differ markedly in model size (BERT models involve 144 attention-based regressor, while the RNN-based model uses just 1 attention-based regressor). How are improvements in model fit balanced against changes in model size?

      Thank you for pointing out this issue. The performance of linear regressions was evaluated based on 5-fold cross-validation, and the performance we reported was the performance on the test set. To match the number of parameters, we have now predicted human attention using the average of all heads. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript:

      “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”

      Also, it was not clear to me how participant-level variance was accounted for in the modeling effort (mixed-effects regression?) These questions may well be easily remedied by more complete reporting.

      In the previous manuscript, the word reading time was averaged across participants, and we did not consider the variance between participants. We have now analyzed eye movements of each participant and used the linear mixed effects model to test how different factors affected human word reading time to account for participantslevel and item-level variances.

      “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”

      “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”

      Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.

      3) Experiment 1 was paired with a relatively comprehensive discussion of how attention weights mapped to reading times, but the same sort of analysis was not reported for Exps 2-4; this seems like a missed opportunity given the broader interest in testing how reading strategies might change across the different parameters of the four experiments.

      Thank you for the valuable suggestion. We have now also characterized how different reading measures, e.g., gaze duration and counts or rereading, were affected by text and task-related features in Experiments 2-4.

      For Experiment 2: “For local questions, consistent with Experiment 1, the effects of question relevance significantly increased from early to late processing stages that are separately indexed by gaze duration and counts of rereading (Fig. S9A, Table S3).”

      For Experiment 3: “For local questions, the layout effect was more salient for gaze duration than for counts of rereading. In contrast, the effect of word-related features and task relevance was more salient for counts of rereading than gaze duration (Fig. S9B, Table S3).”

      For Experiment 4: “Both the early and late processing stages of human reading were significantly affected by layout and word features, and the effects were larger for the late processing stage indexed by counts of rereading (Fig. S9C, Table S3).”

      4) Comparison of predictive power of BERT weights to human annotations of text relevance is limited: The annotation task asked participants to chose the 5 "most relevant" words for a given question; if >5 words carried utility in answering a question, this would not be captured by the annotation. It seems to me that the improvement of BERT over human annotations discussed around page 10-11 could well be due to this arbitrary limitation of the annotations.

      Thank you for the insightful comment. We only allowed a participant to label 5 words since we wanted the participant to only label the most important information. As the reviewer pointed out, five words may not be enough. However, this problem is alleviated by having >26 annotators per question. Although each participant can label up to 5 words, pooling the results across >26 annotators results in nonzero relevance rating for an average 21.1 words for local questions and 26.1 words for global question. More important, as was outlined in Experimental Materials, we asked additional participants to answer questions based on only 5 annotated keywords. The accuracy for question answering were 75.9% for global questions and 67.6% for local questions, which was close to the accuracy achieved when the complete passage was present (Fig. 1B), suggesting that even 5 keywords could support question answering.

      5) Abstract ln 35: This concluding sentence didn't really capture the key contribution of the paper which, at least from my perspective, was something closer to "we offer a computational account of how task optimization modulates attention during reading"

      p 4 ln 66: I think this sentence does a good job capturing the main contributions of this paper

      Thanks for your suggestion. We have modified our conclusion in Abstract accordingly.

      6) p 4 ln 81: "therefore is conceptually similar" maybe "may serve a conceptually similar role"

      We have rewritten the sentence.

      “Attention in DNN also functions as a mechanism to selectively extract useful information, and therefore attention may potentially serve a conceptually similar role in DNN.”

      7) p. 7 ln 140: "disproportional to the reading time" I didn't understand this sentence

      Sorry for the confusion and we have rewritten the sentence.

      “In Experiment 1, participants were allowed to read each passage for 2 minutes. Nevertheless, to encourage the participants to develop an effective reading strategy, the monetary reward the participant received decreased as they spent more time reading the passage (see Materials and Methods for details).”

      8) p 8 ln 151: This was another sentence that helped solidify the main research contributions for me; I wonder if this framing could be promoted earlier?

      Thank you for the suggestion and we have moved the sentence to Introduction.

      9) p. 33: I may be missing something here, but I didn't follow the reasoning behind quantifying model fit against eye-tracking measures using accuracy in a permutation test. Models are assessed in terms of the proportion of random shuffles that show a greater statistical correlation. Does that mean that an accuracy value like 0.3 (p. 10 ln 208) means that 0.7 random permutations of word order led to higher correlations between attention weights and RT? Given that RT is continuous, I wonder if a measure of model fit such as RMSE or even R^2 could be more interpretable.

      We have now realized that the term “prediction accuracy” was not clearly defined and have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:

      “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”

      The permutation test was used to test if the predictive power is above chance. Specifically, if the predictive power is higher than the 95 percentile of the chancelevel predictive power estimated using permutations, the significant level (i.e., the p value) is 0.05. We have explained this in Statistical tests.

      10) p. 33: FDR-based multiple comparisons are noted several times, but wasn't clear to me what the comparison set is for any given test; more details would be helpful (e.g. X comparisons were conducted across passages/model-variants/whatever)

      Sorry for missing this important information. We have now mentioned which comparisons are corrected,

      “…Furthermore, the predictive power was higher for global than local questions (P = 4 × 10-5, bootstrap, FDR corrected for comparisons across 3 features, i.e., layout features, word features, and question relevance)…”

      Reviewer #2:

      In this study, researchers aim to understand the computational principles behind attention allocation in goal-directed reading tasks. They explore how deep neural networks (DNNs) optimized for reading tasks can predict reading time and attention distribution. The findings show that attention weights in transformer-based DNNs predict reading time for each word. Eye tracking reveals that readers focus on basic text features and question-relevant information during initial reading and rereading, respectively. Attention weights in shallow and deep DNN layers are separately influenced by text features and question relevance. Additionally, when readers read without a specific question in mind, DNNs optimized for word prediction tasks can predict their reading time. Based on these findings, the authors suggest that attention in real-world reading can be understood as a result of task optimization.

      The research question pursued by the study is interesting and important. The manuscript was well written and enjoyable to read. However, I do have some concerns.

      We thank the reviewer for the accurate summary and positive evaluation.

      1) In the first paragraph of the manuscript, it appears that the purpose of the study was to test the optimization hypothesis in natural tasks. However, the cited papers mainly focus on covert visual attention, while the present study primarily focuses on overt attention (eye movements). It is crucial to clearly distinguish between these two types of attention and state that the study mainly focuses on overt attention at the beginning of the manuscript.

      Thank you for pointing out this issue. We have explicitly mentioned that we focus on overt attention in the current study. Furthermore, we have also discussed that native readers may rely more on covert attention so that they do not need to spend more time overtly fixating at the task relevant words.

      In Introduction:

      “Reading is one of the most common and most sophisticated human behaviors [16, 17], and it is strongly regulated by attention: Since readers can only recognize a couple of words within one fixation, they have to overtly shift their fixation to read a line of text [3]. Thus, eye movements serve as an overt expression of attention allocation during reading [3, 18].”

      In Discussion:

      “Therefore, it is possible that when readers are more skilled and when the passage is relatively easy to read, their processing is so efficient so that they do not need extra time to encode task-relevant information and may rely on covert attention to prioritize the processing of task-relevant information.”

      2) The manuscript correctly describes attention in DNN as a mechanism to selectively extract useful information. However, eye-movement measures such as gaze duration and total reading time are primarily influenced by the time needed to process words. Therefore, there is a doubt whether the argument stating that attention in DNN is conceptually similar to the human attention mechanism at the computational level is correct. It is strongly suggested that the authors thoroughly discuss whether these concepts describe the same or different things.

      Thank you for bringing up this very important issue and we have added discussions about why human and DNN may generate similar attention distributions. For example, we found that both DNN and human attention distributions are modulated by task relevance and word properties, which include word length, word frequency, and word surprisal. The influence of task relevance is relatively straightforward since both human readers and DNN should rely more on task relevant words to answer questions. The influence of word properties is less apparent for models than for human readers and we have added discussions:

      For DNN’s sensitivity to word surprisal:

      “The transformer-based DNN models analyzed here are optimized in two steps, i.e., pre-training and fine-tuning. The results show that pre-training leads to text-based attention that can well explain general-purpose reading in Experiment 4, while the fine-tuning process leads to goal-directed attention in Experiments 1-3 (Fig. 4B & Fig. 5A). Pre-training is also achieved through task optimization, and the pre-training task used in all the three models analyzed here is to predict a word based on the context. The purpose of the word prediction task is to let models learn the general statistical regularity in a language based on large corpora, which is crucial for model performance on downstream tasks [21, 22, 33], and this process can naturally introduce the sensitivity to word surprisal, i.e., how unpredictable a word is given the context.”

      For DNN’s sensitivity to word length:

      “Additionally, the tokenization process in DNN can also contribute to the similarity between human and DNN attention distributions: DNN first separates words into tokens (e.g., “tokenization” is separated into “token” and “ization”). Tokens are units that are learned based on co-occurrence of letters, and is not strictly linked to any linguistically defined units. Since longer words tend to be separated into more tokens, i.e., fragments of frequently co-occurred letters, longer words receive more attention even if the model pay uniform attention to each of its input, i.e., a token.”

      3) When reporting how reading time was predicted by attention weights, the authors used "prediction accuracy." While this measure is useful for comparing different models, it is less informative for readers to understand the quality of the prediction. It would be more helpful if the results of regression models were also reported.

      Sorry for the confusion. The prediction accuracy was defined as the correlation coefficient between the predicted and actual eye-tracking measures. We have now realized that the term “prediction accuracy” might have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:

      “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”

      4) The motivations of Experiments 2 and 3 could be better described. In their current form, it is challenging to understand how these experiments contribute to understanding the major research question of the study.

      Thank you for pointing out this issue. In Experiments 1, different types of questions were presented in separate blocks, and all the participants were L2 reader. Therefore, we conducted Experiments 2 and 3 to examine how reading behaviors were modulated when different types of questions were presented in a mixed manner, or when participants were L1 readers. We have now clarified the motivations:

      “In Experiment 1, different types of questions were presented in blocks which encouraged the participants to develop question-type-specific reading strategies. Next, we ran Experiment 2, in which questions from different types were mixed and presented in a randomized order, to test whether the participants developed question-type-specific strategies in Experiment 1.”

      “Experiments 1 and 2 recruited L2 readers. To investigate how language proficiency influenced task modulation of attention and the optimality of attention distribution, we ran Experiment 3, which was the same as Experiment 2 except that the participants were native English readers.”

      Reviewer #3:

      This paper presents several eyetracking experiments measuring task-directed reading behavior where subjects read texts and answered questions.

      It then models the measured reading times using attention patterns derived from deep-neural network models from the natural language processing literature.

      Results are taken to support the theoretical claim that human reading reflects task-optimized attention allocation.

      STRENGTHS:

      1) The paper leverages modern machine learning to model a high-level behavioral task (reading comprehension). While the claim that human attention reflects optimal behavior is not new, the paper considers a substantially more high-level task in comparison to prior work. The paper leverages recent models from the NLP literature which are known to provide strong performance on such question-answering tasks, and is methodologically well grounded in the NLP literature.

      2) The modeling uses text- and question-based features in addition to DNNs, specifically evaluates relevant effects, and compares vanilla pretrained and task-finetuned models. This makes the results more transparent and helps assess the contributions of task optimization. In particular, besides finetuned DNNs, the role of the task is further established by directly modeling the question relevance of each word. Specifically, the claim that human reading is predicted better by task-optimized attention distributions rests on (i) a role of question relevance in influencing reading in Expts 1-2 but not 4, and (ii) the fact that fine-tuned DNNs improve prediction of gaze in Expts 1-2 but not 4.

      3) The paper conducts experiments on both L2 and L1 speakers.

      We thank the reviewer for the accurate summary and positive evaluation.

      WEAKNESSES:

      1) The paper aims to show that human gaze is predicted the the DNN-derived task-optimal attention distribution, but the paper does not actually derive a task-optimal attention distribution. Rather, the DNNs are used to extract 144 different attention distributions, which are then put into a regression with coefficients fitted to predict human attention. As a consequence, the model has 144 free parameters without apparent a-priori constraint or theoretical interpretation. In this sense, there is a slight mismatch between what the modeling aims to establish and what it actually does.

      Regarding Weakness (1): This weakness should be made explicit, at least by rephrasing line 90. The authors could also evaluate whether there is either a specific attention head, or one specific linear combination (e.g. a simple average of all heads) that predicts the human data well.

      Thank you for pointing out this issue. One the one hand, we have now also predicted human attention using the average of all heads, i.e., the simple average suggested by the reviewer. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript.

      “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”

      On the other hand, since different attention weights may contribute differently to the prediction of human reading time, we have now also reported the weights assigned to individual attention head during the original regression analysis (Fig. S4). It was observed that the weight was highly distributed across attention head and was not dominated by a single head.

      Even more importantly, we have now rephrased the statement in line 90 of the previous manuscript:

      “We employed DNNs to derive a set of attention weights that are optimized for the goal-directed reading task, and tested whether such optimal weights could explain human attention measured by eye tracking.”

      Furthermore, in Discussion, we mentioned that:

      “Furthermore, we demonstrate that both humans and transformer-based DNN models achieve taskoptimal attention distribution in multiple steps… Similarly, the DNN models do not yield a single attention distribution, and instead it generates multiple attention distributions, i.e., heads, for each layer. Here, we demonstrate that basic text features mainly modulate the attention weights in shallow layers, while the question relevance of a word modulates the attention weights in deep layers, reflecting hierarchical control of attention to optimize task performance. The attention weights in both the shallow and deep layers of DNN contribute to the explanation of human word reading time (Fig. S4).”

      2) While Experiment 1 tests questions from different types in blocks, and the paper mentions that this might encourage the development of question-type-specific reading strategies -- indeed, this specifically motivates Experiment 2, and is confirmed indirectly in the comparison of the effects found in the two experiments ("all these results indicated that the readers developed question-typespecific strategies in Experiment 1") -- the paper seems to miss the opportunity to also test whether DNNs fine-tuned for each of the question-types predict specifically the reading times on the respective question types in Experiment 1. Testing not only whether DNN-derived features can differentially predict normal reading vs targeted reading, but also different targeted reading tasks, would be a strong test of the approach.

      Regarding Weakness (2): results after finetuning for each question type could be reported.

      Thank you for the valuable suggestion. We have now fine-tuned the models separately based on global and local questions. The detailed fine-tuning parameters employed in the fine-tuning process were presented in Author response table 1.

      Author response table 1.

      The hyperparameter for fine-tuning DNN models with specific question type.

      The fine-tuning process yielded a slight reduction in loss (i.e., the negative logarithmic score of the correct option) on the validation set. Specifically, for BERT, the loss decreased from 1.08 to 0.96; for ALBERT, it decreased from 1.16 to 0.76; for RoBERTa, it went down from 0.68 to 0.54. Nevertheless, the fine-tuning process did not improve the prediction of reading time (Author response image 1). A likely reason is that the number of global and local questions for training is limited (local questions: 520; global questions: 280), and similar questions also exist in RACE dataset that is used for the original fine tuning (sample size: 87,866). Therefore, a small number of questions can significantly change the reading strategy of human readers but using these questions to effectively fine-tune a model seems to be a more challenging task.

      Author response image 1.

      Fine-tuning based on local and global questions does not significantly modulate the prediction of human reading time. Lighter-color symbols show the results for the 3 BERT-family models (i.e., BERT, ALBERT, and RoBERTa) and the darker-color symbols show the average over the 3 BERT-family models. trans_fine: model fine-tuned based on the RACE dataset; trans_local: models additionally fine-tuned using local questions; trans_global: models additionally fine-tuned using global questions.

      3) The paper compares the DNN-derived features to word-related features such as frequency and surprisal and reports that the DNN features are predictive even when the others are regressed out (Figure S3). However, these features are operationalized in a way that puts them at an unfair disadvantage when compared to the DNNs: word frequency is estimated from the BNC corpus; surprisal is derived from the same corpus and derived using a trigram model. The BNC corpus contains 100 Million words, whereas BERT was trained on several Billions of words. Relatedly, trigram models are now far surpassed by DNN-based language models. Specifically, it is known that such models do not fit human eyetracking reading times as well as modern DNN-based models (e.g., Figure 2 Dundee in: Wilcox et al, On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior, CogSci 2020). This means that the predictive power of the word-related features is likely to be underestimated and that some residual predictive power is contained in the DNNs, which may implicitly compute quantities related to frequency and surprisal, but were trained on more data. In order to establish that the DNN models are predictive over and above word-related features, and to reliably quantify the predictive power gained by this, the authors could draw on (1) frequency estimated from the corpora used for BERT (BookCorpus + Wikipedia), (2) either train a strong DNN language model, or simply estimate surprisal from a strong off-the-shelf model such as GPT-2.

      This concern does not fundamentally cast doubt on the conclusions, since the authors found a clear effect of the task relevance of individual words, which by definition is not contained in those baseline models. However, Figure S3 -- specifically Figure S3C -- is likely to inflate the contribution of the DNN model over and above the text-based features.

      Thank you for pointing out these issues. Following the valuable suggestion of the reviewer, we have now 1) computed word frequencies based on BookCorpus and Wikipedia and 2) calculated word surprisal using GPT-2.

      “The word features included word length, logarithmic word frequency estimated based on the BookCorpus [62] and English Wikipedia using SRILM [68], and word surprisal estimated from GPT-2 Medium [69].”

      These recalculated word frequency and surprisal are correlated with the original measures (word frequency: 0.98; surprisal: 0.59), and the updated results are also closely aligned with those reported in the previous manuscript.

      Others:

      1) How does the statistical modeling take into account that measures are repeated both within the items (same texts read by different subjects) and within the subjects (some subject read multiple texts)? I only see the items-level repetition be addressed in line 715-721 in comparing between local and global questions, but not elsewhere. The standard approach in the literature on human reading times (e.g. the Wilcox et al paper mentioned above, or ref. 44) is to use mixed-effects regression with appropriate random effects for items and subjects. The same question applies to the calculation of chance accuracy (line 702-709), which is done by shuffling words within a passage. Relatedly, how exactly was cross-validation (line 681) calculated? On the level of subjects, individual words, trials, texts, ...?

      Thank you for raising up this issue. In the previous manuscript, the word reading time was averaged across participants. The cross-validation was conducted on the level of texts (i.e., passages). Following the valuable suggestion, we have now separately analyzed each participant and applied the linear mixed effects models.

      “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”

      “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”

      Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.

      2) I could not find any statement about code availability (only about data availability). Will the source code and statistical analysis code also be made available?

      We have added the code availability statement.

      “The code is now available at https://github.com/jiajiezou/TOA.”

      3) The theoretical claim, and some basic features of the research, are quite similar to other recent work (Hahn and Keller, Modeling task effects in human reading with neural network-based attention, Cognition, 2023; cited with very little discussion as ref 44), which also considered task-directed reading in a question-answering task and derived task-optimized attention distributions. There are various differences, and the paper under consideration has both weaknesses and strengths when compared to that existing work -- e.g., that paper derived a single attention distribution from task optimization, but the paper under consideration provides more detailed qualitative analysis of the task effects, uses questions requiring more high-level reasoning, and uses more state-of-the-art DNNs.

      The paper would benefit from being more explicit about how the work under review provides a novel angle over Ref 44 (Hahn and Keller, Cognition, 2023).

      Thanks for bringing up this issue. We have now incorporated a more comprehensive discussion that compare the current study with the recent work conducted by Hahn and Keller:

      “When readers read a passage to answer a question that can be answered using a word-matching strategy [45], a recent study has demonstrated that the specific reading goal modulates the word reading time and the effect can be modeled using a RNN model [46]. Here, we focus on questions that cannot be answered using a word-matching strategy (Fig. 1B) and demonstrate that, for these challenging questions, attention is still modulated by the reading goal but the attention modulation cannot be explained by a word-matching model (Fig. S3). Instead, the attention effect is better captured by transformer models than an advanced RNN model, i.e., the SAR (Fig. 3A). Combining the current study and the study by Hahn et al. [46], it is possible that the word reading time during a general-purpose reading task can be explained by a word prediction task, the word reading time during a simple goal-directed reading task that can be solved by word matching can be modeled by a RNN model, while the word reading time during a more complex goal-directed reading task involving inference is better modeled using a transformer model. The current study also further demonstrates that elongated reading time on task-relevant words is caused by counts of rereading and further studies are required to establish whether earlier eye movement measures can be modulated by, e.g., a word matching task.”

      4) In Materials&Methods, line 599-636, specifically when "pretraining" is mentioned (line 632), it should be mentioned what datasets these DNNs were pretrained on.

      We have now mentioned this in the revised manuscript:

      “The pre-training process aimed to learn general statistical regularities in a language based on large corpora, i.e., BooksCorpus [62] and English Wikipedia…”

    1. Author Response

      Reviewer 1 (Public Review)

      Summary: The authors have made a novel and important effort to distinguish and include different sources of active deformations for fitting C elegans embryo development: cyclic muscle contrac- tions and actomyosion circumferential stresses. The combination and synchronisation of both contributions are, according to the model, responsible for different elongation rates, and can in- duce bending and torsion deformations, which are a priori not expected from purely contractile forces. The model can be applied to other growth processes in initially cylindrical shapes.

      Strengths: The model allows us to fit and deduce specific growth patterns, frequencies, and lo- cations of contractions that yield the observed axial elongation during the 240 min of the studied process.

      The deformation gradient is decomposed according to muscle and actomyosin activity, which can be distinguished and quantified. An energy-transferring process allows for the retrieval of the nec- essary permanent deformations that embryo development requires.

      Weaknesses: Despite the completeness of the model, the explanation of the methodology needs to be improved. Parameters and quantities are not always explained in the main text and are intro- duced on some occasions in an ordered manner. This makes the comprehension and deduction of methodology difficult. There are some minor comments that are listed below. The most important points are:

      How are the authors sure that there is a torsional deformation? Without tracking the muscle fibers, bending with respect to different angles for different Zs may yield a shape similar to the one in Figure 6E. Furthermore, it is unclear why the model yields torsion deformation. If material points of actomyosin rings do not change in reference configuration, no helicoidal growth should be happening.

      Our torsional deformations were obtained computationally, and the results are plotted in Figure 6 according to our formalism. In our approach, the torsional deformation results from the interaction between the vertical muscles and the circumferential actin network: the muscles bend the cylinder and the bending modifies the direction of the actin fibers, as demonstrated in the experiment.

      -The triple decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺𝑖 ⋅ 𝐺0 seems to complicate the expressions of growth and requires the use of angles alpha and beta due to the initial deformation 𝐺0. Why not use a simpler decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺, where 𝐺 contains all contributions from actomyosin and muscle contrac- tions in a material frame? This would avoid considering angles alpha and beta.

      𝐺0 represents the active strain during the early elongation stage and 𝐺𝑖 during the late elongation stage respectively. Such a decomposition which is not mandatory, allows a better un- derstanding. In addition, due to the late elongation stage, both muscle and actin networks must be considered, and their orientation changes with deformation. Therefore, it is clearer and simpler to express the active strain in terms of alpha and beta angles.

      The section "Energy transformation and Elongation" is unclear. Indeed, stresses need to relax, oth- erwise, the removal of muscle and actin activity would send the embryo back to its initial state. How- ever, the rationale behind the energy transfer is not explained. Authors seem to impose 𝑊𝑐 = 𝑊𝑟, and from this deduce the necessary actin contraction after muscle relaxation. Why should energy be maintained when muscle relaxes? Which mechanism physically imposes this energy transfer? Muscle contraction could indeed induce elongation if traction forces at the opposite side of the contracting muscle relax. In fact, an alternative approach for obtaining stress relaxation and axial elongation would be converting part of the elastic deformation 𝐹𝑒 to a permanent deformation 𝐹𝑝.

      In this section, we do assume that all the energy accumulated by the muscle contrac- tions will be converted into the energy necessary for elongation, and as our estimate in the article shows, 𝑊𝑐 is indeed greater than 𝑊𝑟, indicating that a significant fraction of 𝑊𝑐 is converted into dissipation and friction, but also into the reorganization of the actin cables. Indeed, elongation of the cylinder induces a significant reduction in the experimentally observed and also in the actin cable density. However, this reduction in cable density is not observed experimentally. Thus, elon- gation requires a reorganization of the actin network, which is part of the energy consumption and which explains the existence of a permanent deformation 𝐹𝑝.

      Self contact is ignored. This may well be a shape generator and responsible for bending deforma- tions. The convoluted shape of the embryo in the confined space deserves at least commenting on this limitation of the model.

      Thank you for your suggestion. We have considered the effect of contact between C. elegans and the eggshell in the energy dissipation section but we also agree that the self-contact of the worm in confinement will be important. Here, we focus mainly on active filaments: actomyosin and muscle, and we restrict ourselves to a cylindrical shell that is far from the embryo.

      Reviewer 2 (Public Review)

      Summary

      During C. elegans development, embryos undergo elongation of their body axis in the absence of cell proliferation or growth. This process relies in an essential way on periodic contractions of two pairs of muscles that extend along the embryo’s main axis. How contraction can lead to extension along the same direction is unknown.

      To address this question, the authors use a continuum description of a multicomponent elastic solid. The various components are the interior of the animal, the muscles, and the epidermis. The different components form separate compartments and are described as hyperelastic solids with different shear moduli. For simplicity, a cylindrical geometry is adopted. The authors consider first the early elongation phase, which is driven by contraction of the epidermis, and then late elongation, where contraction of the muscles injects elastic energy into the system, which is then released by elongation. The authors get elongation that can be successfully fitted to the elongation dynamics of wild-type worms and two mutant strains.

      Strengths

      The work proposes a physical mechanism underlying a puzzling biological phenomenon. The framework developed by the authors could be used to explain phenomena in other organisms and could be exploited in the design of soft robots.

      Weaknesses

      1) This reviewer considers that the quality of the writing is poor. Because of this the main result of this work, how elongation is achieved by contraction, remains unclear to me. In the opinion of this reviewer, the work is not accessible to a biologist. This is a real pity because the findings are potentially of great interest to developmental biologists and engineers alike.

      We regret that, despite a general introduction and a number of figures, the work does not seem accessible to biologists.

      2) The authors assume that the embryo is elastic throughout all stages of development. Is this assumption appropriate? In my opinion, the authors need to critically discuss this assumption and provide justification. Would this still be true for the adult? If so could the adult relax back to the state prior to elongation? The embryo should be able to do that, if the contractility of the epidermis were sufficiently reduced, right?

      Soft tissues are elastic, the modeling of soft tissues, even with large deformations, is now well established. The difference between a worm embryo and an adult is first of all the quality of the tissues, their low degree of heterogeneity, the weakness of the muscles and the absence of bones. As for the question of complete relaxation of the stresses, the fact that different components are attached to each other limits complete relaxation. We keep our fingerprints and cortical undula- tions, although they originate from an elastic instability that occurs in fetal life. It never disappears.

      The authors impose strains rather than stress. Since they want to understand the final deformation, I find this surprising. Maybe imposing strain or stress is equivalent, but then you should discuss this.

      Perhaps, the referee has in mind the question of active strain versus active stress and is concerned about the representation of biological forces such as those produced by actomyosin or muscle. In fact, both exist in morphoelasticity and are, of course, related. Usually, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      4) Does your mechanism need 4 muscle strands or would 2 be sufficient?

      First, the 4 muscle strands are consistent with real C. elegans structures, and second, although we assume that two muscles on the same side contract simultaneously, their size and position affect the deformation results. Also, the time period we consider is just before the worm hatches. After that, the worm has to slide on the ground. So efficient muscles are needed.

      5) It is sometimes hard to understand, whether the authors are talking about the model or the worm.

      It will be corrected in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The authors thank the reviewers for their thoughtful and constructive comments. We address each comment below and have uploaded a revised manuscript.

      Public Reviews

      1) One key point that could use further clarification is how to interpret densities in the reconstruction that do overlap with the template. If the omitted regions can be reliably reconstructed, and the density is smooth throughout, it implies the detected particles are not only (mostly) true positives but also their poses must be essentially correct. Therefore, why cannot the entire reconstruction be trusted, including portions overlapping with the template? In the "Future applications" section, the authors state that in order to obtain a reconstruction that is entirely devoid of template bias, it would be necessary to successively omit parts of the template structure through its entirety. I wonder if that is really necessary and if the presented approach of omitting template portions could be better framed as a "gold-standard" validation procedure.

      Our assumption is indeed that the entire reconstruction can be trusted if the omitted features are faithfully reproduced in the reconstruction. We have added a sentence in the discussion to clarify this. However, we think that assessing template bias will still require the omit test (see also our reply below). Also, as discussed in the manuscript, there is likely a little bias left, even if it is not directly visible in the reconstruction. Therefore, if the goal is an entirely unbiased reconstruction, the only way will be to successively omit parts of the template structure throughout the template.

      2) In other words, given the compelling evidence provided by the reconstructions in the omitted areas, I find it hard to imagine how the procedure would be "hallucinating" features in the rest of the structure, as the entire reconstruction depends on the same pose and defocus parameters. A possible experiment to test this hypothesis would be to go the opposite way, deliberately adding an unrealistic feature to the bait and checking whether it comes up in the reconstruction, while at the same time checking how it behaves in omitted parts.

      Template bias might be generated in different ways. A common situation is the presence of noise, which causes biased deviations of the best template match from their “true” match that would just align the target signal to the template. Another type of bias may occur when there is a mismatch between the template and the detected target. The target may still be detected if there is sufficient structural overlap with the template. Since there might not be a clear “correct” alignment of a mismatching target to the template, the best alignment may again be biased, generating artificial density in the reconstruction. This second case may produce bias that is more pronounced in the mismatching regions. The different origins of bias will have to be investigated more thoroughly in another study. For the present study, however, we maintain that unless there is some assessment of bias in a given location, one cannot completely rule out bias based on the absence of it elsewhere in the reconstruction.

      3) When assessing their approach to in situ data (the yeast ribosome), it is intriguing to see that the resolution downgraded from 3.1 to 8 Å when refinement of the particle poses against the current reconstruction was attempted. The authors do provide some possible explanations, such as the reduced signal of the reconstruction at high resolution and the crowded background, but it leaves one to wonder if this means that a 3.1 Å reconstruction could never be obtained from these data by conventional single-particle analysis procedures.

      The refinement results with our in situ data do indeed appear to be limited to low resolution when using the conventional single-particle pipeline and software. It might be possible to improve refinement by introducing certain priors, filters and masking functions that are optimized for the increased background and spectral properties of in situ data. Also, we have not tested all available software, and some might perform better than others. It is worth noting that in a different study using our data, by Cheng et al (2023) and cited in our manuscript, the resolution of the refined reconstruction using different software was ~7 Å resolution, i.e., close to what we report here. Finally, refinement of the detected targets against a high-resolution template does work but since it involved the template, we regard this as part of the template matching process.

      4) Furthermore, in the section "Quantifying template bias", the authors make the intriguing statement that there can still be some overfitting of noise even in true positives. I understand this overfitting would occur in the form of errors in the pose and defocus estimation, but a clarification would be helpful.

      We have added a sentence in the Discussion to clarify where this bias may come from.

      5) In the Discussion, the claim that "it is not necessary to use tomography to generate high-resolution reconstructions of macromolecular complexes in cells" is a misconception, at least in part. As demonstrated in works by the same group and others (https://doi.org/10.1016/j.xinn.2021.100166, https://doi.org/10.1038/s41467-023-36175-y, https://doi.org/10.1038/s41586-023-05831-0), 2D imaging of native cellular environments does offer a faster and better way to obtain high-resolution reconstructions compared to tomography. However, tomography provides the entire 3D context of the macromolecules, such as their localization to membranes and the cellular architecture, which can be readily visualized in a tomogram even at low resolution, so methods for structure determination from tilt series data such as subtomogram averaging remain of paramount importance. Most likely, a combination of 2D and 3D imaging approaches will be necessary to retrieve both the highest structural resolution and their cellular context to address biological questions.

      We agree and have modified our statement accordingly.

      6) The "Materials and Methods" section lacks a description of transmission electron microscopy data collection.

      We are sorry for this oversight and have added these details.

      7) Finally, the preprint version of this work posted on bioRxiv (https://doi.org/10.1101/2023.07.03.547552) contains the following competing interests statement, which is missing from the submitted version: "The authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts."

      This is correct. The statement was missing in the first version of the uploaded manuscript and was added after consultation with the eLife editorial office.

      8) Quantification of the amount of model bias is then performed using omit maps, where every 20th residue is removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positives. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectation of the number of false positives does not always match the correct number of false positives, probably due to inaccuracies in the noise model for more complicated images. This may warrant further in-depth discussion in a revised manuscript.

      We have added further thoughts regarding the mismatch between expected and actual number of false positives in the Discussion section. A full understanding of the issue likely requires further study, which is currently underway.

      9) The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions, and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field. However, its main point is to promote high-resolution 2D template matching (2DTM) as a more universal analysis method for in vitro and, more importantly, in situ data. While the experiments performed to that end are sound and well-executed in principle, I fail to make that specific conclusion from their results.

      We do not see 2DTM as a more universal analysis method for in vitro and in situ data, but as simply as another method that can be used. We have added a sentence in the introduction to clarify this.

      10) The authors correctly point out that overfitting is largely enabled by the presence of false-positives in the data set. They go on to perform their in situ experiments with ribosomes, which provide an extremely favorable amount of signal that is unrealistic for the vast majority of the proteome. This seems cherry-picked to keep the number of false-positives and false-negatives low. The relationship between overfitting/false-positive rate and the picking threshold will remain the same for smaller proteins (which is a very useful piece of knowledge from this study). However, the false-negative rate will increase a lot compared to ribosomes if the same high picking threshold is maintained. This will limit the applicability of 2DTM, especially for less-abundant proteins.

      The reviewer is correct that the lower SNR of smaller targets poses a fundamental limit to 2DTM. We have stated this in previous studies and have added a sentence in the introduction of the current manuscript to clarify this.

      11) I would like to see an ablation study: Take significantly smaller segments of the ribosome (for which the authors already have particle positions from full-template matching, which are reasonably close to the ground-truth), e.g. 50 kDa, 100 kDa, 200 kDa etc., and calculate the false-negative rate for the same picking threshold. If the resulting number of particles does plummet, it would be very helpful to discuss how that affects the utility of 2DTM for non-ribosomes in situ.

      The suggested ablation study is a good idea and was reported by Rickgauer et al (2020), cited in our manuscript. We added our own analysis for this dataset in Figure 4-figure supplement 1 and show the proportion of LSUs detected as a function of template mass, indicating detection limit of ~300 kDa. We also added a note in the Results section to explain that the threshold we use to limit false positives means that there are also false negatives, with a rate that depends on their molecular mass.

      12) Another point of concern is the dramatic resolution decrease to 8 A after multiple iterations of refinement against experimental reconstructions described in line 159. Was this a local search from the poses provided by 2DTM, or something more global? While this is not a manifestation of overfitting as the authors have conclusively shown, I think it adds an important point to the ongoing "But do we really need tomograms, or can we just 2D everything?" debate in the field, which is also central to the 2D part of 2DTM. Reaching 8 A with 12k ribosome particles would be considered a rather poor subtomogram averaging result these days. Being in the "we need tilt series to be less affected by non-Gaussian noise" camp myself, I wonder if this indicates 2D images are inherently worse for in situ samples. If they are, the same limitations would extend to template matching. In that case, shouldn't the authors advocate for 3DTM instead of 2DTM? It may not be needed for ribosomes, but could give smaller proteins the necessary edge.

      We have extensively discussed the advantages and disadvantages of both tomography and 2DTM (Lucas et al, 2021) and think it is not useful to talk in terms of “better” and “worse”. Instead, each technique has its areas of application, and we maintain that a combination of the two may give the best results. The limitation of 8 Å does not apply to reconstructions aligned against high-resolution templates, as demonstrated in the present study. Regarding noise models, there is also need for these in 3DTM, as explained in recent publications: Maurer et al (2023), bioRxiv, doi.org/10.1101/2023.09.06.556487; Cruz-León et al (2023), bioRxiv, doi.org/10.1101/2023.09.05.556310; Chaillet et al (2023), Int. J. Mol. Sci. 24, 13375.

      13) Right now, this study is also an invitation to practitioners who do not understand the picking threshold used here and cannot relate it to other template-matching programs to do a lot of questionable template matching and claim that the results are true because templates are "unoverfittable". I think such undesirable consequences should be discussed prominently.

      We have added a discussion of this point in the Discussion section.

      Recommendations for the authors

      1) Lines 58-59: What does "nominally untilted" mean? Has the lamella pre-tilt (milling angle) been taken into account or not? If yes, how?

      The lamella milling angle was not taken into account, so there is a tilt built into the sample of about 8° that was not compensated for by a counter-tilt of the microscope goniometer. We have added a note to explain this in the text of the manuscript.

      2) Lines 113-114: A brief explanation of the threshold calculation method from Rickgauer et al, 2017 to achieve an expected false positive rate of one per micrograph would be helpful here.

      We describe the equation for estimating the false discovery rate later in the manuscript. We have added a note in the text to point the reader to the relevant section of the manuscript.

      3) For consistency, it would be interesting to include a plot of the SNR peaks found by 2DTM in the in situ dataset, that could be directly compared to Figure 1 - figure supplement 1B.

      We have added this to Figure 2 - figure supplement 1A-C, to directly compare to Figure 1 – figure supplement 1A-C.

      4) Showing model-map FSC curves between the density retrieved from the omitted areas and their respective models would provide further evidence not only that they are correct but to what extent.

      An FSC calculation would be challenging for small regions, such as side chains and drugs, due to masking artifacts. Moreover, the model was built into an in vitro determined map and was not fit into the in vivo map calculated here. Therefore, deviations between the map and model may reflect differences between the two conditions and may not reflect the agreement of the map to the in vivo structure.

      5) Lines 128-130: The figure references are wrong. Here, Figure 1B should probably be Figure 1A (or 1B), and Figure 1C clearly refers to Supplementary Figure 1F (FSC curve).

      We have corrected the incorrect figure references.

      6) Line 125: Wrong figure reference, Figure 1A here refers to Supplementary Figure 1B (cross-correlation peaks).

      We have corrected the incorrect figure references.

      7) I haven't been able to find mention of code availability in the manuscript. Given that it is a major outcome of the study, I think it should be provided.

      The code is available from the cisTEM repository, github.com/timothygrant80/cisTEM, and an executable version of the program measure_template_bias has been posted for download on the cisTEM webpage, cistem.org. We have added a note in the Methods section to point the readers to these resources.

      8) Line 50: "An additional complication of subtomogram averaging for in situ imaging is the selection of valid targets" - This is not specific to subtomogram averaging, but to in situ samples.

      We agree and have updated the text to reflect this.

      9) Line 77: "if this is true for high-resolution features, which are more susceptible to noise overfitting" - This is not intuitive to me. High-resolution features require more information to be overfitted with a constant set of model parameters, thus making their overfitting harder.

      The reviewer is correct that there is more information at high resolution, partially compensating for the low SNR. However, the overall refinement behavior is still dominated by overfitting at high resolution, as we have demonstrated in an earlier publication in Stewart & Grigorieff (2004), Ultramicroscopy 102, 67–84.

      10) Line 316: "Baited reconstruction is substantially faster and a more streamlined" - To back this and other similar statements, it would be helpful if the authors provided some time measurements for the execution of their potentially very computationally expensive search.

      The current implementation of 2DTM requires 45 GPU hours per template per K3 image to search 13 defocus planes. However, for a comparison, the manual work for annotation, as well as additional processing to align and classify sub-tomograms to generate high resolution averages should also be considered in this comparison. These are highly project-dependent and can exceed the time required for 3DTM manifold. We have clarified this in our Discussion section.

      11) Line 319: "We expect focused classification to identify sub-populations to further improve the resolution" - How would this work if refining the 2D data without a high-resolution template resulted in significantly worse resolution even for a ribosome? Or is this meant to be done with prior knowledge of every state?

      Classification can be done using existing single particle software. To avoid alignment errors, as described above, particle alignment angles and shifts are fixed during classification. This leaves only the particle occupancy per class to be refined, which appears to lead to good classification. We have added a brief note to explain this strategy. However, since this is not shown in this manuscript, we have not added a more extensive discussion of particle classification.

      12) Line 354: "without requiring manual intervention or expert knowledge" - Previous expert knowledge was arguably provided in the form of a high-resolution structure.

      We agree with the reviewer and have clarified our statement.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      Huang and colleagues present a method for approximation of linkage disequilibrium (LD) matrices. The problem of computing LD matrices is the problem of computing a correlation matrix. In the cases considered by the authors, the number of rows (n), corresponding to individuals, is small compared to the number of columns (m), corresponding to the number of variants. Computing the correlation matrix has cubic time complexity , which is prohibitive for large samples. The authors approach this using three main strategies:

      1. they compute a coarsened approximation of the LD matrix by dividing the genome into variant-wise blocks which statistics are effectively averaged over;

      2. they use a trick to get the coarsened LD matrix from a coarsened genomic relatedness matrix (GRM), which, with time complexity, is faster when n << m;

      3. they use the Mailman algorithm to improve the speed of basic linear algebra operations by a factor of log(max(m,n)). The authors apply this approach to several datasets.

      Strengths:

      The authors demonstrate that their proposed method performs in line with theoretical explanations.

      The coarsened LD matrix is useful for describing global patterns of LD, which do not necessarily require variant-level resolution.

      They provide an open-source implementation of their software.

      Weaknesses:

      The coarsened LD matrix is of limited utility outside of analyzing macroscale LD characteristics. The method still essentially has cubic complexity--albeit the factors are smaller and Mailman reduces this appreciably. It would be interesting if the authors were able to apply randomized or iterative approaches to achieve more fundamental gains. The algorithm remains slow when n is large and/or the grid resolution is increased.

      Thanks for your positive and accurate evaluation! We acknowledge the weakness and include some sentences in Discussion.

      “The weakness of the proposed method is obvious that the algorithm remains slow when the sample size is large or the grid resolution is increased. With the availability of such as UK Biobank data (Bycroft et al., 2018), the proposed method may not be adequate, and much advanced methods, such as randomized implementation for the proposed methods, are needed.”  

      Reviewer #2 (Public Review)

      Summary:

      In this paper, the authors point out that the standard approach of estimating LD is inefficient for datasets with large numbers of SNPs, with a computational cost of , where n is the number of individuals and m is the number of SNPs. Using the known relationship between the LD matrix and the genomic- relatedness matrix, they can calculate the mean level of LD within the genome or across genomic segments with a computational cost of . Since in most datasets, n<<m, this can lead to major computational improvements. They have produced software written in C++ to implement this algorithm, which they call X-LD. Using the output of their method, they estimate the LD decay and the mean extended LD for various subpopulations from the 1000 Genomes Project data.

      Strengths:

      Generally, for computational papers like this, the proof is in the pudding, and the authors appear to have been successful at their aim of producing an efficient computational tool. The most compelling evidence of this in the paper is Figure 2 and Supplementary Figure S2. In Figure 2, they report how well their X- LD estimates of LD compare to estimates based on the standard approach using PLINK. They appear to have very good agreement. In Figure S2, they report the computational runtime of X-LD vs PLINK, and as expected X-LD is faster than PLINK as long as it is evaluating LD for more than 8000 SNPs.

      Weakness:

      While the X-LD software appears to work well, I had a hard time following the manuscript enough to make a very good assessment of the work. This is partly because many parameters used are not defined clearly or at all in some cases. My best effort to intuit what the parameters meant often led me to find what appeared to be errors in their derivation. As a result, I am left worrying if the performance of X-LD is due to errors cancelling out in the particular setting they consider, making it potentially prone to errors when taken to different contexts.

      Thanks for you critical reading and evaluation. We do feel apologize for typos, which have been corrected and clearly defined now (see Eq 1 and Table 1). In addition, we include more detailed mathematical steps, which explain how LD decay regression is constructed and consequently finds its interpretation (see the detailed derivation steps between Eq 3 and Eq 4).

      Impact:

      I feel like there is value in the work that has been done here if there were more clarity in the writing. Currently, LD calculations are a costly step in tools like LD score regression and Bayesian prediction algorithms, so a more efficient way to conduct these calculations would be useful broadly. However, given the difficulty I had following the manuscript, I was not able to assess when the authors’ approach would be appropriate for an extension such as that.

      See our replies below in responding to your more detailed questions.

      Reviewer #1 (Recommendations For The Authors)

      There are numerous linguistic errors throughout, making it challenging to read.

      It is unclear how the intercepts were chosen in Figure S2. Since theory only gives you the slopes, it seems like it would make more sense to choose the intercept such that it aligns with the empirical results in some way.

      Thanks for your critical evaluation. We do feel apologize some typos, and we have read it through and clarify the text as much as possible. In addition, we included Table 1, which introduces mathematical symbols of the paper.

      In Figure S2, the two algorithms being compared have different software implementations, PLINK vs X-LD. Their real performance not only depended on the time complexity of the algorithms (right-side y-axis), but also how the software was coded. PLINK is known for its excellent programming. If we could have programmed as well as Chris Chang, the performance of X-LD should have been even better and approach the ratio m/n. However, even under less skilled programming, X-LD outperformed plink.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for the chance to review your manuscript. It looks like compelling work that could be improved by greater detail. Providing the level of detail necessary may require creating a Supplementary Note that does a lot of hand-holding for readers like me who are mathematically literate but who don’t have the background that you do. Then you can refer readers to the Supplement if they can’t follow your work.

      We fix the problems and style issues as possible as we can.

      Regarding the weakness section in the public review, here are a few examples of where I got confused, though this list is not exhaustive.

      1) Consider Equation 1 (line 100), which I believe must be incorrect. Imagine that g consists of two SNPs on different chromosomes with correlation rho. Then ell_g (which is defined as the average squared elements of the correlation matrix) would be

      ell_g = 1/4 (1 + 1 + rho^2 + rho^2) = (1+rho^2)/2.

      But ell_1=1 and ell_2=1 and ell_12=rho^2 (The average squared elements of the chromosome-specific correlation matrices and the cross-chromosome correlation matrix, respectively). So

      sum(ell_i)+sum(ell_ij) = 1 + 1 + rho^2 + rho^2 = (1+rho^2)*2.

      I believe your formulas would hold if you defined your LD values as the sum of squared correlations instead of the mean, but then I don’t know if the math in the subsequent sections holds. I think this problem also holds for Eq 2 and therefore makes Eqs 3 and 4 difficult to interpret.

      Thanks for your attentive review and invaluable suggestions. We acknowledge the typo in calculating the mean in Eq 1, resulting in difficulties in understanding the equations. We sincerely apologize for this oversight. To address this issue and ensure clarity in the interpretation of Eq 3 and Eq 4, we have provided more detailed explanations (see the derivation between Eq 3 and Eq 4).

      2) I didn’t know what the parameters are in Equation 3. The vector ell needs to be defined. Is it the vector of ell_i for each chromosomal segment i? I’m also confused by the definition of m_i, which is defined on line 113 as the “SNP number of the i-th chromosome.” Do the authors mean the number of SNPs on the i-th chromosomal segment? If so, it wasn’t clear to me how Eq 2 and Eq 3 imply Eq 4. Further, it wasn’t clear to me why E(b1) quantifies the average LD decay of the genome. I’m used to seeing plots of average LD as a function of distance between SNPs to calculate this, though I’m admittedly not a population geneticist, so maybe this is standard. Standard or not, readers deserve to have their hands held a bit more through this either in the text or in a Supplementary Note.

      Thanks for your insightful feedback. When we were writing this paper, our actually focus was Eq 3 and to establish the relationship between chromosomal LD and the reciprocal of the length of chromosome (Fig 6A) – which was surrogated by the number of SNPs, the correlation between ell_i and 1/m_i.

      We asked around our friends who are population geneticists, who anticipated the correlation between chromosomal LD (ell) and 1/m. The rationale simple if one knows the very basis of population genetics. A long chromosome experiences more recombination, which weakens LD for a pair of loci. In particular, for a pair of loci D_t=D_0 (1-c)^t. D_t the LD at the t generation, D_0 at the 0 generation, and c the recombination fraction. As recombination hotspots are nearly even distributed along the genome, such as reported by Science 2019;363:eaau8861, the chromosome will be broken into the shape in Author response image 1 (Fig 1C, newly added). Along the diagonal you see tight LD block, which will be vanished in the further as predicted by D_t equation, and any loci far away from each other will not be in LD otherwise raised by such as population structure. Ideally, we assume the diagonal block of aveage size of m×m and average LD of a SNP with other SNPs inside the diagonal block (red) is l_u; and, in contrast, off-diagonal average LD (light red) to be l_uv. This logic is hidden but employed in such as ld score regression and prs refinement using LD structure.

      Author response image 1.

      But, how to estimate chromosomal LD (ell), which is overwhelming as our friends said! So, the Figure 6A is logically anticipated by a seasoned population geneticist, but has never been realized because of is nightmare. Often, those signature patterns should have been employed as showcases in releasing new reference data, such as HapMap. However, to our knowledge, this signature linear relationship has never been illustrated in those reference data.

      If you further test a population geneticist, if any chromosome will deviate from this line (Fig 6A)? The answer most likely will be chromosome 6 because of the LD tight HLA region. However, it is chromosome 11 because of its most completed sequenced centromere. Chr 11 is a surprise! With T2T sequenced population, Chr 11 will not deviate much. We predict!

      However, we suspect whether people appreciate this point, we shift our focus to efficient computation of LD—which is more likely understood. We acknowledge the lack of clarity in notation definitions and the absence of the derivation for the interpretation of b1 and b0 for LD decay regression. So, we have added a table to provide an explanation of the notation (see the Table 1) and provided additional derivations, which explained how LD decay regression was derived (see the derivation between Eq 3 and Eq 4). Figure 1C provides illustration for the underlying assumption under LD.

      The technique to bridge Eq 2~3 to Eq 4 is called “building interpretation”. It once was one of the kernel tasks for population genetics or statistical genetics, and a classical example is Haseman-Elston regression (Behavior Genetics, 1972, 2:3-19). When it is moving towards a data-driven style, the culture becomes “shut up, calculate”. Finding interpretation for a regression is a vanishing craftmanship, and people often end up with unclear results!

      3) In line 135, it’s not clear to me what is meant by . If it is , then wouldn’t the resulting matrix be a matrix of zeros since is zero everywhere except the lower off-diagonal? So maybe it is ? But then later in that line, you say that the square of this matrix is the sum of several terms of the form . Are these the scalar elements of the G matrix? But then the sum is a scalar, which can’t be true since is a matrix.

      Thanks for your attentive review. We indeed confused the definition of matrices and their elements, and should refer to the stacked off-diagonal elements of matrix . So, is a vector for variable – the relationship between sample i and j. We assume the reviewer use R software, then corresponds to mean .

      See the text between Eq 5 and Eq 6.

      “We extract two vectors , which stacks the off-diagonal elements of , and , which takes the diagonal elements of .”

      In addition, , so the ground truth is that , but not zero.

      To clarify these math symbols, we replace G with K, so as to be consistent with our other works (see Table 1).

      To derive the means and the sampling variances for and , the Eq 7 can be established by some modifications on the Delta method as exampled in Appendix I of Lynch and Walsh’s book (Lynch and Walsh, 1998). We added this sentence near Eq 7 in the main text.

    1. Author Response

      Reviewer #1:

      We thank Reviewer #1 for their review of our manuscript.

      Reviewer #1, comment #1: “The authors of this manuscript are from the Canadian, public interest open-science company YCharos.”.

      It is important to state that none of the authors work for YCharOS. The YCharOS company has created an open ecosystem consisting of antibody manufacturers, knockout cell lines providers, academics, granting agencies and publishers. The Antibody Characterization Group (participating authors are affiliated to the Department of Neurology and Neurosurgery, Structural Genomics Consortium, The Montreal Neurological Institute, McGill University) works in collaboration with YCharOS to have access to commercial antibodies and knockout cell lines donated by YCharOS’ manufacturer partners.

      Reviewer #1, comment #2: In regard to ZENODO antibody characterization reports prepared by this group, Reviewer #1 wrote: “While the results are convincing, they could be more accessible. In the current format, researchers have to download reports for each target and look through all images to identify the most useful antibodies from the images. The reports I reviewed did not draw conclusions on performance. A searchable database that returns validated antibodies for each application seems necessary.”

      After careful consideration and consultation with YCharOS industry partners, we decided not to rate the performance of the antibodies tested. It was determined that antibody selection is best left to the user, who should analyze all parameters, including the type of antibody to be chosen (recombinant-monoclonal, recombinant-polyclonal, monoclonal), the species used to generate the antibody, the species predicted to react with the antibody, performance in a specific application, antigen sequences, and antibody cost.

      Reviewer #1, comment #3: “A key question is to what extent off-target binding was predictable from the WBs provided by the manufacturers. Thus, how often did the authors find multiple bands when the catalogue image showed a single band and vice versa?”

      In many cases, the antibodies were tested on cell lines other than those used by the manufacturers. Given that protein expression is specific to each line, we can't answer this question properly.

      Reviewer #1, comment #4: “Cross-reactive proteins will generally not be detected when blots are stained with an antibody reactive with a different epitope than the one used for IP. Possible solutions to overcome this limitation such as the use of mass spectrometry as readout should be discussed (Nature Methods volume 12, pages 725- 731 (2015)”.

      Our protocols only inform whether an antibody can capture the intended target, without any evaluation of the extend to the capture of unwanted, cross-reactive proteins. Thus, our data can only be used to aid in selection of the best performing antibodies for IP – our data does not inform profiling of non-specific interactions.

      IP/mass spec is an excellent approach for evaluating antibody performance for IP, and authors on this manuscript are experts in proteomics and recognize the importance of this methodology. We have considered implementing IP/mass in our platform. However, there are limitations, such as the cost of the approach and the difficulty of detecting smaller proteins or proteins with a certain amino acid composition (high presence of Cys, Arg or Lys). Fundamentally, we have decided to focus on throughput relative to details in this regard.

      Reviewer #1, comment #5: “Performance in immunofluorescence microscopy was performed on cells that were fixed in 4% paraformaldehyde and then permeabilized with 0.1% Triton-X100. It seems reasonable to assume that this treatment mainly yields folded proteins wherein some epitopes are masked due to cross-linking. The expectation is therefore that results from IP are more predictive for on-target binding in IF than are WB results (Nature Methods volume 12, pages725-731 (2015). It is therefore surprising that IP and WB were found to have similar predictive value for performance in IF (supplemental Fig. 3). It would be useful to know if failure in IF was defined as lack of signal, lack of specificity (i.e. off-target binding) or both. Again, it is important to note the IP/western protocol used here does not test for specificity.”

      The assessment of antibody performance is biased by how antibodies were originally tested by suppliers. Manufacturers primarily validate their antibody by WB. Thus, most antibodies immunodetect their intended target for WB. Thus, in retrospect, we tested a biased pool of antibodies that detect linear epitopes. Still, we observed that a large cohort of antibodies show specificity for their target across all three applications or for specific combinations of applications. This slightly challenges the idea that antibodies are fit-for-purpose reagents and can recognize either linear or native epitopes - a significant number of antibodies can specifically detect both types of epitope.

      Reviewer #1, comment #6: “The authors report that recombinant antibodies perform better than standard monoclonals/mAbs or polyclonal antibodies. Again, a key question is to what extent this was predictable from the validation data provided by the manufacturers. It seems possible that the recombinant antibodies submitted by the manufacturers had undergone more extensive validation than standard mAbs and polyclonals”.

      Our antibody manufacturing partners indicated that the recombinant antibodies are more recent products and have been more extensively characterized relative to standard polyclonal or monoclonal antibodies.

      The main message is that recombinant antibodies can be used in all applications once validated. Although recombinant antibodies are available for many proteins, the scientific community is not adopting these renewable regents as we believe it should. We hope that the data provided will encourage scientists to adopt recombinant technologies when available to improve research reproducibility.

      Reviewer #1, comment #7: “Overall, the manuscript describes a landmark effort for systematic validation of research antibodies. The results are of great importance for the very large number of researchers who use antibodies in their research. The main limitations are the high cost and low throughput. While thorough testing of 614 antibodies is impressive and important, the feasibility of testing hundreds of thousands of antibodies on the market should be discussed in more detail.”

      We thank the reviewer for this comment. One of our challenges is to increase the platform's throughput to succeed in our mission to characterize antibodies for all human gene products. We will continue to test antibodies using protocols agreed upon with our partners, commonly used in the laboratory, to ensure that ZENODO reports can serve as a guide to the wider community.

      In terms of development our marketing efforts have been substantially accelerated by our new partnership with the journal F1000. We have begun to convert our reports into peer-reviewed papers (20 ZENODO reports were converted into F1000 articles). This conversion allows researchers to find our work via PubMed, and easily cite any study. Producing peer-reviewed articles also further enhances the credibility of our research and our project as a whole: https://f1000research.com/ycharos

      Colleagues have published a letter to Nature explaining the problem and our technology platform: (Kahn, et al., Nature, 2023, DOI: https://doi.org/10.1038/d41586-023-02566-w).

      This project has been presented worldwide, with a presence at major antibody conferences, such as the annual Antibody Validation meeting in Bath (PSM attended the meeting in September 2023). The authors are organizing a sponsored mini-symposium on antibody validation at the next American Society for Cell Biology (ASCB) meeting in December 2023 (Boston, USA): https://plan.core- apps.com/ascbembo2023/event/6fb928f06b0d672e088c6fa88e4d77fb

      Colleagues have prepared petitions addressed to various governmental organizations (US, Canada, UK) to support characterization and validation of renewable antibodies: https://www.thesgc.org/news/support- characterization-and-validation-renewable-antibodies.

      Reviewer #2

      We thank Reviewer #2 for the review of the antibody characterization reports we have uploaded to ZENODO. A manuscript describing the full standard operating procedures of the platform, which has been used in all reports is in preparation, and should be available on a preprint server before the end of the year. Our protocols were reviewed and approved by each of YCharOS' manufacturer partners. Moreover, a recent editorial describes the platform used here and gives advice on how to interpret the data: https://doi.org/10.12688/f1000research.141719.1)

      Reviewer #2, comment #1: “A discussion of how the working concentrations of antibodies are selected and validated is required. Based on the dilutions described in the reports, it seems that dilutions suggested by the manufacturer were used - For LRRK2 it seems that antibody concentrations ranging from 0.06 to over 5 µg/ml for WB were used. Often commercial antibody comes in a BSA-containing buffer making it hard to validate the concentration of the antibody claimed by the manufacturer”.

      The concentration recommended by the manufacturer is our starting point. For WB, when the signal is at the level of detectability, we will repeat with a ~5-10 fold increase in antibody concentration. For >80% of the antibody tested, the use of the recommended concentration led to the detection of bands (specific or not to the target protein).

      Reviewer #2, comment #2: “In the authors' experience are the manufacturer's concentrations reliable? Additionally, if the information regarding applications provided by the manufacturers is unreliable how do the authors suggest working concentrations for antibodies to be assessed”?

      We do not evaluate the concentration of antibodies internally. In the immunoprecipitation experiments, we use 2.0 µg of antibody for each IP, based on the concentration provided by the manufacturers. On Ponceau staining of membranes, we can observe the heavy and light chains of the primary antibodies used, giving an indication of the amount of antibodies added to the cell lysate. In most cases, the intensity of the heavy and light chains is comparable.

      Reviewer #2, comment #3: “We understand that it would not be feasible to test every antibody at different concentrations, but this is an issue that should at least be mentioned. An antibody might be put in the wrong performance category solely because of the wrong concentration being used. Ie if an excellent antibody is used at too high a concentration, it may detect non-specific proteins that are not seen at lower dilutions where the antibody still picks up the desired antigen well”.

      We agree with Reviewer #2, we do not use an optimal concentration for all tested antibodies. As mentioned previously, the concentration recommended by the manufacturer is our starting point. By testing multiple antibodies side-by-side against a single target protein, we can generally identify one or more specific and selective antibodies. We leave it to users of our reports to optimize the antibody concentration to suit their experimental needs.

      Reviewer #2, comment #4: “Do the authors check different WB conditions ie 2h primary antibody with BSA or milk vs. overnight at 4 degrees with BSA or Milk”?

      All primary antibodies are always tested in milk overnight at 4 degrees. The overnight incubation is convenient in the timeline of the protocol. All protocols were agreed upon after careful consultation with our partners.

      Reviewer #2, comment #5: “Do the authors provide detailed WB protocols that include the description of the electrophoresis and type of gels used, transfer buffer and transfer method and time used, and conditions for all the primary and secondary blotting including times, buffers and dilutions of all antibodies and other reagents”?

      This information is included in all ZENODO reports.

      Reviewer #2, comment #6: “Do the authors discuss detection approaches- we have noticed for some antibodies there are significant different results using LICOR, ECL and other detection methods, with certain especially weaker antibodies preferring ECL-based methods”.

      We only use ECL-based methods.

      Reviewer #2, comment #7: “For IPs the amount of antibody needed can also vary-for some we can use 1 microgram or less, but for others, we need 5 to 10 micrograms. The amount of antibody needed to get maximal IP should be stated”.

      We use 2.0 ug of antibodies and we have found this to be adequate for lower abundance proteins (e.g. Parkin - https://zenodo.org/records/5747356) and higher abundance proteins (e.g. PRDX6 - https://zenodo.org/records/4730953). Abundance is based on PaxDb.com. For Parkin and PRDX6, we were able to enrich the expected target in the IP and observe depletion in the unbound fraction. Optimization of the IP conditions is left to the antibody users.

      Reviewer #2, comment #8: “Doing IPs with commercial antibodies can be very expensive or infeasible if many micrograms are needed especially if only packages of 10 micrograms for several hundred dollars are provided”.

      This is a major advantage of the side-by-side comparison: the reader is free to choose between high-performance antibodies from different manufacturers, with varying antibody costs. We also work in partnership with the Developmental Studies Hybridoma Band (DSHB), which supplies antibodies on a cost recovery basis.

      Reviewer #2, comment #9: “For IPs it is important to determine the percentage of antigen that is depleted from the supernatant for each IP. We think that this should be calculated and recorded in the Zenodo data. Some antibodies will only IP 10% of antigen whereas others may do 50% and others 80-90%. One rarely sees 100% depletion. For IPs the buffer detergent and salt concentration might also strongly influence the degree of IP and therefore these should be clearly stated”.

      In Box 1, we define criteria of success. For IP, “under the conditions used, a successful primary antibody immunocaptures the target protein to at least 10% of the starting material”. Colleagues have written an editorial on how to interpret and analyze antibody performance https://f1000research.com/articles/12-1344).

      The cell lysis buffer is a critical reagent when considering IP experiments. We use a commercial buffer consisting of 25 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1% NP-40 and 5% glycerol (Thermo Fisher, cat. #87787). This buffer is efficient to extract the target proteins we have studied thus far.

      Reviewer #2, comment #10: “Whether antibodies cross-react with human, mouse and other species of antigens is always a major question. It is always good to test human and mouse cell lines if possible. If antibodies cross-react in WB, in the authors' experience will they also cross-react for IF and IP”?

      The authors started this initiative by focusing on the 20,000 human proteins, defining an end point. We and our collaborators found that most of the cherry-picked selective antibodies for WB for human proteins, which manufacturers claim react with the murine version of the target proteins, were selective for murine tissue lysates.

      Indeed, poorly performing antibodies in WB mostly failed IF and IP. However, selective antibodies for IF or specific for IP were generally (>90%) selective for WB.

      Reviewer #2, comment #11: “Cell lines express proteins at vastly different levels and it is possible that the selected cell line does not express the antigen or expresses it at very low levels - this could be a reason for wrongly assessing an antibody not working. It would be useful to use cell lines in which MS data has defined the copy number of protein per cell and this figure could be included in the antibody data if available. This MS data is available for the vast majority of commonly used cells”.

      We agree with Reviewer #2 that MS data are useful for target protein selection. At the moment, our approach using transcriptomic data provided on DepMap.org proved to be a successful mechanism for cell line selection. We have identified a specific antibody for WB for each target, enabling the validation of expression in the cell line selected.

      For some protein targets, the parental line corresponding to the only commercial or academic knockout line available has weak protein expression. We thus needed to generate a KO clone in a second cell line background with high expression, and indeed found that some antibodies which failed in the first commercial line were successful in the new higher-expressing line (e.g CHCHD10 - https://zenodo.org/records/5259992).

      Reviewer #2, comment #12: “Some proteins are glycosylated, ubiquitylated or degraded rapidly making them hard to see in WB analysis”.

      We used the full gel/membrane length when analyzing antibody performance by WB. Indeed, proteins can show different isoforms and molecular weights compared to that based on amino acid sequence (e.g. SLC19A1 -https://zenodo.org/records/7324605).

      Reviewer #2, comment # 13: “We have occasionally had proteins that appear unstable when heated with SDS- sample buffer before WB. For these, we still use SDS-Sample buffer but omit the heating step. I often wonder how necessary the heating step is”.

      For WB, samples are heated to 65 degrees, then spun to remove any precipitate.

      Reviewer #2, comment # 14: “For IF the methods by which cells are fixed and stained, and the microscope and settings, can significantly influence the final result. It would be important to carefully record all the methods and the microscope used”.

      We agree with Reviewer #2 that many parameters influence antibody performance for imaging purposes. We are progressively implementing the OMERO software to monitor any experimental parameters and information (metadata) about the microscope itself.

      Reviewer #2, comment # 15: “How do the authors recommend antibodies are stored? These should be very stable, but I have had reports from the lab that some antibodies become less good when stored and others that recommend storing at 4 degrees”.

      Antibodies are aliquoted to avoid freeze-thaw cycles and stored at -20 degrees. If it is recommended to store antibodies at 4 degrees, we add glycerol to a final concentration of 50% and store them at -20 degrees.

      Reviewer #2, comment # 16: “Would other researchers not part of the authors' team, be able to add their own data to this database validating or de-validating antibodies? This would rapidly increase the number of antibodies for which useful data would be available for. It would be nice to greatly expand the number of antibodies being used in research and this is not feasible for a single team to undertake”.

      Yes! We believe that only a community effort can resolve the antibody liability crisis. We partner with the Antibody Registry (antibodyregistry.org - led by co-author Anita Bandrowski). In the Registry, each antibody is labelled with a unique identifier, and third-party validation information can be easily tagged to any antibody. Antibody users are invited to upload information about an antibody they have characterized into the Registry.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your consideration and insightful comments on our article.

      We have gone through all the reviewers' comments and addressed all their questions and concerns point by point.

      As per their recommendation, we have amended our manuscript by providing more information about the experimental procedure and statistical analysis followed, and removed some analyses with a reduced number of imaging sessions. In addition, as a Resource and Tools article, the claim of our paper has been adjusted to a proof-of-concept paper showing robust and reliable preliminary results. In the meantime, we have provided 3 new Supplementary Figures, including one showing data from all individual animals.

      Reviewer #1 (Public Review):

      The authors apply a new approach to monitor brain-wide changes in sensory-evoked hemodynamic activity after focal stroke in fully conscious rats. Using functional ultrasound (fUS), they report immediate and lasting (up to 5 days) depression of sensory-evoked responses in somatosensory thalamic and cortical regions.

      Strengths: This a technically challenging and proof-of-concept study that employs new methods to study brain-wide changes in sensory-evoked neural activity, inferred from changes in cerebral blood flow. Despite the minor typos/grammatical errors and small sample size, the authors provide compelling images and rigorous analysis to support their conclusions. Overall, this was a very technically difficult study that was well executed. I believe that it will pave the way for more extensive studies using this methodological approach. Therefore I support this study and my recommendations to improve it are relatively minor in nature and should be simple for the authors to address.

      Weaknesses: The primary weakness of this paper is the small sample sizes. Drawing conclusions based on the small sham control group (n=2) or 5-day stroke recovery group (n=2), is rather tenuous. One way to alleviate some uncertainty with regard to the conclusions would be to state in the discussion that the findings (ie. loss of thalamocortical function after stroke) are perfectly consistent with previous studies that examined thalamocortical function after stroke. The authors missed some of these supporting studies in their reference list (see PMID: 28643802, 1400649). A second issue that can easily be resolved is their analysis of the 69 brain regions. This seems like a very important part of the study and one of the primary advantages of employing efUS. As presented, I had difficulty seeing the data. I think it would be worthwhile to expand Fig 3 (especially 3C) into a full-page figure with an accompanying table in the Supplementary info section describing the % change in CBF for each brain region.

      Other Recommendations for the authors:.

      • Since there is variability in spreading depolarizations, was there any trend in the relationship between # SD's and ischemic volume? I know there are few data points but a scatterplot might be of interest.

      • For statistical comparisons of 'response curves' in Fig 3 and 4, what exactly was the primary dependent measure: changes in peak amplitude (%) or area under the curve?

      • There are several typos and minor grammatical errors in the manuscript. Some editing is recommended.

      We thank the reviewer for the comments and suggestion, we have adapted our message to a proof-of-concept paper showing robust and reliable preliminary results. We also thank the reviewer for pointing out important references that support our observation and have added them to our article. We have provided a supplementary full-page version of the current Figure 3C (see Supplementary Figure 3).

      Regarding the recommendations, we strongly agree that it would be of interest to link SDs and ischaemia, but unfortunately this can't be done because our experimental design, i.e. narrow cranial window and single static plane, does not allow brain-wide quantification of ischemic volume. This would be possible either by scanning the brain or by using a matrix array (also discussed in the manuscript).

      For statistical analysis of the hemodynamic response curves, we have adapted them to compare the area under the curve (AUC). In addition, we have provided a new Supplementary Figure 4 showing the associated values and statistics.

      We have edited typos and errors.

      Reviewer #2 (Public Review):

      Brunner et al. present a new and promising application of functional ultrasound (fUS) imaging to follow the evolution of perfusion and haemodynamics upon thrombotic stroke in awake rats. The authors leveraged a chemically induced occlusion of the rat Medial Cerebral Artery (MCA) with ferric chloride in awake rats, while imaging with fUS cerebral perfusion with high spatio and temporal resolution (100µm x 110µm x 300µm x 0.8s). The authors also measured evoked haemodynamic response at different timepoints following whisker stimulation.

      As the fUS setup of the authors is limited to 2D imaging, Brunner and colleagues focused on a single coronal slice where they identified the primary Somatosensory Barrel Field of the Cortex (S1BF), directly perfused by the MCA and relay nuclei of the Thalamus: the Posterior (Po) and the Ventroposterior Medial (VPM) nuclei of the Thalamus. All these regions are involved in the sensory processing of whisker stimulation. By investigating these regions the authors present the hyper-acute effect of the stroke with these main results:

      • MCA occlusion results in a fast and important loss of perfusion in the ipsilesional cortex.

      • Thrombolysis is followed by Spreading Depolarisation measured in the Retrosplenial cortex.

      • Stroke-induced hypo-perfusion is associated with a significant drop in ipsilesional cortical response to whisker stimulation, and a milder one in ipsilesional subcortical relays.

      • Contralesional hemisphere is almost not affected by stroke with the exception of the cortex which presents a mildly reduced response to the stimulation.

      In addition, the authors demonstrate that their protocol allows to follow up stroke evolution up to five days post-induction. They further show that fUS can estimate the size of the infarcted volume with brilliance mode (B-mode), confirming the presence of the identified lesional tissue with post-mortem cresyl violet staining.

      Upon measuring functional response to whisker stimulation 5 days after stroke induction, the authors report that:

      • The ipsilesional cortex presents no response to the stimulation

      • The ipsilesional thalamic relays are less activated than hyper acutely

      • The contralesional cortex and subcortical regions are also less activated 5d after the stroke.

      These observations mainly validate the new method as a way to chronically image the longitudinal sequelae of stroke in awake animals. However, the potentially more intriguing results the authors describe in terms of functional reorganization of functional activity following stroke appear to be preliminary, and underpowered ( N = 5 animals were imaged to describe hyper-acute session, and N = 2 in a five day follow-up). While highly preliminary, the research model proposed by the author (where the loss of the infarcted cortex induces reduces activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive), is interesting. This hypothesis would require a greatly expanded, sufficiently powered study to be validated (or disproven).

      We thank the reviewer for the careful and accurate description of our work. We have addressed all the comments, recommendations and concerns raised by providing details of the experimental procedure and statistical analysis followed, and by removing some analyses associated with a reduced number of imaging sessions (at d5, n=2).

      Reviewer #3 (Public Review):

      The authors set out to demonstrate the utility of functional ultrasound for evaluating changes in brain hemodynamics elicited acutely and subacutely by the middle cerebral artery occlusion model of ischemic stroke in awake rats.

      Functional ultrasound affords a distinct set of tradeoffs relative to competing imaging modalities. Acclimatization of rats for awake imaging has proven difficult with most, and the high quality of presented data in awake rats is a major achievement. The major weakness of the approach is in its being restricted to single-slice acquisitions, which also complicates the registration of acquisition across multiple imaging sessions within the same animal. Establishing that awake imaging represents an advancement in relation to studies under anesthesia hinges upon the establishment of the level of stress experienced by the animals in the course of imaging, i.e., requires providing data on the assessment of stress over the course of these long imaging sessions. This is particularly significant given how significant a stressor physical restraint has been established to be in rodent models of stress. Furthermore, assessment of the robustness of these measurements is of particular significance for supporting the wide applicability of this approach to preclinical studies of brain injury: the individual animal data (effect sizes, activation areas, kinetics) should thus be displayed and the statistical analysis expanded. Both within-subject, within/across sessions, and across-subjects variability should be evaluated. Thoughtful comments on the relationship between power doppler signal and cerebral blood volume are important to include and facilitate comparisons to studies recording other blood volume-weighted signals. Finally, the contextualization of the observations with respect to other studies examining acute and subacute changes in brain hemodynamics post focal ischemic stroke in rats is needed. It is also quite helpful, for establishing the robustness of the approach, when the statistical parametric maps are shown in full (i.e. unmasked).

      We would like to thank the reviewer for the comments, recommendations and concerns he/she/they raised. We have addressed all the points to clarify our article and make it more relevant and informative for readers.

      Reviewer #2 (Recommendations For The Authors):

      The work described by Brunner et al is primarily a methodological paper, with potentially interesting, yet not robust enough, novel biological insight into the mechanisms of stroke. Nonetheless, the method employed is interesting and potentially well-validated.

      General comments/suggestions

      1- One potential concern I have is related to the relatively low sample size used, with n=5 for the main results and only n=2 for the follow-up after 5d. I am not sure much can be generalized using only two animals in any research study and this N = 2 dataset should probably be removed entirely from the study. Moreover, I found the statistical methods used were only superficially described, which prevented me from assessing whether the results reported by the authors are biologically relevant or not (including some significant differences in rCBV well below 1% estimated over two individuals).

      We fully agree with the reviewer’s comment and balanced our claim by considering this work as a proof-of-concept on brain imaging of multiple aspects of stroke hemodynamics (ischemia, spreading depolarization-like events, cortico-thalamic functions) in awake head-fixed rats. Therefore, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 356, 441, 455), we also remove statistics from the analysis at d5 post-stroke, see Figure 4 and associated paragraph from Line 356.

      2- Based on their investigations, the authors propose a model where the loss of infarcted cortex induces reduced activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive. This is an intriguing framework but this hypothesis would require a more complete, well-powered study to be substantiated.

      I think a clear recognition of the fact that these findings are just preliminary and not validated should be more explicitly reported. I also marginally note here that these results are in contrast with previous reports from the same team where occlusion of the MCA induced increased response to whisker stimulation in anaesthetised rats. These contradictory findings are not discussed in this manuscript.

      As mentioned above, we explicit more on the proof-of-concept proposed in this work as well as clearly stating on the preliminary aspect of the findings described in this work. As mentioned above, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 348, 433, 447), we also remove statistics from the analysis at d5 post-stroke, see figure 4 and associated paragraph from Line 348.

      We thanks the reviewer for pointing out the missing link with our previous work performed under anesthesia. We therefore provided a discussion point on this contradictory finding (Line 441).

      3- In a previous study from the same group perfusion was imaged in 3D either by means of a motorized probe or by using a 2D matrix arrays. It would be interesting to discuss why a 2D approach was chosen in this study over those previous methods.

      Indeed, brain-wide coverage would be of great interest in such experiment context. As mentionned by the reviewer, two strategies can be used:

      • One can scan the brain using a motorized probe as performed for different purposes by Sieu et al., Nature Methods, 2015; Hingot, Brodin et al., Theranostics 2020; Macé et al., Neuron 2019 and also by our group in Sans-Dublanc, Chrzanowska et al., Neuron, 2022; Brunner et al. Frontiers in Neuroscience 2022 and Brunner et al., JCBFM 2023. (This list of publication is not exhaustive).

      • A second approach aims at using a 2D matrix array to capture functions at brain-wide scale. So far, this strategy has been employed in a couple of studies (Rabut et al., Nature Methods, 2019 and Brunner, Grillet et al., Neuron, 2020).

      The strategy consisting of scanning (manually or using a motor) strongly limits investigation on brain functions, as performing an accurate covering of the functional regions requires an extensive and time-consumming scanning: brain functions must be addressed several time to capture a reliable and robust signal for all the brain section scanned (see Brunner et al., 2022). Unfortunately, this strategy prevents us to accurately capture other brain hemodynamics like the dynamic of the ischemia or the spreading depolarization event.

      On the other hand, the volumetric functional ultrasound imaging (vfUSI) would be suited for brain-wide coverage capturing large-scale brain functions (see Brunner, Grillet et al. Neuron 2020) and hemodynamic events (see Rabut et al., Nature Methods, 2019) but at the cost of the resolution, frame rate and larger cranial window. Unfortunately, this technology was not available when this work was conducted.

      Such experimental opportunities have been suggested at the end of the manuscript: “To overcome such limitation, one can extend the size of the cranial window to allow for larger scale imaging either by sequentially scanning the brain27,28,31,32,59,69,71,72, or by using the recently developed volumetric fUS which provides whole-brain imaging capabilities in anesthetized73 and awake rats30.“

      4- Overall the registration scheme seems suboptimal which ultimately questions the specificity of the findings in thalamic regions. It would be interesting to validate this procedure, especially the probe repositioning five days after the stroke.

      Positioning was not difficult part of this experiment. First, all head posts were implanted in the same position relative to the skull references bregma and lambda. Second, the head fixation ensures the same placement of the headpost for all animals. Finally, fine adjustement of the ultrasound probe position were done using a micromanipulator by finding key landmarks from the µDoppler image. In practice, minimal adjustements were needed to find back the same imaging plane. We provide additional information about the positionning in the Materials and Methods section.

      New text – Line 126: “Positionning.

      The mechanical fixation of the head-post ensures an easy and repeatabe positionning of the ultrasound probe across imaging session. The ultrasound probe is indeed fixed to a micromanipulator enabling light adjustements To find the plane of interest (containing both S1BF and thalamic relays: bregma - 3.4mm), we used brain landmarks (e.g., surface of the brain, hippocampus, superior sagittal sinus, large vessels). Note that as the headpost was carefully placed in the same position relative to the skulls landmarks (bregma and lambda), the position of the region of interest was minimal across animals.”

      Second, at d5 post-stroke, we positionned the ultrasound probe over the imaging window as described in the Materials and Methods section and use brain landmarks from baseline/post-stroke image to maximize the position of brain image. We better detail the procedure followed.

      Original text: “First, we used the vascular markers and the shape of the hippocampus31,32 to find back the coronal cross-section imaged during the pre-stroke session. Five days after the MCA occlusion,….”

      New text – Line 360 :“Five days after the MCA occlusion, we first placed the ultrasound probe over the imaging window and adjusted its position (using micromanipulator) to find back the recording plane from Pre-Stroke session using Bmode (morphological mode) and µDoppler imaging using brain vascular landmarks (i.e., vascular patterns, brain surface and hippocampus34,35; see Figure 2B).”

      More detailed questions/comments/suggestions

      Methods

      ARRIVE methodology

      • Point 2b: sample size is not adequately explained, especially the use of n = 2 animals for 5d follow up

      We have explicited the sample size by adding a short paragraph at the beginning of the Results section. We also make the Supplementary Table 1 more accurate. New text – Line 239: “Animals

      Report on animal use, experimentation, exclusion criteria can be found in Supplementary Table 1. Rat#1 was excluded after the control session as the imaging window was too anterior to capture both cortical and thalamic responses. Ra#2 was excluded as hemodynamic responses were inconsistent during baseline (pre-stroke) period. Rat#3 showed early post-stroke reperfusion and was excluded from stroke analysis, the control session (pre-stroke) from Rat#3 was analyzed.”

      • Point 7: statistical methods: The quantification used to assess significant differences in stimulation traces is poorly described.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221: “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      Functional Ultrasound Imaging acquisition

      • References 26 and 28 imply 2.5Hz and 2Hz acquisition rates, respectively. Why does the same method result in a 1.25Hz acquisition rate here? Can you confirm the same spatial resolution in these conditions?

      The spatial resolution is independent of the temporal resolution (frame rate). The spatial resolution depends on the resolution of the compound image and the temporal resolution is given by the number of compound images to generate a single Doppler image (exposure time). By increasing the number of compound images, the frame rate decreases while increasing the signal to noise ratio and sensistivity. For some work, a pause between 2 frames is used (mostly due to technical limitations in the software (processing time , or execution of a real-time display/processing by the user), however this reduces the frame rate.

      Author response table 1.

      Comparing with the sequences used in references 26 and 28, we have the following timing parameters

      In this work, we decided to reduce the frame rate to have less images but with higher SNR. The 0.3s were added by technical considerations in this specific implementation.

      New text – Line 158:“ To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. “

      Activity Maps

      • How is the use of a 40s window motivated?

      The 40s window has been choosen to better compare hemodynamic responses to either left or right whisker stimulation and centered the period of interest on the start of the stimulation. Original text:” Pre- and post-stroke recordings are reshaped in shorter 40-s sessions, i.e., 50 frames, …”

      New text – Line 206:“ Pre- and post-stroke recordings are reshaped in 40-s sessions, i.e., 50 frames, centered on the start of the stimulation (at 20s), …”

      • I think the manuscript would benefit from the use of an established, event-based GLM for activity mapping.

      We thank the reviewer for this suggestion, here we used a z-score for activity mapping that is largerly established in the neuroimaging realm.

      • The statistical thresholds used should account for multiple comparisons.

      We have amended the Materials and Methods section, and figure captions about statistics and provided Supplementary Figure 4.

      Statistical analyses

      • Overall this section is only superficially described, and lacks detailed information.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221 : “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      • Are average rCBV changes referred to in the 40s window?

      The rCBV changes are referring to the pre-stimulation baseline. We have modified the text accordingly (Line 206).

      • Were normality and variance equality requirements verified in the group with n=2?

      Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • There is no method for cresyl violet staining

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure – Line 228:

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope.”

      Results 1: Real time imaging of stroke induction in awake rats

      • Why is the window so narrow in the anteroposterior direction?

      The imaging window was defined based on the brain regions investigated in this work, meaning the primary somatosensory cortex (S1BF) and the ventroposterior medial thalamic relay (VPM). From Paxinos atlas, a position of interest is located at Bregma -3.4mm. The cranial window was performed accordingly, and restricted couple of mm to avoid non-needed procedure and brain exposure. We added a new sentence in the Materials & Methods section – Line 116: “This cranial window aims to cover bilateral thalamo-cortical circuits of the somatosensory whisker-to-barrel pathway.”

      • What validation was employed for the habituation protocol? Are animals stressed by the procedure? Do you have cortisol data to show? Ar animal weights throughout the procedure?

      The habituation protocol employed in this work follows recommandations from the expert in the field and peers (Martin et al., Journal of Neuroscience Methods, 2002; Martin et al., Neuroimage 2006; Topchiy et al., Behav Brain Res 2009). We have amended the corresponding paragraph in the Materials & Methods section detailling the habituation procedure:

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      • The observation of contralateral oligemia is based only on RSG traces.

      We provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • The spatial and temporal distribution of Bmode measured hyperechogenicity is surprising and should be discussed. Reference 29 describes for instance non-overlap with an area of hypo-perfusion. Overlap between hypo-perfused and infarct volumes should be systematically investigated and coregistered with histology. Moreover, reference 40, while using a different model, presents hyperechogenicity at 5h.

      The B-mode images in Figure 2B are presented as an illustration of the potential morphological changes detected at different timepoint. However, our study focuses on functional responses and not on the evolution of the morphological changes. Indeed, this Bmode images remain difficult to interpret as they show a structural reorganization at the level of the ultrasound scatterers which has not been directly linked with tissue infarction, oedema, orother histological conditions.

      Regarding the reference 40, the authors found an hyper-echogenicity at 5h a time window is not covered by our protocol. In reference 29, we indeed detailed a mismatch between the µDoppler images and histopathology. As suggested by the reviewer, seeking for other potential mismatchs/overlaps between Bmode/µDoppler and histopathology is an interesting field on investigation, but remains out of the scope of this work.

      Results 3: Delayed alteration of the somatosensory thalamocortical pathway

      • These results are underpowered and as such should probably be removed entirely from the paper (or substantiated with greater Ns of animals). Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • If I am not mistaken, reference 28 describes a protocol for awake mouse imaging, and thereby does not introduce any hippocampal landmark allowing effective positioning of the probe.

      We thanks the reviewer for this comment. While not used in the figure detailling image registration in reference 28, step 42 (page 17) from the protocol mentions the use of hippocampal landmark to position of the imaged brain to the atlas. The hippocampal landmark is also used in Brunner et al., JCBFM 2023, we have added this reference which is more appropriate to this work (i.e., rat model, digitalized paxinos atlas, linear ultrasound transducer).

      • Significant difference in ispsilesional VPM with post-stroke period looks spurious.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      Discussion:

      The sentence "might result from the direct loss of the excitatory corticothalamic feedback to the VPM" should be moderated in the absence of electrophysiology support. Such a decrease could be explained by reduced perfusion due to the challenge.

      The reviewer is right and we believe the tense used in the sentence already balance the claim. However, we clarified on how such result could be better validated.

      Original text: “Further work will need to dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level, as fUS only reports on hemodynamics as a proxy of local neuronal activity27,28,60,66–68“

      New text – Line 445: “Therefore, further studies will be needed to accurately dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level by direct electrophysiology recordings and imaging, as fUS only reports on hemodynamics as a proxy of local neuronal activity30,31,63,74–76.“

      Figure 2

      • Panel B would be more informative if presented as an average.

      The aim of this figure is to show the raw data of a typical case. Averaging µDoppler images wouldn’t be illustrative as individual vessels will not be visible anymore. Because the vessels are in different positions from one animal to another, an average image would be blurred.

      • Panel C lacks contralateral S1BF trace.

      We have provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • Methods for detection of SDs refer to non-peer-reviewed reference 29, where SD is defined as 50% over baseline level. What is the actual threshold/method used to define a SD in this study?

      We better detailled this procedure in the Materials & Methods section - Line 195: “The detection of hemodynamic events associated with spreading depolarizations (SDs) was performed based on the temporal analysis of the rCBV signal in the retrosplenial granular (RSGc) and dysgranular (RSD) cortices of the left hemisphere (ipsi-lesional). SDs were defined as transient increase of rCBV signal (+25%) detected with a temporal delay of <10 frames (i.e., 8secs) between the two regions of interest, validating both the hyperemia and spreading features of hemodynamic events associated with spreading depolarizations.”

      • For panel F, a measure of variance would be more suited to show stereotypic profile across animals as the number of SDs varies between animals.

      Figure 2F indeed shows the average profile of hemodynamic events associated with spreading depolarizations (black line) with the variance (95% confidence interval error bands in gray). We have adjusted the corresponding figure caption to make this information more clear.

      Figure 3

      • The exact stimulation employed is not clear as the methods describe a 1.33 min delay between two whisker pad stimulations, but the figure reports 40s. The description is thereby ambiguous. We thank the reviewer for pointing out this potiential confusion which allowed us to correct a mistake

      • The effective delay between two stimulations delivered to the whisker pads is 40 seconds

      • The effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start or 75 seconds from end to start.

      The text was amended accordingly in line 144: “Thus, the effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start.“

      • In panel B the choice of colormap and transparency for template overlay is not explained and is confusing given the employed threshold of 1.6. Which mask was used to overlay the activation map on the template? Why black color to represent a supposedly significant difference?

      We thank the reviewer for pointing out this potiential confusion. We have adjusted the colormap in Figures 3 and 4.

      • The pre-stroke thalamic response is clearly localized in VPM for left stimulation, while it overlaps VPM and Po for the right stimulation. This questions the accuracy of the employed registration scheme and consequently the choice of these ROIs, which appear quite small as compared to the resolution and this positioning precision.

      We see the point of the reviewer, here the apparent difference because the brain is slighly tilted. By adjusting the angle for both activity maps (see Author response image 1) we confirm that both maps are very similar including the for activated areas VPM and Po.

      Author response image 1.

      • It would be interesting to see the same activation maps for all animals in supplementary.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • Looking at panel C, more cortical regions seem to respond to the stimulation above S1BF.

      The reviewer is right and we have indeed mentioned this point several times in the original manuscript in:

      • the result section: “We also detected significant increase of activity in S2, AuD, Ect (*p<0.0001) and PRh (p<0.001) cortices and VPL nucleus (**p<0.01; the list of acronyms is provided in Supplementary Table 2), brain regions receiving direct efferent projections from the S1BF45,48,49, VPM or Po nuclei50–52.”

      • the caption of Figure 4: “S1BF, S2, AuD, VPM, VPL and Po regions are brain regions significatively activated (all pvalue<0.01; GLM followed by t-test.”

      • the conclusion section : “Functional responses to mechanical whisker stimulation were detected in several regions relaying the information from the whisker to the cortex, including the VPM and Po nuclei of the thalamus, and S1BF, the somatosensory barrel-field cortex. Responses were also observed in the S2 cortex involved in the multisensory integration of the information43,44,61, the auditory cortex as it receives direct efferent projection from S1BF45,61, and the VPL nuclei of the thalamus connected via corticothalamic projections45.“

      • It would be interesting to see bilateral traces as supplementary figures.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • In both panels C and D, n=5 is reported, but methods state the use of 7 animals. Please clarify how animals have been used in the different studies

      We have clarified the report on animal use and amended the Supplementary Table 1 accordingly.

      • In Panel D, the 95% CI intervals seem particularly narrow. Might this be the result of considering multiple trials as independent events? A GLM analysis would avoid this statistical fallacy.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work. The statistical analysis has been adjusted (see Materials and Methods) and completed with a Supplementary Figure 4

      Figure 4 - See comments above for Figure 3

      We have adjusted the Figure 3 accordingly to reviewer’s suggestions

      Reviewer #3 (Recommendations For The Authors):

      1) Introduction: Given the emphasis on the awake state, it would be helpful to note that a significant portion of strokes occur during sleep - as well as comment on its hemodynamic difference with respect to an awake state.

      We agree with the reviewer on the remark that some strokes occur during sleep phase. However, here the awake state, which has been poorly addressed in the litterature, is opposed to anesthesia a condition largerly used to investigate brain functions after stroke. We added a point and corresponding references about wake-up stroke, see Line 49.

      2) The effects of anesthetics on stroke are quite variable and the literature data on the topic is rather divergent: it would be helpful for the introduction to reflect the large level of discord in the literature and the wide-ranging mechanisms of action of different anesthetics.

      We thank the reviewer for this comment. We have completed our original sentence in the introduction to better reflect the various effects of anesthetics on stroke, see Line 50

      3) The reference list (14-17) to other studies of brain hemodynamic changes post ischemic stroke is egregiously short. Please expand. Similarly, the list of citations to other functional ultrasound rodent studies in the literature (23-24) is misleading: other groups have published similar work and ought to be cited.

      We thank the reviewer for this comment and added complementary references. However, we believe that the references 14-17 pointed by the reviewer are not only refering to brain hemodynamic changes but mostly on network and function as stated in the manuscript. Regarding references on fUS (23-24) mentioned by the reviewer, we did not limited our citation on functional ultrasound imaging to those 2 articles but on 15+ from 4 different research groups.

      4) It would be helpful if the authors used "spreading depolarization" the way it has been utilized in the many decades of research on them in the literature, namely, as waves of hyper/hypoactivity in the electrophysiological signals. Please use a distinct term to refer to waves of changes in the hemodynamic state.

      We have amended the terminology used in the manuscript. “Spreading depolarization” has been replaced by “hemodynamic events associated with spreading depolarizations” or similar.

      5) Why is this investigation restricted to male rats?

      As a proof of concept, we did not performed experiments in female rats. We agree that further investigation would require a gender mix. We added a line in the discussion.

      New text – Line 455:” Finally, it is important to note that this proof-of-concept work did not specifically focus the impact of sex dimorphism on the stroke or early behavioral outcomes following the insult that would greatly enhance the translational value of such preclinical stroke study80.”

      6) Were the animals tested during their active phase? If not, why not, and what are the implications of testing their responses during the sleep phase?

      We think there is a misunderstanding here as we investigated brain functions in awake head-fixed rats. Therefore, the sleep/active phases were not investigated neither mentioned in the manuscript.

      7) How is the level of stress monitored/established?

      In this work, we followed established procedure used to reduce stress and disconfort of the rats all along the experiment. The procedure used is now better detailled in the Materials and Methods section. However, the level of stress was not monitored, and would be of interest to considere in future experiments.

      8) What are the sequelae of stress on brain hemodynamics, especially given 1-4 hour long sessions.

      This is a good remark. While we cannot state on how the stress impacts brain hemodynamics, the data extracted show that hemodynamics reponse functions were stable and robust over hour-long recording (see control and pre-stroke sessions in Supplementary Figure 5).

      9) How is the animal prepared for stroke induction? In general, the methodological steps surrounding animal handling and preparation are exceedingly terse.

      We provided more details about the handling and preparation of the rats in the Materials and Methods section.

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      10) What is the reproducibility of the chemo-thrombotic model timeline? What are its limitations?

      We have provided more information on the chemo-thrombotic model and its limitations in the discussion section to discuss

      New text – Line 402:” However, to adequatly and efficiently occlude the vessel of interest, removing a piece of skull remains required. As mentioned in the report on animal use, one rat was excluded from the analysis as the MCA spontaneously reperfuses, thus dropping the success rate of such model.”

      11) What is the motivation behind the 5-days post stroke timepoint selection?

      In addition to demonstrating the feasability of imaging brain functions at different timepoint following the ischemia, the motivation to performed this delayed session was to capture functional diaschisis which is known to occur few days after the initial insult. More recurrent imaging sessions covering a longer post-stroke period would be of high interest to better capture the impact of ischemia on both the brain hemodynamics and functions.

      12) How predictive is hyperacute hemodynamics imaging of the long-term outcome?

      We thanks the reviewer for this question, that remains of major interest in the stroke realm. However, the prediction of long-term outcome would require to capture brain hemodynamic at larger scale as performed in Hingot et al., Theranostics 2020 and Brunner et al. JCBFM 2023, a coverage not accessible with the imaging window proposed in this work.

      13) It would be greatly reassuring if the authors presented the statistical parametric maps without masking regions of interest (eg Fig3B).

      We thank the reviewer for pointing out this potential confusion. In the first version of the figure, the colormap used of activity maps was indeed non optimal. Therefore, we i) adjusted the colormap used in Fig 3 and 4 and ii) provided non-thresholded z-score maps for all rats in Supplementary Figure 5.

      14) Fig 3C is hard to make out.

      We provided a full page version of the Figure 3C in Supplementary Figure 3.

      15) Figs 3,4 should incorporate box and whisker plots of data across all rats scatter plots of individual animal data.

      We are not sure which kind of data the reviewer wants to have displayed here. However, we have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and for individual animal included in this work.

      16) The final panels in Figures 3,4 would more tellingly include the plots of the linear models fitted.

      Based on all reviewers’ comments, we have adjusted and clarified the statistical analysis performed (see Materials and Method) and completed with a Supplementary Figure 4.

      17) The frame rate calculations are not adding up unless averaging and pauses are included so some more details should be stated. Are tilted plane waves averaged before compounding as in prior publications?

      The angles are averaged 6 times before compounding to reduce signal to noise ration and there is a pause of 0.3s between each Doppler image. See also question “Functional Ultrasound Imaging acquisition” from reviewer 2. We also provided supplementary and key information about the sequence used in this work.

      We have provided complementary information in the manuscript:

      Original text:” The ultrasound sequence generated by the software is the same as in Macé et al.,26 and Brunner, Grillet et al., Briefly, the ultrafast scanner images the brain 140 with 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°) at a 10-kHz frame rate. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. Each set of 250 compound images is 142 filtered to extract the blood signal. Finally, the intensity of the filtered images is averaged to obtain a 143 vascular image of the rat brain at a frame rate of 1.25Hz. Then, the acquired images are processed with a dedicated GPU architecture, displayed in real-time for data visualization, and stored for subsequent off-line analysis.”

      New text – Line 146:” The ultrasound sequence generated by the software is adapted from Macé et al.31 and Brunner, Grillet et al.34 Ultrafast images of the brain were generated using 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°). Each plane wave is repeated 6 times and the recorded echoes are averaged to increase the signal to noise ration. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. The set of 250 compound images has a mixed information of blood and tissue signal. To extract the blood signal we apply a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter aims to select all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal i.e., Power Doppler image. This image is in first approximation proportional to the cerebral blood volume26,28. Overall, this process enables a continious acquisition of power Doppler images at a frame rate of 1.25Hz during several hours.”

      18) Ultrasound data processing: The filtering process should have more description. It would be highly instructive to explain that the power Doppler signal is being used and comment clearly on its relationship to blood volume, commenting on stalled flow mircrovessels/RBC-devoid micrrovessels, and considerations of vessel orientation.

      The compound image has a mixed information of blood and tissu signal. To extract the blood signal, we applied a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter selects all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal (Power Doppler image). This power Doppler image is in first approximation proportional to the cerebral blood volume.

      These information have been added in the Materials and Methods section of the manuscript.

      19) Does the SVD processing have the same cut off (20 singular values) as in prior publications as a standard value, or is that adjusted for each study? There are enough minor differences between sequences that these details are uncertain. Do the overall hemodynamics measurements (Fig 2) include all data acquired, or do they exclude the whisker stimulation events, and if so, how long of a window is excluded? The explanation of the activity maps should be rephrased e.g. "... recordings are segmented in shorter 40-s time windows encompassing the whisker stimulation trials..."

      We agree that these details are important, all these information have been added to the manuscript

      • SVD processing: We eliminate 20 singular values as in cited studies.

      • Sequence: we have included more details about the sequence.

      • Processing: all data during the whisker stimulation is used.

      • We have rephrased the explanation about the activity maps.

      20) Discuss the methodology behind histological data shown in Fig. 1.

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure (Line 228):

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope

    1. Author Response

      The following is the authors’ response to the original reviews.

      We were pleased with seeing our work published as a Reviewed Preprint online so swiftly. Now, we would like to take the opportunity to include our responses to the comments made by the reviewers into the Reviewed Preprint and also submit a revised version of the manuscript, in which we have incorporated and addressed the reviewers’ comments.

      We believe that our revisions have significantly improved the quality of the manuscript. Specifically, we have described our results more precisely and explained certain decisions that were made in the analysis pipeline more clearly. For example, Figure 4 was improved substantially, by incorporating a schematic representation of how ERP traces were extracted from neural data. Furthermore, we have added three paragraphs in the Discussion where we elaborate on 1) the two observed interaction effects between attention and drug condition, 2) the relation between behavioral, computational, and neural effects, and 3) the statistical robustness of our findings. As such, we believe our interpretation of the results and their robustness now more faithfully represents our observations.

      Moreover, we have incorporated the Supplementary Information and Figures, initially presented as a separate section of the manuscript, to the main manuscript and its accompanying supplementary figures. Thereby, the structure of the paper now better follows the eLife format. As a result, some of the previously included supplementary figures are now described in text of the main manuscript.

      Reviewer #1 comments:

      In the results section on page 6, the authors conclude that "Attention and ATX both enhanced the rate of evidence accumulation towards a decision threshold, whereas cholinergic effects were negligible." I believe "negligible" is wrong here: the corresponding effects of donepezil had p-values of .09 (effect of donepezil on drift rate), .07 (effect of donepezil on the cue validity effect on drift rate) and .09 (effect of donepezil on non-decision time), and were all in the same direction as the effects of atomoxetine, and would presumably have been significant with a somewhat larger sample size. I would say the effects of donepezil were "in the same direction but less robust" (or at the very least "less robust") instead of "negligible".

      We agree with the reviewer that ‘negligible’ may not properly capture the effects of DNP on DDM parameter estimates. Although we do feel that caution is warranted in interpreting the effects of DNP on computational parameter estimates, we have now described these effects in line with the reviewer’s suggestion: in the same direction as the effects of ATX, but not (or less) statistically robust.

      "In the results section on page 8, the authors conclude that "Summarizing, we show that drug condition and cue validity both affect the CPP, but they do so by affecting different features of this component (i.e. peak amplitude and slope, respectively)." This conclusion is a bit problematic for two reasons. First, drug condition had a significant effect not only on peak amplitude but also on slope. Second, cue validity had a significant effect not only on slope but also on peak amplitude. It may well be that some effects were more significant than others, but I think this does not warrant the authors' conclusion.

      Indeed, we observed that cue validity affected both CPP peak amplitude and slope and some effects were more significant than others. As such, we agree with the reviewer that the conclusion that cue validity and drug condition affect different features of the CPP was too strongly formulated. We have changed this statement in the manuscript to reflect the observed data pattern more appropriately. We would however like to point out that this does not undermine our main conclusion. Spatial attention and drug condition showed only limited interaction effects in terms of behavior and neural data and their effects on occipital activity were separable in terms of timing and spatial profile. Therefore, our conclusion that catecholamines and spatial attention jointly shape perceptual decision-making remains valid.

      In the discussion section on page 11, the authors conclude that "First, although both attention and catecholaminergic enhancement affected centro-parietal decision signals in the EEG related to evidence accumulation (O'Connell et al., 2012; Twomey et al., 2015), attention mainly affected the build-up rate (slope) whereas ATX increased the amplitude of the CPP component (Figure 3D-F)." As I wrote above, I believe it is not correct that "attention mainly affected the build-up rate or slope", given that the effect of cue-validity on CPP slope was also significant. Also, while the authors' data do support the conclusion that ATX increased the amplitude and not the slope of the CPP component, a previous study in humans found the opposite: ATX increased the slope but did not affect the peak amplitude of the CPP (Loughnane et al 2019, JoCN, https://pubmed.ncbi.nlm.nih.gov/30883291). Although the authors cite this study (as from 2018 instead of 2019), they do not draw attention to this important discrepancy between the two studies. I encourage the authors to dedicate some discussion to these conflicting findings.

      We thank the reviewer for spotting this error, we cited the preprint version (from 2018) of Loughnane and colleagues and not the published JoCN paper (from 2019). We have changed this in the updated version of the manuscript. We further thank the reviewer for asking about this interesting discrepancy between our observation that ATX increased CPP peak amplitude in absence of slope effects and the observation by Loughnane et al. (2019, JoCN) that ATX increased CPP slope, but not amplitude. We first would like to point out that the peak amplitude effect in Loughnane et al. (2019) was in the same direction as our reported effect, with numerically higher peak amplitudes for ATX compared to PLC (Figure 2A – right panel in Loughnane et al., 2019). However, as their omnibus main effect of drug condition on CPP peak amplitude was not significant, they did not provide statistics for a pairwise comparison of ATX and PLC in terms of CPP peak amplitude, which makes it hard to compare the effects directly. Regardless, Loughnane et al. (2019) did observe an effect on CPP slope, whereas we did not. Speculatively, this difference could be related to the behavioral tasks that were used in both studies. Below we have added a new paragraph from the Discussion in which we elaborate on this more.

      In Discussion, page 15:

      Here, we demonstrated that response accuracy and response speed are differentially represented in the CPP, with correct vs. erroneous responses resulting in a higher slope and peak amplitude, whereas fast vs. slow responses are only associated with increased slopes (Figure 3A-B). Speculatively, the specific effect of any (pharmacological) manipulation on the CPP may depend on task-setting. For example, Loughnane et al. (2019) used a visual task on which participants did not make many errors (hit rate>98%, no false alarms), whereas we applied a task in which participants regularly made errors (roughly 25% of all trials). Possibly, the effects of ATX from Loughnane et al. (2019) in terms of behavior (RT effect, not accuracy/d’) and CPP feature (slope effect, not peak) may therefore have been different from the effects of ATX we observed on behavior (d’ effect, not RT) and CPP feature (peak effect, not slope). Regardless, when we compared subjects with high and low drift rates (Figure 3C), we observed that both CPP slope and CPP peak were increased for the high vs. low drift group (independent of the drug or attentional manipulation). This indicates that both CPP slope and CPP peak were associated with drift rate from the DDM. Clearly, more work is needed to fully understand how evidence accumulation unfolds in neural systems, which could consequently inform future behavioral models of evidence accumulation as well.

      On page 12 and page 14 the authors suggest a selective effect of ATX on tonic catecholamine activity, but to my knowledge the exact effects of ATX on phasic vs. tonic catecholamine activity are unknown. Although microdialysis studies have shown that a single dose of atomoxetine increases catecholamine concentrations in rodents, it is unknown whether this reflects an increase in tonic and/or phasic activity, due to the limited temporal resolution of microanalysis. Thus, atomoxetine may affect tonic and/or phasic catecholamine activity, and which of these two effects dominates is still unknown, I think.

      We agree with the reviewer that the direct effects of ATX on tonic versus phasic catecholaminergic activity are not clear as initially stated in the manuscript. Equally problematic, previous work has demonstrated that changes in tonic neuromodulation shape evoked neuromodulatory discharge (Aston-Jones & Cohen, 2005, Annu. Rev. Neurosci; Knapen et al., 2016, PLoS ONE). As such, any effect of ATX on tonic neuromodulatory drive would probably have affected phasic catecholaminergic responses as well, although this claim will have to be experimentally addressed. We think that because of the close relation between tonic and phasic neuromodulation, it may indeed be better to refrain from the simplistic interpretation that ATX (and DNP) solely and specifically affects tonic neuromodulation. We have used more neutral language in that regard in the updated version of the manuscript, for example by only mentioning elevated neuromodulator levels (not specifying tonic or phasic). Moreover, we have extended a part of our previous Discussion, to elaborate this issue in more detail. An excerpt of this paragraph, consisting of previous and newly added text, can be seen below.

      In Discussion, page 14:

      In contrast with recent work associating catecholaminergic and cholinergic activity with attention by virtue of modulating prestimulus alpha-power shifts (Bauer et al., 2012; Dahl et al., 2020, 2022) and attentional cue-locked gamma-power (Bauer et al., 2012; Howe et al., 2017), the current work shows that the effects of neuromodulator activity are relatively global and non-specific, whereas the effects of spatial attention are more specific to certain locations in space. Our findings are, however, not necessarily at odds with these previous studies. Most recent work associates phasic (event-related) arousal with selective attention (for reviews see: Dahl et al., 2022; Thiele & Bellgrove, 2018). For example, cue detection in visual tasks is known to be related to cholinergic transients occurring after cue onset (Howe et al., 2017; Parikh et al., 2007). Contrarily, in our work we aimed to investigate the effects of increased baseline levels of neuromodulation by suppressing the reuptake of catecholamines and the breakdown of acetylcholine throughout cortex and subcortical structures. Tonic and phasic neuromodulation have previously been shown to differentially modulate behavior and neural activity (de Gee et al., 2014, 2020, 2021; McGinley et al., 2015; McGinley, Vinck, et al., 2015; van Kempen et al., 2019). Note, however, that it is difficult to investigate causal effects of tonic neuromodulation in isolation of changes in phasic neuromodulation, mostly because phasic and tonic activity are thought to be anti-correlated, with lower phasic responses following high baseline activity and vice versa (Aston- Jones & Cohen, 2005; de Gee et al., 2020; Knapen et al., 2016). As such, pharmacologically elevating tonic neuromodulator levels may have resulted in changes in phasic neuromodulatory responses as well. Concurrent and systematic modulations of tonic (e.g. with pharmacology) and phasic (e.g. with accessory stimuli; Bruel et al., 2022; Tona et al., 2016) neuromodulator activity may be necessary to disentangle the respective and interactive effects of tonic and phasic neuromodulator activity on human perceptual decision-making.

      Reviewer #2 comments:

      The main weakness of the paper lies in the strength of evidence provided, and how the results tally with each other. To begin with, there are a lot of significance tests performed here, increasing the chances of false positives. Multiple comparison testing is only performed across time in the EEG results, and not across post-hoc comparisons throughout the paper. In and of itself, it does not invalidate any result per se, but it does colour the interpretation of any results of weak significance, of which there are quite a few. For example, the effect of Drug on d' and subsequent post-hoc comparisons, also effect of ATX on CPP amplitude and others.

      We agree with the reviewer that the statistical evidence for some of the results presented in this study is limited. This issue mostly concerns the effects of the pharmacological manipulation (effects of attention were strong and robust), which is unfortunately often the case given the high inter-individual variability in responses to pharmaceutical agents. We have added a paragraph to the Discussion in which we discuss this limitation of the current study. Furthermore, we discuss our findings in the context of previous work, thereby showing that - although not always robust- most of the reported drug effects were in the direction that could be expected based on previous literature. We have pasted that paragraph below.

      In Discussion, pages 16:

      Although the effects of the attentional manipulation were generally strong and robust, the statistical reliability of the effects of the pharmacological manipulation was more modest for some comparisons. This may partly be explained by high inter-individual variability in responses to pharmaceutical agents. For example, initial levels of catecholamines may modulate the effect of catecholaminergic stimulants on task performance, as task performance is supposed to be optimal at intermediate levels of catecholaminergic neuromodulation (Cools & D’Esposito, 2011). While acknowledging this, we would like to highlight that many of the observed effects of ATX were in the expected direction and in line with previous work. First, pharmacologically enhancing catecholaminergic levels have previously been shown to increase perceptual sensitivity (d’) (Gelbard-Sagiv et al., 2018), a finding that we have replicated here. Second, methylphenidate (MPH), a pharmaceutical agent that elevates catecholaminergic levels as well, has been shown to increase drift rate as derived from drift diffusion modeling on visual tasks (Beste et al., 2018) in line with our ATX observations. Third, a previous study using ATX to elevate catecholaminergic levels observed that ATX increased CPP slope (Loughnane et al., 2019). Although in our case ATX increased the CPP peak and not its slope, this provide causal evidence that centro-parietal ERP signals related to sensory evidence accumulation are modulated by the catecholaminergic system (Nieuwenhuis et al., 2005). Fourth, we observed that elevated levels of catecholamines affected stimulus driven occipital activity relatively late in time and close to the behavioral response, which resonates with previous observations (Gelbard-Sagiv et al., 2018). Finally, ATX had robust effects on physiological responses (heart rate, blood pressure, pupil size), cue-locked ERP signals and oscillatory power dynamics in the alpha-band, leading up to stimulus presentation. We concur, however, that more work is needed to firmly establish how (various forms of) attention and catecholaminergic neuromodulation affect perceptual decision-making.

      The lack of an overall RT effect of Drug leaves any DDM result a little underwhelming. How do these results tally? One potential avenue for lack of RT effect in ATX condition is increased drift rate but also increased non-decision time, working against each other. However, it may be difficult to validate these results theoretically.

      As the reviewer remarks, an increase in performance/d’ in absence of any RT effects can be algorithmically explained by a combination of increased drift rate and prolonged non-decision time. This is indeed what we observed for ATX. Non-decision time is generally thought to reflect the time necessary for stimulus encoding and motor execution and as such is seen as separate from the evidence-accumulation decision process. We deem it possible that ATX simultaneously prolonged stimulus encoding/motor execution (reflected in changes in non-decision time) and fastened evidence accumulation (reflected in changes in drift rate). Although our neural data did not provide evidence for this claim, previous work has demonstrated that increased baseline (pupil-linked) arousal/neuromodulation is associated with a decreased build-up rate of a neural signal associated with motor execution (β-power over motor cortex, Van Kempen et al., 2019, eLife), potentially linking increased non-decision time under ATX to slowing down of motor execution processes. The same authors also report relationships between baseline (pupil-linked) arousal/neuromodulation and activity over occipital and centroparietal cortices, respectively associated with sensory processing and sensory evidence accumulation, suggesting that baseline neuromodulation may affect all stages leading up to a decision (sensory processing, evidence accumulation and motor execution). Note also that the attentional manipulation seems to simultaneously increase drift rate and shorten non-decision time in our case, as one would expect (Figure 2E, Figure 2 – Supplements 4&5).

      There is an interaction between ATX and Cue in terms of drift rate, this goes against the main thesis of the paper of distinct and non-interacting contributions of neuromodulators and attention. This finding is then ignored. There is also a greater EDAN later for ATX compared to PLA later in the results, which would also indicate interaction of neuromodulators and attention but this is also somewhat ignored.

      There are indeed some interesting interaction effects between ATX and spatial attention (cue), as pointed out by the reviewer. However, we did also observe striking differences in the effects of ATX and attention on stimulus-locked occipital activity (in timing and spatial specificity) as well as independent (main) effects on CPP amplitude and pre-stimulus alpha power. Therefore, throughout the paper we tried to carefully describe the effects of attention and ATX as largely independently and jointly modulating perceptual decision-making, while at the same time highlighting the interaction effects that we observed, where present. We have highlighted the effects the reviewer refers to even more explicitly in a separate paragraph that we added to the discussion, pasted below.

      In Discussion, page 13-14:

      We did observe two striking interaction effects between the catecholaminergic system and spatial attention. First, effects of attention on drift rate were increased under catecholaminergic enhancement (Figure 2D). Although this interaction effect was not reflected in CPP slope/peak amplitude, this does suggest that catecholamines and spatial attention might together shape sensory evidence accumulation in a non-linear manner. Second, the amplitude of the cue-locked early lateralized ERP component (resembling the EDAN) was increased under ATX as compared to PLC. The underlying neural processes driving the EDAN ERP, as well as its associated functions, have been a topic of debate. Some have argued that the EDAN reflects early attentional orienting (Praamstra & Kourtis, 2010) but others have claimed it is mere a visually evoked response and reflects visual processing of the cue (Velzen & Eimer, 2003). Thus, whether this effect reflects a modulation of ATX on early attentional processes or rather a modulation of early visual responses to sensory input in general is a matter for future experimentation.

      The CPP results are somewhat unclear. Although there is an effect of ATX on drift rate algorithmically, there is no effect of ATX on CPP slope. On the other hand, even though there is no effect of DNP on drift rate, there is an effect of DNP on CPP slope. Perhaps one may say that the effect of DNP on drift rate trended towards significance, but overall the combination of effects here is a little unconvincing. In addition, there is an effect of ATX on CPP amplitude, but how does this tally with behaviour? Would you expect greater CPP amplitude to lead to faster or slower RTs? The authors do recognise this discrepancy in the Discussion, but discount it by saying the relationship between algorithmic and CPP parameters in terms of DDM is unclear, which undermines the reasoning behind the CPP analyses (and especially the one correlating CPP slope with DDM drift rate).

      We thank the reviewer for pointing out this dissociation of drug effects in terms of the algorithmic (DDM) and neural (CPP) ‘implementations’ of the evidence accumulating process underlying perceptual decisions. We have added a new paragraph to the discussion where we interpret the effects of ATX on the neural and algorithmic levels of evidence accumulation. Below we have pasted that paragraph:

      In Discussion, page 14-15:

      We reported attentional and neuromodulatory effects on algorithmic (DDM, Figure 2) and neural (CPP, Figure 3) markers of sensory evidence accumulation. Recent work has started to investigate the association of these two descriptors of the accumulation process, aiming to uncover whether neural activity over centroparietal regions reflects evidence accumulation, as proposed by computational accumulation-to-threshold models (Kelly & O’Connell, 2015; O’Connell et al., 2018; O’Connell & Kelly, 2021; Twomey et al., 2015). Currently, the CPP is often thought to reflect the decision variable, i.e. the (unsigned) evidence for a decision (Twomey et al., 2015), and consequently its slope should correspond with drift rate, whereas its amplitude at any time should correspond with the so-far accumulated evidence. As -computationally- the decision is reached when evidence crosses a decision bound (the threshold), it may be argued that the peak amplitude of the CPP (roughly) corresponds with the decision boundary. This seems to contradict our observation that 1) ATX modulated drift rate, but not CPP slope and 2) ATX did not modulate boundary separation, but did modulate CPP peak. Note, however, that previous studies using pharmacology or pupil-linked indexes of (catecholaminergic) neuromodulation have also demonstrated effects on both CPP peak (van Kempen et al., 2019) and CPP slope (Loughnane et al., 2019).

      The posterior component effects are problematic. The main issue is the lack of clarification of and justification for the choice of posterior component. The analysis is introduced in the context of the target selection signal the N2pc/N2c, but the component which follows is defined relative to Cue, albeit post-target. Thus this analysis tells us the effect of Cue on early posterior (possibly) visual ERP components, but it is not related to target selection as it is pooled across target/distractor. Even if we ignore this, the results themselves wrt Drug lack context. There is a trending lower amplitude for ATX at later latencies at temporo-parietal electrodes, and more positive for DNP, relative to PLA. Is this what one would expect given behaviour? This is where the issue of correct component identification becomes critical in order to inform any priors on expected ERP results given behaviour.

      We thank the reviewer for raising this issue with the occipital ERP analysis, allowing us to clarify our decisions regarding the analyses and our interpretations of the results. First, the selection of electrodes was based on, and identical to, previous studies investigating lateralized target selection signals in visual tasks containing bilateral visual stimuli (Loughnane et al., 2016; Newman et al., 2017; Papaioannou & Luck, 2020; van Kempen et al., 2019). Second, the ERPs were defined relative to both the direction of the cue as well as the location of the target. As cue direction and target location were not always congruent (cue validity=80%), we could adopt a 2x2 (cue direction x stimulus identity) design for our ERP analyses (we are ignoring drug condition for explanation purposes). For example, for validly cued target trials we extracted two ERP traces: 1) from the hemisphere contralateral to both the cue and the target stimulus (representing processing of cued target stimulus) and 2) from the hemisphere ipsilateral to the cue and the target stimulus (representing processing of non-cued noise stimulus). However, for invalidly cued trials, ERP traces were extracted from 3) the hemisphere contralateral to cue direction and ipsilateral to the target stimulus (reflecting processing of cued noise stimuli) as well as 4) from the hemisphere ipsilateral to cue direction but contralateral to the target stimulus (reflecting processing of non-cued target stimuli). By defining our ERPs as such, we were able to gauge effects of cue direction (reflecting general shifts in attention), stimulus identity (reflecting target vs. noise selection processes) and their interaction (reflecting cue validity) on activity over occipito-temporal activity. Third, we did not pool data (across target/noise stimuli) for statistical analyses, but only for visualization purposes. To clarify how we extracted ERP traces, we have changed Figure 4 substantially. The updated figure now contains a schematic of how these four distinct ERP traces (cue x stimulus identity) were extracted from neural activity. Moreover, for clarity sake, we now show all 12 ERP traces (3x2x2, drug condition x cue direction x stimulus identity) as well as the three main effects that we observed after performing a 3x2x2 repeated measures (rm)ANOVA over time.

      We observed robust (cluster-corrected) effects of cue direction (not validity) on early occipital activity (Fig. 4C – left panel) and of stimulus identity (target/noise) and drug condition on later occipital activity (Fig. 4C – middle and right panel). These results crucially highlight the different temporal (early/late) and spatial (lateralized/not lateralized) profiles of cue, target and drug effects on occipital activity. Moreover, we observed a specific order of drug effects on late occipital activity (DNP>PLC>ATX). The behavioral relevance of this pattern of effects remains elusive. Although the effects of drug condition coincide in time with those of target selection (i.e. when activity contralateral and ipsilateral to the target stimulus was different), the effects of drug were bilateral, meaning that occipito-temporal activity related to the processing of the target (task-relevant) stimulus and non-target (task-irrelevant) stimulus was equally modulated by these pharmaceutical agents. One might argue that these effects show that neither ATX nor DNP modulated the signal-to-noise ratio (SNR), a feature that describes how well relevant stimulus information (signal) can be discerned from irrelevant information (noise). Although it may be tempting to extrapolate this finding to behavior, by suggesting that on the basis of these drug effect neither ATX nor DNP could have modulated d’ (behavioral measure describing how well signal is separated from noise), we would like to point out that our behavioral task specifically concerned a discrimination task about the (orientation of the) target stimulus in which the difference between signal and noise was only relevant for localization purposes and thus has a less direct relation with task performance. As such it is difficult to grasp how the modulation of late occipito-temporal activity by ATX and DNP relates to their behavioral effects. Moreover, the bilateral effect of both ATX and DNP also suggests an absence of interaction effects between drug conditions and visuo-spatial attention, as the effects of ATX/DNP were similar across all cue and target identity conditions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):*

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test the possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:

      1) GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2) UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 are expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      This is correctly stated with the exception that the P079 mouse line itself expresses mCitrine. The Brainbow mouse line was used in the connectivity study by crossing it to the GRP-Cre or Calretinin-Cre lines.

      3) Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute nor relative abundance was quantified.

      They were certainly rare to observe using our approaches, but we reasoned that the densities of such connections are not possible to estimate accurately. Please see discussion below.

      4) A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF- OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer-lasting sensorimotor responses.

      The cerebellum's long-known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobules of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, the overall experimental design has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes severely undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:

      1) All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long-lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. That is, even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane and Trussell, 2015). Thus, blocking inhibition was essential to produce clear results in the characterization of GRP and P079 UBCs. While GABAergic/glycinergic feedforward and feedback inhibition is certainly important in the intact circuit, it was not our intention, nor was it possible, to study its contribution in the present study. Leaving inhibition unblocked does not lead to a physiologically realistic stimulation pattern in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition by directly exciting Golgi cells, rather than their synaptic inputs. The main inhibition that UBCs receive that are crucial to determining burst or pause durations is not via GABA/glycine, but instead through mGluR2, which lasts for 100-1000s of milliseconds. The main excitation that drives UBC firing is mGluR1 and AMPA, which both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition. Recent studies that examined the duration of bursting or pausing in UBCs had inhibition blocked in their experiments, presumably for the reasons outlined above (Guo et al., 2021; Huson et al., 2023).

      In Author response image 1 is an example showing the synaptic currents and firing patterns in an ON UBC before and after blocking inhibition. The GABA/glycinergic inhibition is fast, occurs soon after the stimuli and has little to no effect on the slow inward current that develops after the end of stimulation, which is what drives firing for 100s of milliseconds.

      Author response image 1.

      Example showing small effect of GABAergic and glycinergic inhibition on excitatory currents and burst duration. A) Excitatory postsynaptic currents in response to train of 10 presynaptic stimuli at 50 Hz before (black) and after (Grey) blocking GABA and glycine receptors. The slow inward current that occurs at the end of stimulation is little affected. B) Expanded view of the synaptic currents evoked during the train of stimuli. GABA/glycine receptors mediate the fast outward currents that occur immediately after the first couple stimuli. C) Three examples of the bursts caused by the 50 Hz stimulation in the same cell without blocking GABA and glycine receptors. D) Three examples in the same cell after blocking GABA and glycine receptors.

      2) No consideration for the involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      We are confident that the synaptic responses shown are monosynaptic for several reasons. UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input, because the main input is accounted for by the monosynaptic response. In all cells included in our data set, the fast AMPA receptor-mediated currents always occurred with short latency (1.24 ± 0.29 ms; mean ± SD; n = 13), high reliability (no failures to produce an EPSC in any of the 13 GRP UBCs in this data set), and low jitter (SD of latency; 0.074 ± 0.046 ms; mean ± SD; n = 13). These measurements have been added to the results section. In some rare cases, we did observe disynaptic currents, which were easily distinguishable because a single electrical stimulation produced a burst of EPSCs at variable latencies. Please see example in Author response image 2. These cases of disynaptic input, which have been reported by others (Diño et al., 2000; Nunzi and Mugnaini, 2000; van Dorp and De Zeeuw, 2015) support the conclusion that UBCs receive input from other UBCs.

      Author response image 2.

      Example of GRP UBC with disynaptic input. Three examples of the effect of a single presynaptic stimulus (triangle) in a GRP UBC with presumed disynaptic input. Note the variable latency of the first evoked EPSC, bursts of EPSCs, and spontaneous EPSCs.

      3) Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      As noted by the reviewer, the connections between UBCs were rare to observe. We decided against attempting to quantify the absolute or relative density of connections for several reasons. A major reason for rare observations of anatomical connections between UBCs is likely due to the sparse labeling. First, the GRP mouse line only labels 20% of ON UBCs and we are unable to test whether postsynaptic connectivity of GRP ON UBCs is the same as that of the rest of the population of ON UBCs that are not labeled in the GRP mouse line. Second, the Brainbow reporter mouse only labels a small population of Cre expressing cells for unknown reasons. Third, the Brainbow reporter expression was so low that antibody amplification was necessary, which then limited the labeled cells to those close to the surface of the brain slices, because of known antibody penetration difficulties. Therefore, we refrained from estimating the density of these connections, because each of these variables reduced the labeling to unknown degrees and we reasoned that extrapolating our rare observations to the total population would be inaccurate.

      A paper that investigated UBC connectivity using organotypic slice cultures from P8 mice suggests that 2/3 of the UBC population receives UBC input, based on the observation that 2/3 of the mossy fibers did not degenerate as would be expected after 2 days in vitro if they were severed from a distant cell body (Nunzi and Mugnaini, 2000). It remains to be seen if this high proportion is due to the young age of these mice or is also the case in adult mice. Even if these connections are indeed rare, they are expected to have profound effects on the circuit, as each UBC has multiple mossy fiber terminals (Berthie and Axelrad, 1994), and mossy fiber terminals are estimated to contact 40 granule cells each (Jakab and Hamori, 1988). We have added a comment regarding this point to the discussion.

      4) Lack of critical parameters in NEURON model.

      A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant. However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      A) The model produces synaptic AMPA and mGluR2 currents that reproduce those we recorded in vitro. We did not find it necessary to implement changes in glutamate release during a train as the model was fit to UBC data with the assumption that the glutamate transient did not change during the train. If there is a change in neurotransmitter release during a train, it is therefore built into the model, which has the advantage of reducing its complexity. UBCs are a special case where the postsynaptic currents are mediated mostly by the total amount of transmitter released. Most of the evoked current occurs tens to hundreds of milliseconds after neurotransmitter release and is therefore much more sensitive to total release and less sensitive to how it is released during the train. Author response image 3 shows the effect of reducing the amount of glutamate released by 10% on each stimulus in the model. Despite a significant change in the pattern of neurotransmitter release, as well as a reduction in the total amount of glutamate, the slow EPSC still decays over the course of hundreds of milliseconds.

      Author response image 3.

      Effect of short-term depression of neurotransmitter release. A) The top trace shows the glutamate transient that drives the AMPA receptor model used in our study. No change in release is implemented, although the slow tail of each transient summates during the train. The bottom trace shows the modeled AMPA receptor mediated current. B) In this model the amount of glutamate released is reduced by 10% on each stimulus. The duration of the slow AMPA current that develops at the end of stimulation is similar, despite a profound change in the pattern of neurotransmitter exposure.

      B) The detailed kinetic AMPA receptor model used here accurately reproduces desensitization, and in fact recovery from desensitization is what mediates the slow ON UBC current. This AMPA receptor is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The forward and reverse rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they accurately reproduced the ON UBC currents evoked by synaptic stimulation in our previous work (Balmer et al., 2021).

      5) Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because, in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      To address this reasonable concern of a general lack of quantification of electrophysiological responses we have added data characterizing the slow inward and outward currents evoked by synaptic stimulation in GRP and P079 UBCs in the results section and in new panels in Figure 1. We report the action potential pause lengths in P079 UBCs and burst lengths in ON UBCs in the results section. However, we favor the duration of the currents to the length of burst and pause, because the currents do not depend on a stable resting membrane potential, which is itself difficult to determine in intracellular recordings of these small cells. We have added peak times and decay time constants of the slow inward and outward currents in ON and OFF UBCs in the results section and have added new panels to figure 1.

      In a series of recent publications that focused on UBC firing, the authors argue that cell-attached recordings are necessary to determine accurately the burst and pause lengths, as well as spontaneous firing rates (Guo et al., 2021; Huson et al., 2023). (The trade-off of these extracellular recordings is that the monosynaptic nature of the input is nearly impossible to confirm.) Spontaneous firing rates were variable within both GRP and P079 UBCs from silent to firing regularly or in bursts, as previously reported for UBCs (Kim et al., 2012; van Dorp and De Zeeuw, 2015). For clarity, we chose to model the GRP UBCs as silent unless receiving synaptic input and P079 UBCs as active unless receiving synaptic input. As the reviewer suggests, we have observed UBCs firing in the patterns similar to those shown in the model UBCs that have input from a spontaneously active presynaptic UBC. In Author response image 4 are some examples.

      Author response image 4.

      Examples of UBCs that receive spontaneous input. A) Three ON UBCs that had spontaneous EPSCs, suggesting the presence of an active presynaptic UBC. B) Two OFF UBCs that had spontaneous outward currents.

      Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model, however, I have some comments/suggestions that may help to improve this manuscript:

      • The discovery of UBCs innervating each other and their own subtypes, suggesting the presence of feed-forward networks in the cerebellum, is an incredibly fascinating and exciting finding followed by an intriguing model by authors. However, it is worth considering an alternative model as well. I acknowledge that visualizing such interactions using current tools and methods can be challenging ("The approaches used here were not able to determine the existence of networks of more than 2 UBCs connected one after the other. If present, 3 or more UBCs in series could extend and transform the input in even more dramatic ways. The temporal diversity that UBC circuits generate may underlie the flexibility of the cerebellum to coordinate movements over a broad range of behaviors."). Therefore, if this is the case in which more than 2 UBCs connected one after the other, then an alternative model PERHAPS resembles the basal nuclei, with its direct and indirect circuits, can be considered (maybe a type of circular model). The basal nuclei circuits are also regulated by modulators such as D1 dopamine receptors in the direct pathway, causing depolarization, and D2 dopamine receptors in the indirect pathway, resulting in hyperpolarization upon dopamine activation. This approach could involve using computational models to gain insight into potential alternatives within this pathway (may be a future direction).

      Thank you for this suggestion to consider the potentially similar circuit interactions in the basal nuclei. We will certainly investigate this further as we move forward with modeling the feed-forward networks in the cerebellum.

      • GRP UBCs are more densely distributed in lobes VI-IX, while P079 UBCs are more densely distributed in the dorsal leaflet of lobe X in sagittal sections. While the cerebellum is well known for its characteristic stripy pattern, are UBC distributions the same in coronal/transverse section?

      UBCs of different types, based on their expression of specific proteins, have overlapping but somewhat distinct distributions in coronal sections. The densities of calretinin-expressing UBCs are higher within Zebrin II positive zones and form sagittal stripes, whereas the densities of mGluR1-expressing and PLCb4-expressing UBCs vary less but are in their highest densities at the midline (Chung et al., 2009; Sekerkova et al., 2014). The difference noted by the reviewer between the dorsal and ventral leaflets of lobe X are the most distinct that we know of in the GRP and P079 populations.

      • The extension of the axons from both subtypes of UBCs show they are long enough to pass several UBCs and even projections are directed toward the white matter (e.g. Fig 9A), suggesting targeting the UBCs or granule cells in other lobules. Is it suggesting UBCs connectivity between different lobules (perhaps longitudinal connectivity)? Is there any observation or information in coronal/transverse section to visualize mediolateral connectivity?

      This is certainly worth exploring in future work. UBCs have been reported to project their axons into and across the white matter (Diño et al., 2000). To our knowledge, whether UBCs project their axons out of one lobule and into another has not been examined.

      • The limitation in identifying networks involving more than two sequentially connected UBCs was briefly noted. I suggest including a paragraph describing limitations and discussing the implications of the findings would enhance the overall impact of the research and broaden our understanding of cerebellar function.

      • It is a pity that there is no clear conclusion to the discussion of this very interesting study. I suggest providing the key points as a conclusion.

      Thank you for these suggestions. Limitations and implications are included throughout the discussion section and we feel that the summary figure and significance statement now sufficiently convey the key conclusions of the study.

      • Please make the correction in Figure 2A by relabeling it as IXa, IXb, and IXc to correct the typographical error.

      Fixed

      • I recommend rotating Figure 7A to align its orientation with the other figures for consistency.

      Fixed

      Reviewer #1 (Recommendations For The Authors):

      Minor comments that should be addressed for clarity:

      1) In the NEURON model, why was the reversal potential for the leak conductance and Gmax for Ih different for the two types of UBCs. Relatedly, why is Erev for GABAB -95mV if Ek is -90mV?

      The h-current (Ih) was estimated from a hyperpolarizing current step in both cell types and these data have been added to the result section and as a panel in Figure 1. The conductance of Ih in the model cells were adjusted accordingly, with OFF UBCs having ~3 times that of ON UBCs and approximated the measured voltage sag, as we now describe in the methods section. The reversal potential of the model mGluR2 current (which is based on a model of GABAB) has been fixed.

      2) Line 69 justification for their dual genetic approach is a bit too strong: "Paired recordings not possible". It may be difficult, but it is certainly possible.

      Reworded

      3) Confusing wording, only one stat for two parameters? Line 93: These currents were produced by both mGluR1 and AMPA receptors, as they were blocked by their antagonists JNJ16259685 and GYKI53655, respectively (92.86% {plus minus} 3.25; paired t-test; P=0.0066; n = 9; 95 mean {plus minus} SEM) (Fig 1D-E).

      Reworded

      References

      Balmer TS, Borges-Merjane C, Trussell LO (2021) Incomplete removal of extracellular glutamate controls synaptic transmission and integration at a cerebellar synapse. eLife 10:e63819.

      Berthie B, Axelrad H (1994) Granular layer collaterals of the unipolar brush cell axon display rosette-like excrescences. A Golgi study in the rat cerebellar cortex. Neuroscience Letters 167:161–165.

      Borges-Merjane C, Trussell LO (2015) ON and OFF unipolar brush cells transform multisensory inputs to the auditory system. Neuron 85:1029–1042.

      Chung SH, Sillitoe RV, Croci L, Badaloni A, Consalez G, Hawkes R (2009) Purkinje cell phenotype restricts the distribution of unipolar brush cells. Neuroscience 164:1496–1508.

      Diño MR, Schuerger RJ, Liu Y-B, Slater NT, Mugnaini E (2000) Unipolar brush cell: a potential feedforward excitatory interneuron of the cerebellum. Neuroscience 98:625–636.

      Guo C, Huson V, Macosko EZ, Regehr WG (2021) Graded heterogeneity of metabotropic signaling underlies a continuum of cell-intrinsic temporal responses in unipolar brush cells. Nat Commun 12:5491.

      Huson V, Newman LN, Regehr WG (2023) A continuum of response properties across the population of Unipolar Brush Cells in the Dorsal Cochlear Nucleus. J Neurosci Available at: https://www.jneurosci.org/content/early/2023/07/26/JNEUROSCI.0873-23.2023 [Accessed August 15, 2023].

      Jakab RL, Hamori J (1988) Quantitative morphology and synaptology of cerebellar glomeruli in the rat. Anatomy and embryology 179:81–88.

      Kim JA, Sekerkova G, Mugnaini E, Martina M (2012) Electrophysiological, morphological, and topological properties of two histochemically distinct subpopulations of cerebellar unipolar brush cells. Cerebellum 11:1012–1025.

      Nunzi M-G, Mugnaini E (2000) Unipolar brush cell axons form a large system of intrinsic mossy fibers in the postnatal vestibulocerebellum. Journal of Comparative Neurology 422:55–65.

      Sekerkova G, Watanabe M, Martina M, Mugnaini E (2014) Differential distribution of phospholipase C beta isoforms and diaglycerol kinase-beta in rodents cerebella corroborates the division of unipolar brush cells into two major subtypes. Brain structure & function 219:719–749.

      van Dorp S, De Zeeuw CI (2015) Forward signaling by unipolar brush cells in the mouse cerebellum. Cerebellum 14:528– 533.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Cook, Watt, and colleagues previously reported that a mouse model of Spinocerebellar ataxia type 6 (SCA6) displayed defects in BDNF and TrkB levels at an early disease stage. Moreover, they have shown that one month of exercise elevated cerebellar BDNF expression and improved ataxia and cerebellar Purkinje cell firing rate deficits. In the current work, they attempt to define the mechanism underlying the pathophysiological changes occurring in SCA6. For this, they carried out RNA sequencing of cerebellar vermis tissue in 12-month-old SCA6 mice, a time when the disease is already at an advanced stage, and identified widespread dysregulation of many genes involved in the endo-lysosomal system. Focusing on BDNF/TrkB expression, localization, and signaling they found that, in 7-8 month-old SCA6 mice early endosomes are enlarged and accumulate BDNF and TrkB in Purkinje cells. Curiously, TrkB appears to be reduced in the recycling endosomes compartment, despite the fact that recycling endosomes are morphologically normal in SCA6. In addition, the authors describe a reduction in the Late endosomes in SCA6 Purkinje cells associated with reduced BDNF levels and a probable deficit in late endosome maturation.

      We would like to thank the reviewers for their careful reading of the paper, their feedback has helped us to add information and experiments to the paper that enhance the clarity of the findings.

      Strengths:

      The article is well written, and the findings are relevant for the neuropathology of different neurodegenerative diseases where dysfunction of early endosomes is observed. The authors have provided a detailed analysis of the endo-lysosomal system in SCA6 mice. They have shown that TrkB recycling to the cell membrane in recycling endosomes is reduced, and the late endosome transport of BDNF for degradation is impaired. The findings will be crucial in understanding underlying pathology. Lastly, the deficits in early endosomes are rescued by chronic administration of 7,8-DHF.

      We thank the reviewers for their positive feedback on this work.

      Weaknesses:

      The specificity of BDNF and TrkB immunostaining requires additional controls, as it has been very difficult to detect immunostaining of BDNF. In addition, in many of the figures, the background or outside of Purkinje cell boundaries also exhibits a positive signal.

      We agree with the reviewers that the performance of the BDNF and TrkB antibodies is an important concern. We have ourselves had difficulties with the performance of many antibodies and the images in this paper are the result of many years of optimization. We have therefore added further detail about the antibody optimization to the methods section of this paper, and have carried out new staining experiments with additional controls. We have added 2 new figure panels in supplementary figures 3 and 4 to demonstrate these tests.

      In the case of anti-BDNF antibodies, we have tested several antibodies and staining protocols and found that in our hands, the only antibody that reliably stained BDNF with a good signal to noise ratio was the one used in this paper (abcam ab108319). Even for this antibody, the staining was greatly enhanced by the use of a heat induced epitope retrieval (HIER) step, which allowed the visualization of BDNF within intracellular structures such as endosomes. When we quantified the intensity of this staining in our previous paper, the results were in agreement with those from a BDNF ELISA used to measure levels of BDNF in the cerebellar vermis of WT and SCA6 mice (Cook et al., 2022), which corroborates these results. As the staining was carried out in tissue sections and not dissociated cells, we also see positive signal from the BDNF staining outside of the Purkinje cells, since BDNF acts on cell-surface receptors and is thus released into the extracellular space around cells (Kuczewski et al., 2008) and is detectable in the extracellular matrix (Lam et al., 2019) and presynaptic terminals around neurons (Camuso et al., 2022; Choo et al., 2017). This is in contrast to studies that image BDNF mRNA with in-situ hybridization, which labels BDNF mRNA predominantly found in cells, and cannot tell us about sub-cellular or extracellular localization of BDNF protein. Together, these factors explain why we observe staining that is not cell- limited, but extends into the space around the cells of interest.

      We have added an additional supplemental figure to demonstrate the importance of using HIER when staining slices with anti-BDNF (Supplementary figure 3). We tested HIER protocols that involved heating the slices to 95°C in a variety of buffers. The buffers tested were sodium citrate buffer (10 mM sodium citrate, 0.05% Tween 20, pH 6), Tris buffer (10mM TBS, 0.05% Tween 20, pH 10), EDTA buffer (1mM EDTA, 0.05% Tween 20, pH 8) and neutral PBS. The PBS produced the best result, enhancing the staining of both anti-BDNF and anti-EEA1 antibodies (Supplementary figure 3). Therefore all slices stained using those antibodies were heated to 95°C in PBS using a heat block or thermocycler for 10 minutes, then allowed to cool before staining proceeded.

      The antibody we use (abcam ab108319) has been used in hundreds of other publications, including Javed et al., 2021 who ectopically expressed BDNF and noted colocalization between the antibody staining and the GFP tag of the BDNF construct, and Lejkowska et al., 2019 who overexpressed BDNF and saw a dramatic increase in antibody staining as well. The colocalization between ectopically expressed BDNF and the antibody in these studies demonstrates the specificity of the antibody.

      However, to further validate antibody specificity we used liver tissue as a negative control. In liver tissue from rodents and humans, the majority of the liver contains negligible levels of BDNF (Koppel et al., 2009; Vivacqua et al., 2014), see also the Human Protein Atlas. The exception is some cholangiocytes: epithelial cells that express BDNF at high levels (Vivacqua et al., 2014). We obtained liver tissue from a WT mouse that was undergoing surgery for an unrelated project and fixed and processed the tissue as we did for brain tissue (outlined in methods section). As we would expect, most of the cells in the liver showed BDNF immunoreactivity that was comparable to background levels (Supplementary figure 3). Interestingly, we were also able to detect sparse highly BDNF-positive cells in the liver, presumed cholangiocytes (Supp. Fig. 3). This pattern of liver BDNF expression is as predicted in the literature, and thus acts as a control for our antibody. We therefore believe that in our hands this antibody is able to stain BDNF with an appropriate degree of specificity.

      We also carried out staining experiments using a second anti-TrkB antibody that we had previously used to detect TrkB via Western bloing. We carried out immunohistochemistry as previously described using tissue sections from a WT mouse. The staining with the two different antibodies was carried out at the same time and all other reagents were kept constant. We found that both antibodies labelled TrkB in a similar pattern of localization, including in the early endosomes of the Purkinje cells (Supplementary figure 4). The second antibody however did have a lower signal to noise ratio and so we believe that the original anti-TrkB antibody used in this manuscript (EMD Millipore ab9872) is optimal for staining cerebellar tissue sections in our hands.

      One important concern about the conclusions is that the RNAseq experiment was conducted in 12-month- old SCA6 mice suggesting that the defects in the endo-lysosomal system may be caused by other pathophysiological events and, likewise, the impairment in BDNF signaling may also be indirect, as also noted by the authors. Indeed, Purkinje cells in SCA6 mice have an impaired ability to degrade other endocytosed cargo beyond BDNF and TrkB, most likely because of trafficking deficits that result in a disruption in the transport of cargo to the lysosomes and lysosomal dysfunction.

      We agree with the reviewers that the defects in the endo-lysosomal system may be caused by other events occurring in the course of disease progression. As mentioned by the reviewers, we have noted this possibility in the text. Detailed investigation into the sequence of events and the root causes of signaling disruption in SCA6 merits future study and we aim to address this in future work. We have expanded this explanation in the text.

      Moreover, the beneficial effects of 7,8-DHF treatment on motor coordination may be caused by 7,8-DHF properties other than the putative agonist role on TrkB. Indeed, many reservations have been raised about using 7,8-DHF as an agonist of TrkB activity. Several studies have now debunked (Todd et al. PlosONE 2014, PMID: 24503862; Boltaev et al. Sci Signal 2017, PMID: 28831019) or at the very least questioned (Lowe D, Science 2017: see Discussion: https://www.science.org/content/blog-post/those-compounds-aren-t- what-you-think-they-are Wang et al. Cell 2022 PMID: 34963057). Another interpretation is that 7,8-DHF possesses antioxidant activity and neuroprotection against cytotoxicity in HT-22 and PC12 cells, both of which do not express TrkB (Chen et al. Neurosci Lett 201, PMID: 21651962; Han et al. Neurochem Int. 2014, PMID: 24220540). Thus, while this flavonoid may have a beneficial effect on the pathophysiology of SCA6, it is most unlikely that mechanistically this occurs through a TrkB agonistic effect considering the potent anti-oxidant and anti-inflammatory roles of flavonoids in neurodegenerative diseases (Jones et al. Trends Pharmacol Sci 2012, PMID: 22980637).

      We thank the reviewers for raising this important point. We have noted in our previous paper (Cook et al., 2022) that 7,8-DHF may not be acting as a TrkB agonist in SCA6 mice, and are in agreement that other explanations are possible. We have now added information to the text of this paper to highlight this possibility. We did show in our previous paper that 7,8-DHF administration activates Akt signaling in the cerebellum of SCA6 mice, a signaling event that is known to take place downstream of TrkB activation. Additionally, 7,8-DHF treatment led to the increase of TrkB levels in the cerebellum of SCA6 mice (Cook et al., 2022), implicating TrkB in the mechanism of action, even if mechanistically, this is not via direct TrkB activation alone. However, even if the mechanism is currently incompletely explained, we believe that 7,8- DHF remains a valuable treatment strategy for SCA6. We have tried to rewrite the Discussion to highlight what we think is the most important takeaway: that 7,8-DHF can rescue endosomal and other deficits in SCA6, even if we do not currently know the full mechanism of action. We have therefore amended the text to add more detail about other potential explanations for the mechanism of action of 7,8-DHF.

      References

      Camuso S, La Rosa P, Fiorenza MT, Canterini S. 2022. Pleiotropic effects of BDNF on the cerebellum and hippocampus: Implications for neurodevelopmental disorders. Neurobiol Dis. doi:10.1016/j.nbd.2021.105606

      Choo M, Miyazaki T, Yamazaki M, Kawamura M, Nakazawa T, Zhang J, Tanimura A, Uesaka N, Watanabe M, Sakimura K, Kano M. 2017. Retrograde BDNF to TrkB signaling promotes synapse elimination in the developing cerebellum. Nat Commun 8:195. doi:10.1038/s41467-017-00260-w

      Cook AA, Jayabal S, Sheng J, Fields E, Leung TCS, Quilez S, McNicholas E, Lau L, Huang S, Watt AJ. 2022. Activation of TrkB-Akt signaling rescues deficits in a mouse model of SCA6. Sci Adv 8:3260. doi:10.1126/sciadv.abh3260

      Javed S, Lee YJ, Xu J, Huang WH. 2021. Temporal dissection of Rai1 function reveals brain-derived neurotrophic factor as a potential therapeutic target for Smith-Magenis syndrome. Hum Mol Genet 31:275–288. doi:10.1093/HMG/DDAB245

      Koppel I, Aid-Pavlidis T, Jaanson K, Sepp M, Pruunsild P, Palm K, Timmusk T. 2009. Tissue-specific and neural activity-regulated expression of human BDNF gene in BAC transgenic mice. BMC Neurosci 10:68. doi:10.1186/1471-2202-10-68

      Kuczewski N, Porcher C, Ferrand N, Fiorentino H, Pellegrino C, Kolarow R, Lessmann V, Medina I, Gaiarsa JL. 2008. Backpropagating action potentials trigger dendritic release of BDNF during spontaneous network activity. J Neurosci 28:7013–7023. doi:10.1523/JNEUROSCI.1673-08.2008

      Lam D, Enright HA, Cadena J, Peters SKG, Sales AP, Osburn JJ, Soscia DA, Kulp KS, Wheeler EK, Fischer NO. 2019. Tissue-specific extracellular matrix accelerates the formation of neural networks and communities in a neuron-glia co-culture on a multi-electrode array. Sci Rep 9. doi:10.1038/s41598- 019-40128-1

      Lejkowska R, Kawa MP, Pius-Sadowska E, Rogińska D, Łuczkowska K, Machaliński B, Machalińska A. 2019. Preclinical Evaluation of Long-Term Neuroprotective Effects of BDNF-Engineered Mesenchymal Stromal Cells as Intravitreal Therapy for Chronic Retinal Degeneration in Rd6 Mutant Mice. Int J Mol Sci 2019, Vol 20, Page 777 20:777. doi:10.3390/IJMS20030777

      Vivacqua G, Renzi A, Carpino G, Franchitto A, Gaudio E. 2014. Expression of brain derivated neurotrophic factor and of its receptors: TrKB and p75NT in normal and bile duct ligated rat liver. Ital J Anat Embryol 119:111–129. doi:10.13128/IJAE-15138

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their thoughful and careful evaluation of our manuscript. We appreciate your time and effort and have incorporated many of these suggestions to improve our revised manuscript.

      Reviewer #1 (Public Review):

      Summary: Cullinan et al. explore the hypothesis that the cytoplasmic N- and C-termini of ASIC1a, not resolved in x-ray or cryo-EM structures, form a dynamic complex that breaks apart at low pH, exposing a C-terminal binding site for RIPK1, a regulator of necrotic cell death. They expressed channels tagged at their N- and C-termini with the fluorescent, non-canonical amino acid ANAP in CHO cells using amber stop-codon suppression. Interaction between the termini was assessed by FRET between ANAP and colored transition metal ions bound either to a cysteine reactive chelator attached to the channel (TETAC) or metal-chelating lipids (C18-NTA). A key advantage to using metal ions is that they are very poor FRET acceptors, i.e. they must be very close to the donor for FRET to occur. This is ideal for measuring small distances/changes in distance on the scales expected from the initial hypothesis. In order to apply chelated metal ions, CHO cells were mechanically unroofed, providing access to the inner leaflet of the plasma membrane. At high pH, the N- and C- termini are close enough for FRET to be measured, but apparently too far apart to be explained by a direct binding interaction. At low pH, there was an apparent increase in FRET between the termini. FRET between ANAP on the N-and Ctermini and metal ions bound to the plasma membrane suggests that both termini move away from the plasma membrane at low pH. The authors propose an alternative hypothesis whereby close association with the plasma membrane precludes RIPK1 binding to the C-terminus of ASIC1a.

      Strengths: The findings presented here are certainly valuable for the ion channel/signaling field and the technical approach only increases the significance of the work. The choice of techniques is appropriate for this study and the results are clear and high quality. Sufficient evidence is presented against the starting hypothesis.

      Weaknesses: I have a few questions about certain controls and assumptions that I would like to see discussed more explicitly in the manuscript.

      My biggest concern is with the C-terminal citrine tag. Might this prevent the hypothesized interaction between the N- and C-termini? What about the serine to cysteine mutations? The authors might consider a control experiment in channels lacking the C-terminal FP tag.

      While it is certainly possible that the C-terminal citrine tag is preventing the hypothesized interaction between the intracellular termini, there are a few things that mitigate (but not eliminate) this concern. First, previous work looking at the interaction between the intracellular termini used FPs on both the N- and C-termini and concluded that in fact there is an interaction (PMID:31980622). Our channels have only a single FP, and we use a higher resolution FRET approach. Second, we aVach our citrine tag with a 11-residue linker, allowing for enhanced flexibility of the region and hopefully allowing for more space for an interaction that was posited to be between the very proximal part of the C-terminus (near the membrane and away from the tag) and the untagged N-terminus. Third, we previously showed that Stomatin, a much larger protein than the NTD, could bind the distal C-terminus of rASIC3 with a large fluorescent protein connected by the same linker on the C-terminus. In the case of Stomatin, the interaction involved the residues at the distal portion of the C-terminus close to the bulky FP. Interestingly, while we did not publish this, without this flexible linker, Stomatin could not regulate the channel and likely did not bind.

      Despite this, we agree that this is possible and have added a statement in our limitations section explicitly saying this.

      Figure 2 supplement 1 shows apparent read-through of the N-terminal stop codons. Given that most of the paper uses N-terminal ANAP tags, this figure should be moved out of the supplement. Do Nterminally truncated subunits form functional channels? Do the authors expect N-terminally truncated subunits to co-assemble in trimers with full-length subunits? The authors should include a more explicit discussion regarding the effect of truncated channels on their FRET signal in the case of such co-assembly.

      The positions that show readthrough (E6, L18, H515) were not used in the study. We eliminated them largely on the basis of these westerns. We elected to put the bulk of the blots in the supplement simply because of how many there were. We believe this is the best compromise. It allows us to show representative blots for all our positions without making an illegible figure with 7 blots.

      The N-terminally truncated subunits would create very short peptides that are not able to create functional channels. A premature stop at say E8 would create a 7-mer. Our longest N-terminal truncation would only create a protein of 32 amino acids. These don’t contain the transmembrane segments and thus cannot make functional channels.

      As the epitope used for the western blots in Figure 2 and supplements is part of the C-terminal tag, these blots do not provide an estimate of the fraction of C-terminally truncated channels (those that failed to incorporate ANAP at the stop codon). What effect would C-terminally truncated channels have on the FRET signal if incorporated into trimers with full-length subunits?

      Alternatively, C-terminally truncated subunits would be able to form functional channels because they contain the full N-terminus, the transmembrane domains, the extracellular domain and a portion of the C-terminus. We don’t think this is a major contaminant to our experiments. The only two C-terminal ANAP positions we use are 464 and 505. In each of these cases, they are only used for memFRET. The ones that do not contain ANAP are essentially “invisible” to the experiment. Since we are measuring their proximity to the membrane, having some missing should not maVer. However, there is some chance that truncations in some subunits could allosterically affect the position of the CT in other subunits. We have added a discussion of this in the manuscript.

      Some general discussion of these results in the context of trimeric channels would be helpful. Is the putative interaction of the termini within or between subunits? Are the distances between subunits large enough to preclude FRET between donors on one subunit and acceptor ions bound on multiple subunits?

      Thank you for this comment. We did not directly test whether the distances are within or between subunits. We considered using a concatemer to do this, however, the concatemeric channels do not express particularly well. Then, UAA incorporation hurts the expression as well. It was unlikely we would be able to get sufficient expression for tmFRET.

      However, the Maclean group has previously tested this using FRET between concatenated subunits and determined that FRET is stronger within than between subunits. We have updated the manuscript to reflect a more thorough discussion of our results in the context of their trimeric assembly.

      The authors conclude that the relatively small amount of FRET between the cytoplasmic termini suggests that the interaction previously modeled in Rosetta is unlikely. Is it possible that the proposed structure is correct, but labile? For example, could it be that the FRET signal is the time average of a state in which the termini directly interact (as in the Rosetta model) and one in which they do not?

      The proposed RoseVa model does not include the reentrant loop of the channel, so it is probable that this model would change if it were redone to include this new feature of the channel.

      However, we do discuss the limitation of FRET as a method that measures a time average that is weighted towards closest approach in our discussion section. The termini are most certainly dynamic and it is possible that spend some time in close proximity. Given that FRET is biased towards closest approach, we actually think this strengthens our argument that the termini don’t spend a great deal of time in complex. In addition, our MST data suggests that the termini do not bind. We have added some commentary on this to the discussion section for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The authors use previously characterised FRET methods to measure distances between intracellular segments of ASIC and with the membrane. The distances are measured across different conditions and at multiple positions in a very complete study. The picture that emerges is that the N- and C-termini do not associate.

      Strengths:

      Good controls, good range of measurements, advanced, well-chosen and carefully performed FRET measurements. The paper is a technical triumph. Particularly, given the weak fluorescence of ANAP, the extent of measurements and the combination with TETAC is noteworthy.

      The distance measurements are largely coherent and favour the interpretation that the N and C terminus are not close together as previously claimed.

      Weaknesses:

      One difficulty is that we do not have a positive control for what binding of something to either N- or Cterminus would look like (either in FRET or otherwise).

      We acknowledge that this is a challenge for the approach. Having a positive control for binding would be great but we are not sure such a thing exists. You could certainly imagine a complex between two domains where each label (ANAP and TETAC) are pointed away from one other (giving comparatively modest quenching) or one where they are very close (giving comparatively large quenching), both of which could still be bound. This is essentially a less significant version of the problem with using FPs to measure proximity…they are not very good proxies for the position of the termini. These small labels are certainly beVer proxies but still not perfect. Our conclusion here is based more on the totality of the data. We tried many combinations and saw no sign of distances closer than ~ 20A at resting pH. We think the simplest explanation is that they are not close to one another but we tried to lay out the limitations in the discussion.

      One limitation that is not mentioned is the unroofing. The concept of interaction with intracellular domains is being examined. But the authors use unroofing to measure the positions, fully disrupting the cytoplasm. Thus it is not excluded that the unroofing disrupts that interaction. This should be mentioned as a possible (if unlikely) limitation.

      Thank you for your comment. We discuss unroofing as a potential limitation because it exposes both sides of the plasma membrane to changes in pH. We have updated this section to include acknowledgement of the possibility that unroofing disrupts the interaction via washout of other critical proteins.

      Reviewer #3 (Public Review):

      Summary: The manuscript by Cullinan et al., uses ANAP-tmFRET to test the hypothesis that the NTD and CTD form a complex at rest and to probe these domains for acid-induced conformational changes. They find convincing evidence that the NTD and CTD do not have a propensity to form a complex. They also report these domains are parallel to the membrane and that the NTD moves towards, and the CTD away, from the membrane upon acidification.

      Strengths:

      The major strength of the paper is the use of tmFRET, which excels at measuring short distances and is insensitive to orientation effects. The donor-acceptor pairs here are also great choices as they are minimally disruptive to the structure being studied.

      Furthermore, they conduct these measurements over several positions with the N and C tails, both between the tails and to the membrane. Finally, to support their main point, MST is conducted to measure the association of recombinant N and C peptides, finding no evidence of association or complex formation.

      Weaknesses:

      While tmFRET is a strength, using ANAP as a donor requires the cells to be unroofed to eliminate background signal. This causes two problems. First, it removes any possible low affinity interacting proteins such as actinin (PMID 19028690). Second, the pH changes now occur to both 'extracellular' and 'intracellular' lipid planes. Thus, it is unclear if any conformational changes in the N and CTDs arise from desensitization of the receptor or protonation of specific amino acids in the N or CTDs or even protonation of certain phospholipid groups such as in phosphatidylserine. The authors do comment that prolonged extracellular acidification leads to intracellular acidification as well. But the concerns over disruption by unroofing/washing and relevance of the changes remain.

      We acknowledge that unroofing is a limitation of our approach and noted it in the discussion. However, we have updated the section to include the possibility that the act of unroofing and washing could also disrupt the potential interaction between the intracellular domains as well as between these domains and other intracellular proteins. This was the best approach we could use to address our questions and it required that we unroof the cells. However, we look forward to future studies or new techniques that do not require the unroofing of the cells.

      The distances calculated depend on the R0 between donor and acceptor. In turn, this depends on the donor's emission spectrum and quantum yield. The spectrum and yield of ANAP is very sensitive to local environment. It is a useful fluorophore for patch fluorometry for precisely this reason, and gating-induced conformational changes in the CTD have been reported just from changes in ANAP emission alone (PMID 29425514). Therefore, using a single R0 value for all positions (and both pHs at a single position) is inappropriate. The authors should either include this caveat and give some estimate of how big an impact changes spectrum and yield might have, or actually measure the emission spectra at all positions tested.

      This is a reasonable concern and one we considered. Measuring the quantum yield would be quite difficult. However, we have measured spectra at a number of positions and see a relatively minimal shik in the peak. Most positions peak between 481 and 484nm. If you calculate the difference in R0 using theoretical spectra with a blue shik of 20nm, the difference in R0 is only ~1.5A. A shik of 20nm is on the higher side of anything we have seen in the literature (PMID 30038260) and since even with that large a shik, the difference is minimal we do not think measuring spectra for each position would impact the overall conclusions presented. As you noted, though, the quantum yield also changes. Assuming a change in yield from 0.22 to 0.47, the largest we found reported in the literature (PMID:29923827) , the R0 would increase by 2A. This same paper showed that the blue shiked position was the one with the higher extinction coefficient so these changes would be working in opposition to one another making the difference in R0 even smaller. It is important to note, that while tmFRET is a much more powerful measure of distance than standard FRET, these distances, as you point out, are quite challenging to measure precisely. Our conclusions are based less on the absolute distances and more on the observation that no positions show large quenching and that if there is any change upon acidification, it is in the wrong direction.

      Overall, the writing and presentation of figures could be much improved with specific points mentioned in the recommendations for authors section.

      See below.

      The authors argue that the CTD is largely parallel to the plasma membrane, yet appear to base this conclusion on ANAP to membrane FRET of positions S464 and M505. Two positions is insufficient evidence to support such a claim. Some intermediate positions are needed.

      We do not see in the paper where we suggest that the CTD is parallel. However, your point that we could try and determine if this was the case is correct. However, we aVempted to create several other CTD TAG mutants but struggled with readthrough and poor expression of these mutants so we opted to just include S464 and M505. Our point from these data is only that the distal CTD (505) must spend significant time near the membrane to explain our FRET data.

      Upon acidification, NTD position Q14 moves towards the plasma membrane (Figure 8B). Q14 also gets closer to C515 or doesn't change relative to 505 (Figures 7C and B) upon acidification. Yet position 505 moves away from the membrane (Figure 8D). How can the NTD move closer to the membrane, and to the CTD but yet the CTD move further from the membrane? Some comment or clarification is needed.

      This is a reasonable question and one that is hard to definitively answer. Our goal here was to test the hypothesis that the termini are bound at rest. Mapping the precise positions of the termini is difficult for reasons we will enumerate in the question that asks why we didn’t make a model. There are potentially multiple explanations but the easiest one would be that the CTD could move away from the membrane but closer to Q14, for instance, if the distal termini, say, rotated towards the NTD. This would move 505 closer and have no impact on whether or not the NTD and CTD moved away or toward the membrane.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns

      The authors show the spectrum of ANAP attached to beads and use this spectrum to calculate R0 for their FRET measurements. Peak ANAP fluorescence is dependent on local environment and many reports show ANAP in protein blue-shiked relative to the values reported here. How would this affect the distance measurements reported?

      This is an important point. See above for the answer.

      Could the lack of interaction between the N- and C-terminal peptides in Figure 7 arise from the cysteine to serine mutations or lack of structure in the synthetic peptides. How were peptide concentrations measured/verified for the experiment?

      It is possible that cysteine to serine mutations could prevent the interaction. It is also possible that these peptides are not capable of adopting their native fold without the presence of the plasma membrane or due to being synthetically created. However, the termini are thought to be largely unstructured. We received these peptides in lyophilized form at >95% purity and resuspended to our desired stock concentration (3 mM C-terminus, 1 mM N-terminus). Even if our concentration was off, we see no signs of interaction up to quite a high concentration.

      How was photobleaching measured for correcting the data?

      We executed several mock experiments at various TAG positions using either pH 8 and pH 6, where we performed the experiments as usual but with a mock solution exchange when we would normally add the metal. We normalized the L-ANAP fluorescence to the first image and averaged together these values for pH 8 and pH 6. We then corrected using Equation 2 in the manuscript..

      We have updated the methods to include how we adjusted for bleaching.

      The authors may wish to make it more explicit that their Zn2+ controls also preclude the possibility that a changing FRET signal between ANAP and citrine may affect their data.

      Thank you for this comment. We agree, it would strengthen the manuscript to include this statement. We have now included this.

      It might be useful to the reader if the authors could include (as a supplement) plots of their data (like in Figure 6), in which FRET efficiency has been converted to distance.

      We considered this idea as well but felt like showing the actual data in the figures and the distances in a table would be best.

      Figure 5D is mentioned in the text before any other figures. This is unconventional. Could this panel be moved to Figure 1 or the mention moved to later?

      Changed

      western blot is not capitalized.

      Changed.

      Figure 1, the ANAP structure shown is the methyl ester, which is presumably cleaved before ANAP is conjugated to the tRNA. The authors may wish to replace this with the free acid structure.

      This is a fair point. We originally used the methyl ester structure to indicate the version of ANAP we chose to use. However, you are correct that the methyl ester is cleaved before conjugation to the tRNA. We replaced the methyl ester with the free acid structure to clarify this.

      Figures 1 and 4 should have scale bars for the images.

      Scale bars have been added to figures 1, 4, and 5.

      In Figure 3, the letters in the structures (particularly TETAC) are way too small. Please increase the font size.

      Changed

      In Figure 3 and Figure 3 supplement 1, the axes are labeled "Absorbance (M-1cm-1)." Absorbance is dimensionless. The authors are likely reporting the extinction coefficient.

      Thank you for catching this. We adjusted the axes to extinction coefficient.

      In Figures 5 B and C, it might be clearer if the headers read "Initial, +Cu2+/TETAC, DTT" rather than "Initial, FRET, Recovery."

      Changed

      The panel labels for Figure 8 seem to be out of order.

      Changed

      The L for L-ANAP should be rendered, by convention, in small caps.

      This is a good example of learning something new from the review process. This is the first I have ever heard of small caps. We can find no other papers that use small caps for L-ANAP so I am not 100% sure what convention this is referring to and don’t want to change the wrong thing in the paper. We are happy to change if the editorial staff at eLife agree but have lek this for now.

      Reviewer #2 (Recommendations For The Authors):

      With so many distances measured, why was not even a basic structural model attempted?

      We certainly considered it, but a number of things lead us to conclude that it might imply more certainty about the structure of these termini than we hope to give. 1) Given that the FRET is a time average of positions, these distance constraints would not do much constraining. 2) Given that the termini are likely unstructured and flexible this makes the problem in 1 worse. 3) There is no structural information to use as a starting point for a model. 4) The flexibility of the linkers for each FRET pair also introduces uncertainty. This can, in theory, be modeled as they do in EPR but all of this together made us decide not to do this. What we hope readers take home, is the overall picture of the data is not consistent with the original RIPK1 hypothesis.

      Maybe it would be good to draw a band on the graphs in Figure 6 for the FRET signal expected for interaction (and thus, disfavoured by these data). This would at least give context.

      We agree this could be helpful, but it is not so easy to do. What distance would we choose? We could put a line at ~5Å (the model predicted distance). As we noted above, a number of distances could be compatible with an interaction. However, we think it’s unlikely that if a complex was formed that none of our measurements would show a distance closer than 20Å at rest and that an unbinding event would then lead to a decrease in distance. This, to us, is the take home message.

      Minor points:

      "Aker unroofing the cells, only fluorescence associated with the "footprint", or dorsal surface, of the cell membrane is lek behind."

      The authors use dorsal and ventral in this section to describe parts of an adherent cell. But in the first instance, they remove the dorsal part of the cell, and then in this phrase, the dorsal part is lek behind....I am a bit confused.

      Thank you for pointing out this mistake, we have fixed this. It is indeed the ventral surface lek behind.

      "bind at rest an" - and?

      Changed

      "One previous study used a different approach to try and map the topography of the intracellular termini of ASIC1a comparable to our memFRET experiments." I think a citation is due.

      Citation added

      "great deal of precedent" even if this result is from my own lab, I would prefer that the authors note that it's one study from one lab! I think best just to delete "great deal of".

      “Great deal of” deleted

      I think the column "Significance" in the tables is unnecessary when the P value is given.

      Thank you for this suggestion. We agree and have made the change.

      Figure 7a Q14TAG has a clearly bimodal distribution at pH 8. What could be the meaning of this result? The authors do not mention it that I could find. Perhaps there is no meaning. The authors should state what they think is (or is not) going on.

      This is a good question and we don’t have a good answer. It appears to be experimental variability. The data from the “low fret” in this experimental condition all came from the same days. So something was different that day. We considered that they might be outliers to exclude but thought showing all of our data was the beVer path. We reperformed the ANOVA here separating out the “outlier” day and nothing of substance changed. Both populations were still different with P value less than 0.001.

      Typo: Lumencore

      Changed

      Maybe just a matter of taste but the panel created with Biorender in Figure 8 is not attractive and depicts the channel differently to in Figure 5D, which is again different from Figure 1A. Surely one advantage of using computer-generated artwork could be to have consistency.

      We agree and have used the same cartoon for all of our images with the one exception being the schematics that are just meant to show the positions that are present in each bar graph.

      Figure 4A was squashed to fit (text aspect ratio is wrong).

      Fixed

      Reviewer #3 (Recommendations For The Authors):

      Citrine is used to report incorporation. Yet citrine has a strong tendency to dimerize (PMID 27240257). Did the authors use mCitrine or just Citrine? This is quite important in interpreting their data.

      Thank you for pointing out this important distinction. We use mCitirine which we have added to the methods.

      The manuscript has numerous instances of imprecise language. For example, page 10, last para, first line, "previous studies have looked at..." or page 7, final paragraph "tell a similar story". Related, the figures could be much better. For example, in Figure 1, where the authors depict the anap chemical in red, as opposed to the blue one might expect of a blue emiqng fluorophore. In figure 6, ANAP is also in red with the quenching group in green. This is opposite to how one typically thinks of FRET with the warmer color being the acceptor not the donor. Moreover, the pH 6 condition is also colored the same shade of red as the ANAP. Labels of Cys positions would again be useful here. In Figure 3, the heteroatoms of TETAC and C18-NTA are very small and difficult to see. It would also be good to label these structures, and the spectra below, so the reader can tell at a glance without looking at the caption, what the structures and spectra arise from. Also, how are the absorption spectra normalized? This is not discussed in the methods. The lack of attention to presentation mars an otherwise nice study.

      Thank you for these points. We have made modifications to the manuscript to address these comments.

      Abstract, second last line "Aker prolonged acidification, ...", 'prolonged' could be interpreted as 'it takes a while for the domain to move' or 'the movement only happens aker a while'. This not what the authors intend to convey. Consider modifying to just 'Aker acidification,'

      We updated the main text to indicate that prolonged acidification is intended to describe acidification that occurs over the minutes timescale.

      Pdf page 6, bottom para on Anap incorporation not altering channel function: What is meant by 'steady state pH dependence of activation'? This implies the authors applied a pH stimulus, then waited until equilibrium was achieved ie. until desensitization was complete and measured the current at that point. It seems more likely they simply applied different pH stimuli and measured the peak response and that the use of 'steady state' here is a typo.

      We removed the phrase steady state.

      Same section, controls of electrophysiology allude to 485, 505 and 515 ANAP-containing channels. In fact, the authors have no way of determining what fraction (if any) of the pH evoked currents arise from channels containing Anap in those positions versus from simply having a translation stop but still functioning. This should be mentioned.

      This is correct. We cannot be sure the CTD TAG positions are not a mixture of ANAP-containing channels and truncations. See above for why we do not think this a big concern for the FRET experiments. Functionally, though, you are correct that we cannot tell. We now mention this in the paper.

      Methods, the abbreviation for SBT should be defined somewhere.

      Added.

      Methods, unroofing section, middle paragraph, the authors use nM not nm to list wavelengths of light.

      Changed.

      Figure 3C-D: There's an unexpected blip in the Anap emission spectra at ~500 nm. Are the grating efficiency of the spectrograph and quantum efficiency of the camera accounted for in these spectra?

      This is a good question. The data are not corrected for either camera efficiency or grating efficiency. We don’t have easy access to the actual data (although we can see a pdf version of each). There is a liVle blip in the grating efficiency graph that could partly explain the blip in our spectra.

      Figure 5C, were recovery experiments routinely done? If so, would be good to show more than n = 1 in the plot to get an idea of reproducibility.

      Recovery experiments were done in every experiment but are not shown for simplicity. We have included all FRET and recovery data for position Q14TAG-C469 at pH 6 in figure 5C to show reproducibility of our FRET and recovery data.

      Table 1, considering adding a Δ distance column (pH 8 versus 6) so the magnitude of changes are more easily seen.

      This is a reasonable suggestion but we decided not to include a Δ distance column. The data are whole numbers and people can easily determine the Δ distance. We felt that including that column would bring too much focus on what we think are preVy small changes. Our hope is that readers take away that the data are not consistent with complex formation between the determine and focus less on absolute distances.

      Figure 7A, Q14tag pH 8 condition has a quite a bit of spread and, likely, two populations. These data, as well as G11, are unlikely to be parametric and hence ANOVA is inappropriate. A normality test, and likely Kruskal-Wallis test is called for.

      Aker testing for normality, the data for Q14TAG C485 pH8 are non-normally distributed. However, a Kruskal Wallis is a non-parametric test for a one-way ANOVA and not applicable here. We separated the data out into population 1 and 2 and repeated the two-way ANOVA statistical test. When Q14TAG pH 8 is split into 2 populations, the statistics hardly change. When the data is not separated, Q14TAG pH 8 relative to pH 6 has a p-value <0.0001. When the 2 populations are separated, both populations relative to Q14TAG pH 6 still have a p-value of <0.0001.

    1. Author Response

      eLife assessment

      This paper by Aitchison and colleagues describes nanobody neutralizing and binding activity against various SARS-CoV-2 variants of concern. The findings are important in that the described nanobodies may have broad therapeutic relevance against current and future variants of concern and may be able to avoid significant resistance. The claims are incomplete: while the study is well-executed and uses a nice balance of biochemical and cellular assays, the efficacy of the proposed nanobody library against VOCs is not completely supported as IC50 values appear to increase against newer variants and are higher than previously used therapeutic bNAbs, animal data showing in vivo efficacy is lacking, and protection against future possible variants is not proven.

      This manuscript is a follow-up of our previous eLife manuscript “Highly synergistic combinations of nanobodies that target SARS-CoV-2 and are resistant to escape” https://elifesciences.org/articles/73027 where we described an “impressive collection of hundreds of new nanobodies binding SARS-CoV-2 spike by combining in vivo antibody affinity maturation and proteomics. [Editor’s evaluation]”. As a follow-up this submission extends the findings of our previous eLife publication and thus focuses on how our repertoire functions in the context of a rapidly evolving SARS-CoV-2 virus, relying on the established methodologies and approaches of the original paper. We explore how nanobody functions have been influenced by the emergence of SARS-CoV-2 variants containing extensive mutations in spike protein, which largely reduced the usefulness of therapeutic monoclonal antibody therapeutics. Our findings show that while some nanobodies lost efficacy in binding to and neutralizing these evolved spikes, a surprising number of nanobodies retained their binding and neutralization activity. This is an important finding, because these efficacious nanobodies target regions that appear rarely targetable by monoclonal antibodies. We also provide experimental validation of the importance of the interplay between binding and neutralization in synergy experiments, where even weakened binding still contributed to strongly enhancing the neutralization.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ketaren, Mast, Fridy et al. assessed the ability of a previously generated llama nanobody library (Mast, Fridy et al. 2021) to bind and neutralize SARS-CoV-2 delta and omicron variants. The authors identified multiple nanobodies that retain neutralizing and/or binding capacity against delta, BA.1 and BA.4/5. Nanobody epitope mapping on spike proteins using structural modeling revealed possible mechanisms of immune evasion by viral variants as well as mechanisms of cross-variant neutralization by nanobodies. The authors additionally identified two nanobody pairs involving non-neutralizing nanobodies that exhibited synergy in neutralization against the delta variant. These results enabled the refinement of target epitopes of the nanobody repertoire and the discovery of several pan-variant nanobodies for further preclinical development.

      Strengths:

      Overall, this study is well executed and provides a valuable framework for assessing the impact of emerging SARS-CoV-2 variants on nanobodies using a combination of in vitro biochemical and cellular assays as well as computational approaches. There are interesting insights generated from the epitope mapping analyses, which offer possible explanations for how delta and omicron variants escape nanobody responses, as well as how some nanobodies exhibit cross-variant neutralization capacity. These analyses laid out a clear path forward for optimizing these promising next-gen therapeutics, particularly in the face of rapidly emerging SARS-CoV-2 variants. This work will be of interest to researchers in the fields of antibody/nanobody engineering, SARS-CoV-2 therapeutics, and host-virus interaction.

      Weaknesses:

      A main weakness of the study is that the efficacy statement is not thoroughly supported. While the authors comprehensively characterized the neutralizing ability of nanobodies in vitro, there is no animal data involving mice or hamsters to demonstrate the real protective efficacy in vivo. Yet, in the title and throughout the manuscript, the authors repeatedly used phrases like "retains efficacy" or "remains efficacious" to describe the nanobodies' neutralization or binding capacities.

      This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages. For example, the authors showed that S1-RBD-15 had a ~100-fold reduction in neutralization titer against Omicron, with an IC50 at around 1 uM. This is much higher than the IC50 value of a typical anti-ancestral RBD nanobody reported in the previous study (Mast, Fridy et al. 2021). In fact, the authors themselves ascribe nanobodies with an IC50 above 1 uM as weak neutralizers. And there were many in the range of 0.1-1 uM.

      Furthermore, many nanobodies selected for affinity measurement against BA.4/5 had no detectable binding.

      Without providing in vivo protection data or including monoclonal antibodies that are known to be efficacious against variants in the in vitro assays as a benchmark, it is difficult to evaluate the efficacy just with the IC50 values.

      We respectfully disagree with the reviewer on several aspects of this critique.

      As to our use of the word efficacy - the quality of being successful in producing an intended result; effectiveness - we were specific to nanobody binding and in vitro neutralization of the variant spike proteins tested in the manuscript. Indeed, our manuscript made no claim of efficacy outside of this intended meaning. However, to prevent misinterpretation we will modify the final paragraph of our introduction to state explicitly that the nanobody repertoire retains efficacy in binding and neutralizing variants of spike. The final paragraph of the Introduction will include the following:

      “Here, we demonstrate that a subset of our previously published repertoire of nanobodies, generated against spike from the ancestral SARS-CoV-2 virus (Mast, Fridy et al. 2021), retains binding and in vitro neutralization efficacy against circulating variants of concern (VoC), including omicron BA.4/BA.5.”

      We agree that in vivo neutralization data would be an important complement to the in vitro binding and neutralization data. Experiments along these lines are ongoing, but are not considered part of a follow-up to our original paper where in vivo data were not included.

      We disagree with the Reviewer that “This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages.” As we specifically state: “In comparison, groups I, I/II, I/IV, V, VII, VIII and the anti-S2 nanobodies contained the majority of omicron BA.1 neutralizers, though here the neutralization potency of many nanobodies was decreased compared to wild-type. This decrease in neutralization potency largely correlates with the accumulation of omicron BA.1 specific mutations throughout the RBD, which likely alters the epitope-binding site of these nanobodies, weakening their interaction with BA.1 spike (Fig. 1B). (emphasis added)”

      Naturally, we expected that some of our nanobodies would lose the ability to bind BA.4/BA.5. This enabled us to determine which areas on spike remained susceptible to our nanobodies. We show that 10/29 nanobodies tested retained binding to BA.4/5. We did not test our entire repertoire, just a subset was selected for. We stated the following:

      “Of the nanobodies that neutralized both delta and omicron BA.1, representatives from each of the nanobody epitope groups were selected for SPR analysis, where S1 binders with mapped epitopes that neutralized one or both variants well, were prioritized.”

      Reviewer #2 (Public Review):

      Summary:

      Interest in using nanobodies for therapeutic interventions in infectious diseases is growing due to their ability to bind hidden or cryptic epitopes that are inaccessible to conventional immunoglobulins. In the present study, the authors were posed (sic) to characterize nanobodies derived from the library produced earlier with the Wuhan strain of SARS-CoV-2, map their epitopes on SARS-CoV-2 spike protein, and demonstrate that some nanobodies retain binding and even neutralization against antigenically distant Variants of Concern (VOCs) that are currently circulating.

      Strengths:

      The authors demonstrate that some nanobodies - despite being obtained against the ancestral virus strain - retain high affinity binding to antigenically distant SARS-CoV-2 strains. This is despite the majority of the repertoire losing binding. Although limited to only two nanobody combinations, the demonstration of synergy in virus neutralization between nanobodies targeting different epitopes is compelling.

      We thank the Reviewer for this positive summary of the strengths of our study. In our previous work, we applied stringent criteria for the down-selection of nanobodies based on their affinity and diversity, as elaborated on in https://elifesciences.org/articles/73027. The current dataset is a further judiciously curated subset, featuring 41 nanobodies chosen to represent and inform on the 10 structurally mapped epitope groups that we initially identified. This subset is but the tip of an iceberg. For each nanobody demonstrating high-affinity binding and neutralization, we possess multiple sequence variants, offering alternative avenues for investigation. Moreover, our repertoire has since been further elaborated by use of a yeast display library (Cross et al., 2023 JBC) providing additional nanobodies capable of targeting the same epitopes. Our findings presented here, thus serve as a heuristic, enabling us to distill the much larger repertoire into manageable and informative clusters of data. We will modify our manuscript to be more explicit of these facts.

      Weaknesses:

      The authors imply that nanobodies that retain binding/neutralization of early Omicron sublineages will be active against currently circulating and future virus strains. Unfortunately, no reasoning for such a conclusion nor data supporting this prediction are provided.

      The nanobodies we propose to retain binding to current and emerging omicron sublineages at the time (Fig. 4) are those that still bind to omicron BA.1, BA.4/5. The structures of XBB and BQ.1 are not divergent enough from these aforementioned omicron sublineages in the regions we propose our nanobodies retain binding (Fig. 4) to result in loss of binding. Thus, we hypothesize that the epitopes where these nanobodies bind or are predicted to bind (outlined in black (Fig. 4)), represent regions on spike vulnerable to nanobody intervention. Importantly, we also now have further experimental data to support our predictions that these nanobodies in Fig. 4 will retain binding (see plot in Author response image 1). We will provide additional data and complements to key figures to help illustrate this in the revised manuscript.

      Author response image 1.

    1. Author Response

      In this paper, we examine the behavioral context that generates foraging decisions at the boundaries of food patches in the nematode C. elegans. By analyzing animal locomotion at high spatial and temporal resolution, we identify discrete behavioral responses to encountering the edge of a food patch that can be understood as a decision: either to remain inside the food patch or to leave it. We find that the decision to leave a food patch is associated with increased behavioral arousal that unfolds on long and short timescales. The coupling of increased arousal to lawn leaving decisions is preserved across genetic, neuronal, and environmental manipulations that alter global arousal levels. However, genetic inactivation of a set of chemosensory neurons disrupts the coupling of arousal and lawn leaving, revealing a potential site of integration between internal signals and external sensation that governs foraging.

      We appreciate the reviewers’ thoughtful engagement with this work. In addition to modifications in the text to address minor concerns and ambiguities, we have conducted new analyses and made text and figure edits to strengthen or explain our conclusions. We have also investigated possible confounding explanations to our interpretation of the data.

      In newly added analysis, we show that increased arousal does not result in increased proximity to the lawn boundary, which would be a trivial reason why roaming animals leave more than dwelling ones (new Figure 2-Supplement 1E).

      We also addressed the concern that classifying the brief speed acceleration motif as a roaming state would inflate the apparent coupling of roaming to leaving. By measuring the duration of roaming states prior to leaving, we in fact found the opposite: roaming states that precede leaving are slightly longer than other roaming states, not short acceleration events (new Figure 2-Supplement 4).

      The reviewers also asked reasonable questions about variability between batches of experiments. In particular, reviewers pointed out high levels of roaming in wild type controls accompanying npr-1 mutants. Indeed, the simultaneously-tested wild type animals roamed more than usual in this experiment (Fig. 4C,K) and less than usual in other panels (Fig. 4A,B,I,J) in these small datasets. There is more to do here, but the results support the general point that roaming and leaving are correlated in several neuromodulatory mutants that regulate roaming. We have included a new sentence in the Figure 4 legend to draw the reader’s attention to the potential limitations of these results, and to explicitly state that results should not be compared across panels. Similarly, there is more to be done to understand tax-4, as we did not test all tax-4-expressing sensory neurons for their effects on roaming and leaving.

      In private comments, reviewers also asked about experimental design and statistics and were concerned that certain assays conducted on just a few days may not represent independent experiments. We have updated the Methods section to improve the description of the behavioral experiments, including more information about the behavioral chambers and imaging conditions. We note that for all experiments we tested all relevant genotypes in the same batches and days, enabling comparisons of experimental animals with matched controls conducted at the same time.

      Reviewers asked us to compare our results to those generated by Rhoades, et al. (2019) and Cermak, et al. (2020). To the best of our knowledge, our results are fully consistent with those studies. The study by Rhoades and co-authors is primarily concerned with behavioral slowing upon first encountering a food patch, and thus does not include data regarding roaming or lawn leaving (Rhoades et al., 2019). As we mention in the text, we were initially surprised that tph-1 did not eliminate regulation of roaming by feeding, but there are straightforward explanations (redundant transmitters, other neurons). tph-1 did have a significant, albeit small, effect. The study by Cermak and co-authors presents an alternative Hidden Markov Model that uses whole animal postures to segment on-food behavior into 9 states including 8 dwelling states and a single roaming state (Cermak et al., 2020); we refer to this analysis in the discussion. Cermak’s paper and ours differ in experimental conditions, the behaviors measured, and the models used to analyze them. The animals in the Cermak paper are exposed to a large bacterial lawn of uniform density, whereas animals in our study are recorded on small bacterial lawns with thick edges. The analysis tools also differ in their use of animal posture (Cermak only) and autoregressive dynamics (our work only). Further studies of the neurons and molecules involved may help to fully harmonize these models.

      References

      Cermak, N., Yu, S.K., Clark, R., Huang, Y.C., Baskoylu, S.N., and Flavell, S.W. (2020). Whole-organism behavioral profiling reveals a role for dopamine in statedependent motor program coupling in C. Elegans. Elife 9, 1–34.

      Rhoades, J.L., Nelson, J.C., Nwabudike, I., Yu, S.K., McLachlan, I.G., Madan, G.K., Abebe, E., Powers, J.R., Colón-Ramos, D.A., and Flavell, S.W. (2019). ASICs Mediate Food Responses in an Enteric Serotonergic Neuron that Controls Foraging Behaviors. Cell 176, 85-97.e14.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      We now make clear throughout the manuscript that our proposition, holding the fast cassette as central to control over powerful movements governed by the PMn, remains a hypothesis. However, we provide additional rationale for our thinking that this is the case based on functional distinctions between the PMns and SMns. Both reviewers 1 and 2 also questioned why so few synaptic and ion channel genes are seen for the SMn type. As pointed out by the reviewer, the idea that small differences in birthdates between Mn types seems like an unlikely explanation and was removed. Now, we better develop the idea that the low levels of expression of both ion channel and synaptic genes in SMns are consistent with the finding from electrophysiology that point to greatly lowered levels of transmitter release, compared to PMns. Additionally, for the purpose of identifying all synaptic and ion channel genes shared equally between Mn types, we re-examined the transcriptome. Figure 7A & B now reflect all genes in these two categories detected above threshold in PMn and SMn types, and not just examples.

      Reviewer 2

      We have added cell types in mammalian circuits shown to express the ion channel cassette members. Examples include the calyx of Held in the auditory circuit and the cerebellar Purkinje neurons. As we show with zebrafish PMn these mammalian neurons form fast, reliable circuits. In these cases, it is noteworthy that our proposal is the first to link all three as functional partners in fast AP firing and high-fidelity synaptic transmission. The suggestion that pancreatic cells would be represented in our data is deemed highly unlikely as our technique separated out the spinal cords prior to dissociation. Finally, as suggested, we added the disclaimer that we can not exclude the possibility that clusters sharing both glia and neuronal markers may represent cell doublets. Other minor corrections were all made.

      Reviewer 3

      First, we agree that the role of PMns is not restricted to escape behavior. They have been shown to participate in the highest speed of swimming as well. We have made this clear throughout the paper.

      Second, we are at odds with this reviewer over the Type I and Type II V2a recruitment during high speed swimming. We agree that both V2a types of interneurons are involved in high speed swimming and likely escape, as both directly innervate the PMns, as pointed out by the reviewer in Figure 2c of Menelaou and McLean 2019. However, the reviewer interprets Figure 2c to show that Type I, not Type II, V2a is more highly recruited over the range of higher swimming speeds whereas we conclude just the opposite. These data, along with other papers we cited, have been firmed up in the text to support a central role played by Type II.

      Third, the reviewer recommends we remove Figures 6b and 6c relating to our two newly discovered SMn markers, fox1b and alcamb. Our data shown in Figure 6a shows that these markers label SMn somas in two distinct layers along the dorsal-ventral axis in the spinal cord. The reviewer objects to Figures 6b and 6c which compare the location of our two markers to the distributions of two well studied SMn labeling transgenic lines, islet:GFP and gata2:GFP. The correspondence is not absolute but suggests that the fox1b labels islet SMns and alcamb labels the gata2 SMns. In the previous version of the paper, we suggested that this correspondence might further signal different dorsal-ventral projections. This suggestion was based solely on reports that islet and gata2 transgenic lines preferentially label SMns with different projections. We do not view this particular point as important and in light of the controversy surrounding these projections, as noted by the reviewer, we removed all reference to the subject of muscle target areas. We focus instead, on our finding of two new markers that label different dorsal ventral soma layers which MAY correspond to previously described SMn types. This reasoning is made clear in the manuscript and, because of its potential importance, we elected to retain Figures 6b and 6c as a call for future testing.

      The reviewer makes other suggestions that were all incorporated. The CoLo estimates indeed were too high, as questioned by the reviewer, because, early on, we inadvertently counted two clusters rather than the single cluster that was later authenticated. This has been corrected to reflect 1.1% in Table 1. The evx1 and evx2 data have been added to Figure 4C. Nomenclature is corrected for KA neurons. We make clear that the axonal projections for CoLo were made with mCherry expression not the in-situ label. The Hayashi reference was added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors)

      MAJOR CONCERNS

      1) Not addressed, but perhaps relevant, is that most of the postembryonic fish growth results from stem cells located in the ciliary marginal zone that make new neurons and Muller glia throughout the fish's life. Thus, Muller cell heterogeneity may result from the central to the peripheral gradient of Muller glial cell maturation.

      1a. Müller glial cell heterogeneity needs to be confirmed using, for example, in situ hybridization studies with gene-specific probes identified in the scRNAseq that distinguish these 2 populations. An additional approach could be the use of transgenic lines harboring tagged endogenous or transgene that reflects the promoter activity of the Muller glia subtypespecific gene.

      We thank the reviewer for the insightful comments and agree on the importance to substantiate the Müller glia heterogeneity in our manuscript. Our study is not the only study that provides evidence for Müller glia heterogeneity. In particular, we would like to refer to a recent publication (Krylov et al., 2023). Using single cell RNA sequencing, Krylov et al. detect Müller glia heterogeneity in the uninjured retina, as well as upon selective, genetic ablation of distinct subtypes of photoreceptors (e.g. long and short wavelength sensitive cones, as well as rods). They observe six distinct clusters of quiescent Müller glia that show differential spatial distribution along the dorsal/ventral retinal axis. For instance, they report a ventral quiescent Müller glia population that shares some marker genes (aldh1a3, rdh10a, smoc1) with our nonreactive Müller glia 2 (cluster 2, supplementary files 1 and 2). Moreover, the authors report that Müller glia located at different positions along the dorsal/ventral axis exhibit distinct patterns of pcna upregulation as well as subsequent re-activation upon photoreceptor ablation. We have added the supportive information from Krylov et al. in the discussion section (lines: 781-789) of our manuscript.

      2) Most interesting, but also least substantiated, is the authors' report of 2 different quiescent Muller glial cell populations in the uninjured retina and 2 different reactive Muller cell populations in the injured retina. If these populations exist independently of each other, it would be important to investigate if they differentially impacted retina regeneration.

      2a. CRISPR knockdown in F0 of factors thought to be involved in specific Müller glia-derived progenitor trajectories would be important to lend some functional significance to the data.

      We fully agree with the reviewer that addition of functional data would enrich the manuscript with valuable information. However, we don´t believe that the suggested CRISPR knockdown of selected genes in F0 animals (also known as crispants) represents a suitable approach. Crispants have been used successfully to investigate genetic contributions in embryonic-tolarval stages (the first few days) of zebrafish development, as injection of multiple gRNAs targeting the same gene is sufficient to achieve a bi-allelic knockout of the gene of up to 90% (Kroll et al., 2021). However, unless both alleles of the target gene(s) is/are mutated already early on with nearly 100%, it is unlikely that the gRNA inactivation would work equally well during subsequent development into adult stages (several months later, and after exponential growth and volume increase of the animal). Even if biallelic inactivation in the crispants does work early on, it remains unclear whether and how crispants survive to adulthood, which will be necessary in order to address gene function in the context of retina regeneration. Moreover, since we observe that the genetic events during adult retina regeneration are highly similar to the events during retina development, we would rather expect the crispants already display developmental phenotypes, which would further hamper the study of potential regenerationspecific phenotypes in adult animals. We are convinced that only ‘clean’ conditional gene inactivation studies will be suitable to address the impact of Müller glia and derived progenitor trajectories on retina regeneration. In this respect, we have recently developed the new conditional Cre-Controlled CRISPR mutagenesis system (Hans et al., Nature Comm 2021). We are currently establishing stable lines to enable controlled and specific gene inactivation, but have only obtained preliminary results so far; the final analysis will take much more time and is, therefore, beyond the scope of this work.

      3) The discussion should be modified to relate the data here presented with those described in Hoang et al., 2020.

      We followed the suggestions of the reviewer and compared our single cell RNA sequencing dataset to that described in Hoang et al., 2020. As one might expect, the comparison between the two datasets showed similarities but also significant differences due to the different experimental set-ups. We show the results of this comparison in additional main (new Figure 9) and supplementary figures (new Figure 9-figure supplement 1). In order to compare our newly obtained scRNAseq dataset of MG and MG-lineage-derived cells of the regenerating zebrafish retina to the previously published dataset of light-lesioned retina (Hoang et al., 2020), we employed the ingestion method (Scanpy, https://scanpy-tutorials.readthedocs.io/en/latest/ integrating-data-using-ingest.html) and mapped the clusters identified by Hoang and colleagues to our clusters (new Figure 9). While we applied a short-term lineage tracing strategy and only sequenced the enriched population of FAC-sorted MG and MG-derived cells of the regenerating zebrafish retina, Hoang and colleagues sequenced all retinal cells in the light-lesioned retina. Consequently, comparison between the two datasets uncovered similarities, but also significant differences, due to the different experimental set-ups (Figure 9A). Consistently, the cluster annotated as resting MG in Hoang et al. mapped to clusters annotated as non-reactive MG 1 and 2 in our dataset (new Figure 9B). The cluster annotated as activated MG in Hoang et al. mapped to clusters annotated as reactive MG 1 and 2, as well as to the cluster with hybrid identity of MG/progenitors in our dataset. Interestingly, some cells annotated as activated MG in Hoang et al. mapped also to neurogenic progenitor 2 and 3 clusters in our dataset (Figure 9B). The cluster annotated as progenitors in Hoang et al. mapped to the progenitor area in our dataset, which included neurogenic progenitors 2, 3 as well as photoreceptor and horizontal cell precursors (new Figure 9B). Finally, retinal ganglion cells, cones, GABAergic amacrine cells and bipolar cells annotated in Hoang et al. perfectly mapped to retinal ganglion cells, cone, amacrine and bipolar cells in our dataset (new Figure 9B). While we did not detect a mature horizontal cell cluster, Hoang and colleagues annotated a horizontal cell cluster, which cells mapped to reactive MG 2, MG/progenitors 1 and part of progenitors 3 in our dataset (new Figure 9B). Moreover, Hoang and colleagues annotated rod photoreceptors that mapped to progenitors 3, photoreceptor precursors, red and blue cones, horizontal cell precursors and bipolar cells in our dataset (new Figure 9B). Finally, due to the different cell isolation protocol, Hoang and colleagues annotated additional cell clusters that did not map to any cluster in our more selective dataset, and included oligodendrocytes, pericytes, retinal pigmented epithelial cells as well as vascular/endothelial cells (new Figure 9B). Next, we selected representative marker genes per cluster from our scRNAseq dataset and checked their expression in the dataset by Hoang and colleagues (Figure 9-figure supplement 1). The dot plot showing the expression of selected gene candidates per cluster further corroborated the large overlap between clusters annotated in the present study with those annotated in the study by Hoang and colleagues. These novel comparisons to the data of Hoang et al. are now included in the resubmitted version, and are described and discussed in an additional paragraph in the results (lines: 482-517) as well as discussion (lines: 766-807) sections.

      MINOR CONCERNS

      1) Fig 1C is difficult to interpret. I am also confused by the color coding which is not presented in the figure legend - why 3 shades of red and two of blue? Please define each (for example, what's the difference between red, purple, and light red in the 6dpl panel?). What are the white areas outlined by blue and red circles/cells (looks like a topography plot)? It appears that there is a fairly large amount of pcna:EGFP expression in the uninjured retina - what are these cells?

      We have replaced Figure 1C with a better one and rephrased/extended the explanation of the figure in the results (lines: 192-195). Figure 1C depicts contour plots, which represent the relative frequency of data. Each contour line encloses an equal percentage of events (that is, cells), and contour lines that are closely packed indicate a high concentration of events. In flow cytometry, contour plots are used to represent highly frequent events, as this kind of plots are independent on sample size.

      Concerning the observed pcna:EGFP expressing cells in the uninjured retina, we interpret them as proliferating cells coming from the ciliary marginal zone and from Müller glia of the central retina, which represent progenitors and Müller glia that have re-entered the cell cycle to generate rod progenitors, respectively. Consistent with that, we observe pcna:EGFPpositive cells in the ciliary marginal zone as well as central retina using immunofluorescence, as shown in Figure 1-figure supplement 1.

      2) Results, lines 186-188 are not presented clearly: EGFP+ cells may persist for some time after they leave the cell cycle, so stating EGFP+ cells are proliferating may not be correct. How long does PCNA promoter activity and EGFP expression remain after Muller cells exit the cell cycle? mCherry+/EGFP- cells may be non-reactive Muller glia or reactive Muller glia that have not entered the cell cycle. It seems likely that Muller glia start reprogramming before undergoing cell division.

      We agree with the reviewer that EGFP persists for some time after the cells have left the cell cycle, which we actually describe and use to benefit in our study. We do not know for how long exactly the pcna promoter is active within the cell cycle, but EGFP is known to have a half-life of approximately 24 hours (Li et al., 1998). Even though we cannot make a statement about EGFP persistence in Müller glia, we note that previous reports (Lahne et al., 2015; Nagashima et al., 2013; Nelson et al., 2013; Thummel et al., 2008) and our study (Figure 3-figure supplement 2) show PCNA at the protein level in Müller glia cells between 24 and 48 hpl, including our sampled 44 hpl time point (lines: 69-73). We also agree with the reviewer that Müller glia will become reactive to the injury most likely prior (lines: 67-69) to activation of the pcna promoter, meaning that Müller glia are EGFP-negative at this time point due to the immature status of EGFP after translation. However, we are confident that our data also comprises this cell state (early phase of Müller glia activation) because we sampled proliferating (EGFP- and mCherry-double positive cells) as well as non-proliferating Müller glia (mCherry-only positive cells) at all time points (lines: 213-215 and Figure 1C). We interpret that the early phase of Müller glia activation corresponds to Müller glia transitioning from a nonreactive to a reactive state. With respect to our UMAP, we map this cell state in cluster 1 localizing to the top left part of the cluster, abutting cluster 3, the reactive Müller glia 1 (Figure 2B).

      3) I am concerned by the observation that microglia were identified by scRNAseq as a contaminating cell population. Since FACS was based on gfap:mCherry expression, why did microglia end up in the mix? Also, what are the ‘...low-quality cells expressing many ribosomal transcripts...’ and why, if they are low-quality cells, did they pass the sequencing quality control as stated on lines 208-209?

      The reviewer is right that microglia should actually not end up in the sample when using the gfap:mCherry line. However, microglia always displayed a certain level of autofluorescence in our experimental set-up (possibly because they may have ingested some cell debris), which may have contributed to their presence in the FACS samples. In contrast to the reviewer, we were not concerned about this ‘contamination’, because the microglia could be easily identified and sorted out using bioinformatics. This is supported by the presented supplementary figure in which microglia separate from the core of clusters containing Müller glia and Müller gliaderived cells in the UMAP of the full dataset (Figure 2-figure supplement 1).

      We also acknowledge that ‘low quality cells’ is not an appropriate term for cells in the cluster expressing ribosomal mRNAs at high levels, as ribosomal enrichment actually does not give any information concerning their quality. We referred to them as ‘low quality’ because the enrichment in ribosomal transcripts masks their identity considerably. To correct this, we now renamed cells in this cluster descriptively as ‘ribosomal gene-enriched’ cells (Figure 2-figure supplement 1, line: 226).

      4) Fig. 2: please list in the text or fig legend the specific genes used to identify each cell cycle state. Why is cluster 3 considered a reactive Muller population when expressing S phase markers and PCNA? These features seem to distinguish cluster 3 from 4 and may suggest cluster 3 is a progenitor population. Further explanation is necessary to understand the assignments.

      Information about the specific genes used to identify each cell cycle state is provided in the paragraph “Bioinformatic analysis” (lines: 925-934) in the Materials and Methods section. We considered listing all the markers in either the results or the figure legends as well, but decided against it, as it impairs readability in our opinion. Nevertheless, we have now highlighted also in the results (line: 261) that the list of cell cycle markers is available in the Materials and Methods section.

      We understand the reviewer´s point that cluster 3 represents progenitors and not Müller glia, when PCNA expression is considered as a sole marker of progenitors or of Müller glia reprogrammed to a progenitor state (Hoang et al., 2020). However, we disagree with this view for three reasons. First, although PCNA is used as a marker of Müller glia reprogrammed to a progenitor state and of progenitors in Hoang et al., 2020, it should be noted that PCNA-positive, Müller glia cells are present in the central retina already in uninjured conditions, when regeneration-associated, Müller glia-derived progenitors are not detectable. Second, cluster 3 is evident only at 44 hpl, a time point at which Müller glia cells are about to divide or have undergone their first and only cell division. In this regard, we would like to refer to the discussion about Müller glia and Müller glia-derived progenitors as distinct populations in Lenkowski and Raymond, 2014. Third, we have performed in situ hybridization for starmaker (stm), a marker gene highly specific for cells in cluster 3 (supplementary files 1 and 3), combined with immunohistochemistry for GFAP and PCNA. The results of the staining are depicted in a new Figure 3-figure supplement 2. In strong agreement with our sequencing results, we observe stm expression only at 44 hpl, whereas no signal is detected in the uninjured as well as 4 and 6 dpl retina (Figure 3- figure supplement 2). Virtually all stm-positive cells at 44 hpl are also PCNA- and GFAP-double positive cells displaying a clear Müller glia morphology (Figure 3- figure supplement 2). Hence, we interpret cells in cluster 3 as reactive Müller glia, indicating that pcna can be used as a marker of progenitors, but not exclusively of progenitors, prevalently at later stages. At 44 hpl, Müller glia express pcna in order to undergo asymmetric cell division giving rise to the renewed Müller glia and the multipotent progenitor that will continue dividing.

      5) I am confused by the crlf1a scRNAseq data indicating it is associated with proliferating PCNA+ reactive Muller glia Cluster 3 and PCNA- reactive Muller glia Cluster4 at 44 hpl (Fig. 3), yet in Fig. 4 crlf1a in situ signal is exclusively associated with proliferating Muller glia at 44 hpl. Why don't we observe the crlf1a+/PCNA- cell population?

      We highlight that crlf1a expression is actually detected also at 4 dpl (Fig. 3). We also note that immunofluorescence in Fig 3. shows crlf1a mRNA and PCNA protein, whereas single cell RNA sequencing detects crlf1a and pcna transcripts. In this context, it is possible that crlf1a-, PCNAdouble positive cells detected at 4 dpl are still positive for the PCNA protein, but no longer express the pcna transcript. Double in situ hybridization for pcna and crlf1a would be needed to fully address whether crlf1a-positive cells are still pcna-positive at 4 dpl. It is also possible that crlf1a-, GFAP-double positive, PCNA-negative Müller glia are fewer and only masked in the crowd of crlf1a-, PCNA-double positive, GFAP-negative progenitors at 4 dpl (Raymond et al., 2006). We amended the discussion section with this information (lines: 634-654).

      6) scRNAseq cluster 3 is a proliferating population that is assigned "reactive Muller glia", whereas cluster 5 is assigned Muller glia/progenitor and in the Discussion referred to as MG about to go or already underwent asymmetric division to generate a progenitor (lines 568-571). I don't understand why cluster 3 is not referred to as the one harboring reactive MG/progenitors that underwent or are undergoing asymmetric cell division - The timing is right, as are the markers.

      We would like to refer the reviewer to the discussion in point 4, including the changes we introduced in the Materials and Methods (Lines 925-934). As mentioned above, we do not agree that PCNA alone represents an exclusive marker of progenitors, but is rather a marker of cells undergoing proliferation. Moreover, we note that Müller glia first and only division occurs between 31 and 48 hpl. Finally, as mentioned above, expression of stm is a unique marker for cluster 3, which is only evident at 44 hpl, but not of cluster 5, which is evident at 4 dpl.

      It seems cluster 5 might better fit the amplifying progenitor stage where some MG markers are retained but diluted by cell division. Please clarify the reasoning behind the labeling of this cluster. It is not clear why this cluster has to contain self-renewed Muller glia - why wouldn't these Muller cells partition to quiescent MG clusters 1 and 2 or reactive Muller glia in clusters 3 and 4?

      We partially agree with the reviewer that cluster 5 might better fit the amplifying progenitor state, and this is why we indicate this cluster as a “crossroad in the trajectory” in the discussion (lines: 613-631). However, we cannot entirely exclude that cells in cluster 5 contain selfrenewed Müller glia (differential gene expression analysis highlights glial markers too, see Figure 3A, supplementary file 6). Cells that we interpret as self-renewing Müller glia do not partition back to quiescent Müller glia (cluster 1 and 2) because they are on the way to be quiescent Müller glia again, yet they did not reach that point, maybe due to sampling reasons. Unfortunately, our short-term lineage tracing strategy ceases at 6 dpl. We also speculate in the discussion (lines: 679-682) that if we had sampled at later time points (e.g. at 14 dpl), we might have been able to detect the density of the cells in the glial area moving back to clusters 1 or 2 (cell density plots, Figure 2B).

      I also have trouble understanding cluster 4's assignment. The Discussion states it represents cells at the crossroad of glial and neurogenic trajectory containing self-renewed Muller glia as well as first-born MG-derived progenitors. However, it is populated by cells after 44 hpl (Fig. 2B) which is when reactive Muller glia are detected and lacks proliferative markers.

      We think that there is a misunderstanding here. We never refer to cluster 4 as a crossroad in the glial and neurogenic trajectory. We state that cluster 5 is actually the crossroad between the two trajectories (line 629). We further propose that self-renewed MG close the cycle via late reactive MG (cluster 4) and return into non-reactive Müller glia (clusters 1 and 2, red, dashed line in Figure 10) (now described in lines 631-633). The cell density plots support the direction of the cycle closing towards non-reactive Müller glia, in particular at 4 and 6 dpl (Figure 2B).

      Might cluster 4 represent a population of reactive MG remaining at 4 dpl that never entered the cell cycle and therefore would be devoid of Muller glia-derived progenitors?

      As stated in the manuscript, we actually think that marker expression as well as the cell density plots support our assignment of cluster 4 to represent self-renewed Müller glia closing the cycle to non-reactive Müller glia. Our assignment also fits well with the expected events following asymmetric cell division. However, as we cannot rule out the reviewer´s entire idea, we included the suggestion in the updated discussion (lines 651-654).

      7) Results, lines 163-164; Please provide a reference for "..... consistent with the previously described....."

      We thank the reviewer for this observation and we added the appropriate references (Fimbel et al., 2007; Lenkowski and Raymond, 2014; Thummel et al., 2008) in the updated version of the manuscript (lines: 171-172).

      Reviewer #2 (Recommendations For The Authors):

      Overall, this very thorough study provides interesting and unexpected results. The published data set will be useful for many subsequent studies. I have only a few remarks that the authors may consider discussing. Their cluster analysis revealed most of the expected cell clusters with some interesting surprises. One relates to photoreceptors where the authors describe well-separated clusters for red and green cones, while rods, UV and blue cones do not form clusters. For rods, this is discussed, but I miss a brief discussion on the "missing" cone subtypes.

      We thank the reviewer for the insightful comments. It is correct that we indeed detect only red and blue cones, as indicated by their expression of red-sensitive opsin gene (opn1lw2) and the blue-sensitive opsin gene (opn1sw2), respectively. It is possible that missing cone subtypes are born later than 6 dpl. As the reviewer suggested, we amended the discussion and added information about the missing cone subtypes (lines: 724-726).

      I am also intrigued by the two, quite separated amacrine cell clusters, while bipolar cells cluster in one cluster, without separation in (say) ON and OFF bipolar cells. This may also merit a discussion. What are their ideas on the small and quite separated amacrine cell cluster (cluster 14).

      Bipolar cells in cluster 15 are very sparse in our dataset, with only 40 cells in total. Hence, the sample size might be too small to be separated into ON and OFF subtypes. Alternatively, cells might be still immature, as we use 6 dpl as our latest sampled time point. Concerning cells in cluster 14, we think they are starburst amacrine cells, as indicated by their simultaneous expression of gad1b and chata (Figure 8-figure supplement 2B), which is a characteristic feature of starburst amacrine cells in mouse (O´Malley et al., 1992). We added this observation in the discussion (lines: 706-712).

    1. Author Response

      The following is the authors’ response to the original reviews.

      The Authors wish to thank the Reviewers for their detailed and insightful comments. By properly addressing these critiques, we sincerely believe our finished product will be substantially improved and provide greater insight to the academic community.

      Both Reviewers noted the importance of identifying the limitations of our study with particular emphasis on embedded implant heating due to switching gradient coils. Understanding the limitations of any model and/or simulation process is critical when adopting its use, especially when estimating the safety of embedded devices. For this reason, we have included the following text and corresponding references in our Discussion section:

      While the workflow presented herein establishes a validated approach to estimate RF heating due to the presence of a passive implant within a human subject undergoing an MR procedure, certain limitations and proper use stipulations of this methodology should be identified. These include:

      1) The approach of embedding a given passive implant must be carefully considered and supervised by an orthopaedic subject matter expert, preferably an orthopaedic surgeon. While the procedures described above focus on insertion and registration of an implant to make it numerically suitable for simulation, relevant anatomic and physiological considerations must also be addressed to ensure a physically realistic and appropriate result. This will enable a proper simulated fit and no empty spaces or unintended tissue deformations.

      2) Temperature changes presented are due only to RF energy deposition. The results do not take into account the impact of low-frequency induction heating of metallic implants naturally caused by the switching gradient coils. Important work on this subject matter has recently been reported in [21],[22],[23],[24],[25],[26],[27]. Unless an orthopaedic implant has a loop path, heating due to gradient fields is typically less than heating due to RF energy deposition. The present testbed would be applicable to the induction heating of implants (and the expected temperature rise of nearby tissues), after switching from Ansys HFSS (the full wave electromagnetic FEM solver) to Ansys Maxwell (the eddy current FEM solver). Two examples of this kind have already been considered in [25],[45].

      3) The procedures presented in this work have been based on the response of a single human model of advanced age and high morbidity.

      4) Finally, validation was achieved using available published data [42]-[44] and relies upon the legitimacy and veracity of that data. Coil geometry, power settings, and other relevant parameters were taken explicitly from these sources and modeled to enable a faithful comparison.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Heitmann et al introduce a novel method for predicting the potential of drug candidates to cause Torsades de Pointes using simulations. Despite the fact that a multitude of such methods have been proposed in the past decade, this approach manages to provide novelty in a way that is potentially paradigm-shifting. The figures are beautiful and manage to convey difficult concepts intuitively.

      Strengths:

      (1) Novel combination of detailed mechanistic simulations with rigorous statistical modeling

      (2) A method for predicting drug safety that can be used during drug development (3) A clear explication of difficult concepts.

      Weaknesses:

      (1) In this reviewer's opinion, the most important scientific issue that can be addressed is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here. If the Hill coefficients were to be significantly different, the concentration- dependent curves shown in Figure 6 could look very different.

      See our response below.

      (2) The curved lines shown in Figure 6 can initially be difficult to comprehend, especially when all the previous presentations emphasized linearity. But a further issue is obscured in these plots, which is the fact that they show a two-dimensional projection of a 4dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. It is unclear, and unaddressed in the manuscript, how differences in the "hidden channels" will influence the shapes of these curves. An example, or at least some verbal description, could be very helpful.

      See our response below.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is generally well-written (with one important exception, see below). The manuscript can be improved with a few suggested modifications, ordered from most important to least important.

      (1) In this reviewer's opinion, the most important scientific issue that the authors need to address is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here.

      In a recent study (Varshneya et al, CPT PSP 2021 (PMID: 33205613)) they originally ran simulations with Hill coefficients of 1 for all the 4 drugs and 7 channels, then re-ran the simulations with differing Hill coefficients. The results were quantitatively quite different than what was originally obtained, even though the overall trends were identical. A look at the table provided in that paper's supplement shows that the estimated Hill coefficients range from 0.5 to 1.9, which is a pretty wide range.

      In this case, I don't think the authors should re-run the entire analysis. That would require entirely too much work and potentially detract from the elegant presentation of the manuscript in its current form. Although I haven't looked at the Llopis-Lorente dataset recently, I doubt that reliable Hill coefficients have been obtained for all 105 drugs. However, the Crumb et al dataset (PMID: 27060526) does provide this information for 30 drugs.

      Perhaps the authors could choose an example of two drugs that affect similar channels but with differences in the estimated Hill coefficients. Or even a carefully-designed hypothetical example could be of value. At the very least, Hill coefficients need to be mentioned as a limitation, but this would be stronger if it were coupled with at least some novel analyses.

      We fixed the Hill coefficients to h=1 because there is no evidence for co-operative drug binding in the literature that would require coefficients other than one. There is also the practical matter that only 17 of the 109 drugs in the dataset have a complete set of Hill coefficients. We have revised the Methods (Drug datasets) to make these justifications explicit:

      Lines 560-566: “… We also fixed the Hill coefficients at h = 1 because (i) there is no evidence for co-operative drug binding in the literature, and thus no theoretical justification for using coefficients other than one; (ii) only 17 of the 109 drugs in the dataset had a complete set of Hill coefficients (hCaL, hKr, hNaL, hKs) anyway. …”

      Out of interest, we re-ran our analysis using only those n=17 drugs (Amiodarone, Amitriptyline, Bepridil, Chlorpromazine, Diltiazem, Dofetilide, Flecainide, Mibefradil, Moxifloxacin, Nilotinib, Ondansetron, Quinidine, Quinine, Ranolazine, Saquinavir, Terfenadine and Verapamil). When the Hill coefficients were fixed at h=1, the prediction accuracy was 88.2% irrespective of the dosage (Author response image 1). When we used the estimated (free) Hill coefficients, the prediction accuracy remained unchanged (88.2%) for all doses except the lowest (1x to 2x) where it dropped to 82.4%. We concluded that using the Hill coefficients from the dataset made little difference to the results.

      Author response image 1.

      (2) I initially had a hard time understanding the curved lines shown in Figure 6 when all the previous presentations emphasized linearity. After thinking for a while, I was able to get it, but there was a further issue that I still struggle with. That is the fact that the plots all show a two-dimensional projection of a 4-dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. How will differences in the "hidden channels" influence the shapes of these curves? An example, or at least some verbal description, could be very helpful.

      We omitted GKs and GNaL from Figure 6 because they added little to the story. Those “hidden” channels operate in the same manner as GKr and GNaL. They are shown in Supplementary Dataset S1. We have included more explicit references to the Supplementary in both the main text and the caption of Figure 6. We have also rewritten the section on ‘The effect of dosage on multi-channel block’ (lines 249-268) to better convey that the drug acts in four dimensions.

      (3) I also struggled a bit with Figure 3 and the section "Drug risk metric." What made this confusing was the PQR notation on the figure and the equations represented as A and B. Can these be presented in a common notation, or can the relationship be defined?

      We have replaced the PQR notation in Figure 3A with vector notation A and B to be consistent with the equations.

      Also in Figure 3B, I was unclear about the units on the x-axis. Is each step (e.g. from 0 to 1) the same distance as a single log unit along the abscissa or ordinate in Figure 3A?

      Yes it is. We have revised the caption for Figure 3B to explain it better.

      (4) The manuscript manages to explain difficult concepts clearly, and it is generally wellwritten. The important exception, however, is that the manuscript contains far too many sentence fragments. These often occur when the authors explain a difficult concept, then follow up with something that is essentially "and this in addition" or "with the exception of this."

      Lines 220-223: "In comparison, Linezolid is an antibacterial agent that has no clinical evidence of Torsades (Class 4) even though it too blocks IKr. Albeit less than it blocks ICaL (Figure 5A, right)."

      Lines 242-245: "Conversely, Linezolid shifts the population 1.18 units away from the ectopic regime. So only 0.0095% of those who received Linezolid would be susceptible. A substantial drop from the baseline rate of 0.93%."

      There are several others that I didn't note, so the authors should perform a careful copy edit of the entire manuscript.

      Thank you. We have remediated the fragmented sentences throughout.

      Reviewer #2 (Public Review):

      Summary:

      In the paper from Hartman, Vandenberg, and Hill entitled "assessing drug safety, by identifying the access of arrhythmia and cardio, myocytes, electro physiology", the authors, define a new metric, the axis of arrhythmia" that essentially describes the parameter space of ion channel conductance combinations, where early after depolarization can be observed.

      Strengths:

      There is an elegance to the way the authors have communicated the scoring system. The method is potentially useful because of its simplicity, accessibility, and ease of use. I do think it adds to the field for this reason - a number of existing methods are overly complex and unwieldy and not necessarily better than the simple parameter regime scan presented here.

      Weaknesses:

      The method described in the manuscript suffers from a number of weaknesses that plague current screening methods. Included in these are the data quality and selection used to inform the drug-blocking profile. It's well known that drug measurements vary widely, depending on the measurement conditions.

      We agree and have added a new section to describe these limitations, as follows:

      Lines 467-478: Limitations. The method was evaluated using a dataset of drugs that were drawn from multiple sources and diverse experimental conditions (LlopisLorente et al., 2020). It is known that such measurements differ prominently between laboratories and recording platforms (Kramer et al., 2020). Some drugs in the dataset combined measurements from disparate experiments while others had missing values. Of all the drugs in the dataset, only 17 had a complete set of IC50 values for ICaL, IKr, INaL and IKs. The accuracy of the predictions are therefore limited by the quality of the drug potency measurements.

      There doesn't seem to be any consideration of pacing frequency, which is an important consideration for arrhythmia triggers, resulting from repolarization abnormalities, but also depolarization abnormalities.

      It is true that we did not consider the effect of pacing frequency. We have included this in the limitations:

      Lines 479-485: The accuracy of the axis of arrhythmia is likewise limited by the quality of the biophysical model from which it is derived. The present study only investigated one particular variant of the ORd model (O’Hara et al., 2011; KroghMadsen et al., 2017) paced at 1 Hz. Other models and pacing rates are likely to produce differing estimates of the axis.

      Extremely high doses of drugs are used to assess the population risk. But does the method yield important information when realistic drug concentrations are used?

      Yes it does. The drugs were assessed across a range of doses from 1x to 32x therapeutic dose (Figure 8A). The prediction accuracy at low doses is 88.1%.

      In the discussion, the comparison to conventional approaches suggests that the presented method isn't necessarily better than conventional methods.

      The comparison is not just about accuracy. Our method achieves the same results at greatly reduced computational cost without loss of biophysical interpretation. We emphasise this in the Conclusion:

      Lines 446-465: Conclusion. Our approach resolves the debate between model complexity and biophysical realism by combining both approaches into the same enterprise. Complex biophysical models were used to identify the relationship between ion channels and torsadogenic risk — as it is best understood by theory. Those findings were then reduced to a simpler linear model that can be applied to novel drugs without recapitulating the complex computer simulations. The reduced model retains a bio-physical description of multi-channel drug block, but only as far as necessary to predict the likelihood of early after-depolarizations. It does not reproduce the action potential itself. Our approach thus represents a convergence of biophysical and simple models which retains the essential biophysics while discarding the unnecessary details. We believe the benefits of this approach will accelerate the adoption of computational assays in safety pharmacology and ultimately reduce the burden of animal testing.

      In conclusion, I have struggled to grasp the exceptional novelty of the new metric as presented, especially when considering that the badly needed future state must include a component of precision medicine.

      Safety pharmacology has a different aim to precision medicine. The former concerns the population whereas the latter concerns the individual. The novelty of our metric lies in reducing the complexity of multi-channel drug effects to a linear model that retains a biophysical interpretation.

      Reviewer #2 (Recommendations For The Authors):

      A large majority of drugs have more complex effects than a simple reduction and channel conductance. Some of these are included in the 109 drugs shown in Figure 7. An example is ranolazine, which is well known to have potent late sodium channel blocking effects - how are such effects included in the model as presented? I think at least suggesting how the approach can be expanded for broader applicability would be important to discuss.

      Our method does consider the simultaneous effect of the drug on multiple ion channels, specifically the L-type calcium current (ICaL), the delayed rectifier potassium currents (IKr and IKs), and the late sodium current (INaL). In the case of ranolazine (class 3 risk), the dose-responses for all four ion channels, based on IC50s published in Llopis-Lorente et al. are given in Supplementary Dataset S1.

      The response curves in Author response image 2 show that in this dataset, ranolazine blocks IKr and INaL almost equally - being only slightly less potent against IKr. There are two issues to consider here that potentially contribute to ranolazine being misclassified as pro-arrhythmic. First, the cell model is more sensitive to block of IKr than INaL. As a result, in the context of an equipotent drug, the prolonging effect of IKr block outweighs the balancing effect of INaL block, resulting in a pro-arrhythmic risk score. Second, the potency of IKr block in this dataset may be overestimated which in turn exaggerates the risk score. For example, measurements of ranolazine block of IKr from our own laboratory (Windley et al J Pharmacol Toxicol 87, 99–107, 2017) suggest that the IC50 of IKr is higher (35700 nM) than that reported in the LlopisLorente dataset (12000 nM). If this were taken into account, there would be less block of IKr relative to INaL, resulting in a safer risk score.

      Author response image 2.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Comments on the original submission:

      Trypanosoma brucei undergoes antigenic variation to evade the mammalian host's immune response. To achieve this, T. brucei regularly expresses different VSGs as its major surface antigen. VSG expression sites are exclusively subtelomeric, and VSG transcription by RNA polymerase I is strictly monoallelic. It has been shown that T. brucei RAP1, a telomeric protein, and the phosphoinositol pathway are essential for VSG monoallelic expression. In previous studies, Cestari et al. (ref. 24) has shown that PIP5pase interacts with RAP1 and that RAP1 binds PI(3,4,5)P3. RNAseq and ChIPseq analyses have been performed previously in PIP5pase conditional knockout cells, too (ref. 24). In the current study, Touray et al. did similar analyses except that catalytic dead PIP5pase mutant was used and the DNA and PI(3,4,5)P3 binding activities of RAP1 fragments were examined. Specifically, the authors examined the transcriptome profile and did RAP1 ChIPseq in PIP5pase catalytic dead mutant. The authors also expressed several C-terminal His6-tagged RAP1 recombinant proteins (full-length, aa1300, aa301-560, and aa 561-855). These fragments' DNA binding activities were examined by EMSA analysis and their phosphoinositides binding activities were examined by affinity pulldown of biotin-conjugated phosphoinositides. As a result, the authors confirmed that VSG silencing (both BES-linked and MES-linked VSGs) depends on PIP5pase catalytic activity, but the overall knowledge improvement is incremental. The most convincing data come from the phosphoinositide binding assay as it clearly shows that N-terminus of RAP1 binds PI(3,4,5)P3 but not PI(4,5)P2, although this is only assayed in vitro, while the in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al (ref. 24) already. Considering that many phosphoinositides exert their regulatory role by modulate the subcellular localization of their bound proteins, it is reasonable to hypothesize that binding to PI(3,4,5)P3 can remove RAP1 from the chromatin. However, no convincing data have been shown to support the author's hypothesis that this regulation is through an "allosteric switch".

      Comments on revised manuscript:

      In this revised manuscript, Touray et al. have responded to reviewers' comments with some revisions satisfactorily. However, the authors still haven't addressed some key scientific rigor issues, which are listed below:

      1) It is critical to clearly state whether the observations are made for the endogenous WT protein or the tagged protein. It is good that the authors currently clearly indicate the results observed in vivo are for the RAP1-HA protein. However, this is not as clearly stated for in vitro EMSA analyses. In addition, in discussion, the authors simply assumed that the c-terminally tagged RAP1 behaves the same as WT RAP1 and all conclusions were made about WT RAP1.

      There are two choices here. The authors can validate that RAP1-HA still retains RAP1's essential function as a sole allele in T. brucei cells (as was recommended previously). Indeed, HA-tagged RAP1 has been studied before, but it is the N-terminally HA-tagged RAP1 that has been shown to retain its essential functions. Adding the HA tag to the C-terminus of RAP1 may well cause certain defects to RAP1. For example, N-terminally HA-tagged TERT does not complement the telomere shortening phenotype in TERT null T. brucei cells, while C-terminally GFP-tagged TERT does, indicating that HA-TERT is not fully functional while TERT-GFP likely has its essential functions (Dreesen, RU thesis). Although RAP1-HA behaves similar to WT RAP1 in many ways, it is still not fully validated that this protein retains essential functions of RAP1. By the way, it has been published that cells lacking one allele of RAP1 behave as WT cells for cell growth and VSG silencing (Yang et al. 2009, Cell; Afrin et al. 2020, mSphere). In addition, although RAP1 may interact with TRF weakly, the interaction is direct, as shown in yeast 2-hybrid analysis in (Yang et al. 2009, Cell).

      Alternatively, if the authors do not wish to validate the functionality of RAP1-HA, they need to add one paragraph at the beginning of the discussion to clearly state that RAP1-HA may not behave exactly as WT RAP1. This is important for readers to better interpret the results. In addition, the authors need to tune down the current conclusions dramatically, as all described observations are made on RAP1-HA but not the WT RAP1.

      The results with RAP1-HA are consistent with previous knowledge of RAP1 interactions with telomeric proteins and DNA. Hence, the C-terminal HA-tagged RAP1 seems, by all measures, functional. Nevertheless, to make it clear for the reader, we added a note in the discussion, lines 244-246: “Although we showed that C-terminal HA-tagged RAP1 protein has telomeric localization (Cestari et al. 2015, PNAS) and interactions with other telomeric proteins (Cestari et al. 2019 Mol Cell Biol); we cannot rule out potential differences between HA-tagged and non tagged RAP1.”

      For a similar reason, the current EMSA results truly reflect how C-terminally His6-tagged RAP1 and RAP1 fragments behave. If the authors choose not to remove the His6 tag, it is essential that they use "RAP1-His6" to refer to these recombinant proteins. It is also essential for the authors to clearly state in the discussion that the tagged RAP1 fragments bind DNA, but the current data do not reveal whether WT RAP1 binds DNA. In addition, the authors incorrectly stated that "disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166). In ref 29, deletion of Myb domain did not abolish RAP1-telomere association. However, point mutations in MybL domain that abolish RAP1's DNA binding activities clearly disrupted RAP1's association with the telomere chromatin. Therefore, the current observation is not completely consistent with that published in ref 29.

      We stated in line 149-150 “…we expressed and purified from E. coli recombinant 6xHistagged T. brucei RAP1 (rRAP1)”. To clarify to the authors, we replaced rRAP1 with rRAP1-His throughout the manuscript and figures. As for the statement that “disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166).”. We removed the statement from the manuscript.

      2) There is no evidence, in vitro or in vivo, that binding PI(3,4,5)P3 to RAP1 causes conformational change in RAP1. The BRCT domain of RAP1 is known for its ability to homodimerize (Afrin et al. 2020, mSphere). It is therefore possible that binding of PI(3,4,5)P3 to RAP1 simply disrupts its homodimerization function. The authors clearly have extrapolated their conclusions based on available data. It is therefore important to revise the discussion and make appropriate statements.

      We did not state that PI(3,4,5)P3 causes RAP1 conformational changes. We discussed the possibility. We stated in lines 199-201: “PI(3,4,5)P3 inhibition of RAP1-DNA binding might be due to its association with RAP1 N-terminus causing conformational changes that affect Myb and MybL domains association with DNA.” This is a reasonable discussion, given the data presented in the manuscript.

      Reviewer #2 (Public Review):

      In this manuscript, Touray et al investigate the mechanisms by which PIP5Pase and RAP1 control VSG expression in T. brucei and demonstrate an important role for this enzyme in a signalling pathway that likely plays a role in antigenic variation in T. brucei. While these data do not definitively show a role for this pathway in antigenic variation, the data are critical for establishing this pathway as a potential way the parasite could control antigenic variation and thus represent a fundamental discovery.

      The methods used in the study are generally well-controlled. The authors provide evidence that RAP1 binds to PI(3,4,5)P3 through its N-terminus and that this binding regulates RAP1 binding to VSG expression sites, which in turn regulates VSG silencing. Overall their results support the conclusions made in the manuscript. Readers should take into consideration that the epitope tags on RAP1 could alter its function, however.

      There are a few small caveats that are worth noting. First, the analysis of VSG derepression and switching in Figure 1 relies on a genome which does not contain minichromosomal (MC) VSG sequences. This means that MC VSGs could theoretically be mis-assigned as coming from another genomic location in the absence of an MC reference. As the origin of the VSGs in these clones isn't a major point in the paper, I do not think this is a major concern, but I would not over-interpret the particular details of switching outcomes in these experiments.

      We agree with the reviewer and thus made no speculations on minichromosomes. The data analysis must rely on the available genome, and the reference genome used is well-assembled with PacBio sequences and Hi-C data (Muller et al. 2018, Nature).

      Another aspect of this work that is perhaps important, but not discussed much by the authors, is the fact that signalling is extremely poorly understood in T. brucei. In Figure 1B, the RNA-seq data show many genes upregulated after expression of the Mut PIP5Pase (not just VSGs). The authors rightly avoid claiming that this pathway is exclusive to VSGs, but I wonder if these data could provide insight into the other biological processes that might be controlled by this signaling pathway in T. brucei.

      We published that the inositol phosphate pathway also plays a role in T. brucei development (Cestari et al. 2018, Mol Biol Cell; reviewed by Cestari I 2020, PLOS Pathogens)

      Overall, this is an excellent study which represents an important step forward in understanding how antigenic variation is controlled in T. brucei. The possibility that this process could be controlled via a signalling pathway has been speculated for a long time, and this study provides the first mechanistic evidence for that possibility.

      Reviewer #1 (Recommendations For The Authors):

      Please see the public review for recommendations.1. It is critical to clearly state whether the observations are made for the endogenous WT protein or the tagged protein. It is good that the authors currently clearly indicate the results observed in vivo are for the RAP1-HA protein. However, this is not as clearly stated for in vitro EMSA analyses. In addition, in discussion, the authors simply assumed that the c-terminally tagged RAP1 behaves the same as WT RAP1 and all conclusions were made about WT RAP1.

      There are two choices here. The authors can validate that RAP1-HA still retains RAP1's essential function as a sole allele in T. brucei cells (as was recommended previously). Indeed, HA-tagged RAP1 has been studied before, but it is the N-terminally HA-tagged RAP1 that has been shown to retain its essential functions. Adding the HA tag to the C-terminus of RAP1 may well cause certain defects to RAP1. For example, N-terminally HA-tagged TERT does not complement the telomere shortening phenotype in TERT null T. brucei cells, while C-terminally GFP-tagged TERT does, indicating that HA-TERT is not fully functional while TERT-GFP likely has its essential functions (Dreesen, RU thesis). Although RAP1-HA behaves similar to WT RAP1 in many ways, it is still not fully validated that this protein retains essential functions of RAP1. By the way, it has been published that cells lacking one allele of RAP1 behaves as WT cells for cell growth and VSG silencing (Yang et al. 2009, Cell; Afrin et al. 2020, mSphere). In addition, although RAP1 may interact with TRF weakly, the interaction is direct, as shown in yeast 2-hybrid analysis in (Yang et al. 2009, Cell).

      Alternatively, if the authors do not wish to validate the functionality of RAP1-HA, they need to add one paragraph at the beginning of the discussion to clearly state that RAP1-HA may not behave exactly as WT RAP1. This is important for readers to better interpret the results. In addition, the authors need to tune down the current conclusions dramatically, as all described observations are made on RAP1-HA but not the WT RAP1.

      The results with RAP1-HA are consistent with previous knowledge of RAP1 interactions with telomeric proteins and DNA. Hence, the C-terminal HA-tagged RAP1 seems, by all measures, functional. Nevertheless, to make it clear for the reader, we added a note in the discussion, lines 244-246: “Although we showed that C-terminal HA-tagged RAP1 protein has telomeric localization (Cestari et al. 2015, PNAS) and interactions with other telomeric proteins (Cestari et al. 2019 Mol Cell Biol); we cannot rule out potential differences between HA-tagged and non tagged RAP1.”

      For a similar reason, the current EMSA results truly reflect how C-terminally His6-tagged RAP1 and RAP1 fragments behave. If the authors choose not to remove the His6 tag, it is essential that they use "RAP1-His6" to refer to these recombinant proteins. It is also essential for the authors to clearly state in the discussion that the tagged RAP1 fragments bind DNA, but the current data do not reveal whether WT RAP1 binds DNA. In addition, the authors incorrectly stated that "disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166). In ref 29, deletion of Myb domain did not abolish RAP1-telomere association. However, point mutations in MybL domain that abolish RAP1's DNA binding activities clearly disrupted RAP1's association with the telomere chromatin. Therefore, the current observation is not completely consistent with that published in ref 29.

      We stated in lines 149-150 “…we expressed and purified from E. coli recombinant 6xHistagged T. brucei RAP1 (rRAP1)”. To clarify to the authors, we replaced rRAP1 with rRAP1-His throughout the manuscript text. As for the statement that “disruption of the MybL domain sequence did not eliminate RAP1telomere binding in vivo" (lines 165-166).”. We removed the statement from the manuscript.

      2) There is no evidence, in vitro or in vivo, that binding PI(3,4,5)P3 to RAP1 causes conformational change in RAP1. The BRCT domain of RAP1 is known for its ability to homodimerize (Afrin et al. 2020, mSphere). It is therefore possible that binding of PI(3,4,5)P3 to RAP1 simply disrupts its homodimerization function. The authors clearly have extrapolated their conclusions based on available data. It is therefore important to revise the discussion and make appropriate statements.

      We did not state that PI(3,4,5)P3 causes RAP1 conformational changes. We discussed the possibility. We stated in lines 199-201: “PI(3,4,5)P3 inhibition of RAP1-DNA binding might be due to its association with RAP1 N-terminus causing conformational changes that affect Myb and MybL domains association with DNA.” This is a reasonable discussion, given the data presented in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the valuable and constructive review of our manuscript. The reviewers’ comments have helped us to improve the quality of the paper. Here we provide detailed responses to the reviewers’ comments and discuss the new experiments we performed.

      Reviewer #1

      Summary:

      In this study, the authors generate a Drosophila model to assess disease-linked allelic variants in the UBA5 gene. In humans, variants in UBA5 have been associated with DEE44, characterized by developmental delay, seizures, and encephalopathy. Here, the authors set out to characterize the relationship between 12 disease-linked variants in UBA5 using a variety of assays in their Drosophila Uba5 model. They first show that human UBA5 can substitute all essential functions of the Drosophila Uba5 ortholog, and then assess phenotypes in flies expressing the various disease variants. Using these assays, the authors classify the alleles into mild, intermediate, and severe loss-of-function alleles. Further, the authors establish several important in vitro assays to determine the impacts of the disease alleles on Uba5 stability and function. Together, they find a relatively close correlation between in vivo and in vitro relationships between Uba5 alleles and establish a new Drosophila model to probe the etiology of Uba5-related disorders.

      Strengths:

      Overall, this is a convincing and well-executed study. There is clearly a need to assess disease-associated allelic variants to better understand human disorders, particularly for rare diseases, and this humanized fly model of Uba5 is a powerful system to rapidly evaluate variants and relationships to various phenotypes. The manuscript is well written, and the experiments are appropriately controlled.

      Recommendations For The Authors:

      1) It would seem of value to determine what tissue(s) the essential function of Uba5 resides. The authors nicely detail the expression of Uba5 in a subset of neurons and glia, and indicate it is expressed in a variety of other tissues. Null mutants are embryonic lethal, suggesting an essential function. From the mouse study cited, it appears Uba5 functions early in development in the hematopoietic system. The authors can express their UAS-Uba5 rescue construct using a variety of tissue-specific Gal4 lines to determine whether the essential function of Uba5 is in the nervous system or other tissues, which would be of interest in understanding key functions of Uba5.

      We thank the reviewer for the suggestion. We tried to rescue the lethality of the Uba5 mutants by expressing human UBA5 reference protein in different tissues. We found that ubiquitous expression of UBA5 (da-GAL4 or act-GAL4) successfully rescues the lethality, however, expression of UBA5 in neurons (elav-GAL4), glia (repo-GAL4), or both neurons and glia does not. In addition, expression of UBA5 in fat body (SPARC-GAL4) or muscles (Mef2-GAL4) does not rescue the lethality either. These results suggest that Uba5 is required in multiple tissues in flies. These data are included in the revised manuscript.

      2). Do intermediate Uba5 alleles impact synaptic function or growth? The etiology of the disease is linked with epilepsy and neurodevelopmental disorders, and the interesting parallels the authors note between Uba5 and Para expression indicate perhaps shared roles in neurons that drive firing activity. Together, these lines of evidence may suggest the Uba5 alleles may have possible impacts on synaptic growth, morphology, and/or function. It would be of interest to examine the larval neuromuscular junction and assess NMJ growth, morphology, and perform some basic electrophysiology to determine if there are any functional defects.

      Following the reviewer’s suggestion, we tested the morphology of NMJs in the humanized flies. We did not observe any obvious changes in the number or size of the synaptic boutons caused by the Group II variants. Hence, we conclude that the Uba5 variants do not cause an obvious defect in synaptic growth. The results are included in the Figure S3.

      More generally, can the authors comment on the expression pattern of Uba5? One might consider something like Uba5 to be a "housekeeping" gene and expressed/required in most if not all cell types. From the data presented in Fig. 2, it appears expression is more sparse, perhaps, as the authors point out, because of roles in mature neurons that actively fire (like Para). Are neuronal targets of Uba5 known, which might suggest key pathways it modulates?

      We showed that Uba5 is broadly expressed in third instar larvae. FlyAtlas2 and FlyCellAtlas datasets show that Uba5 is broadly expressed but not in all the cells. In the larval CNS and adult brain, Uba5 is not expressed in all cells either. Hence, we cannot say Uba5 is a “housekeeping” gene. Regarding the neuronal targets of Uba5, we do not know which types of neurons express Uba5 and which pathways Uba5 modulates. This could be studied in the future.

      3) Does strong overexpression of the various Uba5 alleles in otherwise wild-type flies cause any phenotypes? This might support possible antimorphic/dominant negative functions of some of the variants. Is it plausible that any of the alleles could impact oligomerization of Uba5?

      We have not observed compromised viability or any obvious phenotype in flies overexpressing human reference UBA5 or UBA5 variants. So, our results do not support a dominant negative effect of any of the variants.

      To our knowledge, people do not have sufficient knowledge on UBA5 dimerization to speculate on whether some variants could play a dominant negative role. There is one variant, V260M, that lies at the dimer interface. We showed that the V260M variant biochemically affects ATP binding as well as UFM1 activation, but we do not have evidence to support that it causes dominant negative effects by affecting UBA5 dimerization.

      Minor points:

      1) Page 5 line 45: It seems a reference is missing about the temperature dependence of Gal4 activity.

      We apologize for the missing reference. We have incorporated a reference to PMID 25824290.

      2) It might be of interest to assay the various transgenic rescue alleles at a higher temperature (say 29C) in addition to the nice work looking at 18/25C survival. Perhaps some of the alleles display temperature sensitivity at low (18) and high (29) temperatures.

      We now include the survival rate data at 29C. The enzyme dead and severe LoF variants fail to rescue the lethality at 29C, while the mild (Group IA and IB) variants fully rescue. For the three Group II variants, the survival rate at 29C is higher than that at 25C and 18C. The results support the dosage sensitive effects of UBA5 overexpression, but do not support any variant to be temperature sensitive within this range.

      Reviewer #2

      Relative simplicity and genetic accessibility of the fly brain make it a premier model system for studying the function of genes linked to various diseases in humans. Here, Pan et al. show that human UBA5, whose mutations cause developmental and epileptic encephalopathy, can functionally replace the fly homolog Uba5. The authors then systematically express in flies the different versions of the gene carrying clinically relevant SNPs and perform extensive phenotypic characterization such as survival rate, developmental timing, lifespan, locomotor and seizure activity, as well as in vitro biochemical characterization (stability, ATP binding, UFM-1 activation) of the corresponding recombinant proteins. The biochemical effects are well predicted by (or at least consistent with) the location of affected amino acids in the previously described Uba5 protein structure. Most strikingly, the severity of biochemical defects appears to closely track the severity of phenotypic defects observed in vivo in flies. While the paper does not provide many novel insights into the function of Uba5, it convincingly establishes the fly nervous system as a powerful model for future mechanistic studies.

      One potential limitation is the design of the expression system in this work. Even though the authors state that "human cDNA is expressed under the control of the endogenous Uba5 enhancer and promoter", it is in fact the Gal4 gene that is expressed from the endogenous locus, meaning that the cDNA expression level would inevitably be amplified in comparison. The fact that different effects were observed when some experiments were performed at different temperatures (18 vs. 25) is also consistent with this. While I do not think this caveat weakens the conclusions of this paper, it may impact the interpretation of future experiments that use these tools, and thus should be clearly discussed in the paper. Especially considering the authors argue that most disease variants of UBA5 are partial loss-of-functions, the amplification effect could potentially mask the phenotypes of milder hypomorphic alleles. If the authors could also show that the T2A-Gal4 expression pattern in the brain matches well with that of endogenous RNA or protein (e.g. using HCR-FISH or antibody), it would help to alleviate this concern.

      We thank the reviewer for pointing out the issue.

      Regarding the humanization strategy we used in the study, we agree that this is a binary system which could induce overexpression of the target protein. However, as the reviewer also points out, this temperature sensitive system also enables us to flexibly adjust the expression level of the target protein (PMIDs 34113007, 35348658, 36206744), which is especially useful to study partial LoF variants. In our study we have successfully compared the relevant allelic strength of most of the variants.

      We agree with the reviewer that a masking effect may exist in our system due to its gene overexpression nature. However, we cannot conclude that this masking effect really affects the three Group IA variants in our tests. The three variants are mild LoF, which is supported by our biochemical assays. Individuals homozygous for one of the Group IA variants, p.A371T, do not have any obvious phenotype, which is also consistent with our findings in flies.

      Regarding the expression pattern of the T2A-GAL4, the Bellen lab has generated T2A-GAL4 lines for more than 3,000 genes. The expression pattern of many GAL4 lines faithfully reflect the expression pattern of the endogenous genes, which has been shown in our previous publications (PMIDs 25824290, 29565247, 31674908).

      Recommendations For The Authors:

      As related to the expression pattern comment in the public review, I think the authors could also take advantage of Fly Cell Atlas or other available scRNA-seq atlases of the fly brain to present a much more detailed description of the Uba5 expression profile with minimal additional effort. If the cells that express it share other features or genes (other than the para that the authors mention), this could lead to further insights about the gene's neuronal or glial functions.

      In response to the reviewer, we show the expression pattern of Uba5 documented in FlyCellAtlas and another adult brain single-cell RNA seq profile (PMID 29909982) in the revised manuscript.

      In addition, one of the mutants (assuming the same one) is referred to as Leu254Pro in some parts of the manuscript while in some other parts (including tables 1-2) it is Lys254Pro.

      We apologize for the mistakes. The variant should be Leu254Pro and we have made these corrections in the revised manuscript.

      Reviewer #3

      Summary:

      Variants in the UBA5 gene are associated with rare developmental and epileptic encephalopathy, DEE44. This research developed a system to assess in vivo and in vitro genotype-phenotype relationships between UBA5 allele series by humanized UBA5 fly models and biochemical activity assays. This study provides a basis for evaluating current and future individuals afflicted with this rare disease.

      Strengths:

      The authors developed a method to measure the enzymatic reaction activity of UBA5 mutants over time by applying the UbiReal method, which can monitor each reaction step of ubiquitination in real time using fluorescence polarization. They also classified fruit fly carrying humanized UBA5 variants into groups based on phenotype. They found a correlation between biochemical UBA5 activity and phenotype severity.

      Weaknesses:

      In the case of human DEE44, compound heterozygotes with both loss-of-function and hypomorphic forms (e.g., p.Ala371Thr, p.Asp389Gly, p.Asp389Tyr) may cause disease states. The presented models have failed to evaluate such cases.

      We agree with the reviewer that our current system has a limitation that it evaluates one variant at a time rather than any combination of variants. However, our biochemical data do show that the three Group IA variants are mild LoF variants rather than benign variants. One of these variants, p.A371T, does not cause any obvious phenotype in homozygous individuals, which is also consistent with our findings in flies. The modeling of variant combinations, especially the Group IA/Group III combinations could be carried out in future studies.

      Recommendations For The Authors:

      Figure 3G. Typo. "ContonS" should be replaced by "CantonS."

      We apologize for the spelling mistake. We correct the typo in the revised manuscript.

      Figure 5. The labels should be in uppercase instead of lowercase.

      We correct the panel labels in the revised manuscript.

      Figure 6A. Is the molecular weight of UBA5~UFM1 intermediate (99 kDa) in model Figure correct? In Fig. 6B, the molecular weight of UBA5~UFM1 intermediate seems to be 70-75 kDa.

      Both are correct. The molecular weight depicted in the schematic of Figure 6A is based on the UBA5 dimer, which dissociates in the SDS-PAGE gel shown in Figure 6B. We have reconfigured the schematic to make this more apparent.

      Figure. 6E, F, H, and I. The time points for quantification in these figures should be specified.

      We apologize for the confusion. The details on data quantification are now more thoroughly explained in the Methods.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript, the authors investigate differences between Tibetans and Han Chinese at altitude in terms of placental transcriptomes during full-term pregnancy. Most importantly, they found that the inter-population differentiation is mostly male-specific and the observed direction of transcriptional differentiation seems to be adaptive at high altitude. In general, it is of great importance and provides new insights into the functional basis of Tibetan high-altitude adaptations, which so far have been mostly studied via population genetic measures only. More specifically, I firmly believe that we need more phenotype data (including molecular phenotypes such as gene expression data) to fully understand Tibetan adaptations to high altitude, and this manuscript is a rare example of such a study. I have a few suggestions and/or questions with which I hope to improve the manuscript further, especially in terms of 1) testing if the observed DEG patterns are truly adaptive, and 2) how and whether the findings in this study can be linked to EPAS1 and EGLN1, the signature adaptation genes in Tibetans.

      We appreciate the reviewer’s constructive comments. We have addressed these points and the details are discussed below.

      Major Comments:

      1) The DEG analysis is the most central result in this manuscript, but the discrepancy between sex-combined and sex-specific DEGs is quite mind-boggling. For those that were differentially expressed in the sex-specific sets but not in the sex-combined one, the authors suggest an opposite direction of DE as an explanation (page 11, Figure S5). But Figure S5A does not show such a trend, showing that down-regulated genes in males are mostly not at all differentially expressed in females. Figure S5B does show such a trend, but it doesn't seem to be a dominant explanation. I would like to recommend the authors test alternative ways of analysis to boost statistical power for DEG detection other than simply splitting data into males and females and performing analysis in each subset. For example, the authors may consider utilizing gene-by-environment interaction analysis schemes here biological sex as an environmental factor.

      We agree with reviewer that the opposite direction of DEGs is likely only one of the possible explanations for the discrepancy between the sex-combined and the sex-specific DEGs. We have toned down the description of this point in the revised manuscripts.

      Following the suggestion of reviewer, we performed a ANCOVA analysis to evaluate the variance explained by sex from the expression data. For each gene, univariate comparisons of the average of gene expression between Tibetans and Han Chinese were made by using the ANCOVA test in R aov function with sex as covariates: aov (Expression ~ Ethnicity + Fetal sex). We observed a significantly higher variance explained by sex than by ethnicity in six layers of the placenta (except for the CN layer) (Author response image 1). For example, in the UC layer, fetal sex can explain ~0.203 variance, while the ethnicity explains ~0.107 variance (P-value = 4.9e-4). These results suggest a significant contribution of fetal sex for the observed variance of gene expression, consist with the observed sex-biased DEG patterns.

      Author response image 1.

      The ANCOVA results of the seven layers of placenta. The scatter plot shows the comparison of the explained variance (y-axis) and significance (x-axis, denoted by –log10(P-value)) between ethnicity (dots in red) and fetal sex (dots in blue). Each dot represents an investigated gene, and only genes with P<0.05 in significance are shown in the plots. The table is the summary statistics of the ANCOVA analysis.

      2) Please clarify how the authors handled multiple testing correction of p-values.

      There were three analyses involving multiple testing in this study: 1) for the differential expression analysis, we obtained the multiple corrected p-values by Benjamini-Hochberg FDR (false discovery rate) procedure; 2) for the GO enrichment analysis, we calculated the FDR-adjusted q-values from the overall p-values to correct for multiple testing.

      3) for the WGCNA analysis, considering the 12 traits were involved, including population, birth weight (BW), biparietal diameter (BPD), femur length (FL), gestation time (GT), placental weight (PW), placental volume (PLV), abdominal girth (AG), amniotic fluid maximcon depth (AFMD), amniotic fluid (AFI), fetal heart rate (FH) and fundal height (FUH). We calculated a Bonferroni threshold (p-value = 0.05/the number of independent traits) using the correlation matrix of the traits to evaluate the significant modules. We estimated the number of independent traits among the 12 investigated traits was 4 (Author response image 2). Therefore, we used a more stringent significant threshold p-value = 0.0125 (0.05/4) as the final threshold to correct the multiple testing brought by multiple traits in our WGCNA analyses. We have updated this section based on the new threshold.

      Author response image 2.

      The correlation matrix of 12 traits involved in the WGCNA analysis. The correlation coefficients larger than 0.2 (or smaller than -0.2) are regarded as significant correlation and marked in gradient colors.

      3) The "natural selection acts on the placental DEGs ..." section is potentially misleading readers to assume that the manuscript reports evidence for positive selection on the observed DEG pattern between Tibetans and Han, which is not.

      a) Currently the section simply describes an overlap between DEGs and a set of 192 genes likely under positive selection in Tibetans (TSNGs). The overlap is quite small, leading to only 13 genes in total (Figure 6). The authors are currently not providing any statistical measure of whether this overlap is significantly enriched or at the level expected for random sampling.

      We understand the reviewer’s point that the observed gene counts overlapped between DEGs from the three sets (4 for female + male; 9 for male only and 0 for female only) with TSNGs should be tested using a statistical method. Therefore, we adopted permutation approach to evaluate the enrichment of the overlapped DEGs with TSNGs.

      For each permutation, we randomly extracted 192 genes from the human genome, then overlapped with DEGs of the three sets (female + male; female only and male only) and counted the gene numbers. After 10,000 permutations, we constructed a null distribution for each set, and found that the overlaps between DEGs and TSNGs were significantly enriched in the “female + male” set (p-value = 0.048) and the “male only” set (p-value = 9e-4), but not in the “female only” set (p-value = 0.1158) (Author response image 3). This result suggests that the observed DEGs are significantly enriched in TSNGs when compared to random sampling, especially for the male DEGs. We added this analysis in the revised manuscript.

      Author response image 3.

      The distribution of 10,000 permutation tests of counts of the overlapped genes between DEGs and the 192 randomly selected genes in the genome. The red-dashed lines indicate the observed values based on the 192 TSNGs.

      b) The authors are describing sets of DEGs that seem to affect important phenotypic changes in a consistent and adaptive direction. A relevant form of natural selection for this situation may be polygenic adaptation while the authors only consider strong positive selection at a single variant/gene level.

      We agree with reviewer that polygenic adaptation might be a potential mechanism for DEGs to take effect on the adaptive phenotypes. Therefore, following the suggestion in the comment below, we conducted a polygenic adaptation analysis using eQTL information.

      c) The manuscript is currently providing no eQTL information that can explain the differential expression of key genes. The authors can actually do this based on the genotype and expression data of the individuals in this study. Combining eQTL info, they can set up a test for polygenic adaptation (e.g., Berg and Coop; https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412). This will provide a powerful and direct test for the adaptiveness of the observed DEG pattern.

      Following the reviewer’s suggestion, we employed the PolyGraph (Racimo et al., 2018) tool to identify the signatures of polygenic selection in Tibetans using eQTL information. We conducted eQTL analysis for the seven layers, and collected a set of 5,251 eQTLs, covering the SNPs associated with gene expression with a significanct p-value < 5e-8. To obtain a list of independent eQTLs, we removed those SNPs in linkage disequilibrium (r2 > 0.2 in 1000 Genome Project). Finally, we obtained 176 independent eQTLs. At the same time, we generated a set of 1,308,436 independent SNPs of Tibetans as the control panel. The PolyGraph result showed that Tibetans have a clear signature of polygenic selection on gene expression (Bonferroni-correction p-value = 0.003) (Author response image 4).

      We have added this result in the revised manuscript (Figure S4), and added a detailed description of polygenic adaption in the Methods section.

      Author response image 4.

      Polygraphs for the eQTLs that show evidence for polygenic adaptation in the five-leaf tree built using the allele frequency data of 1001 Tibetans (Zheng et al. 2023) and 1000 Genome Project. The colors indicate the marginal posterior mean estimate of the selection parameter for variants associated with the gene expression. r, q, s and v in the tree nodes refer to the nodes in terminal branches and internal branches. TBN, Tibetans; CHB, Han Chinese in Beijing; JPT, Japanese in Tokyo, Japan; CEU, Northern Europeans from Utah; YRI, Yoruba in Ibadan, Nigeria.

      4) The manuscript is currently only minimally discussing how findings are linked to EPAS1 and EGLN1 genes, which show the hallmark signature of positive selection in Tibetans. In fact, the authors' group previously reported male-specific association between EPAS1 SNPs and blood hemoglobin level. Many readers will be intrigued to see a discussion about this point.

      According to the reviewer’s suggestion, in the revised manuscript, we added a paragraph to discuss the relationship between our transcriptomic data and the two genes with strong selective signals, i.e. EPAS1 and EGLN1.

      “As the gene with the strongest signal of natural selection in Tibetans, EPAS1 has been reported in numerus studies on its contribution to high altitude adaptation. In this study, we detected a significant expression reduction of EPAS1 in the Tibetan UC compared to the high-altitude Han. It was reported that the selected-for EPAS1 variants/haplotype were associated with lower hemoglobin levels in the Tibetan highlanders with a major effect (Beall et al., 2010; Peng et al., 2017), and the low hemoglobin concentration of Tibetans is causally associated with a better reproductive success (Cho et al., 2017). Therefore, we speculate that the selective pressure on EPAS1 is likely through its effect on hemoglobin, rather than directly on the reproductive traits. The down-regulation of EPAS1 in placentas likely reflects a blunted hypoxic response that may improve vasodilation of UC for better blood flow, and eventually leading to the higher BW in Tibetans (He et al., 2023). For EGLN1, another well-known gene in Tibetans, we detected between-population expression difference in the male UC layer, but not in other placental layers. Considering the known adaptation mechanism of EGLN1 is attributed to the two Tibetan-enriched missense mutations, the contribution of EGLN1 to the gene expression changes in the Tibetan UC is unexpected and worth to be explored in the future.”

      Reviewer #2 (Public Review):

      In this manuscript, the authors use newly-generated, large-scale transcriptomic data along with histological data to attempt to dissect the mechanisms by which individuals with Tibetan ancestry are able to mitigate the negative effects of high elevation on birth weight. They present detailed analyses of the transcriptomic data and find significant sex differences in the placenta transcriptome.

      I have significant concerns about the conclusions that are presented. The analyses also lack the information necessary to evaluate their reliability.

      The experimental design does not include a low elevation comparison and thus cannot be used to answer questions about how ancestry influences hypoxia responses and thus birthweight at high elevations. Importantly, because the placenta tissues (and trophoblasts specifically) are quickly evolving, there are a priori good reasons to expect to find population differences irrespective of adaptive evolution that might contribute to fetal growth protection. There are also significant details missing in the analyses that are necessary to substantiate and replicate the analyses presented.

      Although the datasets are ultimately valuable as reference sets, the absence of low elevation comparisons for Tibetans and Han Chinese individuals undermines the ability of the authors to assess whether differences observed between populations are linked to hypoxia responses or variation in the outcomes of interest (i.e., hypoxia-dependent fetal growth restriction).

      We understand the reviewer’s concern about the lack of low-altitude comparison. For the placenta transcriptomic data, actually, we previously studied the comparison of placenta from high-altitude Tibetans and low-altitude Han Chinese, including 63 placentas of Tibetans living at Lhasa (elevation: 3650m) and 14 placentas of Han in Kunming (elevation: 1800m) (Peng et al. 2017). The main finding was that in general, the expression profiles are similar between the high-altitude Tibetans and the low-altitude Han. In particular, most high-altitude Tibetans have a similar level of EPAS1 expression in the placenta as the lowlander Han Chinese, a reflection of Tibetans’ adaptation at altitude. In other words, (Peng et al. 2017). In this study, we observed a significant down-regulation of EPAS1 in the Tibetan UC when compared to Han Chinese living at the same high altitude. Therefore, the observed differences between Tibetans and Han Chinese placenta at high altitude are due to the adaptation of Tibetans.

      For phenotypic data, we made a systematical comparison of reproductive outcomes in our previous studies (He et al., 2023; He et al., 2022). We proved that polygenic adaptation of reproduction in Tibetans tends to reduce the chance of preterm birth and eliminate the restriction on fetal development at high altitude. Compared to the high-altitude Han Chinese migrants, the high-altitude Tibetans exhibit a less birth weight reduction and infant mortality induced by hypoxia, similar with the lowland Han Chinese as reference.

      In summary, although we cannot make combination analysis with our high-altitude data and the published low-altitude data because of batch effect and difference of sampling strategy, we obtained more supportive evidence for the adaptation of placenta expression regulation in Tibetans. To be objective, we have discussed the limitation of the lack of lowlander placenta data in the Discussion section.

      The authors attempt to tackle this phenotypic association by looking for correlations between gene networks (WGCNA) and individual genes with birthweight and other measurements collected at birth. I have some reservations about this approach with only two groups (i.e., missing the lowland comparison), but it is further problematic that the authors do not present data demonstrating that there are differences in birthweight or any other traits between the populations in the samples they collected.

      Throughout, I thus find conclusions about the adaptive value and hypoxia-responses made by the authors to be unsubstantiated and/or the data to be inadequate. There are also a gratuitous number of speculative statements about mechanisms by which differential gene expression leads to the protection of birthweight that are not evaluated and thus cannot be substantiated by the data presented.

      As currently presented and discussed, these results thus can only be used to evaluate population differences and tissue-specific variation therein.

      We understand the reviewer’s point that the observed differences of gene expression between Tibetan natives and Han immigrants living at high altitude might be explained by ancestral divergence, rather than hypoxia-associated response and genetic adaptation of native Tibetans.

      Firstly, we conclude that Tibetans have a better reproductive outcome, not only based on the two highlander groups living at the same altitude, but also relied on the change direction compared to the lowland level. For example, we observed a significant higher BW in Tibetans than Han migrants in our dataset (35 Tibetans vs. 34 Han: p-value = 0.012) (Author response image 5), and in a larger dataset (He et al. 2023) (1,317 Tibetans vs. 87 Han: p-value = 1.1e-6), suggesting an adaptation of Tibetans because BW decreases with the increase of altitude. The logic was the same to the other traits. Following the suggestion of reviewer, we added these phenotype comparisons in the revised manuscripts. The detailed information of the investigated samples and the statistic results were also added as supplementary tables in the revised version.

      For the WGCNA, we agree with the reviewer that the detected modules both showing significant correlation with population and other reproductive traits cannot be fully explained by adaptation of Tibetans. Therefore, we tuned down the description of this section and added other possible explanations, such as population differences, in the discussion.

      Author response image 5.

      Comparison of 11 reproductive traits between Tibetans and Han immigrants. (A) comparison based on the dataset of this study (35 Tibetans vs. 34 Han); (B) correlation between BW and altitude (left panel) and comparison analysis based on the larger sample size (the data were retrieved from (He et al., 2023)). Univariate comparisons of the average of each trait cross population were made by using the ANCOVA test in R aov function with fetal sex and maternal age as covariates.

      There is also some important methodological information missing that makes it difficult or impossible to assess the quality of the underlying data and/or reproduce the analyses, further limiting the potential impact of these data:

      1) Transcriptome data processing and analyses: RNA quality information is not mentioned (i.e., RIN). What # of reads are mapped to annotated regions? How many genes were expressed in each tissue (important for contextualizing the # of DE genes reported - are these a significant proportion of expressed genes or just a small subset?).

      According to the reviewer’s suggestion, we added more information about transcriptome data processing and analyses in the revised Methods and Results:

      “After RNA extraction, we assessed the RNA integrity and purity using 1% agarose gel electrophoresis. The RIN value of extracted RNA was 7.56 ± 0.71.”

      “In total, 10.6 billion reads were mapped to the annotated regions, and 17,283 genes express in all the investigated placenta.”

      “We identified 579 differentially expressed genes (DEGs) between Tibetans and Han, accounting for 3.4% of the total number of expressed genes.”

      2) The methods suggest that DE analyses were run using data that were normalized prior to reading them into DESeq2. DESeq2 has an internal normalization process and should not be used on data that was already normalized. Please clarify how and when normalization was performed.

      Actually, we made raw read count matrix as input file when conducting differential analysis using DESeq2, rather than using the normalized data. We have updated our description in the method section of the revised manuscript.

      3) For enrichment analyses, the background gene set (all expressed genes? all genes in the genome? or only genes expressed in the tissue of interest?) has deterministic effects on the outcomes. The background sets are not specified for any analyses.

      Actually, we utilized the genes expressed in placenta as the background gene set for enrichment analyses. The genes with more than two transcripts per million transcripts (TPM) were regarded as an expressed gene, which is commonly used criteria for RNA-seq data.

      4) In the WGCNA analysis, P-values for correlations of modules with phenotype data (birthweight etc.) should be corrected for multiple testing (i.e., running the module correlation for each outcome variables) and p.adjust used to evaluate associations to limit false positives given the large number of correlations being run.

      As we explained in response to comment#2 of Reviwer-1, we used a more stringent significant threshold of p-value = 0.0125 (0.05/4) as the final threshold to correct the multiple testing brought by multiple traits in the WGCNA analysis.

      5) The plots for umbilical histological data (Fig 5 C) contain more than 5 points, but the use of replicate sections is not specified. If replicate sections were used, the authors should control for non-independence of replicate sections in their analyses (i.e., random effects model).

      We did not use replicate sections. Figure 5C shows the umbilical artery intima and media. Because each human umbilical cord includes two umbilical arteries, the 5 vs. 5 individual comparison generates 10 vs. 10 umbilical artery comparison. To be clearer, we added an explanation in the revised manuscript.

      On more minor notes:

      There is significant and relevant published data on sex differences and hypoxia in rodents (see Cuffe et al 2014, "Mid- to late-term hypoxia in the mouse alters placental morphology, glucocorticoid regulatory pathways, and nutrient transporters in a sex-specific manner" and review by Siragher and Sferuzzi-Perro 2021, "Placental hypoxia: What have we learnt from small animal models?"), and historical work reporting sex differences in placental traits associated with high elevation adaptation in Andeans (series of publications by Moira Jackson in the late 1980s, reviewed in Wilsterman and Cheviron 2021, "Fetal growth, high altitude, and evolutionary adaptation: A new perspective").

      We thank the reviewer for the constructive comments on literature review. We have cited and discussed them in the revised manuscript.

      Reviewer #3 (Public Review):

      More than 80 million people live at high altitude. This impacts health outcomes, including those related to pregnancy. Longer-lived populations at high altitudes, such as the Tibetan and Andean populations show partial protection against the negative health effects of high altitude. The paper by Yue sought to determine the mechanisms by which the placenta of Tibetans may have adapted to minimise the negative effect of high altitude on fetal growth outcomes. It compared placentas from pregnancies from Tibetans to those from the Han Chinese. It employed RNAseq profiling of different regions of the placenta and fetal membranes, with some follow-up of histological changes in umbilical cord structure and placental structure. The study also explored the contribution of fetal sex in these phenotypic outcomes.

      A key strength of the study is the large sample sizes for the RNAseq analysis, the analysis of different parts of the placenta and fetal membranes, and the assessment of fetal sex differences.

      A main weakness is that this study, and its conclusions, largely rely on transcriptomic changes informed by RNAseq. Changes in genes and pathways identified through bioinformatic analysis were not verified by alternate methods, such as by western blotting, which would add weight to the strength of the data and its interpretations. There is also a lack of description of patient characteristics, so the reader is unable to make their own judgments on how placental changes may link to pregnancy outcomes. Another weakness is that the histological analyses were performed on n=5 per group and were rudimentary in nature.

      For the weakness raised by the reviewer, here are our responses:

      (1) Considering that our conclusions largely rely on the transcriptomic data, we agree with reviewer that more experiments are needed to validate the results from our transcriptomic data. However, this study was mainly aimed to provide a transcriptomic landscape of high-altitude placenta, and to characterize the gene-expression difference between native Tibetans and Han migrants. The molecular mechanism exploration is not the main task of this study, and more validation experiments are warranted in the future.

      (2) For the lack of description of patient characteristics, actually, we provided three level results on the placental changes of Tibetans: macroscopic phenotypes (higher placental weight and volume), histological phenotypes (larger umbilical vein walls and umbilical artery intima and media; lower syncytial knots/villi ratios) and transcriptomic phenotypes (DEG and differential modules). Combined with the previous studies, these placenta changes suggest a better reproductive outcome. For example, the placenta volume shows a significantly positive correlation with birth weight (R = 0.31, p-value = 2.5e-16), therefore, the larger placenta volume of Tibetans is beneficial to fetal development at high altitude. In addition, the larger umbilical vein wall and umbilical artery intima and media of Tibetans can explain their adaptation in preventing preeclampsia.

      (3) For the sample size of histological analyses, we understand the reviewer’s concern that 5 vs. 5 samples are not large in histological analyses. This is because it was difficult to collect high-altitude Han placenta samples, and we only got 13 Han samples, from which we selected 5 infant sex matched samples.

      References

      Beall, C.M., Cavalleri, G.L., Deng, L.B., Elston, R.C., Gao, Y., Knight, J., Li, C.H., Li, J.C., Liang, Y., McCormack, M., et al. (2010). Natural selection on EPAS1 (HIF2 alpha) associated with low hemoglobin concentration in Tibetan highlanders. P Natl Acad Sci USA 107, 11459-11464.

      Cho, J.I., Basnyat, B., Jeong, C., Di Rienzo, A., Childs, G., Craig, S.R., Sun, J., and Beall, C.M. (2017). Ethnically Tibetan women in Nepal with low hemoglobin concentration have better reproductive outcomes. Evol Med Public Health 2017, 82-96. He, Y., Guo, Y., Zheng, W., Yue, T., Zhang, H., Wang, B., Feng, Z., Ouzhuluobu, Cui, C., Liu, K., et al. (2023). Polygenic adaptation leads to a higher reproductive fitness of native Tibetans at high altitude. Curr Biol.

      He, Y., Li, J., Yue, T., Zheng, W., Guo, Y., Zhang, H., Chen, L., Li, C., Li, H., Cui, C., et al. (2022). Seasonality and Sex-Biased Fluctuation of Birth Weight in Tibetan Populations. Phenomics 2, 64-71.

      Peng, Y., Cui, C., He, Y., Ouzhuluobu, Zhang, H., Yang, D., Zhang, Q., Bianbazhuoma, Yang, L., He, Y., et al. (2017). Down-Regulation of EPAS1 Transcription and Genetic Adaptation of Tibetans to High-Altitude Hypoxia. Mol Biol Evol 34, 818-830.

      Racimo, F., Berg, J.J., and Pickrell, J.K. (2018). Detecting Polygenic Adaptation in Admixture Graphs. Genetics 208, 1565-1584.

    1. Author Response

      We thank the reviewers and editorial team for their positive and thoughtful comments and recommendations for our paper. We will provide a detailed point-to-point response accompanying a revised version of our paper to carefully incorporate all the recommendations and clarify several confusing points. Here we provide a brief provisional response to summarize the key points.

      1) Are the two factors in the enslavement patterns after stroke, changes in shape (loss of complexity) and magnitude (intrusion of flexor bias), dissociable? Our results show both a loss of shape (Fig. 5) and an increase of magnitude (Fig. 7) in enslavement patterns in the paretic hand. We agree with the reviewers that the key measures for these two factors, Angular (Cosine) and Euclidean Distances, are not mathematically orthogonal because, while Angular Distance is indeed only influenced by shape, Euclidean Distance is influenced by both magnitude and shape changes of the enslavement patterns. However, our LME results show that increased flexor bias in the paretic hand strongly predicts Euclidean Distance but not Angular Distance (Fig. 9), thereby suggesting that pattern shape change cannot be fully accounted for by flexor intrusion. This analysis was also recommended by Reviewer 1. In the revised version, we will further clarify the dissociation of the two components.

      2) Can biomechanical factors be ruled out from the enslavement patterns in the paretic hand? We agree with the reviewers that resting hand posture measures alone cannot fully assess biomechanical factors, given that biomechanical constraints during action and abnormal postures due to neural loss after stroke were not captured in these measures. In the paper, however, we used three analyses to justify this point. In the first analysis, we showed that resting hand posture (Mount Distance and Mount Angle) could not account for the Biases in all groups (healthy, paretic, non-paretic). In the second analysis, we showed that resting hand posture could not account for Enslavement in all groups. In the third analysis, we showed that Biases in the non-paretic hand could not predict Biases or Enslavement in the paretic hand within the same patients. The third analysis was done based on the existing literature that secondary biomechanical change after stroke was likely not the major contributor in the hand impairment, where passive muscle stimulation could successfully evoke a similar level of fingertip forces in both stroke and control hands (Hoffmann et al. 2016) and median nerve stimulation could significantly reduce intrusion of finger flexion (Kamper et al. 2003). The resting hand posture and non-paretic hand biases would include both biomechanical and neural factors, but since none of these measures could predict enslaving patterns, we maintain that biomechanical factors would not be a contribution to the enslavement in the paretic hand.

      3) Neural correlates of behavioral changes were not tested, therefore claims such as "low-level," "subcortical," and "top-down cortical" contributions are not fully justified. We agree with the reviewers, and we will clear references to these neural correlates from the text of the Results section in the revised version of the paper. These neural correlates will only be discussed in the Discussion section.

      4) RDM construction for "by-Target Direction" was not clearly explained. We agree with the reviewer that the diagram in Fig. 4D was a little confusing. To construct these matrices, we analyzed differences in coactivation patterns of the non-instructed fingers when two fingers move in the same target direction. A cleaner pattern comparison should exclude both the two instructed fingers to be compared from the enslavement matrices. This will be clarified in the revised version.

      References

      Hoffmann, Gilles, Megan O. Conrad, Dan Qiu, and Derek G. Kamper. 2016. “Contributions of Voluntary Activation Deficits to Hand Weakness after Stroke.” Topics in Stroke Rehabilitation 23 (6): 384–92. https://doi.org/10.1179/1945511915Y.0000000023.

      Kamper, D G, R L Harvey, S Suresh, and W Z Rymer. 2003. “Relative Contributions of Neural Mechanisms versus Muscle Mechanics in Promoting Finger Extension Deficits Following Stroke.” Muscle & Nerve 28 (3): 309–18. https://doi.org/10.1002/mus.10443.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Editorial comments:

      Comment 1 - Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      We appreciate the feedback from the 3 Reviewers and Editor. We have enumerated each Reviewer comment and provide a detailed response. We endeavoured to include each suggestion into the revised manuscript. All changes in the manuscript are indicated in red font. In instances in which we respectfully disagree with the Reviewer, we have provided a fair rebuttal. We feel the comments from the Reviewers has significantly improved the clarity and quality of the manuscript.

      Comment 2 - The revision process has demonstrated the value of your work, highlighting both its strengths and shortcomings. Importantly, it provides detailed and achievable suggestions for improving the current version of your contribution.

      We thank the Reviewers and Editor for their time and expert input on our manuscript. We feel the suggestions from the Reviewers to address the shortcomings has resulted in a significantly improved manuscript.

      Comment 3 - There is a general consensus among the reviewers on three key aspects. Firstly, the article would greatly benefit from a clearer layout of the experimental design and methodology, potentially including schematics to help readers comprehend the complexity and details of the study.

      We appreciate the feedback from Reviewer 2 in particular. We have added a new schematic for Experiment 3 (see PUBLIC REVIEWS Reviewer #2 Comment 2). We have also revised the Results section by including subheadings and additional text to help explain the methods.

      Comment 4 - Secondly, conducting a more comprehensive analysis of the available dataset, utilizing tools such as WGCNA to explore gene co-expression networks beyond specific genes, is recommended. Additionally, it is advised to exercise greater caution when discussing the limitations of the employed methods.

      The suggestion for the WGCNA is excellent and very much appreciated. The revised manuscript includes WGCNA for both the MBH and pituitary gland. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 5 - Thirdly, expanding the results section to create a more engaging narrative that guides readers through the numerous findings, and extending the discussion and conclusions to emphasize the ecological relevance of learning photoperiodic/seasonal responses and highlighting the presented model, would be valuable.

      These were excellent suggestions that significantly improved the clarity and quality of the manuscript. The results section included several subheadings to help break up of the transitions across experiments. We have also significantly revised the introduction and discussion to include the ecological relevance and importance to consider sex as a factor in the interpretations.

      Comment 6 - Finally, please pay close attention to the comment on the statistical analysis provided by Rev#2.

      It is unclear why the Benjamini-Hochberg’s FDR analyses was suggested. The statistical test is a version of the Bonferroni test but is less stringent. We prefer to use conservative tests (i.e., Bonferroni correction). Moreover, the Bonferroni correction is the commonly used statistical tests in the field. To be consistent with the field and to be careful in our statistical approach, the revised manuscript did not change the post-hoc correction.

      PUBLIC REVIEWS:

      Reviewer #1:

      Comment 1 - The authors investigated the molecular correlates in potential neural centers in the Japanese quail brain associated with photoperiod-induced life-history states. The authors simulated photoperiod to attain winter and summer-like physiology and samples of neural tissues at spring, and autumn life-history states, daily rhythms in transcripts in solstices and equinox, and lastly studies FSHb transcripts in the pituitary. The experiments are based on a series of changes in photoperiod and gave some interesting results. The experiment did not have a control for no change in photoperiod so it seems possible that endogenous rhythms could be another aspect of seasonal rhythms that lack in this study. The short-day group does not explain the endogenous seasonal response.

      We thank the Reviewer for the fair assessment of the manuscript. The statement ‘the experiment did not have a control for no change in photoperiod’ is not clear to us. We think the Reviewer is arguing that prolonged constant photoperiod was not conducted to examine circannual timing in avian reproduction. The constant short photoperiod in Exp3 does provide the ability to examine the initial stages of interval timing. A different endogenous mechanism used by animals. The revised manuscript has clarified the different physiological responses.

      Comment 2 - The manuscript would benefit from further clarity in synthesizing different sections. Additionally, there are some instances of unclear language and numerous typos throughout the manuscript. A thorough revision is recommended, including addressing sentence structure for improved clarity, reframing sentences where necessary, correcting typos, conducting a grammar check, and enhancing overall writing clarity.

      We have incorporated the suggestions from both Reviewer 1 and Reviewer 2 that aimed to increase the clarity of the manuscript. We have provided detailed responses to each comment below and state how each comment was incorporated in the revised manuscript. We also had the manuscript reviewed by a colleague to help identify issues associated with sentence structure, grammar, and spelling.

      Comment 3 - Data analysis needs more clarity particularly how transcriptome data explains different physiological measures across seasonal life-history states. It seems the discussion is built around a few genes that have been studied in other published literature on quail seasonal response. Extending results on the promotor of DEGs and building discussion is an extrapolating discussion on limited evidence and seems redundant.

      A new statistical analysis (ie., WGCNA) was conducted to identify relations between photoperiod, physiology and transcripts. The focus on the few photoperiodic gene was kept in the discussion as the transcript expression is important to highlight the differences from the prevailing hypotheses and novel patterns of expression across seasonal timescales. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 4 - Last, I wondered if it would be possible to add an ecological context for the frequent change in the photoperiod schedule and not take account of the endogenous annual response. Adding discussion on ecological relevance would make more sense.

      This is an excellent suggestion. The introduction and discussion were substantially revised to include the ecological relevance.

      Reviewer #2:

      Comment 1 - This study is carefully designed and well executed, including a comprehensive suite of endpoint measures and large sample sizes that give confidence in the results. I have a few general comments and suggestions that the authors might find helpful.

      We appreciate the Reviewers support for our manuscript. We have endeavoured to incorporate all suggestions in the revised manuscript.

      Comment 2 - I found it difficult to fully grasp the experimental design, including the length of light treatment in the three different experiments (which appears to extend from 2 weeks up to 8 weeks). A graphical description of the experimental design along a timeline would be very helpful to the reader. I suggest adding the respective sample sizes to such a graphic, because this information is currently also difficult to keep track of.

      We have created a new figure panel to address the Reviewer’s concern. See figure S4 panel ‘a’. The new schematic representation was designed to illustrate the similarity in experimental design used in Experiment 1 and Experiment 2. But clearly illustrates the extended short photoperiod manipulation (4 weeks and not 8 weeks). We added the sample sizes to initial drafts but felt the added text hindered the clarity of the schematic representation (particularly for Fig1a). The sample sizes for each experiment and treatment are provided in the raw data provided in the supplementary Table 1. For this reason, we have opted to not add the sample size to each diagram. We hope that the Reviewer will understand our perspective.

      Comment 3 - The authors use a lot of terminology that is second nature to a chronobiologist but may be difficult for the general reader to keep track of. For example, what is the difference between "photoinducibility" and "photosensitivity"? Similarly, "vernal" and "autumnal" should be briefly explained at the outset, or maybe simply say "spring equinox" and "fall equinox."

      This is a very helpful suggestion, and we thank the Reviewer. Two changes were made to the manuscript to address this comment. First, we revised the second introductory paragraph to describe the photoperiodic response and the terms used. Second, we have removed all reference to ‘vernal’ and replaced with ‘spring’. We opted to keep ‘autumn’ as the change to ‘fall’ did not provide the clarity of seasonal state in some statements (as fall is also used as a downward direction).

      Comment 4 What was the rationale for using only male birds in this study? The authors may want to include a brief discussion on whether the expected results for females might be similar to or different from what they found in males, and why.

      We agree with the Reviewer’s position that studies should include, or least describe, male and female biology. We have revised the text to address this comment. In the methods, we provide 2 sentences that state the photoperiodic response is the same for both male and females, and why males were selected. See lines (352-355). Then, in the discussion, we describe why females will be important to study how other supplementary environmental cues impact seasonal timing of reproduction. See lines (312-330; and 334-339).

      Comment 5 - The authors used the Bonferroni correction method to account for multiple hypothesis testing of measures of testes mass, body mass, fat score, vimentin immunoreactivity and qPCR analyses in Study 1. I don't think Bonferroni is ever appropriate for biological data: these methods assume that all variables are independent of each other, an assumption that is almost never warranted in biology. In fact, the data show clear relationships between these endpoint measures. Alternatively, one might use Benjamini-Hochberg's FDR correction or various methods for calculating the corrected alpha level.

      This concern is not clear to us. The Benjamini-Hochberg’s FDR is a slight modification of the Bonferroni correction. Moreover, the FDR is a less-stringent statistical test compared to the Bonferroni correction. We prefer to keep the Bonferroni approach to correct for multiple tests for two reasons. First, this test is commonly used in the field of chronobiology, and second, the Bonferroni correction is more conservative. We hope the Reviewer will appreciate our perspective to be consistent with the research field and higher stringency in our statistical approach.

      Comment 6 - The graphical interpretations of the results shown in Figure 1n and Figure 3e, along with the hypothesized working model shown in Figure S5, might best be combined into a single figure that becomes part of the Discussion. As is, I do not think these interpretative graphics (which are well done and super helpful!) are appropriate for the Results section.

      We appreciate the Reviewer’s suggestion. During the revision we developed a single figure to show the graphical representation for the respective experiments. Unfortunately, we found the single source to be very difficult to provide a clear description and overview of the findings. We feel that the interpretations, (admittedly unusual for Results section) are best placed in the respective figures that correspond to the different experiments.

      Reviewer #3:

      Comment 1a - It is well known that as seasonal day length increases, molecular cascades in the brain are triggered to ready an individual for reproduction. Some of these changes, however, can begin to occur before the day length threshold is reached, suggesting that short days similarly have the capacity to alter aspects of phenotype. This study seeks to understand the mechanisms by which short days can accomplish this task, which is an interesting and important question in the field of organismal biology and endocrinology.

      We thank the Reviewer for their positive feedback.

      Comment 1b - The set of studies that this manuscript presents is comprehensive and well-controlled. Many of the effects are also strong and thus offer tantalizing hints about the endo-molecular basis by which short days might stimulate major changes in body condition. Another strength is that the authors put together a compelling model for how different facets of an animal's reproductive state come "on line" as day length increases and spring approaches. In this way, I think the authors broadly fulfill their aims.

      We thank the Reviewer for the positive support of our research and manuscript.

      Comment 1c - I do, however, also think that there are a few weaknesses that the authors should consider, or that readers should consider when evaluating this manuscript. First, some of the molecular genetic analyses should be interpreted with greater caution. By bioinformatically showing that certain DNA motifs exist within a gene promoter (e.g., FSHbeta), one is not generating robust evidence that corresponding transcription factors actually regulate the expression of the gene in question. In fact, some may argue that this line of evidence only offers weak support for such a conclusion. I appreciate that actually running the laboratory experiments necessary to generate strong support for these types of conclusions is not trivial, and doing so may even be impossible. I would therefore suggest a clear admission of these limitations in the paper.

      We agree with the Reviewer’s position. The transcription binding protein analyses was used as a means to identify potential factors involved in the regulation of transcript expression. We have written a new paragraph to address this comment. In the discussion, we that highlight the links between the well characterised circadian regulation of photoperiodic transcripts (e.g, D- & E-box elements and the photoperiodic control of TSHβ. We also indicate that our bioinformatic approach identified potentially new transcription binding motifs, and provide a clear admission and state that functional analyses are required to determine necessity of these pathways (e.g., MEF2). See lines 293-295.

      Comment 2 - Second, I have another issue with the interpretation of data presented in Figure 3. The data show that FSHbeta increases in expression in the 8Lext group, suggesting that endogenous drivers likely act to increase the expression of this gene despite no change in day length. However, more robust effects are reported for FSHbeta expression in the 10v and 12v groups, even compared to the 8Lext group. Doesn't this suggest that both endogenous mechanisms and changes in day length work together to ramp up FSHbeta? The rest of the paper seemed to emphasize endogenous mechanisms and gloss over the fact that such mechanisms likely work additively with other factors. I felt like there was more nuance to these findings than the authors were getting into.

      We agree with the Reviewer and a similar concern was raised by Reviewer 1. Our aim was to highlight that FSH expression increased in constant short photoperiod. We have revised the manuscript to address the concern raised by the Reviewer. We have added 2 sentences in the results to highlight the additive role of endogenous timing and photoperiodic effects on FSH expression (see lines 223-226). We have kept the text that describes endogenous increases in expression (e.g., FSH/GnRH) in response to short photoperiod in the manuscript as this observation is not influenced by long photoperiod.

      Comment 3 - Third, studies 1 - 3 are well controlled; however, I'm left wondering how much of an effect the transitions in day length might have on the underlying molecular processes that mediate changes in body condition. While the changes in day length are themselves ecologically relevant, the transitions between day length states are not. How do we know, for example, that more gradual changes in day length that occur over long timespans do not produce different effects at the levels of the brain and body? This seemed especially relevant for study 3, where animals experience a rather sudden change in day length. I recognize that these experimental methods are well described in the literature, and they have been used by endocrinologists for a long time; nonetheless, I think questions remain.

      There are two points raised in this comment. First, the effect of transition in day length on body condition. We are investigating the impact of photoperiodic transitions on body condition. The ongoing project has examined the changes in tissue lipid content and conducted transcriptomic analyses of multiple peripheral tissues involved in energy balance. Although we made an initial attempt to combine all the findings into a single manuscript, the large datasets resulted in an overwhelming manuscript that lacked clarity. Instead, we have opted for two manuscripts that focus on the respective physiological systems. Those data should be published shortly. We did expand the discussion by developing a single paragraph that focused on the pattern of POMC expression and changes in quail body mass and adipose tissue. See lines 300-311.

      Second, the Reviewer raised the issue of more gradual changes in day length over longer timespans. The day length and duration of exposure selected was to replicate previously used photoperiod manipulations to ensure reproducibility in research programmes, and to reduce the impact of photoperiod history (see lines 367-369). The present manuscript is the first study in birds to examine multiple intervening (ie within the extreme long- and short-photoperiods) day length conditions and we feel this is a major and novel contribution to the field. We agree that other time points (e.g., 13L:11D), or quicker/longer timespans could provide additional insight into the molecular mechanisms that govern seasonal transitions in reproduction/energy balance. The question raised by the Reviewer requires the types of studies that use natural conditions from wild-caught animals (or semi-natural laboratory settings) and beyond the focus of the current manuscript.

      Recommendations For The Authors:

      Reviewer #1

      Comment 1 - Abstract: Overall abstract needs more clarity in rationale, hypothesis, and result outcomes. How this study advances our knowledge in seasonal/ photoperiodic regulation of reproduction in birds. Particularly what knowledge gap FSHb results fill in.

      We have substantially revised the abstract considering the Reviewer’s suggestions. The abstract has clarified the rationale, hypothesis and results outcomes. We have also added new introductory and concluding statements that place the work into a wider ecological context (as suggested below).

      Comment 2 - In general the introduction needs more clarity and doesn't seem to cover the ecological relevance of learning photoperiodic/seasonal response.

      We agree with the Reviewer the introduction could be improved. We have substantially revised the introduction with an aim to increase the clarity. This involved an addition on the ecological context, clarification of the photoperiodic states in birds, and a description of the general and specific objectives. Note we did not include an introduction to ‘learning’ of the photoperiodic response, as the term implies a cognitive component is involved which is incorrect. See lines (61-67, 71-74, 80-86, and 100-105).

      Comment 3 - Line 58: What does the author mean by "future seasonal environment" Is it to introduce change in climate or future seasonal events? This sentence needs rephrasing and more clarity.

      In response to Comment 2, we have revised the introductory paragraph and the sentence was removed from the text.

      Comment 4 - Line 63: I would recommend the use of circannual rhythms with caution for the kind of experiments authors have proposed. The approach used here is beyond the scope of addressing circannual endogenous rhythms, which can be tested only independent of photoperiod change.

      We agree with the Reviewer’s concern. The use of circannual rhythms was limited to the first paragraph (lines 56-63) only to introduce the concept of endogenous rhythmicity. We were careful to not use the term ‘circannual’ for the rest of the manuscript, as the Reviewer has indicated, would be inappropriate. We have retained the use of ‘endogenous program’ to refer to the molecular and physiological changes that can occur independent of photoperiod change (ie Experiment 3). In this case, the use of endogenous is appropriate as this form of timing adheres to an interval timer. We also provided a definition for interval timer and ecological examples to illustrate the difference between circannual rhythms and annual interval timer (see lines 71-74). We also reviewed the entire manuscript to ensure the distinction for the endogenous program was clear.

      Comment 5 - Another aspect authors missed is that Quail is not an absolute photorefractory (Robinson and Follett, 1982).

      We agree with the Reviewer that quail are not absolute photorefractory (but instead relative photorefractory). As our photoperiod manipulations do not address criterion 1, or criterion 2 of the avian photoperiodic response (MacDougall-Shackelton et al., 2009; see https://doi.org/10.1093/icb/icp048), we feel that adding the type of photorefractory response would be a distraction and reduce the clarity of the concepts/experimental design described in the manuscript.

      Comment 6 - Line 223-234: "Chicks were raised under constant light and constant heat lamp". Constant photoperiod experienced during development raises concern on how this pretreatment would shape the adult seasonal response, which could be different in the seasonal response of birds raised in natural photoperiod. If this is correct, the results shown are not tenable for birds inhabiting the natural environment.

      The light schedule used in our experiment is the most appropriate for laboratory reared chicks. The light schedule, use of an incubator and hatchery is commonly used in research laboratories. The procedure serves to increase the hatch rate and welfare of chicks. Undoubtedly there will be some early developmental programming effects on quail development. However, the gonadal response across all 3 experiments was consistent with the vast scientific literature on the avian photoperiodic response in both laboratory and wild birds. As the robust gonadal response clearly replicated previous studies, we are confident the results are tenable for birds inhabiting natural environments.

      Comment 7 - Numerous studies done in mammals suggest that photoperiod experienced in the early life stage affects the circadian and seasonal response in adults (Ciarleglio et al., 2011, Perinatal photoperiod imprints the circadian clock, Nat Neurosceince; Stetson M., et al., 1986, Maternal transfer of photoperiodic information influences the photoperiodic response of prepubertal Djungarian hamsters).

      We agree with the Reviewer that developmental programming in mammals is important for the photoperiodic response. However, there are vast differences between the avian and mammalian photoperiodic response. Critically, in mammals, the maternal transfer of information to the offspring is achieved via the melatonin hormone. Conversely, in birds, melatonin is not necessary, nor sufficient for photoperiodic time measurement (Juss et al., 1993; see https://doi.org/10.1098/rspb.1993.0121). It is not scientifically tenable to relate the mammalian and avian photoperiodic responses in adulthood based on early developmental programs. For this reason, we did not introduce or discuss developmental programming in our manuscript.

      Comment 8 - Please give details on the month in which these birds were exposed to different short and long photoperiods. It is not clear in the method section. The birds experience long to short day transition and then back to long day in 16 weeks (~ 4 months). The annual cycle is ~12 months long in nature. Again, what is the ecological relevance of such an experimental paradigm. This could give some idea on photoperiodic response, but not on how the endogenous annual cycle would respond.

      Birds were delivered in September 2019 and 2020. (We have added these details to the manuscript (see lines 351-352). We agree with the Reviewer that the ecological relevance of the experimental design is limited. Our focus was to use laboratory conditions and well characterised photoperiodic manipulations to examine the role of the environmental, initial predictive cue to time seasonal transitions in reproduction. The 2-week duration for each photoperiod state in Experiment 1 provides the ability to eliminate the impact of photoperiodic history (see lines 367-369; Stevenson et al., 2012a) and reduce the time necessary for the research project. As described above in Comment #4 – we did not examine the endogenous annual cycle – but instead focused on an endogenous interval timer. Experiment 3 was designed to best examine an endogenous interval timer.

      Comment 9 - Line 251: "A jugular blood sample" Please rephrase this sentence and add 50 ul heparinized tubes

      We thank the Reviewer for identifying this oversight. The text was changed accordingly.

      Comment 10 - Line 259: The scale.....fat pads" - The sentence doesn't read correctly.

      The sentence was revised accordingly.

      Comment 11 - Line 274: Male.....six weeks. It is not clear from this sentence; what photoperiod birds were exposed to before transferring to 2 long days. Is it 16 or 14 LD.

      The birds were held in 16L. The text has been revised accordingly.

      Comment 12 - Line 276: It is not clear what is Home Office approved schedule 1. This may be a commonly used term for animal sacrifice protocol in UK and Europe. But it is not familiar jargon for the rest of the globe.

      We apologise for the jargon. The text was revised to include the exact methods (decapitation followed by exsanguination).

      Comment 13 - Line 277-284: Birds under SD for 4 weeks (8 Lext) is a bit confusing and particularly in the context of studying endogenous rhythm. Needs more clarity.

      The text was revised to improve the clarity. The manuscript now states: ‘A subset of birds (n=6) was maintained in short day photoperiods for four more weeks (8Lext). This group of birds provided the ability to examine whether an endogenous increase in FSHβ expression would occur in constant short day photoperiod condition.’

      Comment 14 - Line 322-323: Give RIN number (RNA integrity number) here which is a very common parameter to determine RNA degradation in RNAseq experiments. I guess, the MiniON is a portable sequencer and sequences one sample at a time. If this is true authors should consider any batch effect in sequencing and use it as a covariate in the model.

      The RIN values from our extraction protocol reliably produce RIN values >9.0. The text now states: Isolated RNA reliably has RIN values >9.0 for both the mediobasal hypothalamus and pituitary gland. Our RIN values are well above the recommended 7.0 limit. The Reviewer is correct that MinION is portable, however, more than one sample can be run at a time. We stated in the text (lines 454-460) that birds were counterbalanced across Flow cells so that each sequencing run had 9 samples, one from each treatment group. Our counterbalancing approach and quality control steps prevented batch effects.

      Comment 15 - Line 397-398: Adding quail or chicken-specific vimentin peptide pre-incubation with primary Ab will serve more confirming control. Omitting primary Ab doesn't address cross-reactive/ nonspecific binding issues.

      We agree that a positive control (ie primary Ab) is the gold standard to support specificity of the antibody. Unfortunately, we have not found a supplier of the epitope for quail/chicken vimentin. We have conducted another in silico analysis an established that the sequences for the vimentin antibody is specific for vimentin. The next closest sequence alignment is only 68% for a protein that is not expressed in the brain. The immunoreactive pattern observed in our histology reproduces work from mammalian models in which the epitope is available. Therefore, we are confident that our immunoreactive signal for vimentin is specific. We have added the in silico analysis in the manuscript on lines 535-538.

      Comment 16 - Line 430: Was the GLM model used for testing all variables? Running a statistical model to explain Differentially expressed genes, photoperiod, and physiological variables together will give a more conclusive outcome to explain the photoperiod effect and seasonal state.

      A similar comment was raised by Reviewer 2. We have conducted a WGCNA analyses to examine the relationship between photoperiod, physiological variables and DEG. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 17 - It is a bit unclear why the author used cherry-picking approach by talking about only a few genes that have been studied as key regulators of photoperiodic response in quail. What was the purpose of transcriptome? A better approach would have been to use a model to reduce the data (PCA) and explain the physiological response by regression against different PCs.

      We agree with the Reviewer that other statistical approaches could be conducted, and other genes could be discussed. However, we focussed on the key regulators of the photoperiodic response in quail as these are the well characterised genes. It is important that our discussion focused on these transcripts as most do not conform to the predicted patterns of expression. We feel it is best that we keep the focus on these genes.

      Comment 18 - TSHb result is inconsistent with past studies, where TSHb is the first responder gene on photoinduction. The author did not pay attention to explaining it further in the discussion.

      We respectfully disagree with the Reviewer. Our results are consistent with past studies and show that TSHβ expression is a molecular marker of long day photoperiod. Our study does not examine photoinduction; which does not provide the ability to compare between our study and previous work (eg., Nakao et al., 2008; see doi: 10.1038/nature06738). We have revised the text in consideration of the concern raised by the Reviewer. The text now states ‘Previous reports established that TSHβ expression is significantly increased during the period of photoinducibility in quail (Nakao et al., 2008). Although the present study did not directly examine photoinduction, TSHβ expression was consistently elevated in long day photoperiod (i.e., 16L).’. (see lines 262-265).

      Comment 19 - PRL result seems interesting and there could be more discussion in relation to the rise in PRL transcripts levels termination of breeding. Elaborating on PRL expression and breeding termination can add more information to the discussion.

      This comment is not clear to us, and we would incorporate a clarified comment in a revised manuscript. The increased expression of prolactin does not occur during the termination of breeding. The increase in prolactin occurs during the vernal increase in photoperiod (ie 14L) but does not have a clear link with gonadal growth.

      Comment 20 - Line 217-219: Based......respectively. Sounds like a big claim with less evidence.

      We have removed the sentence from the discussion.

      Comment 21 - Line 220-223: The .....Bird. The sentence is not clear about how this study would add to ecological studies. Need more clarity on the importance of such data.

      The sentence was removed from the text.

      Comment 22 - I think that it would be helpful to add a couple of caveats to provide more ecological context. First, the model is only based on males, and responses in females could be different.

      We agree with the Reviewer there are undoubtedly sex differences in timing seasonal biology. However, the photoperiodic response (growth and regression) is similar in both males and females. Sex differences exist in response to supplementary environmental cues (e.g., temperature). Males were used in these studies as the gonadal response to changes in photoperiod manipulations are much larger compared to ovarian changes in females. The focus on males allows for fewer animals to be used in the experiments and greater statistical power. To address the Reviewers concern, we have added a paragraph in the discussion that describes the similarity in photoperiodic responses in males and females, and the importance of supplementary cues for full reproductive development in female birds. We also provide a couple sentences in the methods that describe the justification for only males in the present study. See lines (Methods 352-355; Discussion 312-330; and 334-339).

      Comment 23 - Last, I wondered if it would be possible to add an ecological context for the frequent change in the photoperiod schedule and not take account of the endogenous annual response. Would the procedure simulate a similar kind of underlined molecular response for a bird under natural conditions responding to changing daylight cycles on an annual time frame?

      The discussion was considerably revised to address the ecological relevance of the study, and findings. We have added a sentence at the beginning of the discussion to highlight that the laboratory-based approach and photoperiodic manipulations reliable replicate previous findings using semi-natural conditions (Robinson and Follett, 1982) (See lines 248-250). We have already reduced the focus on the endogenous annual response.

      Reviewer #2:

      Comment 1 - The writing is very terse and could benefit from a more narrating style, which would make it a lot easier for the reader to get through some of the very data-heavy text. Breaking up the Results with subheadings would also be helpful.

      We appreciate the suggestion to add subheadings to the Results. We added 3 descriptive headings for each other studies conducted in the manuscript. We feel the added revision (e.g., ecological) has improved the narrative and made the manuscript accessible to the wider readership.

      Comment 2 - The transcriptome analyses could be developed a bit more. First, using the limma package would allow the authors to apply a more complete model to the DEG analyses, which would likely be superior to EdgeR. Second, the authors may want to consider WGCNA or a similar approach to discover gene co-expression modules, and then examine whether any of the resulting module eigengenes co-vary with any morphological or physiological measures and/or vary rhythmically.

      This is an excellent suggestion, and the new analyses was incorporated into the revised manuscript. Using the Langfelder and Horvath 2008 WCGNA package we conducted module-trait analyses to examine co-variation in our findings. These data are presented in Figure S# and lines 476-484. We agree that other DEG analyses would be useful; our main objectives was to use BioDare2.0 to identify rhythmic transcription in the seasonal transcriptomes. EdgR provides an excellent approach to identify transcripts and commonly used.

      Comment 3 - In the Data and code availability statement (lines 226ff) the authors state that "all raw data are available in Extended data Table 1." However, they should be submitted to the GEO database or a similar public repository along with all relevant metadata. Also, and maybe I overlooked this, I did not see anywhere that the "R code used in Study 1 is freely available" (I was not sure what "the methods reference list" was supposed to refer to). Instead of stating that "the full R code used is available upon request" I suggest making all scripts available via GitHub or Dataverse, along with all non-omics data. The advantage of the latter platform is that a citable DOI is assigned to each upload.

      The data are now available in the GEO database and can be accessed see GSE241775. We have added this information to the text. The R code is now provided as a Table S11 so that the reader can directly access the script.

      Comment 4 - Line 191: Delete the extra "that"

      We thank the Reviewer for identifying the oversight. We have revised the text accordingly.

      Comment 5 - Line 24f: What does "pseudo-randomly" mean? Maybe "haphazardly" would be more appropriate here?

      The term pseudo-randomly is used to describe the organized manner in which subjects are assigned to each treatment group. The aim is to ensure that a particular physiological variable, such as body mass, is evenly distributed across treatment groups. (Note although the term derived from the field of psychology). The aim is to reduce bias in the experiment due to an initial bias established when assigning treatment group. We are reluctant to replace pseudorandomly with haphazardly as the latter does not imply a logical organization. We have added text to help clarify the reason. The text now state: At the end of each photoperiodic treatment a subset of quail (n=12) body mass was used as a measure to pseudo randomly select birds for tissue collection and served to reduce the potential for unintentional bias.

      Comment 6 - Figure 1e,j: The text indicates that 398 and 130 genes were "rhythmically expressed" in the MBH and pituitary, respectively, but considerably fewer genes are shown in the heatmaps in Figure 1e,j. How were these genes selected, and what was the rationale for doing so? Also, some autumnal and vernal expression patterns show some strong similarities (e.g., 16a and 16v in the MBH), which could be discussed. Consider showing the two heatmaps with the columns also hierarchically clustered in a supplementary figure.

      We agree with the Reviewer that the full heatmap for the transcripts should be provided. The heat maps in Figure 1 are based on the transcripts with the most significant change; and were selected to provide a graphical representation that would be easily digested by the wide readership. We have created a new figure (ie. Fig. S1) that provides all the transcripts in heat maps for both the MBH and pituitary gland.

      Reviewer #3:

      Comment 1 I do not have too much to add to this section of my review. Broadly speaking, I would suggest that the authors address some of the concerns I highlight above, and integrate their thoughts into the paper more than they currently do. I think this is particularly important with respect to the limitations of many of the bioinformatic analyses.

      We thank the reviewer for their input and time assessing the manuscript. We have revised the manuscript in many sections incorporating the suggestions by Reviewer 3 above, and Reviewers 1 and 2.

      Comment 2 Some of the methods are also a little scant. For example, the qPCR analyses are not described in sufficient detail to replicate the study. What are the efficiencies? Were samples run in duplicate? What was the housekeeping control gene used? Was there only one, or were multiple housekeeping genes used?

      We apologise for the oversight, the absence of information was a mistake that missed our previous early revisions. The revised manuscript includes all the requested information. Line 333 states that all samples were run in duplicate. The efficiency for each transcript was within the MIQE guidelines (indicated on line 342) and were within the 0.7 to 1.0 range. Actin and glyceraldehyde 3-phosphate dehydrogenase were used as the reference transcripts. The most stable reference transcript was used to calculate fold change in target gene expression (lines 343-345).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, the authors report a link between brumation and tissue size in frogs, summarizing convincing evidence that extended brumation is associated with smaller brain size and increased investment in reproduction-related tissues. The research will be of broad interest to ecologists, evolutionary biologists, and those interested in global change biology. While the dataset involves significant field work and advanced statistical analyses, the manuscript would benefit from more explanation of the models, including why frogs are a good model in which to address these questions, and from general improvement in the structure and conciseness.

      We highly appreciate your positive assessment and that you considered our paper important and convincing.

      Reviewer #1 (Public Review):

      The authors have conducted lots of field work, lab work and statistical analysis to explore the effect of brumation on individual tissue investments, the evolutionary links between the relative costly tissue sizes, and the complex non-dependent processes of brain and reproductive evolution in anuran. The topic fits well within the scope of the journal and the manuscript is generally written well. The different parameters used in the present study will attract a board readership across ecology, zoology, evolution biology, and global change biology.

      Thank you for your positive and supporting feedback.

      Reviewer #2 (Public Review):

      The authors set out to show how hibernation is linked to brain size in frogs. If there were broader aims it is hard to decipher them. The authors present an extremely impressive dataset and a thorough set of cutting-edge analyses. However not all details are well explained. The main result about hibernation and brain size is fairly convincing, but it is hard to think of broader implications for this study. Overall, the manuscript is very confusing and hard to follow.

      Thank you for your compliments on our paper. As for your concerns, we have greatly revised our paper and, as we hope, improved its clarity. We have also added a few sentences to the conclusions to draw attention to potentially broader implications. Specifically, we describe how the focal traits of our study may all be affected by climate change. Differential constraints in necessary investments could be one of several reasons for the varying resilience to climate change between species in the same habitat.

      Reviewer #1 (Recommendations For The Authors):

      There are no issues on the availability of data and code.

      Thank you.

      Line 15: in the author contribution section, it seems that C.L.M. and J.P.Y are not in the author list.

      These two authors are not part of this study. This was a mistake.

      Line 24: I don't think it is vital or logical to address or compare too much on birds or mammals, which are not the focused taxa of the present study. Instead, it is better to clarify the reason why frogs and toads are ideal model taxon to this study.

      The reason for comparisons with birds and mammals was that all hypotheses related to the various trade-offs tested here had been developed in these taxa. One of the points of our paper was that these needed validation beyond the two taxa, in addition to being tested against one another (each prediction had been developed in a specific group and typically in isolation of all other hypotheses).

      Line 25-26: as the authors are shooting for eLife, as a general journal, it is not essential to provide the detailed methods in the abstract. But I think the authors need to strengthen the novelty of the work in the field here.

      The strength of our study was that all traits were measured directly in our species, including estimates of hibernation duration. Prior studies used various proxies, categorial classification or datasets assembled from multiple sources. To us, this seemed like a sufficiently important advance in the field to mention it, but considering the reviewer’s comment, we have now removed it.

      Line 28: "protracted brumation reduces brain size and instead promotes reproductive investments", as a correlative study, it is much more precise to change this sentence to a similar description as "protracted brumation is negatively correlated with brain size but is positively correlated with reproductive investments" here and related statements throughout the whole text.

      We agree that, strictly speaking, a path analysis can only point toward possible causality and not provide hard evidence as experimental manipulation might. The wording may have been a bit too strong here in our attempt to minimize wordiness and because all our analyses combined very strongly pointed in this direction. However, we have now changed this as suggested even though it now reads almost as if we had done no more than conducting a simple correlation. We have further paid attention to the wording of our interpretations throughout the paper.

      Line 32-33: it needs a bigger ending linking your main findings with the implication in understanding species response to the sustained environment change.

      We have reworded the ending of the abstract to: “Our results provide novel insights into resource allocation strategies and possible constraints in trait diversification, which may have important implications for the adaptability of species under sustained environmental change.”

      Line 63-68: this sentence is too long to understand and please simplify it.

      We have split the sentence into two sentences.

      Line 125-130: it is known that there are various frog reproductive modes (Crump et al. 2015) such as trade-offs between clutch size and egg size, different number of breeding during one year, etc. These different reproductive forms may also influence the brain size evolution with food availability and seasonal variations. Please clarify it.

      Yes, anurans do have varying reproductive modes, but to us, there is no a priori reason to assume that such variation would have a direct effect on brain evolution. Rather, in our opinion, different reproductive modes would have indirect effects by affecting the environment in which reproduction occurs. For example, larvae developing under different environmental conditions (substrate, larval density, egg provisioning etc.) might affect developmental trajectories that could influence how resources are available and allocated to different organs, including the brain. Alternatively, reproductive modes could influence the choice of environment for reproduction, thereby possibly affecting mating strategies and ultimately trait investments associated with these strategies. Given we were asked to shorten our paper, we believe that ‘environmental effects’ remains broad enough to encompass such variation, thereby not necessitating disentangling the different, and likely primarily indirect, ways that reproductive modes could be linked to brain evolution. However, if the reviewer would find it important to go into such detail in the paper, we will be happy to do so.

      Line 186-187: it is necessary to mention here that the authors also conducted sensitivity analyses to apply 2{degree sign}C or 4{degree sign}C below their experimentally derived as thresholds to test the robustness of the results to data uncertainty.

      We have added “(details on methodology and various sensitivity analyses for validation in Material and Methods)” to indicate the different types of sensitivity analyses, which included more than simply 2 or 4°C difference.

      Line 188: please change "In phylogenetic regressions" to "after controlling for phylogenetic autocorrelation/pseudo-replication" or similar sentence here.

      Our focus here was the phylogenetically informed GLS model rather than phylogenetic control itself. In the latter case, it would still not be clear what type of model was conducted with such phylogenetic control. To avoid any shorthand, we have reworded for more precision: “We employed phylogenetic generalized least-squares (PGLS) models, …”

      Line 177-287: please provide the exact variance explained by different predictor variables in brumation duration, individual tissue investments, and brain evolution. I also suggest that the authors need consider conducting multi-model inference-based model averaging analysis to test the relative importance of different variables. In addition, the present analyses did not include the interaction terms among variables, which may be more important than the effect of each individual factor.

      There may be some misunderstanding as these models represent separate analyses for each predictor as indicated by the associated λ values (never more than one value per model). We conducted separate models to determine which variables might even play a role in explaining variation in the corresponding response variables. Based on relevant predictors, we then conducted path analyses rather than general multi-predictor analyses. The relative effect sizes are represented by the correlation coefficients (r values) in the tables.

      Reviewer #2 (Recommendations For The Authors):

      Why exactly are the pairwise comparisons positively correlated (fig. S5) and then negatively correlated (fig. 3). What is actually driving this difference? For the phylogenetic path analyses 26 candidate models are chosen without explanation. What theory or hypotheses are these based on?

      We assume the reviewer is referring to the brain-body fat association. The two ‘pairwise’ analyses they mention were not the same. The correlation in Fig. S5 was a standard (albeit phylogenetically informed) partial correlation between the two focal tissues, controlling for SVL. By contrast, as described when introducing the analyses, negative associations were derived when additionally controlling for testes and hindlimb muscles, all of which deviated from isometry against body size. Here, the total mass of the four main tissues was divided by their proportional contribution to that mass in each species, then standardized for comparison across species. Since the total mass of these four tissues scaled directly with body size, larger-bodied species did not invest a proportion of their body to these tissues than smaller-bodied species, thus essentially rendering body size irrelevant for this analysis. However, the relative representation of the four traits changed between species such that more resources devoted to body fat was associated with a smaller brain, hence a negative relationship. Similarly, the multivariate analysis as well as the PCA also suggested similar trends when all four tissues were considered rather than purely pairwise comparisons.

      Regarding the second comment: We indeed used 28 pre-defined predictions for our larger path analysis.

      The authors haven't really provided much additional context either, and the discussion is almost entirely a rehash of the results section. I can't see the analysis code but this may be of use to people performing similar analyses.

      It is true that the traits and core message of the Discussion relate directly to our results, but we believe that our Discussion provides the essential biological context to our findings and to how they are connected. We tried not to go on tangents or too much speculation as the many results provided enough material to discuss, with several different ways that we expanded the prior state-of-the-art in the field. However, we have now expanded the concluding paragraph to place our findings in the context of climate change, given that this could affect anurans and the different traits examined in many ways that are directly related to the current study. Yet, we decided to keep this short because such extrapolation of our findings

      We indeed held off making the code available to the public in case dramatic changes to the paper were requested by the reviewers. However, it will be published.

      Additional recommendations from the Reviewing Editor:

      • One of the reviewers and I found the text a little difficult to follow. I suggest simplifying the paper by being more concise. For example, the introduction could be shortened into a 3-4 paragraphs of relevant text without overwhelming the reader. One of the reviewers wanted a better explanation of statistical models and I agree. The discussion could benefit from some structure - consider adding subheadings that would guide the reader as to the topic. Finally, the figures are difficult to see and should be made larger. For example, the graphs in Figure 1c could be on a panel below A and B so that readers can interpret the graph. In Figure 3 - the legend is far too small - please put above or below the graphs. In summary - I hope you consider a major re-write that would strengthen the accessibility of your paper to a broad audience.

      We have substantially shortened the paper despite adding further details on models and a broader context to the Discussion. We also condensed the Introduction to about two thirds of the original word count. However, we did not think that shortening it even further or splitting it into 3-4 paragraphs would improve readability. We still considered it important to introduce with sufficient context all major hypotheses that were tested against one another, provide at least some information on what was or was not known about the evolution of the focal traits and their links to one another or the environmental variables. We also found it important to touch on the differences between our study organisms and those typically studied in the context of hibernation or brain evolution, as this could affect the predictions. Given the number of hypotheses and traits, cutting the number of paragraphs would have meant merging some of them into very long ones, which we did not consider helpful.

      We further added short subheadings to the Discussion and adjusted the figures as requested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We very much appreciate the constructive comments provided by the reviewers. We have incorporated many of their suggestions and believe the manuscript is much improved.

      In brief, we updated the text as suggested and have included three additional panels in supplementary fig. S2E-G. This additional data provides further support that the ectopically persisting neuroblasts are actively dividing and that cell cycle defects alone do not account for temporal patterning phenotypes.

      Reviewer #1 (Public Review):

      In this manuscript, the authors are building on their previous work showing Delta-Notch regulates the entrance and exit from embryo-larval quiescence of neural stem cells of the central brain (called CB neuroblasts (NB) (PMID: 35112131)). Here they show that continuous depletion of Notch in NBs from early embryogenesis leads to cycling NBs in the adult. This - cycling NBs in the adult - is not seen in controls. The assumption here is that these Notch-RNAi NBs in adults are those that did not undergo terminal differentiation in pupal development. The authors show that Notch is activated by its ligand Delta which is expressed on the GMC daughter cell and on cortex glia. They determine that the temporal requirement for Notch activity is 0-72 hours after larval hatching (ALH) (i.e., 1st instar through mid-3rd instar at 25C). In NBs/GMCs depleted for Notch, early temporal markers were still expressed at time points when they should be off and late markers were delayed in expression. These effects were observed in ~20-40% of NBs (Figures 5 and 6). Through mining existing data sets, they found that the early temporal factor Imp - an RNA binding protein - can bind Delta mRNA. They state that Delta transcripts decrease over time (without any reference to a Figure or to published work), leading to the hypothesis that Delta mRNA is repressed by the late temporal factors. Over-expressing late factors Syp or E93 earlier in development leads to downregulation of a Delta::GFP protein trap. These results lead to a model in which Notch regulates expression of early temporal factors and early temporal factors regulate Notch activity through translation of Delta mRNA.

      There are several strengths of this study. The authors report rigorous measurements and statistical analyses throughout the study. Their conclusions are appropriate for the results. Data mining revealed an important mechanism - that Imp binds Delta mRNA - supporting the model that early temporal factors promote Delta expression, which in turn promotes Notch signaling.

      There are also several weaknesses:

      1) The activation of Notch in NBs by Delta in GMCs was already shown by this group in their Dev 2022 paper, reducing some of the impact of this study.

      In our previous work, we reported that Delta-expressing GMCs transactivate Notch in neuroblasts during the embryonic to larval transition. In the current manuscript, we show that Delta is expressed in GMCs and cortex glia and both sources transactivate Notch in neuroblasts during later developmental stages. This is in agreement with work published by others and while not novel per se, is a necessary first step for understanding which neighboring cell types control Notch pathway activity. During the embryonic to larval transition, glia do not contribute likely because they have not yet grown to ensheath CB NBs and their recently born progeny.

      2) The authors do not explain their current results in context of their prior paper (2022 Dev) until the Discussion, but this would be useful to read in the Introduction. Similarly, it would be good to mention that in the 2022 paper, they find a significant number of wor>Notch RNAi NBs at 2 AHL that are cycling. Are the adult Notch RNAi in this study descended from those NBs at 2 hours ALH in the 2022 study? In other words, how does the early requirement for Notch between 0-72 hours ALH reported in the current study relate to the Notch-depleted NBs identified in the 2022 paper?

      We have now included the following text in the intro: “We recently reported that Notch signaling regulates CB NB quiescence during the embryonic to larval transition (Sood et al., 2022). When Notch is knocked down, some CB NBs continue dividing during this transition. We also reported that Notch activity becomes attenuated in quiescent CB NBs because CB NBs are no longer dividing and producing Delta-expressing GMC daughters for Notch pathway transactivation. Moreover, low Notch is necessary for CB NBs to reactivate from quiescence in response to dietary nutrients (Sood et al., 2022).

      Here we report that Notch signaling also regulates neurogenesis termination during pupal stages. When Notch is knocked down, CB NBs maintain early temporal factor expression longer resulting in a delay of late temporal factor expression with prolonged neurogenesis into late pupal stages and early adulthood. This defect in temporal patterning (switching from early to late) occurs after CB NB exit from quiescence suggesting that Notch is required at multiple times throughout development in controlling CB NB proliferation decisions.”

      We do not know whether the neuroblasts that fail to enter quiescence are the same that fail to terminate divisions during pupal stages, however there are many more that fail to terminate divisions during pupal stages.

      3) Most of the experiments rely upon continuous depletion of Notch from embryonic stage 8 until adulthood using the wor-GAL4 driver. There is no lineage tracing of this driver and there is no citation about the published expression pattern of this driver. The inclusion of these details is important for a broad audience journal.

      The reference for the driver is included in supplementary data, under the heading “Experimental model:Drosophila melanogaster”. This GAL4 driver is widely used and one of the most accepted in the field.

      4) Most of the experiments utilize a single RNAi transgene for Notch, Delta, Imp, Syp, E93. There are no experiments demonstrating the efficacy of the RNAi lines and no references to prior use and/or efficacy of these lines.

      All RNAi lines used in these studies have been published previously, by our group as well as others and sources for the lines are listed in supplementary data, under the heading “Experimental model:Drosophila melanogaster”. Efficiency of these lines have been verified using antibody labeling (data not shown) and by assaying activity of Notch activity reporters (shown in Fig. 2).

      An appraisal: The authors use temperature shifts with Gal80TS to show that Notch is required between 0-72 hours ALH. They show with the use of known markers of the temporal factors and Delta protein trap, that Imp promotes Delta protein expression and the later temporal factors reduce Delta, although the molecular mechanisms are not clearly delineated. Overall, these data support their model that the reduction of Delta expression during larval development leads to a loss of Notch activity.

      As noted in the Discussion, this study raises many questions about what Notch does in larval CB NBs. For example, does it inhibit Castor or Imp? Is Notch required in certain neural lineages and not others. These studies will be of interest in the community of developmental neurobiologists.

      Reviewer #2 (Public Review):

      Embryonic stem cells extensively proliferate to generate the necessary number of cells that are required for organogenesis, and their proliferation must be timely terminated to allow for proper patterning. Thus, timely termination of stem cell proliferation is critical for proper development. Numerous studies have suggested that cell-extrinsic changes in the surrounding niche environment drive the termination of stem cell proliferation. By contrast, cell-intrinsic mechanisms that terminate stem cell proliferation remain poorly understood. Fruit fly larval brain neuroblasts provide an excellent model for mechanistic investigation of intrinsic control of stem cell proliferation due to the wealth of information on molecular marks, gene functions and lineage hierarchy. Sood et al. conducted a genetic screen to identify genes that are required for the termination of neuroblast proliferation in metamorphosis and found that Notch and its ligand Delta contribute to their exit from cell cycle. They showed that knocking down Notch or delta function in larval neuroblasts allows them to persist into adulthood and remain proliferative when no neuroblasts can be detected in wild-type adult brains. By carrying out a well-designed temperature-shift experiment, the authors showed that Notch is required early during larval development to promote timely exit from cell cycle in metamorphosis. The authors went on to show that attenuating Notch signaling prolongs the expression of temporal identity genes castor and seven-up perturbing the switch from Imp to Syp/E93. Finally, they showed that knocking down Imp function or overexpressing E93 can restore the elimination of neuroblasts in Notch/delta mutant brains.

      Overall, the experiments are well conceived and executed, and the data are clear. However, the data reported in this study represent incremental progress in improving our mechanistic understanding of the termination of neuroblast proliferation.

      We respectfully disagree with this statement. Because Notch signaling is implicated in neurogenesis termination and Notch activity is regulated by GMCs and glia, it strongly suggests that NB proliferation and timing cues are controlled in a non-autonomous manner through direct interactions with NBs and their neighbors. This is in contrast to temporal patterning during embryogenesis which is largely believed to be controlled NB-autonomously. In addition, to our knowledge, no one has yet reported that CB NBs fail to terminate cell divisions on time when Notch activity is reduced during normal development. In fact, reported NB phenotypes associated with Notch loss of function have been surprisingly subtle until now.

      Some of the data seem to represent more careful analyses of previously published observations described in the Zacharioudaki et al., Development 2016 paper while others seem to contradict to the results in this study.

      The Zacharioudaki et al., Development 2016 paper is terrific. One key difference between our work and theirs, is that we look at Notch pathway knockdown and loss of function phenotypes, whereas in the Zacharioudaki 2016 paper, the authors report phenotypes associated with Notch constitutive activation. It has been known for some time that constitutively active Notch leads to tumorigenic phenotypes particularly in type II lineages. Zacharioudaki and colleagues further determined that some of the classically known temporal transcription factors were ectopically expressed in these stem cell tumors.Here we show that under normal developmental conditions, Notch pathway activity controls CB NB temporal patterning.

      Gaultier et al., Sci. Adv. 2022 suggested that Grainyhead is required for the termination of neuroblast proliferation in a neuroblast tumor model, and grainyhead is a direct target of Notch signaling. Thus, Grainyhead should be a key downstream effector of Notch signaling in terminating castor and seven-up expression. Identical to Notch signaling, Grainyhead is also expressed through larval development. Grainyhead can function as a classical transcription factor as well as a pioneer factor raising the possibility that temporal regulation of neurogenic enhancer accessibility might be at play in allowing Notch signaling in early larval development to set up termination of castor and seven-up expression in metamorphosis. Diving deeper into how dynamic changes in chromatin in neurogenic enhancers affect the termination of neuroblast proliferation will significantly improve our understanding of termination of stem cell proliferation in diverse developing tissue.

      Reviewer #3 (Public Review):

      In this study, the authors investigate the effects of Notch pathway inactivation on the termination of Drosophila neuroblasts at the end of development. They find that termination is delayed, while temporal patterning progression is slowed down. Forcing temporal patterning progression in a Notch pathway mutant restores the correct timing of neuroblast elimination. Finally, they show that Imp, an early temporal patterning factor promotes Delta expression in neuroblast lineages. This indicates that feedback loops between temporal patterning and lineage-intrinsic Notch activity fine tunes timing of early to late temporal transitions and is important to schedule NB termination at the end of development.

      The study adds another layer of regulation that finetunes temporal progression in Drosophila neural stem cells. This mechanism appears to be mainly lineage intrinsic - Delta being expressed from NBs and their progeny, but also partly niche-mediated - Delta being also expressed in glia but with a minor influence. Together with a recent study (PMID: 36040415), this work suggests that Notch signaling is a key player in promoting temporal progression in various temporal patterning system. As such it is of broad interest for the neuro-developmental community.

      Strengths

      The data are based on genetic experiments which are clearly described and mostly convincing. The study is interesting, adding another layer of regulation that finetunes temporal progression in Drosophila neural stem cells. This mechanism appears to be mainly lineage intrinsic - Delta being expressed from NBs and their progeny, but also partly niche-mediated - Delta being also expressed in glia but with a minor influence. A similar mechanism has been recently described, although in a different temporal patterning system (medulla neuroblasts of the optic lobe - PMID: 36040415). It is overall of broad interest for the neuro-developmental community.

      Weaknesses

      The mechanisms by which Notch signaling regulates temporal patterning progression are not investigated in details. For example, it is not clear whether Notch signaling directly regulates temporal patterning genes, or whether the phenotypes observed are indirect (for example through the regulation of the cell-cycle speed). The authors could have investigated whether temporal patterning genes are directly regulated by the Notch pathway via ChIP-seq of Su(H) or the identification of potential binding sites for Su(H) in enhancers.

      This is already known for svp and cas and we have now included this information in the discussion.Thank you.

      “Whether Notch pathway activity curtails both Cas and Svp or just Cas remains an open question, however it has been reported that both cas and svp are associated with at least one enhancer that is responsive to Notch activity (Zacharioudaki et al., 2016).”

      A similar approach has been recently undertaken by the lab of Dr Xin Li, to show that Notch signaling regulates sequential expression of temporal patterning factors in optic lobes neuroblasts (PMID: 36040415), which exhibit a different temporal patterning system than central brain neuroblasts in the present study. As such, the mechanistic insights of the study are limited.

      Reviewer #1 (Recommendations For The Authors):

      1) There are missing controls

      a) Fig. 1F and Fig. 6A - The authors should generate and show images of control clones (FRT19A) stained with the same markers as Notch clones.

      Fig. 1F is at 48 hours APF. In control clones, there are no Dpn positive cells present, as stated in the text and therefore no confocal images are shown. Same for Fig. 6A, there are no Dpn positive cells in control clones in the brain at this time, therefore nothing to double label.

      2) This result is incorrectly described in the Results

      a) P. 5 "Ectopically persisting N RNAi CB NBs expressed the NB transcription factor Deadpan (Dpn), the S-phase indicator pcnaGFP, and were small on average, similar in size to control CB NBs at earlier pupal stages (Fig. 1B,C,E)." The Notch RNAi NBs were larger (not smaller) than controls in Fig. 1E at 30, 48, 72 h APF and in adults.

      Thank you for this comment. We have changed the language in the main text as follows:

      “Ectopically persisting N RNAi CB NBs (CB NBs at 48 hours APF and beyond) expressed the NB transcription factor Deadpan (Dpn), the S-phase indicator pcnaGFP, and were small on average compared to control CB NBs during earlier developmental stages (L3 control, average diameter 10-15μms) (Fig. 1B,C,E). However, at 30 hours APF when control CB NBs are still present, N RNAi CB NBs were larger on average (Fig. 1B,C,E).”

      3) This sentence needs clarification/editing

      a) P. 4: " Independent of neurogenesis timing and the mechanism by which CB NB stop divisions, temporal patterning plays a key role". A key role in what?

      Thank you again. We have changed the text to the following:

      “Independent of neurogenesis timing and the mechanism by which CB NB stop divisions, temporal patterning plays a key role in controlling numbers and types of neurons made within each of the NB lineages (Maurange et al., 2008; Tsuji et al., 2008; Bahrampour et al., 2017; Yang et al., 2017; Pahl et al., 2019).”

      4) Some sentences need references or data to support them.

      a) P. 9 Please provide a reference to support the statement that Delta is a known Notch target

      We have included a reference.

      b) P. 9 - please provide a reference or data to support the statement that Delta transcripts decrease over time in larval CB NBs.

      This result is shown in Fig. 7B.

      5) Fig. 7A - it is difficult to appreciate the purple highlighting.

      We have changed the colors as suggested.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 4C, why does late knockdown of delta lead to ectopic persistence of NBs but late knockdown of Notch has no effect?

      This could be due to many things including differences in efficiency of UAS-RNAi lines. The point is that Delta/Notch is required early, but not late. Although some DeltaRNAi CB NBs are still present, the number compared to 48 hours APF is greatly reduced.

      2) It is surprising that Delta expression in NBs/GMCs appears to play a more important role in activating Notch signaling in neuroblasts than Delta expression in cortex glia. Please explain how Delta can cell autonomously activate Notch signaling.

      We are not proposing that Delta activates Notch cell autonomously, but are proposing that Delta in GMCs transactivates Notch in NBs. After NBs divide Delta is partitioned to GMCs. Quiescent NBs have low to no Notch pathway activity, likely because they are not producing Delta expressing GMC daughters (Sood, 2022).

      Please also reconcile the difference in gene expression induced by delta[RNAi] in this study and the delta-mutant allele used in the Zacharioudaki et al study.

      We are unsure what the reviewer is asking here and therefore can not reconcile any differences in gene expression between the dlRNAi line and the mutant allele. What gene expression needs to be reconciled? Zacharioudaki is listed as first author on four manuscripts. Which paper is being referred to?

      3) In Fig. 2J-L, why does knocking down delta in glia lead to loss of Scrib expression in neuroblasts and their surrounding progeny?

      We are not sure if it does or not. We only use Scrib as a membrane marker to identify and locate cells and neuropil regions of interest.

      4) The phrase "Notch is active early" is misleading when multiple labs have shown that Notch signaling is active in neuroblasts throughout larval development.

      Good point! We have rewritten the statement: “Somewhat paradoxically, we find that early Notch activity is required to terminate CB NB divisions late.”

      5) Neuroblasts that persist into adulthood are "smaller and Dpn-positive/PCNA-GFP-positive". Are they really neuroblasts? Can the authors verify the identity of these "persistent neuroblasts" with other molecular markers as well as functional assessment by inducing lineage clones?

      We have no doubt that these cells are NBs. Because we examine brains over time, these cells can be tracked using the markers, Scrib, Dpn, and pcna. These cells also undergo asymmetric cell division (Refer to Fig. S2F) and express other markers characteristic of CB NBs (mir and insc-not shown). We have made clones and see the same phenotype (ectopic persistence) in both MARCM clones and in “flip-out” clones.

      Reviewer #3 (Recommendations For The Authors):

      I have a few issues that need to be addressed to reinforce some of the conclusions:

      1) It is unclear whether NBs that persist in late pupal or adult stages have just failed to differentiate or whether they continue to divide, leading to supernumerary progeny (as shown for NBs that are stalled in temporal patterning like in svp mutant NBs (Maurange et al. 2008)). EdU or PH3 staining could be done in adults to clarify this point.

      In this manuscript, we make use of pcna:GFP, a reporter for E2F activity as an indicator of cell proliferation. We certainly observe Dpn positive cells that only weakly express the reporter, suggesting that these cells are not actively dividing or dividing at a reduced rate. However, by far most of the ectopically persisting CB NBs strongly express the reporter and generate pcnaGFP expressing progeny, indicating that these cells are dividing. We have also stained tissues with PH3 and have included an image of a telophase dlRNAi expressing CB NB at 48 hours APF (Fig. S2F).

      2) It is unclear whether Notch signaling directly or indirectly regulates temporal transitions. One possibility is that knockdown of Notch signaling decreases cell-cycle speed leading to delayed temporal transitions. The authors should test whether Notch KD affects cell cycle speed using EdU incorporation or PH3 staining. This could be done best using Notch mutant MARCM clones as wt NBs can be used as controls.

      We have quantified the number of PH3 positive CB NBs during wandering L3 stages in control and dlRNAi animals. We find that dlRNAi CB NBs are indeed proliferating at reduced rates compared to controls. To test whether reduced cell cycle times are causative for termination delay, we expressed a constitutively active form of PI3-kinase in dlRNAi animals to drive cell growth and proliferation. We found that CB NBs still ectopically persist (Fig. S2E-G).

      We have included the following in the text:

      “Defects in timing of temporal transitions could be due to defects in cell cycle progression, although embryonic NBs still transition independent of cell division (Grosskortenhaus et al., 2005). We used PH3 to assay CB NB mitotic activity. In Delta knock down animals, the percentage of PH3 positive CB NBs was reduced compared to control (Fig. S2E). At 48 h APF however, Delta knock down CB NBs were still dividing based on PH3 expression (Fig. S2F). To determine whether CB NBs ectopically persist due to defects in cell cycle rate, we co-expressed dp110 to constitutively activate PI3-kinase in Delta knock down animals. A significant number of pcnaGFP expressing, Dpn positive CB NBs were still observed, suggesting that defects in cell cycle timing and growth rates alone cannot account for ectopic persistence of CB NBs into later developmental stages and adulthood (Fig. S2G).”

      3) Cas is expressed in NBs either during quiescence and shortly after quiescence. It is possible that the maintenance of Cas in Figure 5D, E is due to NBs that have not re-entered the cell-cycle or have exited quiescence with a strong delay.

      Knockdown of Notch pathway has no effect on CB NB reactivation from developmental quiescence. In fact, low levels of Notch are required for CB NBs to reactivate in response to dietary nutrients (Sood, 2022).

      Indeed, the authors have previously shown that Notch signaling is important for NB cell cycle reentry during early larval stages (PMID: 35112131). Are Cas and Svp also maintained in late larval N-/MARCM clones (MARCM clonew are made after quiescence exit)?

      We have not assayed Cas or Svp expression past 48 hours ALH.

      4) The authors have revisited some previously published RNA-seq data showing that Delta is temporally regulated in NB lineages. This is not clearly shown by the authors that the same is true at the protein level.

      Moreover, they find that mis-expression of late temporal factors or Imp knockdown in early larval brains appear to decrease Delta expression. Such semi-quantitative analysis of gene expression by immunostainings in different conditions can be a bit complicated and not very convincing because variations on intensity levels can be due to slight variations in antibody concentration, or different parameters of image acquisition.

      We totally agree, but in this case the difference compared to controls was so readily apparent, that we felt it was not necessary to carry out experiments in clones. All images were acquired with the same confocal settings, experiments were repeated, and we consistently observed the same results. The data shown in Fig. 7D-G is representative.

      I suggest that the authors use clonal analysis rather than pan-neuroblast manipulation in order to have internal controls. For example, blocking temporal progression in Syp-RNAi clones (MARCM or Flp-out) and/or svp MARCM clones should lead to maintenance of Imp expression in late larval clones and maintenance of high levels of Delta, which would be easily assessed compared to surrounding NBs.

      Minor points:

      Fig 5: the sequential expression of Cas and Svp expression in larval NBs was first described by Maurange et al. 2008. Please cite appropriately.

      We have now added the requested citation to the following:

      “Over time, the percentage of Cas expressing CB NBs declined, while Svp expressing CB NBs modestly increased (Fig. 5B). Less than 1% of CB NBs co-expressed Cas and Svp at any stage and expression of both factors was absent by 48 hours ALH (Fig. 5B,C). This is consistent with work published previously (Isshiki et al., 2001; Tsuji et al., 2008; Chai et al., 2013; Maurange et al., 2008; Ren et al., 2017; Syed et al., 2017).”

      Fig 6A: Please indicate which immunostainings are shown in the overlay panels.

      Good catch! We have modified the figure.

      P9: "Delta co-immunoprecipitated with Imp.": Add "Delta mRNA co-immunoprecipitated with Imp in RIP-seq experiments" Otherwise, it suggests that you are talking about the protein.

      Done

      The scheme in Figure 7H is rather complicated to understand. In my opinion, it does not clearly convey the idea that Notch signaling favors the Imp-to-Syp transition.

      We have made a new model figure.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Several concerns are raised from the current study.

      1) Previous studies showed that iTregs generated in vitro from culturing naïve T cells with TGF-b are intrinsically unstable and prone to losing Foxp3 expression due to lack of DNA demethylation in the enhancer region of the Foxp3 locus (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). It is known that removing TGF-b from the culture media leads to rapid loss of Foxp3 expression. In the current study, TGF-b was not added to the media during iTreg restimulation, therefore, the primary cause for iTreg instability should be the lack of the positive signal provided by TGF-b. NFAT signal is secondary at best in this culturing condition.

      In restimulation, void of TGFb is necessary to cause iTreg instability. Otherwise, the setup is similar to the iTreg-inducing environment (Author response image 1). On the other hand, the ultimate goal of this study is to provide a scenario that bears some resemblance of clinical treatment, where TGFb may not be available. The reviewer is correct in stating that TGFb is essential for iTreg stability, we are studying the role played by NFAT in iTreg instability in vitro, and possibly in potential clinical use of iTreg .

      Author response image 1.

      Restimulation with TGFb will persist iTreg inducing environment, resulting in less pronounced instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, and then rested or restimulated in the presence of TGF-β for 2 d. Percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3 after 2 d.

      2) It is not clear whether the NFAT pathway is unique in accelerating the loss of Foxp3 expression upon iTreg restimulation. It is also possible that enhancing T cell activation in general could promote iTreg instability. The authors could explore blocking T cell activation by inhibiting other critical pathways, such as NF-kb and c-Jun/c-Fos, to see if a similar effect could be achieved compared to CsA treatment.

      We thank the reviewer for this suggestion. We performed this experiment according to see extent of the role that NFAT plays, or whether other major pathways are involved. As Author response image 2 shows, solely inhibiting NFAT effectively rescued the instability of iTreg. The inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), or a c-Jun/c-Fos complex (T5224) had no discernable effect, or in one case, possibly further reduction in stability. These results may indicate that NFAT plays a crucial and special role in TCR activation, which leads to iTreg instability. Other pathways, as far as how this experiment is designed, do not appear to be significantly involved.

      Author response image 2.

      Comparing effects of NFAT, NF-kB and c-Jun/c-Fos inhibitors on iTreg instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of listed inhibitors. Percentages of Foxp3+ cells were analyzed by intracellular staining after 2d restimulation.

      3) The authors linked chromatin accessibility and increased expression of T helper cell genes to the loss of Foxp3 expression and iTreg instability. However, it is not clear how the former can lead to the latter. It is also not clear whether NFAT binds directly to the Foxp3 locus in the restimulated iTregs and inhibits Foxp3 expression.

      T helper gene activation is likely to cause instability in iTregs by secreting more inflammatory cytokines, as shown in Figure Q9, for example, IL-21 secretion. Further investigation is needed to understand how these genes contribute to Foxp3 gene instability exactly. With our limited insight, there may be two possibilities. 1. IL-21 directly affects Foxp3 through its impact on certain inflammation-related transcription factors (TFs). 2. There could be an indirect relationship where NFAT has a greater tendency to bind to those inflammatory TFs when iTreg instability appears, promoting the upregulation of these Th genes like in activated T cells, while being less likely to bind to SMAD and Foxp3, representing a competitive behavior. We at the moment cannot comprehend the intricacies that lead to the differential effects on T helper genes and Treg related genes.

      With that said, we have previously attempted to explore the direct effect of NFAT on Foxp3 gene locus. Foxp3 transcription in iTregs primarily relies on histone modifications such as H3K4me3 (Tone et al., 2008; Lu et al., 2011) rather than DNA demethylation (Ohkura et al., 2012; Hilbrands et al., 2016). Previous studies have reported that NFAT and SMAD3 can together promote the histone acetylation of Foxp3 genes (Tone et al., 2008). In our previous set of experiments, we simultaneously obtained information of NFAT binding sites and H3K4me3. In Foxp3 locus, we observed a decreasing trend in NFAT binding to the CNS3 region of Foxp3 in restimulated iTregs compared to resting iTregs (Author response image 3). Additionally, the H3K4me3 modification in the CNS3 region of Foxp3 decreased upon iTreg restimulation, but inhibiting NFAT nuclear translocation with CsA could maintain this modification at its original level (Author response image 3).

      Author response image 3.

      The NFAT binding and histone modification on Foxp3 gene locus. Genome track visualization of NFAT binding profiles and H3K4me3 profiles in Foxp3 CNS3 locus in two batches of dataset.

      Based on these preliminary explorations, it is concluded that NFAT can directly bind to the Foxp3 locus, and it appears that NFAT decreases upon restimulation, resulting in a decrease in H3K4me3, ultimately leading to the close association of NFAT and Foxp3 instability. However, due to limited sample replicates, these data need to be verified for more solid conclusions. We speculate that during the induction of iTregs, NFAT may recruit histone-modifying enzymes to open the Foxp3 CNS3 region, and this effect is synergistic with SMAD. When instability occurs upon restimulation, NFAT binding to Foxp3 weakens due to the absence of SMAD's assistance, subsequently reducing the recruitment of histone modifications enzyme and ultimately inhibiting Foxp3 transcription.

      Reviewer #2 (Public Review):

      (1) Some concerns about data processing and statistic analysis.

      The authors did not provide sufficient information on statistical data analysis; e.g. lack of detailed descriptions about

      -the precise numbers of technical/biological replicates of each experiment

      -the method of how the authors analyze data of multiple comparisons... Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      These inappropriate data handlings are ruining the evidence level of the precious findings.

      We thank the reviewer for pointing out this important aspect. In the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were used.

      (2) Untransparent data production; e.g. the method of Motif enrichment analysis was not provided. Thus, we should wait for the author's correction to fully evaluate the significance and reliability of the present study.

      Per this reviewer’s request, we have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015).

      The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters.

      The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      (3) Lack of evidence in human cells. I wonder whether human PBMC-derived iTreg cells are similarly regulated.

      This is a rather complicated issue, human T cells express FoxP3 upon TCR stimulation (PNAS, 103(17): 6659–6664), whose function is likely to protect T cells from activation induced cell death, and does not offer Treg like properties. In contrast in mice, FoxP3 can be used as an indicator of Treg. Currently, this is not a definitive marker for Treg in human, our FoxP3 based readouts do not apply. Nevertheless, we have now investigated whether inhibiting calcium signaling or NFAT could enhance the stability of human iTreg. As shown in Author response image 4, we found that the proportion of Foxp3-expressing cells did not show significant changes across the different conditions, while the MFI analysis revealed that CsA-treated iTreg exhibited higher Foxp3 expression levels compared to both restimulated iTreg and rest iTreg. However, CM4620 had no significant effect on Foxp3 stability, consistent with the observation of its limited efficacy in suppressing human iTreg long term activation. In summary, our results suggest that inhibiting NFAT signaling through CsA treatment can help maintain higher levels of Foxp3 expression in human iTreg.

      Author response image 4.

      Effect of inhibiting NFAT and calcium on human iTreg stability. Human naïve CD4 cells from PBMC were subjected to a two-week induction process to generate human iTreg. Subsequently, human iTreg were restimulated for 2 days with dynabeads followed by 2 days of rest in the prescence of CsA and CM-4620. Four days later, percentages of Foxp3+ cells and Foxp3 mean fluorescence intensity (MFI) were analyzed by intracellular staining.

      (4) NFAT regulation did not explain all of the differences between iTregs and nTregs, as the authors mentioned as a limitation. Also, it is still an open question whether NFAT can directly modulate the chromatin configuration on the effector-type gene loci, or whether NFAT exploits pre-existing open chromatin due to the incomplete conversion of Treg-type chromatin landscape in iTreg cells. The authors did not fully demonstrate that the distinct pattern of chromatin regional accessibility found in iTreg cells is the direct cause of an effector-type gene expression.

      To our surprise, the inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), and the c-Jun/c-Fos complex (T5224) resulted in minimal alterations, as shown in Fig Q1. This seems to argue that NFAT may play a more special role in events leading iTreg instability.

      We hypothesize that NFAT takes advantage of pre-existing open chromatin state due to the incomplete conversion of chromatin landscape in iTreg cells. Because iTreg cells, after induction, already exhibit inherent chromatin instability, with highly-open inflammatory genes. Furthermore, when iTreg cells were restimulated, the subsequent change in chromatin accessibility was relatively limited and not rescued by NFAT inhibitor treatment (Author response image 5). Therefore, in the case of iTreg cells, we propose that NFAT exploits the easy access of those inflammatory genes, leading to rapid destabilization of iTreg cells in the short term.

      In contrast, tTreg cells possess a relatively stable chromatin structure in the beginning, it would be interesting to investigate whether NFAT or calcium signaling could disrupt chromatin accessibility during the activation or expansion of tTreg cells. It is possible that NFAT might cause the loss of the originally established demethylation map and open up inflammatory loci, thereby inducing a shift in gene transcriptional profiles, equally leading to instability.

      Author response image 5.

      Chromatin accessibility of Rest, Retimulated, CsA/ORAIinh treated restimulated iTreg. PCA visualization of chromatin accessibility profiles of different cell types. Color indicates cell type.

      To establish a direct relationship between gene locus accessibility and its overexpression, a controlled experimental approach can be employed. One such method involves precise manipulation of the accessibility of a specific genomic locus using CRISPR-mediated epigenetic modifications at targeted loci. Subsequently, the impact of this manipulation on the expression level of the target gene can be precisely examined. By conducting these experiments, it will be possible to determine whether the augmented gene accessibility directly causes the observed gene overexpression.

      Reviewer #1 (Recommendations For The Authors):

      1) It might be helpful to add TGF-b to the iTreg restimulation culture to remove the influence of the lack of TGF-b from the equation, and measure the influence of SOCE/NFAT on iTreg instability.

      Please refer to Author response image 1.

      2) Alternatively, authors can also culture iTreg cells with TGF-b for 2 weeks when they undergo epigenetic changes and become more stabilized (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). At this point, the stabilized iTregs can be used to measure the influence of SOCE/NFAT on iTreg instability.

      In the study conducted by Polansky, it was observed in Figure 1 that prolonged exposure to TGF-β fails to induce stable Foxp3 expression and demethylation of the Treg-specific demethylated region (TSDR). Based on this finding, we could consider exploring alternative approaches to obtain a more stabilized iTreg population. One such approach could be isolating Foxp3+helios-Nrp1- iTreg cells directly from the peripheral in vivo, which are also known as pTregs. Generally, pTreg cells generated in vivo tend to be more stable compared to iTreg cells induced in vitro, and they already exhibit partial demethylation of the Treg signature, as shown in Fig 6C (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). Investigating the role of NFAT and calcium signaling in pTreg cells would provide further insights into the additional roles of NFAT in Treg phenotypical transitions, particularly its role in chromatin accessibility.

      3) In Figure 3, NFAT binding to the inflammatory genes in iTreg cells was even stronger than in activated T conventional cells. This is possibly due to Tconv cells being stimulated only once while iTregs were restimulated. A fair comparison should be conducted with restimulated activated conventional T cells.

      Figure 3 demonstrates the accessibility of inflammatory gene loci, rather than NFAT binding. Comparing restimulated Tconvs with restimulated iTreg cells is indeed a valuable suggestion, as their activation state and polarization in iTreg directions could lead to distinct chromatin accessibility. Although one is activated long term regularly and the other is activated long term under iTreg polarization, it is highly likely that the chromatin state of both activated Tconvs and iTreg cells is highly open, especially in terms of the accessibility of inflammatory genes. This may provide us with a new perspective to understand iTreg cells, but will unlikely affect our central conclusion.

      4) In the in vivo experiment in Figure 6, a control condition without OVA immunization should be included as a baseline.

      We have performed this experiment in the absence of OVA, as depicted in Author response image 6. In the absence of OVA immunization, both WT-ORAI and DN-ORAI iTreg exhibited substantial stability, although DN-ORAI demonstrated a slightly less stable trend. Upon activation with 40ug and 100ug of OVA, DN-ORAI iTreg demonstrated enhanced stability than WT-ORAI iTreg, maintaining a higher proportion of Foxp3 expression.

      Author response image 6.

      Stability of DN-ORAI iTreg in vivo with or without OVA immunization. WT-ORAI/DN-ORAI-GFP+-transfected CD45.2+ Foxp3-RFP+ OT-II iTregs were transferred i.v. into CD45.1 mice. Recipients were left or immunized with OVA323-339 in Alum adjuvant. On day 5, mLN were harvested and analyzed for Foxp3 expression by intracellular staining.

      Reviewer #2 (Recommendations For The Authors):

      Major

      Some concerns about the data processing and statistic analysis, as mentioned in the public review. In the figure legend, what does it mean e.g. n=3, N=3? Technical triplicate experiments? Three mice? Independently-performed three experiments? The authors should define it at least in the "Statistical analysis" in the method section otherwise the readers cannot determine the reason why they mainly use SEM for the data description.

      Moreover, in some cases, the number of experiments was not sure; e.g., Fig.1B, Fig. 5.

      How did the authors analyze data including multiple comparisons? Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      We thank the reviewer for pointing out this omission. Now, in the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. For Fig. 1B, N=2, and for Fig 5, we have acquired NFAT Cut&Tag data for 2 times, N=2. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were involved apart from the Student's t-test.

      In Figure 1A, the difference in suppressiveness seemed subtle. Data collection of multiple doses of Tconv:Treg ratio will enhance the reliability of such kind of analysis.

      We have now attempted the suppression assay with varying Treg:Tconv ratios and observed that the suppressive effect of iTreg was more obvious than that of tTreg when co-cultured at a 1:1 ratio with Tconv cells. However, as the cell number of tTreg and iTreg decreased, the inhibitory effects converged.

      Author response image 7.

      Compare multiple dose of Tconv:Treg ratio in suppression function CFSE-labelled OT-II T cells were stimulated with OVA-pulsed DC, then different number of Foxp3-GFP+ iTregs and tTregs were added to the culture to suppress the OT-II proliferation. After 4 days, CFSE dilution were analyzed. Left, Representative histograms of CFSE in divided Tconvs. Right, graph for the percentage of divided Tconvs.

      In Figure 3F, to which group did the shaded peaks belong? In this context, the authors should focus on "Activation Region" peaks (open chromatin signature in both TcAct & iTreg defined in Fig. 4D) but I did not find the peak in the focusing DNA regions in TcAct (e.g. the shaded regions in IL-4 loci). The clear attribution of the peaks to the heatmap will enhance the visibility and understanding of readers.

      We have selected some typical peaks that belong to Fig 3D. These genes encompass some T-cell activation-associated transcription factors, such as Irf4, Atf3, as well as multiple members of the Tnf family including Lta, Tnfsf4, Tnfsf8, and Tnfsf14. Additionally, genes related to inflammation such as Il12rb2, Il9, and Gzmc are included. These genes show elevated accessibility upon T-cell activation, partially open in activated nTreg cells, referred to as the "Activation Region." They collectively exhibit high accessibility in iTreg cells, which may contribute to their instability.

      Author response image 8.

      Chromatin accessibility of some “Activation Region”. Genomic track showing chromatin accessibility of Irf4, Atf3, Lta, Tnfsf8, Tnfsf4, Tnsfsf14, Il12rb2, Il9, Gzmc in activated Tconv and iTreg.

      In Figure 4A/S4A, the information on cell death will help the understanding of readers because the sustained SOCE is associated with cell survival as shown in Fig. S2. The authors can discuss the relationships between cell death and Foxp3 retention, which potentially leads to a further interesting question; e.g. the selective/resistance to activation-induced cell death as the identity of Treg cells.

      As shown in Author response image 9, activated iTreg cells indeed exhibit a certain degree of cell death compared to resting iTreg cells. The inhibition of NFAT by CsA enhances the survival rate of iTreg cells, but the inhibition of ORAI by CM-4620 leads to more severe cell death. The cell death induced by CsA and CM-4620 is not consistent, indicating that there may not be a direct proportional relationship between cell death and the expression of Foxp3 and Treg identity.

      Author response image 9.

      Relationship of cell death and Foxp3 stability in restimulated iTregs.<br /> Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of CsA or CM-4620. After 2d restimulation, live cell percentage were analyzed by staining of Live/Dead fixable Aqua, and percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3. Upper, live cell percentage of iTregs. Lower, percentages of Foxp3 in iTregs.

      In Figure 5, the information for the data interpretation was insufficient.

      We have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015). The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters. The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      The correlation between the open chromatin status of the gene loci described in Fig.5E and the expression at mRNA level? e.g.; Do iTreg-Act cells produce a higher level of IL-21 than nTreg-act? The analysis in Fig.5F-G should be performed in parallel with nTreg cells to emphasize the distinct NFAT-chromatin regulation in iTreg cells.

      We have now compared the secretion levels of IL-21 in tTreg and iTreg upon activation and treated with CsA by ELISA. As shown in Author response image 10, tTreg did not secrete IL-21 regardless of activation status (undetectable), while iTreg did not secrete IL-21 at resting state but exhibited IL-21 secretion after 48 h of activation. Moreover, the secretion of IL-21 was inhibited by CsA and CM-4620 treatment. This observation aligns with our earlier findings where we observed nuclear binding of NFAT to gene loci of these cytokines, enhancing their expression and pushing iTreg unstable under inflammatory conditions. These findings further underscore the likelihood that the inhibition of calcium and NFAT signaling might contribute to the stabilization of iTreg by suppressing the secretion of inflammatory cytokines.

      Author response image 10.

      IL-21 secretion in tTreg and iTreg upon activation.<br /> iTregs and tTregs were sorted and restimulated with anti-CD3 and anti-CD28 antibodies, in the presence of CsA and CM-4620. Cell culture supernatant were harvested after 2 d restimulation and IL-21 secretion was analyzed by ELISA.

      Performing a parallel comparison of NFAT activity between tTreg and iTreg cells was initially part of our experimental plan. However, it proved challenging in practice, as we encountered difficulties in efficiently infecting tTreg cells with NFAT-flag. Consequently, we could not obtain a sufficient number of tTreg cells for conducting Cut&Tag experiments.

      Based on our observations, we speculate that there might be substantial differences in the accessibility of genes in tTreg cells, leading to considerable variations in the repertoire of genes available for NFAT to regulate. As a result, we expect significant differences in the nuclear localization and activity of NFAT between iTreg and tTreg cells.

      In Figure 6C, what does the FCM plot between Foxp3-CFSE look like?

      The authors can discuss the mechanism of ORAI-DN-mediated through such analysis; e.g. the possibility that selective proliferation defect by ORAI-DN in Foxp3- cells led to an increased percentage of Foxp3, not only just unstable transcription of Foxp3.

      This is an in vitro experiment to assess the suppressive effect of iTreg on Tconv proliferation. Therefore, CFSE is used to stain Tconv cells, but not iTreg cells, so we did not detect proliferation feature of iTreg.

      Minor

      Confusing terminology of "tTreg" at line 47, etc. "natural Treg" contains both thymic-derived Treg and periphery-derived Treg cells. (A Abbas et al. Nat Immunol. 2013)

      We have now changed the designation to tTreg at line 47. tTreg refers to thymus-derived regulatory T cells, while nTreg includes both tTreg and pTreg. However, it is important to note that the Treg cells used in our study were isolated from the spleen of 2-4-month-old Foxp3-GFP or Foxp3-RFP mice. The CD4+ T cells were first enriched using the CD4 Isolation kit, and the FACSAriaII was utilized to collect CD4+ Foxp3-GFP/RFP+ Treg cells. Subsequently, Helios and Nrp-1 staining revealed that the majority of these cells were nTreg, with only approximately 6% being pTreg. Overall, we consider the cells we used as tTreg.

      In all FCM analyses, the authors should clarify how to detect Foxp3 expression; Foxp3-GFP/Foxp3-RFP/Intracellular staining like Figure S5A (but not specified in the other FCM plots)

      All Foxp3 expressions in the article were assessed using intracellular staining, as described in the methods section, and we have added specific descriptions to each figure legend. The reason for employing intracellular staining is that we used Foxp3-IRES-GFP mice, where GFP and Foxp3 are not fused into a single protein, existing as separate proteins after expression. Therefore, during induction, the appearance of GFP protein might potentially represent the presence of Foxp3. However, in cases of Foxp3 instability, the degradation of GFP protein may not be entirely synchronized with that of Foxp3 protein, making GFP an unreliable indicator of Foxp3 expression levels. As a result, for the purification of pure iTreg cells, we used Foxp3-GFP/RFP fluorescence, while for observing instability, we employed intranuclear staining of Foxp3.

      In Figure 6B, the captions were lacking in the two graphs on the right side

      The two restimulation conditions, 0.125+0.25 and 0.25+0.5, have been added into Fig 6B right side.

      In Figure S2, the annotation of the x-y axis was missing.

      Added.

      Lack of reference at line 292.

      Reference 42-46 were added.

      In the method section, the authors should note the further product information of antibodies and reagents to enhance reproducibility and transparency. Making a list that clarifies the suppliers, Ab clone, product IDs, etc. is encouraged. The authors did not specify the supplier of recombinant proteins and which type of TGF-beta (TGF-beta 1, 2, or 3?).

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided and incorporated into the methods section.

      In the method section, the authors should clarify which Foxp3-reporter strain. There are many strains of Foxp3-reporter mice in the world. In line 373, is the "FoxP3-IRES-GFP transgenic mice" true? Knock-in strain or BAC-transgene?

      This mouse is a gift from Hai Qi Lab in Tsinghua University. They acquired this mouse strain from Jackson Laboratory, and the strain name is B6.Cg-Foxp3tm2Tch/J, Strain #:006772. An IRES-EGFP-SV40 poly A sequence was inserted immediately downstream of the endogenous Foxp3 translational stop codon, but upstream of the endogenous polyA signal, generating a bicistronic locus encoding both Foxp3 and EGFP.

      The age of mice used in the experiments should be specified, and confusing words such as "young" should not be used in any method descriptions; e.g. line 405.

      The detailed mouse age has been added in the methods section. “To prepare Tconv, tTreg and iTreg for experiments, spleen was isolated from 2-4-month-old Foxp3-GFP mice for Tconv and tTreg sorting, and 6-week-old mice for iTreg induction.”

      The method of how the original ATAC-seq/Cut & Tag data were generated was not described in the method section.

      Added in method section.

      The reference section was incomplete, and the style was not unified. e.g.; ref 7, 24, 25, 26 ... I gave up checking all.

      The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were modified.

      Changes in manuscript:

      Author Name: “Huiyun Lv” to “Huiyun Lyu”.

      Fig 1A was updated according to Reviwer 2’s suggestion.

      Fig S3E and associated description was added according to Reviwer 2’s suggestion.

      Fig S4C and associated description was added according to Reviwer 1’s suggestion.

      Fig 5H and associated description was added according to Reviwer 2’s suggestion.

      Fig 6D were updated according to Reviwer 1’s suggestion.

      Fig 2D was corrected, the labels for gapdh and actin in the iTreg panel were inadvertently switched. The mistake has been rectified, and the original gel image will be provided.

      Fig 2A and Fig 4A was updated.

      The style of Fig 6B and Fig S2A was modified.

      Method:

      Mice: FoxP3-IRES-GFP with more description.

      Flow Cytometry sorting and FACS: the detailed mouse age has been added. RNA-seq analysis, ATAC-sequencing, ATAC-seq analysis, Cut&Tag assay, Cut&Tag data analysis: more description was added.

      Statistical analysis: “Numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n.” were added.

      Reference: Ref 42-46 and 49-52 were added. The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were corrected.

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The precise mechanism of how tetraspanin proteins engage in the generation of discs is still an open question in the field of photoreceptor biology. This question is of significance as the lack of photoreceptor discs or defects in disc morphogenesis due to mutations in tetraspanin proteins is a known cause of vision loss in humans. The authors of this study combine TEM and mouse models to tease out the role of tetraspanin proteins, peripherin, and Rom1 in the genesis of the photoreceptor discs. They show that the absence of Rom1 leads to an increase in peripherin and changes in disc morphology. Further rise in peripherin alleviates some of the defects observed in Rom1 knockout animals leading to the conclusion that peripherin can substitute for the absence of Rom1.

      Strengths:

      A mouse model of Rom1 generated by the McInnes group in 2000 predicted a role for Rom1 in rim closure. They also showed enlarged discs in the absence of Rom1. This study confirmed this finding and showed the compensatory changes in peripherin, maintaining the total levels of tetraspanin proteins. Lack of Rom1 leads to excessive open disks demonstrated by darkly stained tannic acid-accessible areas in TEM. Interestingly, increased peripherin expression can rescue some morphological defects, including maintaining normal disc diameters and incisures. Overall, these observations lead authors to propose a model that ROM1 can be replaced by peripherin.

      Thank you for your kind summary of our work.

      Weaknesses:

      The compensatory increase in peripherin and morphological rescue in the absence of ROM1 is expected, given the previous work from authors showing i) absence of peripherin showing increased ROM1 and ii) "Eliminating Rom1 also increased levels of Prph2/RRCT: mean Prph2/RRCT levels in P30 Prph2+/R retinas were 34% of WT, while levels in Prph2+/R/Rom1−/− retinas were 59% of WT" from Conley, 2019. The current study provides a comprehensive quantitative analysis. However, the mechanism behind the mechanism is unclear and warrants discussion.

      We referenced the result from the 2019 paper by Conley and colleagues in revision. As noted by the reviewer, new information in the current study consists of the precise quantification of the compensatory increase by a technique more accurate than semi-quantitative Western blotting. The nature of these compensatory increases is currently unknown and beyond the scope of experiments described in the current study. While this is an intriguing area for future investigation, we prefer not to speculate on the underlying mechanisms to avoid any appearance of data overinterpretation.

      Photoreceptor morphology appears better when peripherin is overexpressed. Is there a rescue of rod function (assessed by ERG or equivalent measures) in peripherin OE/Rom1-/- mice? Given the extensive work in this area and the implications the authors allude to at the end, it is important to investigate this aspect.

      It is indeed an interesting and potentially translationally relevant direction to address whether PRPH2 overexpression can rescue the long-term degeneration and functional defects of the loss of ROM1. Unfortunately, our work in this direction remains severely hindered by the fact that the current line of ROM1 knockout mice are notoriously poor breeders, allowing us to get only a handful of animals for each year of breeding. Therefore, we decided to limit our current study to addressing the structural roles of ROM1 and PRPH2 in supporting disc formation.

      Reviewer #1 (Recommendations For The Authors):

      Line 210: "ROM1 is able to form disc rims in the absence of PRPH2" is not demonstrated. The data shows that the tetraspanin domains are interchangeable similar to Conley, 2019. Similar concern for lines 225-226.

      We agree with the point regarding the interchangeable tetraspanin domains and clarified it in the text by referring to the tetraspanin body of PRPH2 where applicable. However, the 2019 paper by Conley and colleagues did not show any ultrastructural images of disc rims in a mouse without at least one copy of WT PRPH2 being expressed. The presence of normally looking disc rims in the complete absence of the tetraspanin body of PRPH2 is an original observation of the present study.

      Line 234: it is unclear what is meant by .."they are normally processed in the biosynthetic membranes" How does lack of ER localization lead to this conclusion?

      We clarified this point by replacing “normally processed” with “not trapped”.

      Lines 306-308: it is difficult to follow the rationale. How will a shift in the trafficking pathway affect disulfide bonds since these are formed in ER?

      The reviewer makes a good point that at least the bulk of S-S bridge formation takes place during protein maturation in the ER and the ability of additional intramolecular S-S bond formation in the Golgi is questionable. We, therefore, removed this speculation from Discussion.

      Given the poor development of OS, the authors could provide an estimate of how many OS-like structures were observed, with and without rims, in RRCT animals.

      The gross development of outer segment structures in RRCT homozygous mice was part of the 2019 paper by Conley and colleagues. We prefer to limit repeating experiments from the previous study, but instead wanted to focus specifically on disc rim formation, which was not analyzed in RRCT homozygous mice in the previous study.

      The term "function" is loosely defined throughout this manuscript. Specifically, the excess peripherin can resolve some of the morphological defects observed in Rom1 -/-, and these functional changes in morphology are the focus of this work.

      We removed the word “function” in three occasions where there may be an ambiguity in its meaning, as noted by the reviewer.

      Lines 115/116: Reference is missing for the statement that photoreceptor cell degeneration begins at P30.

      These lines reference Figures 1A,B, which include quantification of the number of photoreceptor nuclei. These results show that ROM1 knockout retinas exhibit a modest but statistically significant degeneration at P30. The text is modified to eliminate any ambiguity.

      Lines 143-144 are speculation and could be moved to the discussion section. "Prolonged delivery of disc membrane delivery to each disc" Any reference or experiments to support this statement?

      We respectfully disagree with moving this short speculative sentence to Discussion. We believe that it helps the reader to follow the flow of the data, while being clearly presented as a potential explanation rather than a conclusion.

      Line 245-246: Results explained in the following paragraph (247-254) do not answer the question "whether disc rim formation in PRPH2 2C150S/C150S knockin mice was driven by disulfide-linked ROM1 molecules", which is a valid and intriguing question. However, the results explained in 247-254 answer the question "if C150S PRPH2 can form discs in the absence of ROM1".

      We changed the text to replace “To address this question” with “To explore whether disc rims can be formed in the absence of any disulfide-linked tetraspanin molecules”, which precisely reflects what was addressed.

      Reviewer #2 (Public Review):

      In this study, Lewis et al seek to further define the role of ROM1. ROM1 is a tetraspanin protein that oligomerizes with another tetraspanin, PRPH2, to shape the rims of the membrane discs that comprise the light-sensitive outer segment of vertebrate photoreceptors. ROM1 knockout mice and several PRPH2 mutant mice are reexamined. The conclusion reached is that ROM1 is redundant to PRPH2 in regulating the size of newly forming discs, although excess PRPH2 is required to compensate for the loss of ROM1.

      This replicates earlier findings while adding rigor using a mass spectrometry-based approach to quantitate the ratio of ROM1 and PRPH2 to rhodopsin (the protein packed in the body of the disc membranes) and careful analysis of tannic acid labeled newly forming discs using transmission electron microscopy.

      In ROM1 knockout mice PRPH2 expression was found to be increased so that the level of PRPH2 in those mice matches the combined amount of PRPH2 and ROM1 in wildtype mice. Despite this, there are defects in disc formation that are resolved when the ROM1 knockout is crossed to a PRPH2 overexpressing line. A weakness of the study is that the molar ratios between ROM1, PRPH2 and rhodopsin were not measured in the PRPH2 overexpressing mice. This would have allowed the authors to be more precise in their conclusion that a 'sufficient' excess of PRPH2 can compensate for defects in ROM1.

      Thank you for these kind comments about our work. Regarding the stated weakness that we did not measure the molar ratios between PRPH2, ROM1 and rhodopsin in the ROM1 knockout line with PRPH2 overexpression: this is one experiment that we really hoped to do but were limited by the poor breeding of the ROM1 knockout line described above. With the current breeding rate, we estimate that we would need to wait for another year to get enough material to do this experiment, which we cannot do in the context of this manuscript revision. We hope, however, that eventually this may be a part of one of our future papers.

      Reviewer #2 (Recommendations For The Authors):

      The p-value for statistical significance is not listed, readers will assume the most commonly used 0.05 value was used but this should still be defined, especially since only asterisks summarizing the p-value range are provided in place of the actual p-values.

      The definitions of various numbers of asterisks of significance (including p<0.05 as a minimal measure of significance) are provided in the Methods section, whereas the exact p-values are stated in figure captions.

      There are 3 phrasing issues that are potentially misleading.

      1) While PRHP2 and ROM1 are the most abundant tetraspanins in photoreceptors they are not the only ones. It would be more precise if for example the Table 1 title was changed to 'molar ratio of outer segment tetraspanins and rhodopsin'.

      We have changed the title of Table 1 to “Quantification of molar ratios between PRPH2, ROM1 and rhodopsin in WT and Rom1-/- outer segments” to be more accurate.

      2) The protein expressed in RRCT mice is described as the 'tetraspanin core' while the cartoon (and original paper) shows the protein as simply being ROM1 with a different cytoplasmic C-terminus (from PRHP2). Tetraspanin core in other places is used to mean just the transmembrane bundle or that bundle with the EC loops.

      We agree that the term “tetraspanin core” may be confusing. We modified the text to not use this term and, when needed, refer to this main part of the tetraspanin molecule as a “body”.

      3) Line 203-205, the 'somewhat restored' qualifier should be removed. If the authors think there is an effect that is different from chance, they should use a different alpha and justify that choice.

      We removed this line, as suggested.

      Reviewer #3 (Public Review):

      In this manuscript, Lewis et al. investigate the role of tetraspanins in the formation of discs - the key structure of vertebrate photoreceptors essential for light reception. Two tetraspanin proteins play a role in this process: PRPH2 and ROM1. The critical contribution of PRPH2 has been well established and loss of its function is not tolerated and results in gross anatomical pathology and degeneration in both mice and humans. However, the role of ROM1 is much less understood and has been considered somewhat redundant. This paper provides a definitive answer about the long-standing uncertainty regarding the contribution of ROM1 firmly establishing its role in outer segment morphogenesis. First, using an ingenious quantitative proteomic technique the authors show PRPH2 compensatory increase in ROM1 knockout explaining the redundancy of its function. Second, they uncover that despite this compensation, ROM1 is still needed, and its loss delays disc enclosure and results in the failure to form incisures. Third, the authors used a transgenic mouse model and show that deficits seen in ROM1 KO could be completely compensated by the overexpression of PRPH2. Finally, they analyzed yet another mouse model based on double manipulation with both ROM1 loss and expression of PRPH2 mutant unable to form dimerizing disulfide bonds further arguing that PRPH2-ROM1 interactions are not required for disc enclosure. To top it off the authors complement their in vivo studies by a series of biochemical assays done upon reconstitution of tetraspanins in transfected cultured cells as well as fractionations of native retinas. This report is timely, addresses significant questions in cell biology of photoreceptors, and pushes the field forward in a classical area of photoreceptor biology and mechanics of membrane structure as well. The manuscript is executed at the top level of technical standard, exceptionally well written, and does not leave much more to desire. It also pushes standards of the field- one such domain is the quantitative approach to analysis of the EM images which is notoriously open to alternative interpretations - yet this study does an exceptional job unbiasing this approach.

      According to my expertise in photoreceptor biology, there is nothing wrong with this manuscript either technically or conceptually and I have no concerns to express.

      Thank you for these incredibly kind comments.

      Reviewer #3 (Recommendations For The Authors):

      I have no recommendations to make.

    1. Author Response

      We would like to thank you and the reviewers for evaluating this manuscript and providing constructive recommendations. Please see our provisional response to the major comments made by the reviewers.

      Reviewer #1 (Public Review):

      "…the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well"

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We agree this is an important experiment which we will do.

      Reviewer #2 (Public Review):

      “…The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions."

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch).

      If our interpretation of the concern is correct, we think this is unlikely to be the case. The first test, and the following HFS protocol, and the second test, (Fig. 1a, top branch) were all performed in the same chamber. For both the first and the second tests, animals received two 30-second recall trials, separated by 2 minutes (the data presented as the average of the two trials). We did not see a difference in freezing between the first and the second recall trials within each session (data not shown). It was only after the HFS protocol that we observed an increase in freezing.

      This shows that in our paradigm the first recall does not impact the next recall in terms of the animals’ freezing levels. It must be noted that in cases where we did not do any testing prior to the HFS protocol, we still observed an increase in freezing after the HFS protocol (ex. Fig. 1a, middle branch and the corresponding data in Fig. 1b, the bar labeled as Wth+HFSth). Also, relevant is the data shown in Fig. 3c. Here, although animals were tested twice (Fig3. a, top branch), there was no increase in freezing in the second test (Fig. 3c, middle panel, Wth+24HFSCtx). That is, in the absence of an effective LTP, there is no significant difference between the two tests.

      To further confirm this, in a new group of mice, 24 hours after weak conditioning, we will induce the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol).

      “The final experiment (Fig. 5a-c, extended data 5c) combines behavioral assessments with in vivo LFP recordings before and 24 h after hetero-HFS. While this experiment is demanding, it seems a bit underpowered”

      We agree with the reviewer that the number of mice used in this experiment is on the lower side. However, this is not unusual for such an experimental configuration. As the reviewer mentioned, this is a demanding experiment for multiple reasons. For example, to confidently demonstrate that our HFS protocol, in addition to long-lasting behavioral changes, produces long-lasting synaptic changes, we must see a significant increase in evoked LFP after the manipulation which is predicted to last at least 24 hours. That is, the change in evoked LFP is not caused by non-related fluctuations, such as movement of the recording probe. For this reason, 3-4 days prior to conditioning, each day we measured evoked LFP. Only those mice that had a stable evoked LFP during this time were used for further conditioning. We will provide exclusion criteria for this experiment in the revised manuscript.

      “ It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS,..”

      We will perform an experiment where mice undergo a weak conditioning protocol and will record the evoked LFP 1-2 hours following the conditioning protocol, as well as the next day.

      “…the slice experiments (Fig. 5d-f) are not well aligned with the in vivo experiments (juvenile animals, electrical vs. opto stimulation, different HFS protocols, timescale of hours).”

      Our aim in this part was to demonstrate that the pathways we chose for our study can undergo heteroLTP. For this purpose, we used an already established protocol, which uses electrical stimulation (Fonseca, 2013). For clarification, I have tried to induce optical LTP with a high-frequency stimulation protocol in slices, but I did not succeed. I am not aware of a work that successfully induced optical LTP with a high-frequency protocol.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank Reviewer 1 for their time reviewing our revised manuscript and appreciate their thoughtful suggestions for further clarity. In regard to the public review statement, "However, parts of the methods (e.g. assessment of blanks and data filtering) and results (e.g. visualization of plant community data) could still be polished, and the figures should be improved to increase the clarity of the manuscript", we have made small modifications in the text and figures during production of the Version of Record to address these important suggestions.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript compiles the colonization of shrubs during the Late Pleistocene in Northern America and Europe by comparing plant sedimentary ancient DNA (sedaDNA) records from different published lake sediment cores and also adds two new datasets from Island. The major findings of this work aim to illuminate the colonization patterns of woody shrubs (Salicaceae and Betulaceae) in these sediment archives to understand this process in the past and evaluate its importance under future deglaciation and warming of the Arctic.

      We greatly appreciate the time and detailed consideration of our manuscript by Reviewer 1. Our responses to individual comments are highlighted in blue, with the original comments provided by the reviewer in black.

      The strength of evidence is solid as methods (sedimentary DNA) and data analyses broadly support the claims because the authors use an established metabarcoding approach with PCR replicates (supporting the replicability of PCR and thereby proving the occurrence of Salicaeae and Betulaceae in the samples) and quantitative estimation of plant DNA with qPCR (which defines the number of cycles used for each PCR amplification to prevent overamplification). However, the extraction methods need more explanation and the bioinformatic pipeline is not well-known and needs also some further description in the main text (not only referring to other publications).

      Thank you for bringing this to our attention. We have now provided greater detail on our extraction methods and bioinformatic pipeline.

      The authors compare their own data with previously published data to indicate the different timing of shrubification in the selected sites and show that Salicaceae occurs always like a pioneer shrub after deglaciation, followed by Betaluaceae with a various time lag. The successive colonization of Salicaceae followed by Betulaceae is explained by its differences in environmental tolerance, the time lag of colonization in the compared records is e.g. explained by varying distance to source areas.

      However, there are some weaknesses in the strength of evidence because full sedaDNA plant DNA assessment, quality of the sedaDNA data (relative abundance and richness of sedaDNA plant composition) and results from Blank controls (for sedaDNA) are not fully provided. I think it is important to show how the plant metabarcoding in general worked out, because it is known that e.g. poor richness can be indicative of less preserved DNA and a full plant assessment (shown in the supplement) would be more comprehensive and would likely attract a larger readership.

      Thank you for bringing these important points to our attention. The DNA dataset including the full taxa assemblage will be included with the manuscript upon publication and apologize for not including it during the review process. This dataset will also include information on positive and negative blanks used for quality control. Following suggestions from Reviewer 2, we have now also calculated some recently proposed DNA quality metrics (Rijal et al., 2021), which collectively support our earlier conclusions that our record is of sufficient quality to draw the current conclusions. We hope that the inclusion of the complete DNA dataset will indeed draw a larger readership.

      Further, it would allow us to see the relative abundance in changes of plants and would make it easier to understand if the families Salicaeae and Betulaceae are a major component of the community signal. Further, the possibility to reach higher taxonomic resolution with sedaDNA compared to pollen or to facilitate a continuous record (which is different from macrofossils) is not discussed in the manuscript but should be added. Also, the taxonomic resolution within these families in the discussed datasets would be of interest, also on the sequence type level if tax. assignments are similar.

      Thank you for these suggestions. We have focused on these two families as it is known from numerous pollen records and floras that they are the major component of the vascular plant communities in the regions investigated. Betula (birch) and Salix (willow) are indeed the most dominant woodland shrubs of the tundra biome, which covers expansive areas of the Arctic. For example, in Iceland natural woodlands, which cover 1.5% of the total land area, are composed of 80% birch shrubs (Snorrason et al. 2016, Náttúrufræðingurinn 86). Salix mixes in with Betula, especially around wet sites. Species from both genera are common and wide-spread throughout Iceland, but dwarf and cold tolerant species thrive best on the highland or at glacial sites, while shrub-like species are more common on the lowland, coastal area and in sheltered valleys. Flora of Iceland (http://www.floraislands.is/PDF-skjol/Checklist-vascular.pdf) lists Betula as the only genus of Betulaceae native to Iceland (page 79/80) and Salix as the major genus of Salicaceae (page 82-85), although Populus tremula (Salicaceae) exists in the wild but is rare (perhaps just a countable number of trees/shrubs in the whole country). The point is that, for Iceland, Betulaceae is Betula and Salicaceae is Salix, meaning that our sedaDNA method has the taxonomic resolution at the genus level. And with the help of pollen analysis of the site near Stóra Viðarvatn (the novel sedaDNA work of the present paper), i.e., Ytri-Áland site (Karlsdóttir et al. 2014), it is possible to interpret our results even to the species level, which we have only mention in the discussion. It has been suggested that matching sedaDNA results with botanical knowledge about the study site and the vegetation history (local reference database) is one way to increase taxonomic resolution of the sedaDNA approach (e.g. Elliott et al. 2023, Quaternary 6,7). In the same way we find our sedaDNA analysis having sufficient resolution to answer the questions asked in the present study. For the future, although we do not include it in the discussion this time, it should be possible to increase the taxonomic resolution of plant metabarcoding by priming multiple genes simultaneously like that is described as a proof of concept by Foster et al. (2021, Front Ecol Evol 9: 735744). In the revised version of the manuscript, we have now expanded on the power of sedaDNA in terms of increased taxonomic resolution and application in continuous lake sediment records in the introduction of the manuscript. Following Reviewer 2’s suggestion, we have now included the sequences used for taxonomic assignment in the supplement information.

      Another important aspect is how the abundance/occurrence of Salicaceae is discussed. Many studies on sedaDNA confirm an overrepresentation of this family due to better preservation in the sediment, far-distance transport along rivers, or preferences of primers during amplification etc. As this family is the major objective of this study, such discussion should be added to the manuscript and data should be presented accordingly.

      Thank you for raising this point. The reviewer is indeed correct that Salicaceae is typically overrepresented in read abundance compared to other vascular plant taxa in sedaDNA studies. However, as we mention in the Results and Interpretation section for Stóra Viðarvatn “As PCR amplification results in sequence read abundances that may not reflect original relative abundances in a sample (Nichols et al., 2018), we focus our discussion on taxa presence/absence,” we do not place weight on the indeed greater relative abundance of Salicaceae in our own dataset. As such, this different relative abundance of plant taxa reads should not influence the conclusions drawn in the manuscript.

      I also miss more clarity about how the authors defined the source areas (refugia) of the shrubs. If these source areas are described in other literature I suggest to show them in a map or so. Further, it should be also discussed and explained more in detail which specific environmental preferences these families have, this is too short in the introduction and too unspecific. Also, it would be beneficial to show relative abundances rather than just highlighted areas in the Figures and it would allow us to see if Salicaeae will be replaced by Betulaceae after colonizing or if both families persist together, which might be important to understand future development of shrubs in these areas.

      Thank you for allowing us to clarify. As the regions studied with the lake sediment records shown in this manuscript were all covered by extensive ice sheets during the Last Glacial Maximum (LGM, Fig. 1), plant refugia and source areas must have been located somewhere south of the ice sheet margins. Thus, we calculate our distance to source as the minimum distance from a lake site to land beyond the extent of the ice sheet during the LGM. This has now been clarified in the text and highlighted in Fig. 1. We have also added in the discussion molecular results from Thórsson et al. (2010, J Biogeogr 37) on possible source origins of Betula in Iceland. Details on taxa environmental preferences have now been expanded upon in the Discussion section where we explore the various trait-based factors that may influence the relative differences in colonization timing between Salicaceae and Betulaceae. We have now also edited Figs. 3 and 4 to include PCR replicates instead of highlighted bars to better compare the DNA and pollen datasets from Iceland.

      The author started a discussion about shrubification in the future, but a more defined evaluation and discussion of how to use such paleo datasets to predict future shrubification and its consequences for the Arctic would give more significance to the work.

      Thank you for this suggestion and allowing us to expand on potential future changes. We have now edited this final section of the paper to provide a little more detail on how we envision these records being used to predict future shrubification and climate change.

      Reviewer #1 (Recommendations For The Authors):

      I list some more specific details here.

      You speak about "read counts", I guess you used relative abundance of read counts, you should state it like this.

      Thank you for allowing us to clarify. The data that we refer do in terms of read counts is from the previously published studies in the circum North Atlantic. The data provided from these studies is raw read counts, and not relative abundance.

      Line 100: What do you mean here: "temperature changes in prior warm periods"?

      Thank you for allowing us to clarify. We have rephrased to sentence to “higher temperature in prior warm periods”, which we hope is clearer for the reader.

      Line 134: How is DNA diluted by minerogenic sediment? Did the sedimentation rate increase? Typically minerogenic input should be beneficial for DNA preservation.

      Thank you for allowing us to clarify. These samples were primarily comprised of tephra glass with minimal organic content. While we agree that minerogenic sediment is generally beneficial for DNA preservation, the predominance of inorganics (tephra) that fell from the sky, rather than being washed into the lake from the landscape, would not carry organic sediment with it. We have rephrased the sentence to make this clearer.

      I would suggest adding more citations to the text (for example statements in lines 106, 110, 368)

      Thank you for the suggestion. The manuscript has been edited accordingly.

      Better divide your discussion part: discussion about dispersal mechanisms occur in both sections. Maybe you could divide it into environmental factors for colonization and traitbased factors (only an idea).

      Thank you for the suggestion. We have now edited the second dispersal section to “Environmental dispersal mechanisms” to be clearer about our focus on factors such as wind, sea ice, and birds that may transport the seeds across the North Atlantic. The previous section retains the trait-based factors that may influence relative timing in colonization between Salicaceae and Betulaceae.

      Which type of sequencing did you use, paired-end 76bp is unknown to me.

      Methods have now been edited to clarify this, along with details related to extraction methods as requested in the Public Review.

      Reviewer #2 (Public Review):

      Harding et al have analysed 75 sedaDNA samples from Store Vidarvatn in Iceland. They have also revised the age-depth model of earlier pollen, macrofossil, and sedaDNA studies from Torfdalsvatn (Iceland), and they review sedaDNA studies for first detection of Betulaceae and Salicaceae in Iceland and surrounding areas. Their Store Vidarvatn data are potentially very interesting, with 53 taxa detected in 73 of the samples, but only results on two taxa are presented. Their revised age-depth model cast new light on earlier studies from Torfdalsvatn, which allows a more precise comparison to the other studies. The main result from both sedaDNA and the review is that Salicaceae arrives before Betulaceae in Iceland and the surrounding area. This is a well-known fact from pollen, macrofossil, and sedaDNA studies (Fredskild 1991 Nordic J Bot, Birks & Birks QSR 2014, Alsos et al. 2009, 2016, 2022) and as expected as the northernmost Salix reach the Polar Desert zone (zone A, 1-3oC July temperature) whereas the northernmost Betula rarely goes beyond the Southern Tundra (zone D, 8-9 oC July temperature, Walker et al. 2005 J. Veg. Sci., Elven et al. 2011 http://panarcticflora.org/ ).

      We greatly appreciate the time and detailed consideration of our manuscript by Reviewer 2. Our responses to individual comments are highlighted in blue, with the original comments provided by the reviewer in black.

      While we agree that previous studies have indeed indicated a relative delay in Betula colonization relative to Salix, most of these have relied on pollen and macrofossil evidence, which are complicated to use as proxies for the first appearance of a given taxa (see our Introduction in the main manuscript). A few studies have shown this also with sedaDNA (e.g., Alsos et al., 2022), which is a more robust proxy for a plant taxa’s presence, but these have been limited geographically (e.g., northern Fennoscandia). In our study, we show that this pattern is reflected in 10 different lakes across the North Atlantic, emphasizing the broad nature of Betula’s delayed colonization relative to other woody shrubs, such as Salix.

      My major concern is their conclusion that lag in shrubification may be expected based on the observations that there is a time gap between deglaciation and the arrival of Salicaceae and between the arrival of Salicaceae and Betulaceae. A "lag" in biological terms is defined as the time from when a site becomes environmentally suitable for a species until the species establish at the site (Alexander et al. 2018 Glob. Change Biol.). The climate requirement for Salicaceae highly depends on species. In the three northernmost zones (A-C), it appears as a dwarf shrub, and it only appears as a shrub in the Southern Tundra (D) and Shrub Tundra (E) zone, and further south it is commonly trees. Thus, Salicaceae cannot be used to distinguish between the shrub tundra and more northern other zones, and therefore cannot be used as an indicator for arctic shrubification. Betulaceae, on the other hand, rarely reach zone C, and are common in zone D and further south. Thus, if we assume that the first Betulaceae to arrive in Iceland is Betula nana, this is a good indicator of the expansion of shrub tundra. Thus, if they could estimate when the climate became suitable for B. nana, they would have a good indicator of colonisation lags, which can provide some valuable information about time lags in shrub expansion (especially to islands). They could use either independent proxy or information from the other species recorded in sedaDNA to reconstruct minimum July temperature (see e.g. Parducci et al. 2012a+b Science, Alsos et al. 2020 QSR).

      We appreciate the reviewer’s insight into the implications of our use of the word “lag”. Indeed, as we do not have site-specific climate timeseries for each lake record, we have adjusted our wording to “delay”, which we believe is more general and descriptive of our observations. We recognize the importance of independent paleotemperature records for each lake, but these are not yet available for all records, so we prefer to keep our study focused on the delay instead. In addition, we prefer not to derive temperature records from the vegetation sedaDNA records, as these are not independent and will incorporate changes driven by additional factors, such as soil and light (e.g., Alsos et al., 2022). We have added some text to the final section on Future Outlook that elaborates on the need for complimentary records of past climate to pair with paleoecological records of colonization. We hope that this motivates the community to pursue these lines of research that we agree are needed.

      The study gives a nice summary of current knowledge and the new sedaDNA data generated are valuable for anyone interested in the post-glacial colonisation of Iceland. Unfortunately, neither raw nor final data are given. Providing the raw data would allow re-analysing with a more extensive reference library, and providing final data used in their publication will for sure interest many botanists and palaeoecologist, especially as 73 samples provide high time resolution compared to most other sedaDNA studies.

      Finally, the raw and final data, including blank controls, used in our study for Stóra Viðarvatn will ultimately be provided with the manuscript’s publication. We apologize for not including it with the original submission.

      Reviewer #2 (Recommendations For The Authors):

      Line 112-113: Difference in northward expansion rate is not the same as lag. Thus, your conclusion "As a result, the biospheres role in future high latitude temperature amplification may be delayed." does not derive directly from the data you present.

      Thank you for allowing us to clarify our wording. We have rephrased the sentence to align with our results more closely as stated in the Abstract of the manuscript.

      .Line 133: From Figure S3, it looks like three or possibly four samples failed.

      Thank you for pointing this out. First, we realized that the DNA reads originally included in Figure S3 were from after filtering. We have now updated the figure to include the total raw reads, which is a better indicator of DNA reliability (Rijal et al., 2021). Based on the total raw reads, only two samples failed with total reads of 2 and 5.

      Line 141: You say you focus on presence/absence, but you do show quantitative results for Salix and Betula (0-5 PCR repeats) in Figure 2.

      Thank you for allowing us to clarify. Fig 2 shows the number of replicates that meet our criteria for taxa presence, where a higher number of replicates corresponds to a higher likelihood of presence.

      Line 142: Where are the other 51 taxa shown?

      We are providing the full DNA record in the supplement, which will be published alongside the main manuscript. We have also now included a plot of species richness against sample depth in Fig. S2.

      Line 178-179: Note that the revised date of first detection is close to what has been previously published (Salix ~10300 vs. 10227, Betula ~9500 vs 9680), so it does not make any changes to previous interpretation.

      Yes, this is true. However, we still believe it is important to always consider improvements in age models to best correlate the timing of events between different paleo records.

      Line 191-194 and Figure S2: I leave the evaluation of revised age-depth model to the geologist.

      As this aspect was not commented on, we assume that both reviewers are satisfied.

      Line 197: "Delay" is a more correct word than "lag".

      Thank you, edited.

      Line 210: Where do 1700 and 2500 come from? If your revised age of ice retreat is 11 800, and your revised date of Salix and Betula arrival are ~10 300 and ~9500, I make this 1500 and 2300.

      Yes, this is correct. Thank you for pointing out this error.

      Line 215-217: To be more certain about any bias caused by low DNA quality, I suggest you explore your data using the tools presented in Rijal et al. 2021 Science Advances. As you do not provide your data, I cannot evaluate the quality of them.

      Thank you for the suggestion. We have now calculated the various DNA quality indices developed by Rijal et al. (2021). This has been added to the methods and results section for the Stóra Viðarvatn record, as well as in Fig. S3. The MTQ and MAQ scores are known to correlate with species richness when richness is low (n<30, Rijal et al., 2021), which is likely an artifact of the requirement that the 10 best represented barcode sequences are required to calculate these scores. As this correlation is observed in our dataset and given that our species richness is low (n<30, Fig. S2), the low MTQ and MAQ score are not likely indicative of low-quality DNA. We therefore judge the quality of our DNA on total raw reads and CT values, which remain relatively constant through time (Fig. S2).

      Line 226: Do you mean TDV?

      We intended to omit unnecessary abbreviations throughout the manuscript, such as lake names, in our original manuscript. We have now changed TORF, which we use as the lake’s abbreviation, to the full lake name, Torfdalsvatn.

      Line 282-283: Given that the basal sediments of Nordivatnet are marine (Brown et al. 2022 PNAS Nexus), even a low detection may be a strong indication of local presence.

      Thank you for this point. However, to standardize the records and compare across a wide range of geographical and depositional settings, we prefer to apply the same criteria for the taxa’s presence to each lake as outlined in our Methods.

      Line 289: See the definition of "lag"

      Changed to “delayed” per your earlier suggestion. Thank you.

      Line 298-303: I agree that the late appearance of Betula at Langfjordvatnet (10 000 cal BP) is anomalously long and a bit unexpected given that it is found at five other lakes in the region 13000-10200 cal BP (Alsos et al. 2022). However, a likely explanation is the lack of area with stable soil - B. nana requires a greater degree of soil development compared to other heath shrubs (Whittaker 1993) and Langfjordvatnet is surrounded by steep scree slopes (Otterå 2012 master thesis Univ. Bergen). At Jøkelvatnet, Salix appears in the four available samples from 10453 to 9811 whereas Betula arrives 9663. Here, the arrival of Betula is just at the drop of local glacier activity and at the temperature rise, suggesting that it arrives immediately after the climate becomes suitable (Elliott et al. 2023 Quaternary). Thus, based on N Fennoscandia where we have more data available, it does not show lags and does not support delayed shrubification (which contrasts with what we have shown for many other species including common dwarf shrubs, see Alsos et al. 2022). Would be very interesting to have similar data from Iceland, which has a large dispersal barrier.

      Thank you for these further considerations. We have incorporated those related to Langfjordvannet into the manuscript accordingly. We also appreciate the point regarding Jøkelvatnet. However, as stated in our Methods section for “Published sedaDNA datasets”, we do not include Jøkelvatnet in our comparison due to the impact of glacier activity as the reviewer notes: “Finally, both Jøkelvatnet and Kuutsjärvi were impacted by glacial meltwater during the Early Holocene when woody taxa are first identified (Wittmeier et al., 2015; Bogren, 2019), and thus the inferred timing of plant colonization is probably confounded in this unstable landscape by periodic pulses of terrestrial detritus.” Due to the glacier’s presence in the lake catchment, it is not possible to discern whether delay in Betulaceae would have occurred if the glacier were not present. Therefore, we prefer to keep this record excluded from our comparisons.

      Line 316-319 and 344: Based on contemporary genetic patterns, Alsos et al. analyse the relative importance of adaptation to dispersal compared to other factors.

      Thank for you bringing up this important point. We have now expanded our discussion to include these analyses from Alsos et al. (2022).

      Line 342+350: Original publication is Alsos et al. 2007 Science

      Thank you, edited.

      Line 343: Alsos et al. 2009 Salix study is the wrong citation here. Eidesen et al. 2015 Mol. Ecol. shows phylogeography of Greenland population but not Baffin - I am not aware of any contemporary genetic studies of Betula from Baffin.

      Thank you for pointing this out. We will also include the Eidesen et al. (2015) citation for reference to Greenland. However, there is one data point included for southern Baffin Island in Alsos et al. (2009), so we will retain this citation here as well.

      Line 351-353: See comment about Betula from Baffin above. Also, I am not sure I follow here - what do you mean by "these populations" - the Svalbard ones or Iceland? Eidesen et al. 2015 is the wrong citation for Salix - use Alsos et al. 2009. Alsos et al. 2009 suggest Iceland (and E Grenland) was colonized from north Scandinavia, although this was uncertain as no data were available from Faroe/Shetland. Svalbard was colonized from N Fennoscandia (Alsos et al. 2007).

      Regarding Baffin Island sources, we refer the reviewer to our response to their previous comment. We have clarified the wording of our sentence from “these populations” to “the modern populations from these locations [Baffin Island, Greenland, and Svalbard]”. We have removed reference to Eidesen et al. (2015), as this is for Betula rather than Salix. Finally, we have added a citation for Alsos et al. (2007) here for Svalbard.

      Line 354-355: AFLP suggest that Baffin and W Greenland were colonised from a refugia south of the Wisconsin Ice Sheet, see Alsos et al. 2009.

      Yes, we are aware, thank you. Our reference to “mid-latitude North America” in the sentence acknowledges this refugia, but we have now added “south of the Laurentide Ice Sheet” for further clarification.

      Line 363-381: See comment above; your Store Vidarvatn data do currently not demonstrate a lag between environmental suitability and climate, but using the rest of the DNA record, potentially it could. Would also be good to reflect on the distance to the source area for shrubs Late Glacial/Early Holocene compared to now.

      Thank you for these suggestions. We have edited this section of the manuscript to elaborate on the need for independent climate reconstructions as well as the fact that distances to plant refugia are shorter now than during the last postglacial period.

      Line 396-416: I am not an expert on tephra so I will not comment on this part.

      As this aspect was not commented on, we assume that both reviewers are satisfied.

      Line 459-457: Please provide results of how much data is lost at each step of filtering.

      We added the read loss following each filtering step as a table in the supplemental information (Table S4).

      Throughout the manuscript, you go only to species level although DNA in most cases is able to distinguish to genus level within Salicaceae and Betulaceae - which sequences did you identify?

      Sequences are now provided in the supplemental for Salicaceae and Betulaceae. Based on our bioinformatic pipeline, reference library and requirement for 100% match between sequence and taxonomy, we were only able to distinguish between species level.

      Figure 2: The detection of Betulaceae is very sporadic in Stóra Vidarvatn with occurrence in only seven samples and hardly ever in all 5 repeats, suggesting that if you apply a statistical model to estimate first arrival (see Alsos et al. 2022), you will have a large confidence interval. Thus, these uncertainties should be considered when estimating the delayed arrival of Betula compared to Salix. The data from Torfdalsvatn (which I assume are from Alsos et al. 2021 although not specified in the figure legend), shows detection in all samples from the first appearance and mostly in 8 of 8 repeats (shown in the original publication - you could to the same here), thus providing a more accurate estimate for the time gap between arrival of Salix and Betula.

      Thank you for bringing up this important point. The detection of Betulaceae is indeed sporadic, but we believe it reflects the genuine nature of its presence/absence during the Holocene in Northeast Iceland. This is supported by Betula pollen from a nearby peat record that shows a similar history (Fig. 4, Karlsdóttir et al., 2014), which we have now elaborated on in the Results and Interpretation section. As for the timing of Betulaceae colonization at this site, the first appearance in the DNA record should be a close minimum estimate as shown with modern DNA and plant survey comparisons (e.g., Sjögren et al., 2017; Alsos et al., 2018). The true first appearance could be biased by small amounts of plants being present in the early stages of colonization and not registering the sedimentary record until enough dead plant material is transported to the depocenter of the lake. However, this is likely less than age model uncertainties and therefore not likely relevant on geologic timescales as in this study. In this sense, our age models and those published for the other records indicate this is generally on the order of several hundred years. In addition, we have now added the Alsos et al. (2021) reference for Torfdalsvatn. Unfortunately, this Torfdalsvatn study does not provide number of PCR repeats so we will keep the figure as is as it best represents the available data.

      Figure 5: I suggest adding lake names to the figure. Is there a dot missing for lake 5 for Salicaceae?

      Thank you for the suggestion, we have added lake names to the figure. There is a dot marked for Salicaceae for lake 5, however, not for Betulaceae as this taxon was not identified. We refer the reviewer to the Discussion Section “Postglacial sedaDNA records from the circum North Atlantic” and the lake’s original publication (Volstad et al., 2020).

      Figure 6: I find it more relevant to plot colonization time versus distance to LGM sheetice margin - lake number is just an arbitrary number.

      We appreciate the suggestion and have modified the figure accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the present manuscript, Abele et al use Salmonella strains modified to robustly induce one of two different types of regulated cell death, pyroptosis or apoptosis in all growth phases and cell types to assess the role of pyroptosis versus apoptosis in systemic versus intestinal epithelial pathogen clearance. They demonstrate that in systemic spread, which requires growth in macrophages, pyroptosis is required to eliminate Salmonella, while in intestinal epithelial cells (IEC), extrusion of the infected cell into the intestinal lumen induced by apoptosis or pyroptosis is sufficient for early pathogen restriction. The methods used in these studies are thorough and well-controlled and lead to robust results, that mostly support the conclusions. The impact on the field is considered minor as the observations are somewhat redundant with previous observations and not generalizable due to cited evidence of different outcomes in other models of infection and a relatively artificial study system that does not permit the assessment of later time points in infection due to rapid clearance. This excludes the study of later effects of differences between pyroptosis and apoptosis in IEC such as i.e. IL-18 and eicosanoid release, which are only observed in the former and can have effects later in infection.” We thank the reviewer for their time and effort in assessing our manuscript.

      We agree with the reviewer’s overall assessment. One minor clarification is that the engineering used does not express the proteins in “all growth phases”, but rather only when the SPI2 T3SS is expressed; we used the sseJ promoter, which is a SPI2 effector.

      Reviewer #2 (Public Review):

      In this study, Abele et al. present evidence to suggest that two different forms of regulated cell death, pyroptosis and apoptosis, are not equivalent in their ability to clear infection with recombinant Salmonella strains engineered to express the pro-pyroptotic NLRC4 agonist, FliC ("FliC-ON"), or the pro-apoptotic protein, BID ("BID-ON"). In general, individual experiments are well-controlled, and most conclusions are justified. However, the cohesion between different types of experiments could be strengthened and the overall impact and significance of the study could be articulated better. ”

      We thank the reviewer for their time and effort in assessing our manuscript. We agree with the reviewer’s overall assessment.

      Reviewer #1 (Recommendations For The Authors):

      Abstract: While new terms are sometimes useful for the visualization of concepts and I appreciate the "bucket list" analogy, it is not yet an accepted term in cell death research, and using it twice in the abstract seems out of order. ”

      We opted to keep the term, but reduce its use to once in the abstract with a specific comment on the recent coining of the term: “We recently suggested that such diverse tasks can be considered as different cellular “bucket lists” to be accomplished before a cell dies.” We recently coined this term in a review in Trends in Cell Biology, where three reviewers had quite positive comments about the concept. Time will tell whether this is a useful term for the cell death field or not.

      “In figure 2C-F Caspase 1 and Gsdmd deficient animals have higher levels of vector control strain than WT or Nlrc4. Could this be due to the redundancy with Nlrp3 in systemic infection described by Broz et al? Please mention in the description of the results.”

      The reviewer correctly points out a trend in the data. However, our experiments are not powered to show that this difference is statistically significant. Nevertheless, we now make note of the trend, and cite prior papers that have observed NLRC4 and NLRP3 redundancy against non-engineered S. Typhimurium strains.

      “The observation that apoptosis does not affect Salmonella systemically would be strengthened if the experiments using the BIDon strain could be taken out to a later time point, i.e. 72 or 96 h.”

      Indeed, we wanted to extend our studies to these timepoints. However, although expression of the SspH1 translocation signal is benign for 48 h, by 72 h this causes mild attenuation (regardless of whether the BID-BH3 domain is attached as cargo). We think that the degree of difficulty for SPI2 effectors to reprogram the vacuole increases over time, and that only beyond 48 h does SPI2 need to function at peak efficiency. This observation will be reported in a second manuscript that is written and will be submitted within this month. We are happy to supply this manuscript to reviewers if they would like to see the results. We also added text to the discussion to alert the reader to the caveats of engineering S. Typhimurium at later timepoints.

      “Discussion: The authors claim that pyroptotic and apoptotic signaling in IEC have the same outcome and IEC only has extrusion as a task. However, upon pyroptosis, IEC also releases IL-18 and eicosanoids, which is not the case during apoptosis. While the initial extrusion makes all the difference in early infection, Mueller et al 2016 showed that lack of IL-18 has an effect on salmonella dissemination at a 72h time point. The FlicON model can not test later time points as the bacteria will be cleared by then, but this caveat should be discussed.”

      We revised the text in the discussion to make it clear that extrusion is not the only bucket list item for IECs, and that IL-18 and eicosanoids are included in the bucket list for IECs after caspase-1 activation, and add the citation to Muller et al.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript is written in a rather colloquial style. Additional editing is recommended. ”

      We edited the abstract to limit the use of the bucket list term and to make more clear that this is a new term that our lab has proposed in a recent review in Trends in Cell Biology. The managing editor for the current manuscript at eLife commented that the prose was lively and thoughtful. We would be happy to make edits if the reviewer has more specific suggestions.

      2) It is not obvious from the Results section that all mouse infections were, in fact, mixed infections. This should be stated more clearly. Additionally, there is a minor concern regarding in vivo plasmid loss over time.

      We added text to the results to make this clearer at the beginning of each in vivo figure in the paper. Our experiments are intentionally blind to any Salmonella that have lost the plasmid. These bacteria essentially convert to a wild type phenotype, and thus are no longer representative of FliCON or BIDON bacteria. We also verify the long established equal competition between pWSK29 (amp) and pWSK129 (kan) in Supplemental Figure 2A-B. Prior experiments from the laboratory of Sam Miller and others in the 1990s showed that plasmid loss occurs at a rate of less than 1%.

      3) Results shown in Figure 4 are difficult to interpret. Essentially, the experiment is aimed at comparing the two engineered Salmonella strains (FliC-ON and BID-ON). However, these strains are very different from one another, which may have a confounding effect on the interpretation of the data.”

      The reviewer has interpreted the experiment correctly. We wanted to make clear to the reader that the two strains induce apoptosis under different kinetics. Indeed, it would be very surprising if two different engineering methods created strains that caused apoptosis with identical kinetics. We make two text edits to the results to make this clearer, concluding with “Overall, both ways of achieving apoptosis are successful in vitro, but with slightly different kinetics.”.

      4) What new insights into mechanisms of bacterial pathogenesis and host response are gained by using recombinant Salmonella (over)expressing a pro-apoptotic protein is not clearly stated.”

      We modify the introduction to make this more clear, stating: “Here, we investigate whether apoptotic pathways could be useful in clearing intracellular infection. Because S. Typhimurium likely evades apoptotic pathways, we again use engineering in order to create strains that will induce apoptosis. This allows us to study apoptosis in a controlled manner in vivo.”

      5) The Discussion section, while provocative, seems speculative and should be revised. Concepts of "backup apoptosis" and crosstalk between pyroptosis and apoptosis are intriguing, but it seems implausible to this reviewer that a cell might "know" that it will die, might "choose" how to die, and might aim to complete a "bucket list" before it loses all functional capacity. The usage of these types of terms does not help bolster the authors' central conclusions. ”

      We agree that cells do not “choose” pathways for regulated cell death. We had over-anthropomorphized the concepts surrounding these interconnected cell death pathways that are created by evolution. We edited the introduction and discussion to remove the “choose” term. However, we kept the second phrase using “know” in the discussion with an added clarifier: “Once a cell initiates cell death signaling, it “knows” that it will die (or rather evolution has created signaling cascades that are predicated upon the initiation of RCD).”. Sometimes anthropomorphizing scientific concepts can be a useful tool to facilitate understanding of complex scientific concepts. For example, the “Red Queen hypothesis” clearly anthropomorphizes the concept of continuous evolution to maintain an evolutionary equilibrium. We have found that scientists in the cell death field often think that modes of cell death are or should be interchangeable. We hope that the idea of the “bucket list” will help to crystalize the idea that distinct processes leading up to different types of regulated cell death can have very different consequences during infection.

      Additional Comments from the Reviewing Editor:

      1) The authors show that FliC-ON is not cleared from the spleen of Casp1 KO or Gsdmd KO mice. The conclusion is that the backup apoptosis pathways that should be present in these mice are insufficient to clear the bacteria from the spleen. However, although it is shown that bone marrow macrophages undergo apoptosis in vitro, I believe it is not shown that the apoptotic pathways are actually activated in the spleen. This seems like an important caveat. Could it be shown (or has it previously been shown) that the cells infected in the spleens of Casp1 KO or Gsdmd KO are activating apoptosis? If not, it seems possible that the reason the bacteria are not cleared is due to a lack of apoptosis activation rather than an ineffectiveness of apoptosis, and the authors could consider explicitly acknowledging this.”

      We agree, and added to the discussion “A final possibility is that our engineered strains are not successfully triggering apoptosis within splenic macrophages. This could be due to intrinsic differences between BMMs and splenic macrophages or could be due to bacterial virulence factors that fail to suppress apoptosis only in vitro. It is quite difficult to experimentally prove that apoptosis occurs in vivo due to rapid efferocytosis of the apoptotic cells.”

      2) Both reviewers were somewhat unhappy about some of the new terminology/metaphors that are introduced in the manuscript. I understand the reviewers' concerns but also feel that the writing is lively and thoughtful. It is up to the authors to decide whether to retain their new terminology, but the response of two expert reviewers might give the authors some pause. At a minimum, to address the concern about an unfamiliar term being used in the abstract, perhaps explicitly state that you are introducing "bucket list" as a new concept to help explain the results. The introduction of this concept may indeed be one of the novel contributions of the manuscript.”

      We opted to keep the term, but reduce its use to once in the abstract with a specific comment on the recent coining of the term: “We recently suggested that such diverse tasks can be considered as different cellular “bucket lists” to be accomplished before a cell dies.” We recently coined this term in a review in Trends in Cell Biology, where three reviewers had quite positive comments about the concept. Time will tell whether this is a useful term for the cell death field or not.

      3) Perhaps this is implied in the discussion already, but it might make sense to state the obvious difference between IECs and splenic macrophages which is that the death of the former results in the removal of the cell and its contents (i.e., Salmonella) from the tissue, whereas the death of the latter does not. This seems like the simplest explanation for why apoptosis restricts bacterial replication in IECs but not macrophages, and I am not sure if introducing the concept of a "bucket list" improves the explanation or not.”

      We agree that this narrative nicely distills the differences between these cell types. We edited the final paragraph of the discussion to include this narrative.

      4) Lastly, some minor comments

      -- p.2 "hyperactivate" instead of "hyperactive"?”

      Corrected.

      -- the authors may also want to mention Shigella, as it might provide another example that apoptotic C8dependent backup protects IECs”

      Yes, indeed, this is a good comparison to make. We added this to the discussion.

      -- p.8, in case readers are unfamiliar with the concept of a PIT, the authors should perhaps cite their own work when they first mention this concept (at the top of the page)”

      Indeed, citation added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thoughtful comments on the manuscript and to the editors for their assessment.

      We thank the reviewers for their positive feedback and appreciate that they consider our method a valid addition to previously established systems for generating recombinant RNA viruses.

      To strengthen this point, we have now included additional validation by the rescue of recombinant Chikungunya and Dengue virus from viral RNA directly, using the CLEVER protocol. This strengthens the potential of this method as a reverse genetics platform for positive-stranded viruses in general.

      The supportive data has been amended in the Results section, taken into account in Materials and Methods, and the corresponding supplementary figure (Figure S4) has been added.

      One key point raised by one of the reviewers, a comparison with different systems, could not be addressed in this manuscript as our lab does not at all perform BAC cloning. We currently do not have the necessary expertise to conduct an unbiased side-by-side comparison.

      All other comments were addressed in detail, either by including additional data or through specific clarification in the revised text. We are grateful for the careful review and constructive criticisms raised by the reviewers and feel that the corrections and additions have significantly improved the manuscript.

      We have revised the latest version posted May 30, 2023 on bioRxiv (https://doi.org/10.1101/2023.05.11.540343).

      Reviewer #1:

      Public Review:

      In this manuscript, Kipfer et al describe a method for a fast and accurate SARS-CoV2 rescue and mutagenesis. This work is based on a published method termed ISA (infectious subgenomic amplicons), in which partially overlapping DNA fragments covering the entire viral genome and additional 5' and 3' sequences are transfected into mammalian cell lines. These DNA fragments recombine in the cells, express the full length viral genomic RNA and launch replication and rescue of infectious virus.

      CLEVER, the method described here significantly improves on the ISA method to generate infectious SARS-CoV2, making it widely useful to the virology community.

      Specifically, the strengths of this method are:

      1) The successful use of various cell lines and transfection methods.

      2) Generation of a four-fragment system, which significantly improves the method efficiency due to lower number of required recombination events.

      3) Flexibility in choice of overlapping sequences, making this system more versatile.

      4) The authors demonstrated how this system can be used to introduce point mutations as well as insertion of a tag and deletion of a viral gene.

      5) Fast-tracking generation of infectious virus directly from RNA of clinical isolates by RT-PCR, without the need for cloning the fragments or using synthetic sequences.

      One weakness of the latter point, which is also pointed out by the authors, is that the direct rescue of clinical isolates was not tested for sequence fidelity.

      The manuscript clearly presents the findings, and the proof-of-concept experiments are well designed.

      Overall, this is a very useful method for SARS-CoV2 research. Importantly, it can be applicable to many other viruses, speeding up the response to newly emerging viruses than threaten the public health.

      We thank the reviewer for this positive feedback and the summary of the main points. Nevertheless, we would like to comment on point 5): “the direct rescue of clinical isolates was not tested for sequence fidelity”

      This impression by the reviewer suggests that the data was not sufficient on this point. However, the sequence fidelity after direct rescue from RNA was indeed tested in this study, even on a clonal level (please see: Table S2, or raw NGS data SRX20303605 - SRX20303607). For higher clarity, we added the following sentence to the manuscript:<br /> “Indeed, a slight increase of unintentional mutations was observed when sequencing clonal virus populations rescued from RNA directly”.

      Recommendations for the authors:

      Minor Points:

      1) On page 8, the authors write: "levels correlated very well with the viral phenotype". This sentence is not clear. Please clarify what you mean by "viral phenotype". Do you mean CPE on Vero cells?

      We corrected the sentence to: “(…) staining intensity and patterns correlated very well with the wild-type phenotype.”

      2) Page 9 "sequences were analyzed with a cut-off of 10%. Cutoff of what? please clarify.

      The sentence was rephrased to: “(…)mutations with a relative abundance of >10% in the entire virus population were analyzed”

      3) Page 15: The authors refer to the time required for completion of each step of the process. It would be helpful and informative for the readers to include a panel in figure 4, visualizing the timelines.

      We included a timeline in Figure 4, Panel A.

      4) Materials and methods, first paragraph: Please specify which human samples were collected. Do the authors refer to clinical virus isolates?

      We added the following information to the Materials and Methods section:<br /> “Human serum samples for neutralization assays were collected from SARS-CoV-2 vaccinated anonymous donors (…)”

      Clinical virus isolates (Material and Methods; Virus) were used for control experiments, neutralization assays, or as templates for RT-PCR.

      5) Supplementary figure 4A: The color scheme makes it hard to differentiate between the BA.1 and BA.5 fragments. Please choose colors that are not as similar to each other.

      Colors were adapted for better distinction.

      Reviewer #2:

      Public Review:

      The authors of the manuscript have developed and used cloning-free method. It is not entirely novel (rather it is based on previously described ISA method) but it is clearly efficient and useful complementation to the already existing methods. One of strong points of the approach use by authors is that it is very versatile, i.e. can be used in combination with already existing methods and tools. I find it important as many laboratories have already established their favorite methods to manipulate SARS-CoV-2 genome and are probably unwilling to change their approach entirely. Though authors highlight the benefits of their method these are probably not absolute - other methods may be as efficient or as fast. Still, I find myself thinking that for certain purposes I would like to complement my current approach with elements from authors CLEVER method.

      The work does not contain much novel biological data - which is expected for a paper dedicated to development of new method (or for improving the existing one). It may be kind of shortcoming as it is commonly expected that authors who have developed new methods apply it for discovery of something novel. The work stops on step of rescue the viruses and confirming their biological properties. This part is done very well and represents a strength of the study. The properties of rescued viruses were also studied using NSG methods that revealed high accuracy of the used method, which is very important as the method relies on use of PCR that is known to generate random mistakes and therefore not always method of choice.

      What I found missing is a real head-to-head comparison of the developed system with an existing alternatives, preferably some PCR-free standard methods such as use of BAC clones. There are a lot of comparisons but they are not direct, just data from different studies has been compared. Authors could also be more opened to discuss limitations of the method. One of these seems to be rather low rescue efficiency - 1 rescue event per 11,000 transfected cells. This is much lower compared to infectious plasmid (about 1 event per 100 cells or so) and infectious RNAs (often 1 event per 10 cells, for smaller genomes most of transfected cells become infected). This makes the CLEVER method poorly suitable for generation of large infectious virus libraries and excludes its usage for studies of mutant viruses that harbor strongly attenuating mutations. Many of such mutations may reduce virus genome infectivity by 3-4 orders of magnitude; with current efficiencies the use of CLEVER approach may result in false conclusions (mutant viruses will be classified as non-viable while in reality they are just strongly attenuated).

      We thank reviewer 2 for the careful review of our work and the valuable feedback. We agree that a direct comparison with other (PCR-free) methods such as BAC cloning, could be useful for demonstrating the unique benefits of the CLEVER method. However, as our laboratory does not use any BAC or YAC cloning methods, we could not ensure an unbiased side-byside comparison using different techniques.

      We would like to highlight the avoidance of any yeast/bacterial cloning steps that render the CLEVER protocol significantly faster and easier to handle. A visualization of the key steps that could be skipped using CLEVER in comparison to common reverse genetics methods is given in Figure 6.

      Further, we firmly believe that the benefits of the CLEVER method become especially apparent for large viral genomes such as the one of SARS-CoV-2, where assembly, genome amplification and sequence verification of plasmid DNA are highly inefficient and more timeconsuming than for small viruses like DENV, CHIKV or HIV.

      We agree with the reviewer that the overall transfection and recombination efficiencies observed with CLEVER seemed rather low. Although data on transfection/rescue efficiency is known for many techniques and viruses, we did not find any published data on the reconstitution of SARS-CoV-2 or viruses with similar genome sizes. Therefore, a useful comparator for our observations in relation to other techniques is currently simply missing. We therefore emphasize that the efficiencies of CLEVER were achieved with one of the largest plus-stranded RNA virus genomes, and our data can’t be directly compared to transfection efficiencies of short infectious RNAs.

      On the contrary, it was rather interesting to observe the very high rescue efficiency of infectious virus progeny. During the two years of establishing and validating the CLEVER protocol, we reached success rates for the genome reconstitution after transfection of >95 %. This was even obtained with highly attenuated mutants including rCoV2∆ORF3678 (joint deletion of ORF3a, ORF6, ORF7a, and ORF8) (Liu et al., 2022)(see Author response image 1). We amended this data in response to the reviewers’ comment and as an example of the successful rescue of an attenuated virus from five overlapping genome fragments (fragments A, B, C, D1, and D2∆ORF3678).

      The latter data were not added to the main manuscript since in this case the deletions were introduced using a different method: from the plasmid-based DNA fragment D2∆ORF3678 and not directly from PCR-based mutagenesis.

      Further, CLEVER was used for related substantial manipulations, including the complete deletion of the Envelope gene (E) which led to the creation of a single-cycle virus that may serve as a live, replication-incompetent vaccine candidate (Lett et al., 2023).

      Author response image 1.

      rCoV2∆ORF3678. Detection of intracellular SARS-CoV-2 nucleocapsid protein (N, green) and nuclei (Hoechst, blue) in Vero E6TMPRSS2 cells infected with rCoV2∆ORF3678 by immunocytochemistry. Scalebar is 200 µm in overview and 50 µm in ROI images.

      Recommendations for the authors:

      The work is nicely presented and the method authors has developed is clearly valuable. As indicated in Public review section the work would benefit from direct comparison of CLEVER with that of infectious plasmid (or RNA) based methods; direct comparison of data would be more convincing that indirect one. Authors should also discuss possible limitations of the method - this is helpful for a reader.

      We were not able to perform a direct comparison of CLEVER with other methods (see our statement above).

      We added the following section to the discussion: “Along with the advantages of the CLEVER protocol, limitations must be considered: Interestingly, virus was never rescued after transfecting Vero E6 cells, as has been observed previously (Mélade et al., 2022). Whether this is due to low transfection efficiency or the cell’s inability to recombine remains to be elucidated. Other cell lines not tested within this study will have to be tested for efficient recombination and virus production first. Further, the high sequence integrity of rescued virus is highly dependent on the fidelity of the DNA polymerase used for amplification. The use of other enzymes might negatively influence the sequence integrity of recombinant virus, as it has been observed for the direct rescue from viral RNA using a commercially available onestep RT-PCR kit. Another limitation when performing direct mutagenesis is the synthesis of long oligos to create an overlapping region. Repetitive sequences, for example, can impair synthesis, and self-annealing and hairpin formation increase with prolonged oligos.”

      Some technical corrections of the text would be beneficial. In all past of the text the use of terms applicable only for DNA or RNA is mixed and creates some confusion. For example, authors state that "the human cytomegalovirus promoter (CMV) was cloned upstream of 5' UTR and poly(A) tail, the hepatitis delta ribozyme (HDVr) and the simian virus 40 polyadenylation signal downstream of the 3' UTR". Strictly speaking it is impossible as such a construct would contain dsDNA sequence (CMV promoter) followed by ssRNA (5'UTR, polyA tail and HDV ribozyme) and then again dsDNA (SV40 terminator). So, better to be correct and add "sequences corresponding to", "dsDNA copies of" to the description of RNA elements

      We thank the reviewer for the advice but would like to state that in scientific language it is common to assume that nucleic acid cloning is based on DNA.

      We have corrected the description in the Methods section: “The human cytomegalovirus promoter (CMV) was cloned upstream of the DNA sequence of the viral 5’UTR; herein, the first five nucleotides (ATATT) correspond to the 5’UTR of SARS-CoV. Sequences corresponding to the poly(A) tail (n=35), the hepatitis delta virus ribozyme (HDVr), and the simian virus 40 polyadenylation signal (SV40pA) were cloned immediately downstream of the DNA sequence of the viral 3’UTR.”

      For ease of reading and for consistent terminology, we kept the original spelling in the rest of the manuscript.

      In description of neutralization assay authors have used temperature 34 C for incubation of virus with antibodies as well as for subsequent incubation of infected cells. Why this temperature was used?

      The following sentence was added (Materials and Methods; Cells): “A lower incubation temperature was chosen based on previous studies (V’kovski et al., 2021).”

      References

      Lett MJ, Otte F, Hauser D, Schön J, Kipfer ET, Hoffmann D, Halwe NJ, Ulrich L, Zhang Y, Cmiljanovic V, Wylezich C, Urda L, Lang C, Beer M, Mittelholzer C, Klimkait T. 2023. Single-cycle SARS-CoV-2 vaccine elicits high protection and sterilizing immunity in hamsters. doi:10.1101/2023.05.17.541127

      Liu Y, Zhang X, Liu J, Xia H, Zou J, Muruato AE, Periasamy S, Kurhade C, Plante JA, Bopp NE, Kalveram B, Bukreyev A, Ren P, Wang T, Menachery VD, Plante KS, Xie X, Weaver SC, Shi P-Y. 2022. A live-attenuated SARS-CoV-2 vaccine candidate with accessory protein deletions. Nat Commun 13:4337. doi:10.1038/s41467-022-31930-z

      V’kovski P, Gultom M, Kelly JN, Steiner S, Russeil J, Mangeat B, Cora E, Pezoldt J, Holwerda M, Kratzel A, Laloli L, Wider M, Portmann J, Tran T, Ebert N, Stalder H, Hartmann R, Gardeux V, Alpern D, Deplancke B, Thiel V, Dijkman R. 2021. Disparate temperaturedependent virus–host dynamics for SARS-CoV-2 and SARS-CoV in the human respiratory epithelium. PLoS Biol 19:e3001158. doi:10.1371/journal.pbio.3001158

    1. Author Response

      The following is the authors’ response to the original reviews.

      Note to reviewer and editor:

      In the previous version of the manuscript, we referred to ‘prevalent’ disease at baseline (e.g., prevalent cardiovascular disease). We have since changed this throughout the manuscript to ‘past or prevalent’ disease. This is a more accurate description as we ascertained diseases which occurred prior to baseline but may have been resolved by the time of the accelerometry study.

      Responses to reviewer 1:

      • I assume that not every participant provided data on all 7 nights. Did the authors exclude those who had fewer number of nights with accelerometer data (e.g., only 2-3 days), as the SRI based on fewer nights may not reliably reflect sleep regularity compared with SRI based all 7 consecutive nights?

      It is correct that not every participant provided complete accelerometry data. Most participants (88%) provided complete data. We only included participants who provided at least 2 valid measurements of the SRI (requiring valid data for at least 2 pairs of contiguous 24-hour periods). This is described in the appendix, but we have additionally now added this detail to the main text:

      “Most participants (88%) provided complete accelerometry data. Participants with fewer than two valid SRI measurements (i.e., less than 2 contiguous 24-hour wear periods; <1%) were excluded.”

      We would also like to note that our statistical analysis accounted, to some extent, for the lower reliability of SRI estimates in those with fewer nights of data. In those with sparse data, their estimated average SRI value would be pulled towards the overall sample average relatively more than for those with complete data. This is a consequence of the ‘partial pooling’ of the linear mixed effects model.

      • The primary analysis and results were based on restricted cubic spline models that allow assessment of nonlinearity. This is different from the usual strategy that starts with the simpler linear relationship and further explores potential nonlinear relationships. Did the authors have a strong rationale for a nonlinear dose-response relationship between sleep regularity and mortality, so that the assessment of linear relationships was skipped?

      We chose to model the SRI with a restricted cubic spline for two reasons. Firstly, we did expect non-linearity to be present a-priori. Partly this was because other sleep exposures (especially sleep time) have known non-linear relationships with health outcomes. We also thought that it is was plausible that a ‘plateau’ might be present, which we wanted to capture. Secondly, we decided that our primary model should be sufficiently flexible from the outset in order that we did not need to make data-driven adjustments to our model specification (e.g., adding non-linear terms depending on the results of hypothesis tests). This approach we believe to be generally safer as making data-driven changes can undermine the validity of standard errors and p-values.1

      • Was the proportional hazards assumption violated in the Cox modeling? Were discrete-time hazard models used to address the violation of the modeling assumption? Please clarify.

      Yes, the proportional hazards assumption was violated for all models except for the cardiovascular disease death model. This was the rationale for the use of the discrete time hazards model. They allowed for the inclusion of a flexible time by SRI interaction, allowing the hazard ratio to vary over the follow-up period. We have made this clearer in our revision. The following text has been added to the statistical methods:

      “In addition to Cox models, discrete-time hazards models, including an interaction between SRI and time (aggregated into 3-month intervals and modeled with a restricted cubic spline with knots at the 5th, 35th, 65th, and 95th percentiles), were fitted to relax the assumption of proportionality and allow hazard ratios (HRs) to vary over time. The SRI by time interaction in this model provided a test of proportionality (a small p value would indicate strong evidence against the proportional hazards assumption).”

      • Please provide correlations between different sleep regularity measures. Although different measures lead to the same conclusion, it is interesting that SRI appeared to provide stronger signals with mortality than the other two SD measures. In addition to what was discussed by the authors, another possibility is that SRI also captures the regularity of napping during the day which is common in older populations.

      Thank you for this helpful suggestion. We have added a correlation matrix for the different sleep regularity measures (Table S1). We have additionally added the following text to the Results:

      “The SRI was modestly negatively correlated with the sleep duration SD (-0.32) and sleep onset time SD ( 0.42; see correlation matrix in Table S1).”

      Regarding napping during the day, the algorithm we used to make determinations of sleep and wake unfortunately is not able to identify napping. This is because, in the absence of a sleep diary, it is very difficult to distinguish napping from inactivity in accelerometry data. The algorithm that we used requires the estimation of a ‘sleep period time window’, defining the period from the beginning to the end of the main sleep bout, in which sleep can be identified. Any sleep outside of this window is treated as inactivity. While some methods have been developed to estimate napping time from accelerometry without a sleep diary, we are not aware of any that are validated for adults using wrist worn accelerometers.

      This is something that was not sufficiently clear from the current manuscript. We have had added the following text to ensure this is clear in the revised version.

      Methods:

      “To distinguish sleep from sustained periods of inactivity without reference to a sleep diary (not available in the UKB), GGIR uses an algorithm to determine a daily ‘sleep period time window’ for each participant.11 This defines the time window between the onset and end of the main daily sleep period, during which periods of sustained inactivity are interpreted as sleep. The algorithm does not, by default, detect bouts of sleep outside of this window and hence is not able to identify naps.”

      Discussion:

      “In addition, sleep diaries in the UKB were not available. Consequently, the algorithm we used to determine sleep and wake relied on the identification of a main ‘sleep period time window’ and did not identify napping..”

      • Table 1 - I would suggest adding additional columns showing the variable distributions across quantiles of the SRI, which can help understand the confounding structure and the covariate associations with SRI.

      We agree that this is a good idea and we have adjusted Table 1 accordingly.

      • Figure 1 and related supplemental Figures: it would be good to label in the figure the specific HR estimate and 95% CI mentioned in the manuscript.

      Thank you for this suggestion. We agree that this would be helpful. After some consideration, we have decided to leave the figures as they are for one primary reason. This is that we want to avoid over-emphasising the 5th and 95th quantiles. As discussed above, we chose to present HRs for these quantiles as they would provide a complement to the Figures which would assist in communication (for some readers, the key results might be easier to glean from these numeric summaries than from the Figures). However, we don’t wish to overemphasise these quantiles when the full ‘dose-response’ function we believe to be of the greatest interest.

      • Additional stratified analyses by main sociodemographic factors (age, sex, SES, etc) and sleep variables (sleep duration and sleep quality) would be informative to understand the population heterogeneity in the association between sleep regularity and mortality

      Thank you for this suggestion. We have assessed effect modification across a range of key background variables (age, sex, household income, sleep duration, moderate to vigorous physical activity, prevalent CVD, and prevalent cancer). This has been added to the results. Where meaningful evidence of effect modification was noted, we have presented results within strata of the effect modifier.

      • Some brief discussion on socioeconomic aspects of sleep is needed (the authors focused on the biological mechanisms underlying the observed association), as emerging evidence suggests that sleep health is not only a biological but also a social construct. For example, a recent study in the US found that sleep regularity is the most important contributor to racial/ethnic disparities in sleep health (see PMID: 34498675).

      We agree that sleep health is both a biological and social construct. We have added the following text to the discussion to address this comment:

      Discussion:

      “Furthermore, identifying the determinants of poor sleep regularity may be of import, not only considering biological factors, but broader social determinants that impact circadian rhythmicity (e.g., racial/ethnic disparities32, neighbourhood factors33) and consequently overall health.”

      References

      1. Harrell FE. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. vol 608. Springer; 2001.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Public review:

      In this study, Porter et al report on outcomes from a small, open-label, pilot randomized clinical trial comparing dornase-alfa to the best available care in patients hospitalized with COVID-19 pneumonia. As the number of randomized participants is small, investigators describe also a contemporary cohort of controls and the study concludes about a decrease of inflammation (reflected by CRP levels) aJer 7 days of treatment but no other statistically significant clinical benefit.

      Suggestions to the authors:

      • The RCT does not follow CONSORT statement and reporting guidelines

      We thank you for this suggestion and have now amended the order and content of the manuscript to follow the CONSORT statement as closely as possible.

      • The authors have chosen a primary outcome that cannot be at least considered as clinically relevant or interesting. AJer 3 years of the pandemic with so much research, why investigate if a drug reduces CRP levels as we already have marketed drugs that provide beneficial clinical outcomes such as dexamethasone, anakinra, tocilizumab and baricitinib.

      We thank the reviewer for bringing up this central topic. The answer to this question has both a historical and practical component. This trial was initiated in June of 2020 and was completed in June of 2021. At that time there were no known treatments for the severe immune pathology of COVID19 pneumonia. In June 2020, dexamethasone data came out and we incorporated dexamethasone into the study design. It took much longer for all other anti-inflammatories to be tested. Hence, our decision to trial an approved endonuclease was based purely on basic science work on the pathogenic role of cell-free chromatin and NETs in murine sepsis and flu models and the ability of DNase I to clear them and reduce pathology in these animal models. In addition, evidence for the presence of cell-free chromatin components in COVID-19 patient plasma had already been communicated in a pre-print. Finally, several studies had reported the anti-inflammatory effects of dornase treatment in CF patients. Hence there was a strong case for a cheap, safe, pulmonary noninvasive treatment that could be self-administered outside the clinical se]ng.

      The Identification of novel/repurposed treatments effective for COVID-19 were hampered by patient recruitment to competing studies during a pandemic. This resulted in small studies with inconclusive or contrary findings. In general, effective treatments were only picked up in very large RCTs. For example, demonstrating dexamethasone as effective in COVID-19 required recruitment of 6,425 patients into the RECOVERY study. Multiple trials with anti-IL-6 gave conflicting evidence until RECOVERY recruited 4116 adults with COVID-19 (n=2022, tocilizumab and 2094, control) similar for Baracitinib (4,148 randomised to treatment and 4,008 to standard care). Anakinra is approved for patients with elevated suPAR, based on data from one randomized clinical trial of 594 patients, of whom 405 had active treatment (PMID: 34625750). However, a systematic review analysing over 1,627 patients (anakinra 888, control 739) with COVID-19 showed no benefit (PMID: 36841793). Regarding the choice of the primary endpoint, there is a wealth of clinical evidence to support the relevance of CRP as a prognostic marker for COVID-19 pneumonia patients and it is a standard diagnostic and prognostic clinical parameter in infectious disease wards. This choice in March 2020 was supported by evidence of the prognostic value of IL-6; CRP is a surrogate of IL-6. We also provide our own data from a large study of severe COVID-19 pneumonia in figure 1, showing how well CRP correlates with survival.

      In summary, our data suggest that Dornase yields an anti-inflammatory effect that is comparable or potentially superior to cytokine-blocking monotherapies at a fraction of the cost and potentially without the additional adverse effects such as the increase for co-infections.

      We now provide additional justification on these points in the introduction on pg.4 as follows:

      “The trial was ini.ated in June 2020 and was completed in September of 2021. At the start of the trial only dexamethasone had been proven to benefit hospitalized COVID-19 pneumonia pa.ents and was thus included in both arms of the trial. To increase the chance of reaching significance under challenging constraints in pa.ent access, we opted to increase our sample size by using a combina.on of randomized individuals and available CRP data from matched contemporary controls (CC) hospitalized at UCL but not recruited to a trial. These approaches demonstrated that when combined with dexamethasone, nebulized DNase treatment was an effec.ve an.-inflammatory treatment in randomized individuals with or without the implementa.on of CC data.”

      We also added the following explanation in the discussion on pg. 16:

      “Our study design offered a solution to the early screening of compounds for inclusion in larger platform trials. The study took advantage of frequent repeated measures of quantifiable CRP in each patient, to allow a smaller sample size to determine efficacy/futility than if powered on clinical outcomes. We applied a CRP-based approach that was similar to the CATALYST and ATTRACT studies. CATALYST showed in much smaller groups (usual care, 54, namilumab, 57 and infliximab, 35) that namilumab that is an antibody that blocks the cytokine GM-CSF reduced CRP even in participants treated with dexamethasone whereas infliximab that targets TNF-α had no significant effect on CRP. This led to a suggestion that namilumab should be considered as an agent to be prioritised for further investigation in the RECOVERY trial. A direct comparison of our results with CATALYST is difficult due to the different nature of the modelling employed in the two studies. However, in general Dornase alfa exhibited comparable significance in the reduction in CRP compared to standard of care as described for namilumab at a fraction of the cost. Furthermore, endonuclease therapies may prove superior to cytokine blocking monotherapies, as they are unlikely to increase the risk for microbial co-infections that have been reported for antibody therapies that neutralize cytokines that are critical for immune defence such as IL-1β, IL-6 or GM-CSF. “

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      This information is provided in the analysis on pg. 8:

      “The primary outcome was the least square (LS) mean CRP up to 7 days or at hospital discharge whichever was sooner.”

      • Why day 35 was chosen for the read-out of the endpointt?

      We now state on pg. 8 that “Day 35 was chosen as being likely to include most early mortality due to COVID-19 being 4 weeks after completion of a week of treatment. ( i.e. d7 of treatment +28 (4 x 7 days))”

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      We initially aimed at a fully randomized trial. However, the swiJ implementation of trial prioritization strategies towards large and pre-established trial plamorms in the UK made the recruitment COVID19 patients to small studies extremely challenging. Thus, we struggled to gain access to patients. Our power calculations suggested that a mixed trial with randomized and contemporary controls was the best way forward under these restrictions in patient access that could provide sufficient power.

      That being said, we also provide the primary endpoint (CRP) results in Fig. 3B as well as the results for the length of hospitalization (Fig. S3D) for the randomized subjects only.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      We apologize if this point was confusing. We performed the analysis on the ITT as defined in our SAP: “The primary analysis population will be all evaluable patients randomised to BAC + dornase alfa or BAC only who have at least one post-baseline CRP measurement, as well as matched historical comparators.”

      We understand that the reason this might be mistaken as an mITT is because the N in the ITT (39) doesn’t match the number randomised and because we had stated on pg. 8 that “ Efficacy assessments of primary and secondary outcomes in the modified inten.on-to-treat popula.on were performed.”

      However, we did randomise 41 participants, but:

      One participant in the DA arm never received treatment. The individual withdrew consent and was replaced. We also have no CRP data for this participant in the database, so they were unevaluable, and we couldn’t include them in the baseline table even if we wanted to. In addition, 1 participant in BAC only had a baseline CRP measurement available. Hence not evaluable as we only have a baseline CRP measurement for this participant.

      We have corrected the confusing statement on pg. 8 and added an additional explanation.

      “Efficacy assessments of primary and secondary outcomes in the inten.on-to-treat (ITT) popula.on were performed on all randomised par.cipants who had received at least one dose of dornase alfa if randomized to treatment. For full details see Sta.s.cal Analysis Plan. The ITT was adjusted to mi.gate the following protocol viola.ons where one par.cipant in the BAC arm and one in the DA arm withdrew before they received treatment and provided only a baseline CRP measurement available. The par.cipant in the DA arm was replaced with an addi.onal recruited pa.ent. Exploratory endpoints were only available in randomised par.cipants and not in the CC. In this case, a post hoc within group analysis was conducted to compare baseline and post-baseline measurements.”

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      Our protocol pre-specified that the primary analysis population should have at least one postbaseline CRP measurement (pg. 13 of protocol). The patient that was excluded was one that initially joined the trial but withdrew consent after the first treatment but before the first post-treatment blood sample could be drawn. Hence, the pre-treatment CRP of this patient alone provided no useful information.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatments as BAT received by those patients except for dexamethasone.

      Table 1 includes all 39 patients plus 60 CCs.<br /> Table 2 shows additional treatments given for COVID-19 as part of BAC.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      One of the main criticisms we have encountered in this study has been the choice of the primary endpoint. The best way respond to these questions was to provide data to support the prognostic relevance of CRP in COVID-19 pneumonia from a separate independent study where no other treatments such as dexamethasone, anakinra or anti-IL6 therapies were administered. We think this is very useful analysis and provides essential context for the trial and the choice of the primary endpoint, indicating that CRP has good enough resolution to predict clinical outcomes.

      • Propensity-score selected contemporary controls may introduce bias in favor of the primary study analysis, since controls are already adjusted for age, sex and comorbidities.

      The contemporary controls were selected to best match the characteristics of the randomized patients including that the first CRP measurement upon admission surpassed the trial threshold, so we do not see how this selection process introduces biases, as it was blinded with regards to the course of the CRP measurements. Given that this was a small trial, matching for baseline characteristics is necessary to minimize confounding effects.

      • The authors do not clearly present numerically survivors and non-survivors at day 34, even though this is one of the main secondary outcomes.

      We now provide the mortality numbers in the following paragraph on pg. 13.

      “Over 35 days follow up, 1 person in the BAC + dornase-alfa group died, compared to 8 in the BAC group. The hazard ra.o observed in the Cox propor.onal hazards model (95% CI) was 0.47 (0.06, 3.86), which es.mates that throughout 35 days follow-up, there was a 53% reduced chance of death at any given .mepoint in the BAC + dornase-alfa group compared to the BAC group, though the confidence intervals are wide due to a small number of events. The p-value from a log-rank test was 0.460, which does not reach sta.s.cal significance at an alpha of 0.05.”

      • It is unclear why another cohort (Berlin) was used to associate CRP with mortality. CRP association with mortality should (also) be performed within the current study.

      As we explained above, the Berlin cohort CRP data serve to substantiate the relevance of CRP as a primary endpoint in a cohort that experienced sufficient mortality as this cohort did not receive any approved anti-inflammatory therapy. Mortality in our COVASE trial was minimal, since all patients were on dexamethasone and did not reach the highest severity grade, since we opted to treat patients before they deteriorated further. The overall mortality was 8% across all arms of our study, which does not provide enough events for mortality measurements. In contrast the Berlin cohort did not receive dexamethasone and all patients had reached a WHO severity grade 7 category with mortality at 30%.

      My other concerns are:

      • This report is about an RCT and the authors should follow the CONSORT reporting guidelines. Please amend the manuscript and Figure 1b accordingly and provide a CONSORT checklist.

      We now provide a CONSORT checklist and have amended the CONSORT diagram accordingly.

      • Please provide in brief the exclusion criteria in the main manuscript

      We have now included the exclusion criteria in the manuscript on pg. 6.

      “1.1.1 Exclusion criteria

      1. Females who are pregnant, planning pregnancy or breasmeeding

      2. Concurrent and/or recent involvement in other research or use of another experimental inves.ga.onal medicinal product that is likely to interfere with the study medica.on within (specify .me period e.g. last 3 months) of study enrolment 3. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require mechanical invasive or non-invasive ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Any major disorder that in the opinion of the Inves.gator would interfere with the evalua.on of the results or cons.tute a health risk for the trial par.cipant

      4. Terminal disease and life expectancy <12 months without COVID-19

      5. Known allergies to dornase alfa and excipients

      6. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period So briefly Pa.ents were excluded if they were:

      7. pregnant, planning pregnancy or breasmeeding 2. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Terminal disease and life expectancy <12 months without COVID-19

      4. Known allergies to dornase alfa and excipients

      5. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period”

      • "The final trial visit occurred at day 35." "Analysis included mortality at day 35". I am not sure I understand why. In clinicaltrials.gov all endpoints are meant to be studies at day 7 except for mortality rate day 28. Why day 35 was chosen? Please be consistent.

      Thank you for identifying this inconsistency. We have amended the record on clinicaltrials.gov to read ‘’the time to event data was censored at 28 days post last dose (up to d35) for the randomised participants and at the date of the last electronic record for the CC.”

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      • Figure 1b as in CONSORT statement, please provide reasons why screened patients were not enrolled.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatment as BAT received those patients except for dexamethasone.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      • In Figure 2 the authors draw results about ITT although in methods describe that they performed an mITT analysis. Please be consistent.

      Please see answers provided to these queries above.

      Reviewer #2 (Recommendations For The Authors):

      1) Suppl Figure 2B would be more informative if presented as a Table with N of patients with per day sampling

      We now provide the primary end point daily sampling table in Table 3.

      2) The numbers at risk should figure under the KM curves

      The numbers at risk for figures 1E, 2C, 2D have been added as graphs either in the main figures or in the supplement.

      3) HD in Supplementary figure 3 should be explained

      We apologize for this omission. We now provide a description for the healthy donor samples that we used in the cell-free DNA measurements in figure S3B on pg. 14:

      “Compared to the plasma of anonymized healthy donors volunteers at the Francis Crick ins.tute (HD), plasma cf-DNA levels were elevated in both BAC and DA-treated COVASE par.cipants.

      4) Presentation is inappropriate for Table S4

      We thank the reviewer for pointing this issue. We have now formaxed Table S4 to be consistent with all other tables.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Hage et al. presents interesting results from a foraging behavior in Marmosets that explores the interactions of saccade and lick vigor with pupil dilation and performance as well as a marginal value theory and foraging theory-inspired value-based decision-making model thereof. The results are generally robust and carefully presented and analyses, particularly of vigor, are carefully executed.

      The authors constructed a model that makes two predictions: "In summary, this simple theory made two sets of predictions: in response to an increased cost of harvest, one should work longer, but move with reduced vigor. In response to an increased reward value, as in hunger, one should also work longer, but now move with increased vigor." Their behavioral data meets these predictions. It is not clear if the model was designed and tweaked in order to make those predictions and match the data, or derived from principles. Furthermore, it is not clear what other models would make similar predictions. It would help to assess what is predicted by other simple models, as well as different functional forms for the effort costs in their model.

      We chose this formulation of utility (Eq. 1) because it is a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another reward opportunity (Richardson and Verbeek 1986; Stephens and Krebs 1986; Bautista et al. 2001). In a typical formulation of the theory, the numerator represents the reward gained (in units of energy), minus the effort expended (also in units of energy). The denominator represents the amount of time spent during that behavior. We represented this idea in Eq. (1) with saccades that produced reward accumulation, and licks that produced reward consumption. Thus, the utility that we are trying to maximize is the rate of energy gained.

      The specific functions that we used to represent the energy acquired through reward acquisition, and the energy expended through effort expenditure, came a priori either from experiment design, or from the measurements we have made in other experiments. We modeled reward accumulation as a linear rise in energy stored because successful saccades produced a linear increase in the food cache. We modeled consumption of the food as a hyperbolic function of the number of licks to represent the fact that as the licking bout began, each successful lick depleted the food, and thus the first few licks produced a greater amount of food consumption than the last few licks. We modeled the effort cost of licking to grow linearly with the number of licks.

      A critical assumption that we made is that energy spent performing the saccade trials (which grew faster than linearly as a function of the number of trials attempted), grew faster than the time spent attempting those same trials (which grew linearly with the number of trials). This assumption is based on the heuristic that the average rate of energy lost following a large number of attempted trials is greater than the average rate of energy lost following a small number of attempted trials.

      Sensitivity to parameter values: The model’s simplicity provides closed-form solutions across all parameter values, allowing one to make predictions without having to fit the model to the measured data. For example, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that in order to maximize the capture rate, an increase in the effort that it takes to harvest the reward should produce a greater willingness to work longer, caching more food. The closed-form solutions are presented in the Mathematica supplementary document.

      Other models of utility: In composing our utility (Eq. 1), we chose to combine reward and effort additively. This is in contrast to other approaches in which effort discounts reward multiplicatively (47–49). Here, let us show that multiplicative interactions may have the limitation that they are incompatible with the observation that reward invigorates movements. To compare additive and multiplicative approaches, let us consider an arbitrary function 𝑈(𝑇) that specifies how effort varies with movement duration. Typically, this is a U-shaped function that describes energy expenditure as a function of movement duration, as in Shadmehr et al. (2016). In the case of multiplicative interaction between reward and effort, we can consider the following representation of utility:

      In the above formulation, reward 𝛼 is discounted hyperbolically with time, and an increase in reward increases the utility of the action. The optimum movement vigor has the duration 𝑇∗ that maximizes this utility. Notably, because increasing reward merely scales this utility, it has no effect on vigor. Thus, a utility in which reward is multiplied by a function of effort generally fails to predict dependence of movement vigor on reward.

      Line 37 page 6; the link of pupil to NE/LC is tenuous. Other modulators systems and circuits may be equally important and should be mentioned (e.g. Reimer, Jacob, Matthew J. McGinley, Yang Liu, Charles Rodenkirch, Qi Wang, David A. McCormick, and Andreas S. Tolias. "Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex." Nature communications 7, no. 1 (2016): 13289.)

      Reimer et al. (2016) used two-photon microscopy to measure activity of ACh and NE projections in layer 1 of mouse visual cortex while tracking pupil diameter fluctuations. During stillness, elevated pupil diameter was followed by cholinergic and noradrenergic axonal activity. Notably, NE activity levels were larger and with shorter latency than ACh. In primates, Joshi et al., (2016) recorded from LC during a fixation task. Using spike-triggered averaging, they found that following a spike in an LC neuron, there was pupil dilation at 200-300 ms latency. Moreover, microstimulation in LC produced pupil dilation at 500ms latency. More recently, Breton-Provencher and Sur (2019) provided causal evidence that LC activity drives pupil size. They optogenetically activated (1s) or silenced (5 sec) locus coeruleus noradrenergic neurons and found strong increase in pupil size or modest decrease: increase had a slow time scale of 1 second or more, similar slow timescale for decrease. The LC-NA neurons are surrounded by GABA-ergic neurons. Stimulation of the GABA-ergic neurons produced mild, slow constriction. They identified GABA-ergic and NA neurons by photo-tagging and then tried to identify them via spike shape and found that “spike shape of some GABA neurons were not well separated from NA neurons, demonstrating the difficulty of cell-type identification based on spike shape alone.” They noted that a subset of GABAergic neurons received coincident inputs with the NA neurons. When the GABA neurons were excited, the gain of the pupil response to an auditory tone was diminished, producing an increase as a function of tone intensity that had a lower gain. Thus, LC-NA neurons causally drive pupil size, and the GABA neurons that surround them control the gain of the response of LC-NA neurons to arousal stimuli.

      Line 35 page 6-page 7 line 10 emphasizes a cognitive interpretation of the pupil dilations that is emphasized, in relation to effort costs. But there are also more concomitant vigorous movements. Could all of their pupil results be explained by motor correlates? This should be tested and ruled out before making cognitive interpretations.

      Pupil dilation is a proxy for activity in the brainstem neuromodulatory system (Vazey et al., 2018) and is a measure of arousal (Mathot, 2018). Control of pupil size is dependent on spiking of norepinephrine neurons in locus coeruleus (LC-NE): an increase in the activity of these neurons produces pupil dilation (Joshi et al., 2016; Breton-Provencher and Sur, 2019). Some of these neurons show a transient change in their activity when acquisition of reward requires expenditure of physical effort (Bornert and Bouret, 2021). However, the link between effort costs and pupil size appears to go beyond motor control, as a recent paper found that pupil size increases during effortful speech perception (Contadini-Wright et al., 2023). Thus, although in our work increases in pupil size were always associated with increased movement vigor, the results from other studies suggest that economic variables such as cognitive effort in tasks in which there is no concomitant movement also drive an increase in pupil size.

      Page 7, line 37-42: How would the model need to be modified in order to account for this discrepancy with the data? Ideally, this would be tested.

      We comment on potential modifications that can be made to the model that may account for the discrepancy referred to by the reviewer in the discussion section: “Notably, some of the predictions of the theory did not agree with the experimental data. An increased effort cost did not accompany a reduction in the duration of harvest, and hunger did not increase saccade vigor robustly. Indeed, earlier experiments have shown that if the effort cost of harvest increases, animals who expend the effort will then linger longer to harvest more of the reward that they have earned (2). This mismatch between observed behavior and theory highlights some of the limitations of our formulation. For example, our capture rate reflected a single work-harvest period, rather than a long sequence. Moreover, the capture rate did not consider the fact that the food tube had finite capacity, beyond which the food would fall and be wasted. This constraint would discourage a policy of working more but harvesting less. Finally, if we assume that a reduced body weight is a proxy for increased subjective value of reward, it is notable that we observed a robust effect on vigor of licks, but not saccades. A more realistic capture rate formulation awaits simulations, possibly one that describes capture rate not as the ratio of two sums (sum of gains and losses with respect to sum of time), but rather the expected value of the ratio of each gain and loss with respect to time (Bateson et al., 1995 & 1996).”

      Page 9, line 2-11: In this section, it would help to also consider 'baseline' pupil size (in between trials). This would give a signal that is not 'contaminated' by movements, and may reflect control state. Relatedly, changes in control state may impact and confound the movement-related dilation magnitudes due to e.g. floor and ceiling effects on pupil size, which has a strong tendency for reversion to the mean.

      The experiment design included little or no between-trial periods because during the trials the subjects worked (performed saccades to accumulate reward), while after completing a few trials they stopped working and started harvesting through licking. Because primates make saccades during their entire wake state, it is probably not possible to find a significant period in which the subjects do not make any movements. We selected a window of 500 ms around each lick in the harvest period, and each saccade during the work period, and computed the average pupil size per movement, which includes data from both before and after movements. We then computed a within-session z-score by normalizing these measures by the average pupil size acquired for that day.

      The hunger-related and reward-size related analyses are both heavily confounded since they were not manipulated directly and could co-vary with many latent factors. For example, why might a given Marmoset be lower weight on a given day? Could it affect sleep, stress, activity, or other factors during the preceding 24 hours? If so, could these other variables be driving the results that are interpreted as 'hunger?' Relatedly, since the reward size is determined by the animals behavior on each trial (how much they worked), factors (internal brain state, external noises, etc.) that alter how much they worked will influence the subsequent reward size. Therefore interpretations about reward expectancy are confounded. Both of these issues should be discussed and manipulations of them (different feeding schedules and reward size-work functions proposed, respectively).

      Weight of the subjects was measured prior to the start of the experiment on each day. The natural fluctuations are typically the result of factors such as time of the experiment and corresponding weight measurement (AM vs PM) relative to the time of feeding on the previous day, day of the week of the experiment (following a weekend vs. during the week), and volume of food given during the previous day. Animals were maintained at 90% of their baseline weight during food restriction, and fluctuations typically occurred within that range (Sedaghat-Nejad et al., 2019). We used weight as a proxy for hunger, and thus value of reward, and the resulting analyses yielded results consistent with predictions made by our model, as seen in Fig. 5. Critically, other factors that may co-vary with lower weights, like those mentioned by the reviewer (sleep conditions, stress levels, and activity levels) often lead to very poor task performance by the subjects. In sharp contrast, the model predicted increased work period, and increased movement vigor for high reward value, both of which we observed when the subject’s weight was low. Thus, a low relative weight did not seem to impair performance, but rather act as a motivating factor. Subjects were closely monitored for well characterized stress-related behaviors and impaired attentive states by experimenters, veterinarian staff, and caretaker staff, and, in the event of abnormalities, were removed from food restriction and experimentation until behavior stabilized.

      Effect of reward size: As you noted, we did not manipulate reward size directly. Rather, because our emphasis was on quantifying the effect of effort, the subjects received the same increment of reward per each completed trial, but on some sessions this reward was easy to harvest, while in other sessions the reward required greater effort to harvest. Because the reward amount accumulated during the work period, some harvests encountered a small amount of reward, while other harvests encountered a large amount of reward. Indeed, the amount of reward available for harvest depended linearly on the number of successful saccade trials completed during the work period. We found that the vigor of licks grew with the reward magnitude.

      A major issue is a lack of alternative models. The authors seem to have constructed a particular model designed to capture the behavioral patterns they observed in the data. The model fails in some instances, as they point out. Even more importantly, there are no results or discussion about how other plausible models could or couldn't fit the data. The lack of model comparisons makes it difficult to interpret the conclusions or put the results in a broader context.

      To model behavior, we chose a formulation of utility that represented a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another patch. In the model, the objective of decisions and actions is to maximize the sum of reward acquired, minus the efforts expended, divided by time. This is termed the capture rate. However, there are other models to consider, and thus we added a new section titled Model formulation and Other models of utility.

      Reviewer #2 (Public Review):

      The model proposed in the paper takes a very specific functional form that is neither motivated by the previous literature nor particularly useful for indexing the behavioral tendencies of individual monkeys (or of the same monkey in different contexts). For example, while it is clear that the saccade effort cost will need to outgrow the increase in the utility of the accumulated food for the monkey to start feeding, it is unclear why this needs to be modeled with a fixed quadratic exponent on the number of saccades? Similarly, why do licks deplete the food stash with the specific rate hard-coded in the model?

      We added a section titled Model formulation and Other models of utility to better explain the rationale behind the model.

      We chose this formulation of utility (Eq. 1) because it is a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another reward opportunity (Richardson and Verbeek, 1986; Stephens and Krebs, 1986; Bautista et al., 2001). In a typical formulation of the theory, the numerator represents the reward gained (in units of energy), minus the effort expended (also in units of energy), while the denominator represents the amount of time spent during that behavior. We represented this idea in Eq. (1) with saccades that produced reward accumulation, and licks that produced reward consumption. Thus, the utility that we aim to maximize is the rate of energy gained.

      The specific functions that we used to represent the energy gained through reward acquisition, and the energy expended through effort expenditure, came either from experiment design, or from the measurements we have made in other experiments. We modeled reward accumulation as a linear rise in energy stored because successful saccades produced a linear increase in the food cache. We modeled consumption of the food as a hyperbolic function of the number of licks to represent the fact that as the licking bout began, each successful lick depleted the food, and thus the first few licks produced a greater amount of food consumption than the last few licks. We modeled the effort cost of licking to grow linearly with the number of licks.

      A critical assumption that we made is that energy expended performing the saccade trials (which grew faster than linearly as a function of the number of trials attempted), grew faster than the time spent attempting those same trials (which grew linearly with the number of trials). This assumption is based on the heuristic that the average rate of energy lost following a large number of attempted trials is greater than the average rate of energy lost following a small number of attempted trials. A quadratic function is one example of such a function, which has the advantage of providing closed form solutions for the optimal policy.

      The model’s simplicity provided closed-form solutions across all parameter values, allowing us to make predictions without having to fit the model to the measured data. Critically, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that to maximize the capture rate, regardless of parameter values, an increase in the effort required for harvest should be met with a greater willingness to work. The closed-form solutions are presented in the supplementary document (simulations.nb).

      Finally, the proportion of successful saccades and lick events is assumed to be fixed, even though it very likely to be directly influenced by movement speed (speed- accuracy trade-off), which is also contained in the model. It would strongly increase the plausibility and potential impact of the model if the authors could clearly state where these hard-coded model terms come from. Ideally, they would formulate the model in more general terms and also consider other functional forms, as briefly suggested in the discussion. This latter point would be particularly important since not all model predictions were actually borne out in the data.

      Thank you for this excellent suggestion. Regarding saccades, contrary to the speed accuracy trade-off hypothesis, we found that faster saccades were also more accurate (Fig. 3C). Thus, increased pupil size was not only associated with more vigorous saccades, but also more accurate saccades. Importantly, these vigor-related changes in accuracy were too small to affect the probability of reward: the reward area for the saccades was much larger (1.5 deg) than the endpoint accuracy changes that was produced due to changes in the food tube distance. For example, on average saccade vigor changed from 0.95 to 1.05 when the food tube distance changed from 12 mm to 8 mm. These changes in vigor would produce a fraction of degree reduction in endpoint error (Fig. 3C).

      Regarding licks, we added new data to the manuscript to assess the relationship between vigor of the licks and endpoint accuracy. We saw no consistent relationship, across subjects or effort conditions, between protraction speed and the outcome of a lick, that is, if the lick was successful in making it inside the tube. On average, in subject R we observed an improvement in lick accuracy with increased vigor, and in subject M we saw no change (Fig. 4F). Thus, we used the average success rate of licks, which was roughly 30% for both subjects.

      The authors derive qualitative predictions, by simulating their model with apparently arbitrary parameters. They then test these qualitative predictions with conventional statistics (e.g., t-tests of whether monkeys lick more for high vs low effort trials). The reader wonders why the authors chose this route, instead of formulating their model with flexible parameters and then fitting these to data. This would allow them (and future researchers) to test their model not just qualitatively but also quantitatively, and to compare the plausibility of different functional forms. The authors certainly have enough data and power to do this, given the vast number of sessions the monkey completed.

      The model’s simplicity provides closed-form solutions across all parameter values, allowing one to make predictions without having to fit the model to the measured data. For example, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that to maximize the capture rate, an increase in the effort that it takes to harvest the reward should produce a greater willingness to work longer, caching more food. The closed-form solutions are presented in the Mathematica supplementary document.

      The effort manipulation chosen by the authors (distance of food tube) goes hand in hand with a greater need for precision since the monkey's tongue needs to hit an opening of similar size, but now located at a greater distance. This raises the question of whether the monkeys moved slower to enhance its chance of collecting the food (in line with a speed-accuracy trade off). The manuscript would benefit from an explicit test of this possibility, for example by reporting whether for each of the two conditions, the speed of tongue movements on a trial-by-trial basis predicts the probability of food collection? At the very least, the manuscript should explicitly discuss this issue and how it affects the certainty with which effects of tube distance can be linked to anticipated effort cost alone.

      Thank you for the excellent point. We looked for but found no consistent relationship, across subjects or effort conditions, between protraction speed of the tongue and the success probability of a lick (probability of insertion into the tube). Regardless, we agree with you that it is an excellent alternate hypothesis that reductions in lick vigor that accompanied increased distance of the tube may be due to a desire to maintain accuracy, and not a reflection of increased effort cost of reward. To incorporate this idea into the model, we would need a measure of speed-accuracy for the licks, something that we do not have but hope to develop in the future.

      However, perhaps the most interesting aspect of our results is that when we increased tube distance, making reward more effortful, there was not only a reduction in lick vigor, but also a reduction in saccade vigor. That is, the decisions and actions during the work period responded to the increased effort cost of reward during the harvest period. These changes accompanied dilation of the pupil, both in the work period and in the harvest period. We now include a paragraph regarding this in the Discussion.

      The manuscript measures pupil dilation in a time period ranging from -250ms before to 250 ms after saccade onset. However, the pupil changes strongly during saccade execution relative to the preceding baseline, leaving doubts as to whether the aggregated measure blurs several interesting and potentially different effects. It would be more conclusive if the manuscript could report the analyses of pupil size separately for a period prior to saccade onset and during/after the saccade.

      Our goal was to test for general correlations between the state of the pupil and both movement vigor and decisions. We chose a window of 500 ms around saccade onset, as referred to by the reviewer, as it allowed us a large enough time window to measure pupil size outside of the movement itself (~30 ms duration), to accurately capture the state of the animal around initiation and end of a saccade. Critically, pupil tracking during a saccade itself, when using infrared eye tracking techniques, can be prone to slight measurement error in certain cases due to tracking jitter. Thus, averaging across this window, following processing of the signal, results in a more accurate measure of pupil size.

  2. Oct 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the critical review of our manuscript. We believe that we have addressed the questions and concerns raised by the reviewers to the best of our ability. As part of the revision, we conducted two new experiments to enhance the rigor of the conclusions and to provide more insights into the mechanism of STEAP proteins, and we reorganized the Results section, as suggested by the reviewers, following to a clearer logical thread. The new data are briefly summarized below.

      1) Reduction of L230G STEAP1 by reduced FAD. We made Leu230Gly STEAP1 mutant and measured the rate of heme reduction by reduced FAD. We found that the rate of heme reduction in L230G STEAP1 is slower than that in the wild type STEAP1. Since Leu230 is solvent accessible only from the intracellular side, this result supports the conclusion that reduced FAD binds to STEAP1 on the intracellular side and reduces the heme. This result also indicates that leucine, which is found at the equivalent position in STEAP1, 2 and 3, and Phe359 in STEAP4, has a significant role in mediating electron transfer from FAD to the bound heme.

      2) Reduction of STEAP2 by reduced FAD. We showed that STEAP2 can be reduced when supplied with reduced FAD, and that the rate of heme reduction is significantly slower than that of reduction of STEAP1 by reduced FAD. This result is consistent with presence of the oxidoreductase domain (OxRD)† in STEAP2, which hampers direct entrance of the isoalloxazine ring of FAD to its binding pocket in the transmembrane domain (TMD). On the other hand, the rate of heme reduction by reduced FAD is much faster than that of heme reduction in the presence of NADPH and FAD, indicating that reduction of FAD by NADPH is rate-limiting in the electron transfer chain in STEAP2.

      †: To be consistent with literature, we adopted the nomenclature “oxidoreductase domain (OxRD)” for the N-terminal soluble domain in STEAP proteins. We used the term “reductase domain (RED)” in the previous version of our manuscript.

      Reviewer #1 (Public Review):

      This important study reveals the structure of human STEAP2 for the first time and suggests the electron transport pathway, but some questions remain regarding the interpretation of the in vitro electron transport experiments, the lack of available redox couples, and potential alternative hypotheses that would if addressed, strengthen the claims in the manuscript.

      Strengths

      One of the clear strengths of the manuscript that stands out is the determination of the structure of human STEAP2. The structures of some other homologs are known, but STEAP2's structure was not, and STEAP2 seems to have an unusually low activity towards certain metal chelates. The approach of producing the human STEAP2 in insect cells with the supplementation of cofactor biogenesis components nicely results in cofactor-replete protein. The structure of STEAP2 reveals a domain-swapped trimer, with the NADPH-binding domain of the neighboring protomer within electron-transport distance of the FAD-heme axis. The FAD has an interesting and somewhat unusual extended conformation and abuts a Leu residue that may regulate electron transport. Another strength of the manuscript is the demonstration that STEAP1, which does not have the internal NADPH binding domain, can interact modestly and shuttle electrons to the heme in STEAP1 through FAD. These experiments nicely expand information on the function of STEAP1 and provide a structural basis for electron transport in STEAP2.

      Weaknesses

      A major weakness in the manuscript lies with the kinetics data and how the data, as presented, are unclear to the reader regarding their impact and their connection to the purported electron transport scheme. While multiple sets of data are reported, the analysis in all cases is simply the reduction of a hexacoordinate heme and its related spectra and kinetic parameters. In most cases, it's unclear to the reader which part of the electron pathway is being tested in which experiment. Simple diagrams would be helpful in each case. However, it's also unclear if all of the potential order of addition experiments were actually performed; i.e., flavin but no NADPH; NADPH but no flavin; flavin before NADPH; flavin after NADPH, etc. As there are multiple permutations that should be tested to demonstrate the electron transport pathway, presenting the data in a way that makes it clear to the reader is challenging. Particularly missing are the determined redox potentials of the hemes in both STEAP1 and STEAP2. Could differences in these heme redox potentials be drivers of the difference in metal reduction rates?

      We re-structured the manuscript to follow a clearer logical thread. We provided explanations for which electron transfer steps are being examined in each experiment.

      We cannot reliably determine EM due to insufficient amount of purified proteins. We are inclined to think that the bound heme on STEAP1 and STEAP2 have similar EM, based on their similar coordination geometry and nearly identical UV-Vis and MCD spectra. Thus, different rates of Fe3+-NTA reduction by STEAP1 and STEAP2 are likely due to differences in substrate binding site rather than different EM.

      Also, the text indicates that STEAP2 does not show a reduction rate dependence on the [Fe3+NTA], but Figure 1A shows a difference in rates dependent on [Fe3+-NTA], and the shape of the reduction curve also changes based on [Fe3+-NTA]. This discrepancy should be rectified.

      We fixed this error. The reduction of Fe3+-NTA by ferrous STEAP2 shows multiple phases and the reaction rates within the initial 2 seconds are weakly dependent on [Fe3+-NTA].

      A second major weakness is the lack of any verification of the relevance of the STEAP2 oligomerization to its in vivo function. Is the same domain-swapped trimer known to exist in vivo? If the protein were prepared in other detergents, is the oligomerization preserved? It is alluded to in the text that another STEAP protein is also a trimer. Was this oligomerization verified in vivo?

      The domain-swapped assembly is an interesting phenomenon, and it seems to provide a solution for bringing the FAD closer to heme. The same domain swapped trimeric assembly is also observed in the structure of STEAP4, which was purified in a different detergent (Nat Commun (2018), 9, page 4337). It is likely that this feature is shared by STEAP2, 3, and 4, and preserved during the purification process.

      Could this oligomerization be disrupted to impede or abrogate electron transport to underscore the oligomerization relevance? This point is germane, as it would further suggest that the domain-swapped trimer observed in the STEAP2 cryo-EM structure is physiologically relevant, especially given how far the distance between the NADPH and the FAD would otherwise be to support electron transport.

      We agree with the reviewer’s reasoning that the oligomeric assembly is required for proper function of STEAPs and thus could potentially be utilized for functional regulation. However, we are not aware of any physiologically relevant stimuli or treatment that would allow regulation of STEAP functions by inducing or forming different oligomeric states. Our experience with STEAP proteins is that the trimeric assembly is stable and well-preserved during the purification process and can only be disrupted under denaturing conditions such as SDS-PAGE.

      Beyond these two areas in which the manuscript could be improved there are a couple of minor considerations. First, the modest resolution of the STEAP2 structure prevents assigning the states of NADP+/NADPH and FAD/FADH2 with confidence. An orthogonal measure would be useful for modeling the accurate states in the structure.

      We agree. We clarified the ambiguity and stated in the main text that the cryo-EM structure of STEAP2 was determined in the presence of NADP+ and FAD.

      Finally, the BLI b5R/STEAP1 binding/unbinding fits seem somewhat poor, especially at high concentrations of b5R in the dissociation regime, which likely influences the derived value of Kd. A different fitting equilibrium might yield better agreement between the experimental and theoretical results. Moreover, whether this binding strength is influenced by the reduction state of the NADPH would be helpful in understanding and contextualizing the weak binding affinity.

      We think that non-specific binding likely causes deviations from the simple binding model at higher b5R concentrations. We made a comment on this in the main text. We agree with the reviewer that the interactions between b5R and STEAP1 could be redox dependent, for example, a reduced FAD on b5R may enhance the affinity. We could implement this by performing the binding experiments in an anaerobic chamber, but this is beyond the scope of the current study.

      Reviewer #2 (Public Review):

      The manuscript provides new insight into a family of human enzymes. It demonstrates that STEAP2 can reduce iron and STEAP1 can be promiscuous regarding the source of electron donors that it can use. The quality of the kinetics experiment and the structural analysis is excellent. I am less enthusiastic about the interpretation of data and the take-home message that the manuscript intends to deliver. Above all, the work combines data on STEAP2 and STEAP1 and it remains unclear which questions are ultimately addressed. A second critical point is about the interpretation of the experiment demonstrating that STEAP1 can be reduced by cytochrome b5 reductase. The abstract states that "We show that STEAP1 can form an electron transfer chain with cytochrome b5 reductase" whereas the main text discusses the data by suggesting that "we speculate that FAD on b5R may partially dissociate to straddle between the two proteins". The two statements seem to be partly contradictory. Cytochrome b5 reductases do not easily release FAD but rather directly donate electrons to heme-protein acceptors (PMID: 36441026). According to the methods section, no FAD was added to the reaction mix used for the cytochrome b5 reductase experiment. Overall, the data seem to indicate that the reductase might reduce the heme of STEAP1 directly. Would it be possible to check whether FAD addition affects the kinetics of the process?

      We agree with the reviewer on this point. We do not have evidence indicating that FAD fully or partially dissociates from b5R to interact with STEAP1. We removed the statement in the revision.

      We have not tried to add free reduced FAD to the mixture of STEAP1/b5R/NADH, because we feel that this would increase the complexity of the system and complicate data interpretation. We are working on determining the structure of b5R in complex with STEAP1 to visualize the electron transfer pathway, and we hope that such a structure would provide a framework for understanding electron transfer between the two proteins.

      And to perform a control experiment to check that NAD(P)H does not directly reduce the heme of STEAP1 (though unlikely)?

      We did the control experiment and included data in Fig. S3A. No reduction of heme by NADH alone.

      A final point concerns the "slow Fe3+-NTA reduction by STEAP2". The reaction is not that slow as the initial phase is 2 per second. The reaction does not show dependence on the substrate concentration suggesting "poor substrate binding". I am not convinced by this interpretation. Poor substrate binding would give rise to substrate dependency as saturation would not be achieved. A possible interpretation could be that substrate binding is instead tight and the enzyme is saturated by the substrate. Can it be that the reaction is limited by non-productive substrate-binding and/or by interconversions between active and non-active conformations? We re-analyzed the data and revised the Results and Discussion.

      We agree with the reviewer on this point. We re-analyzed the data and found that the reaction rates within the first 2 seconds are weakly dependent on [Fe3+-NTA] while the rates beyond 2 seconds do not show dependence on [Fe3+-NTA]. More studies are required to unravel the mechanism that leads to the complicated kinetic data.

      Reviewer #3 (Public Review):

      The six-transmembrane epithelial antigen of the prostate (STEAP) family comprises four members in metazoans. STEAP1 was identified as integral membrane protein highly upregulated on the plasma membrane of prostate cancer cells (PMID: 10588738), and it later became evident that other STEAP proteins are also over expressed in cancers, making STEAPs potential therapeutic targets (PMID: 22804687). Functionally, STEAP2-4 are ferric and cupric reductases that are important for maintaining cellular metal uptake (PMIDs: 16227996, 16609065). The cellular function of STEAP1 remains unknown, as it cannot function as an independent metalloreductase. In the last years, structural and functional data have revealed that STEAPs form trimeric assemblies and that they transport electrons from intracellular NADPH, through membrane bound FAD and heme cofactors, to extracellular metal ions (PMIDs: 23733181, 26205815, 30337524). In addition, numerous studies (including a previous study from the senior authors) have provided strong implications for a potential metalloreductase function of STEAP1 (PMIDs: 27792302, 32409586).

      This new study by Chen et al. aims to further characterize the previously established electron transport chain in STEAPs in high molecular detail through a variety of assays. This is a wellperformed, highly specialized study that provides some useful extra insights into the established mechanism of electron transport in STEAP proteins. The authors first perform a detailed spectroscopic analysis of Fe3+-NTA reduction by STEAP2 and STEAP1, confirming that both purified proteins are capable of reducing metal ions. A cryo-EM structure of STEAP2 is also presented. It is then established that STEAP1 can receive electrons from cytochrome b5 reductase, and the authors provide experimental evidence that the flavin in STEAP proteins becomes diffusible.

      The specific aims of the study are clear, but it is not always obvious why certain experiments are performed only on STEAP2, on STEAP1, or on both isoforms. A better justification of the performed experiments through connecting paragraphs and proper referencing of the literature would improve readability of the manuscript. Experimentally, the conclusions are appropriate and mostly consistent with the experimental data, although one important finding can benefit from further clarification. Namely, the observation that STEAP1 can form an electron transfer chain with cytochrome b5 reductase in vitro is an exciting finding, but its physiological relevance remains to be validated. The metalloreductase activity of STEAP1 in vitro has been described previously by the authors and by others (PMIDs: 27792302, 32409586). However, when over expressed in HEK cells, STEAP1 by itself does not display metal ion reductase activity (PMID: 16609065), and it was also found that STEAP1 over expression does not impact iron uptake and reduction in Ewing's sarcoma (cancer) cells (PMID: 22080479). Therefore, the physiological relevance of metal ion reduction by STEAP1 remains controversial. The current work establishes an electron transfer chain between STEAP1 and cytochrome b5 reductase in vitro with purified proteins. However, the conformation of this metalloreductase activity of the STEAP1-cytochrome b5 complex will be required in a cell line to prove that the two proteins indeed form a physiological relevant complex and that the results are not just an in vitro artefact from using high concentrations of purified proteins.

      The work will be interesting for scientists working within the STEAP field. However, some of the presented data are redundant with previous findings, moderating the study's impact. For instance, the new structural insights into STEAP2 are limited because the structure is virtually identical to the published structures of STEAP4 and STEAP1 (PMIDs: 30337524, 32409586), which is not surprising because of the high sequence similarity between the STEAP isoforms. Moreover, the authors provide experimental evidence to prove the previous hypothesis (PMID: 30337524) that the flavin in STEAP proteins becomes diffusible, but the molecular arrangement of a STEAP protein, in which the flavin interacts with NADPH, remains unknown. Based on the manuscript title, I would also expect the in-depth characterization of STEAP1/STEAP2 hetero trimers (first identified by the authors), but this is only briefly mentioned. When taken together, this study by Chen et al. strengthens and supports previously published biochemical and structural data on STEAP proteins, without revealing many prominent conceptual advances.

      We thank the reviewer for information and the broader context. We have revised the manuscript to have a clearer logical thread.

      Reviewer #1 (Recommendations For The Authors):

      Please see the "Public Review" for recommendations.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions

      -The introduction should more clearly state which questions are being addressed and why STEAP1 and STEAP2 are investigated.

      We have revised the Introduction to make that clearer.

      -The manuscript should discuss more extensively and provide possible explanations for the substrate-independent kinetics of iron-reduction by STEAP2.

      We re-analyzed the data and found the rate constants of the reactions before 2 s are weakly [Fe3+NTA]-dependent. We ascribe the weak [Fe3+-NTA]-dependence to the partial rate-limiting by substrate binding. However, we do not have a good interpretation for the reaction kinetics after 2 s which does not show [Fe3+-NTA]-dependence.

      -"The rate of STEAP1(Fe(II)) oxidation by Fe3+-NTA is similar to those by Fe3+-EDTA or Fe3+-citrate, but the rates are significantly faster than STEAP2(Fe(II)) re-oxidation by Fe3+NTA (Fig. 1B)." The rates for STEAP1 should be given to substantiate this statement.

      We added Table S1 in the supplementary materials that includes the 2nd order association (kon) and the 1st order dissociation rate constants (koff) of iron substrates in STEAP1 and STEAP2. Data on Fe3+-EDTA or Fe3+-citrate by STEAP1 are from our previous study (Biochemistry, 2016). We also calculated the KDs of different iron substrates for STEAP1 and STEAP2.

      • "Our results indicate that STEAP2 can supply reduce FAD to initiate electron transfer in STEAP1." As discussed above, this statement should be discussed and analyzed

      We mixed 0.9 μM STEAP1, 1.1 μM STEAP2, and 2.2 μM FAD and added 60 μM NADPH to the system and found that the heme on both STEAP1 and STEAP2 are reduced. Since adding NADPH to STEAP1 plus FAD alone does not reduce the heme (Fig. S3B), we reasoned that reduction of the heme on STEAP1 is achieved by the reduced FAD produced on STEAP2. The reduced FAD likely dissociates from STEAP2 and then bind to STEAP1.

      -Experiments on "STEAP1 reduction by STEAP2" The experiments show that "STEAP2 can supply reduce FAD to initiate electron transfer in STEAP1." Could these results be explained by heterotrimer formation in agreement with the previous data published by the authors?

      In our experience, STEAP1 and STEAP2 homotrimers are stable and do not form heterotrimers when mixed. STEAP1/2 heterotrimers form only when the two proteins are co-expressed in cells (Biochemistry (2016) 55, 6673-6684).

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      1) As a very general point: the order in which the results are presented could be greatly improved to increase the readability for non-experts. To elaborate: The manuscript starts with the spectroscopic characterization of STEAP2, then suddenly the reductase activities of STEAP1 and STEAP2 are compared; subsequently, experiments are described involving STEAP1 and cytochrome b5 reductase; then the cryo-EM structure of STEAP2 is presented etc. As a non-expert reader, this presentation of the results is confusing, especially because the paragraphs are not always connected well, and there is a lot of switching between STEAP1 and STEAP2 data. A more logical order would be to first present the STEAP2 spectroscopy and structural data, then write a connecting paragraph on why it is important to also study the electron transfer chain in STEAP1, followed by the comparison of the STEAP isoforms and the data on STEAP1 alone. The authors should include sentences on why they performed each experiment. For example, why did they determine the structure of STEAP2. What were they after that they could not retrieve from the homologous STEAP4 and STEAP1 structures? Justification of the performed experiments will make it easier for the reader, and will establish a better connection between the paragraphs.

      We reorganized the data presentation in Results per the reviewer’s suggestions.

      2) The physiological relevance of metal ion reduction by STEAP1 remains controversial. Because the current work establishes an electron transfer chain between STEAP1 and cytochrome b5 reductase, could the authors perform an easy experiment where they over express both STEAP1 and cytochrome b5 reductase in a cell line? If such an experiment would reveal STEAP1-dependent metal-ion reduction, it would greatly improve this part of the manuscript. If no activity is observed, the established electron transfer chain could just represent an in vitro artifact from using high concentrations of purified proteins.

      This is an excellent point. We are not set up to perform the proposed experiment but will do so in the future.

      3) The manuscript states that metal ion reduction of purified STEAP2 is slow, and the authors explain this by the absence of density for the extracellular region between helices 3 and 4 that are present in the structures of STEAP4 and STEAP1, resulting in a less-well defined substratebinding site. Can the authors exclude that the less-well defined substrate-binding site is a result of the detergent extraction of STEAP2? Would it be possible to measure the reductase activity of STEAP2 in purified membranes?

      Detergent mostly interacts with the transmembrane domains and since the TMD region of STEAP2 aligns well with those of STEAP1 and STEAP4, we do not think that the disordered substrate binding region in STEAP2 is a consequence of detergent solubilization. It is difficult to conduct pre-steady state kinetic experiments using STEAP2 in membrane fractions.

      4) The manuscript would greatly benefit from citing the literature more comprehensively to acknowledge insightful findings from authors in the field; for example, the important work by the Lawrence lab from 2015 (PMID: 26205815), which biochemically proved that STEAPs bind a single heme and that FAD bridges the TMD and RED, is not cited. The authors also mention that STEAP proteins belong to the same family as NOX proteins and cite some NOX structure papers. However, they fail to cite the first NOX structure paper (PMID: 28607049), as well the manuscript that structurally compares NOXs and STEAPs (PMID: 32815713). Similarly, the authors use SerialEM for their cryo-EM data collection but cite an old paper instead of the more recent (and relevant) SerialEM publication (PMID: 31086343).

      We agree and added the references.

      5) Generally, the data presented in the manuscript appear of good technical quality. However, a 'Table 1' with all relevant cryo-EM data collection and refinement statistics is completely missing as far as I can see. The authors should definitely add this to allow for the judgement of structural data quality. Without it, the manuscript is not suitable for publication.

      We added Table S2 that includes relevant cryo-EM statistics.

      Minor points:

      6) The authors write in the abstract: 'STEAP2 - 4, but not STEAP1, have an intracellular domain that binds to NADPH and FAD'. This is not correct, because it has clearly been established that FAD also majorly binds to the transmembrane domain (this is even shown by the authors in the current manuscript as well).

      Agree, we corrected that in the revision.

      7) Sentence from the abstract and introduction state: 'It is also unclear whether STEAP1 has metal ion reductase activity' and 'it is unclear whether STEAP1 can form a competent electron transfer chain from NADPH'. The authors should definitely add "physiologically relevant" to these sentences. Namely, the senior authors themselves concluded in their 2016 Biochemistry paper (PMID: 27792302) that STEAP1 is capable of reducing metal ion complexes. Further indications that the transmembrane domain of STEAP1 displays metalloreductase activity was published by the Gros lab (PMID: 32409586), and it was also shown that fusing the RED of STEAP4 to the TMD of STEAP1 yields a functional protein in cells that reduces metal ions.

      Good point and we revised the text and included the references.

      8) Why is scheme 1 not just a summarizing figure?

      We could change Scheme 1 to a Figure if required by the copy editor.

      9) What is the purpose of Fig. 6? A larger depiction of Fig. 5e would likely be more relevant and should be considered as a replacement. Alternatively, the structure of STEAP1 (pdb 6y9b) could be shown in combination with Fig. 7, as the mutation is performed in STEAP1.

      We agree and made changes to the structural figures to enhance clarity.

      10) The manuscript now contains many, single panel figures. Certain main figures could easily be combined, for example, Fig. 1 and 2 and/or Fig. 3 and 4.

      We agree and made changes to reduce single panel figures.

      11) In Fig. 2, 3 and 4, the spectra show changes in peak heights as a result of the ferric to ferrous heme transition. However, a time component is missing in the legend. How long do these transitions take?

      We added the reaction times to the figure legends.

      12) The last part of the discussion states: 'The assembly of an intracellular RED with a membrane-embedded TMD observed in NOX, DUOX, and STEAPs naturally led to the notion that NADPH, FAD, and heme form an uninterrupted rigid electron-transfer chain that shuttles electron from the intracellular cellular NADPH to the extracellular substrates. While this may be true for NOX and DUOX, in which rapid supply of electrons to their extracellular substrates are essential to their biological functions, it may not apply similarly to STEAPs since it has only one heme in the TMD, and their electron transfer relies on shuttling of FAD.' The authors should mention here that the activity of NOX and DUOX is tightly regulated by accessory proteins, Ca2+ etc. Similarly, do the authors expect that the large distance between NADPH and FAD in the structures could represent a way to regulate/dampen the metal ion reduction rates of STEAPs in vivo?

      We agree. We mentioned the regulation of NOX and DUOX in Discussion. We remain puzzled by the large distance between NADPH and FAD in STEAPs and are in pursuit of a structure in which the two cofactors are “in touch” for electron transfer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The conclusions of this paper are mostly well supported by data, but some aspects need to be corrected.

      1) Line 99. The title is not suitable for summarizing this part of the results. In this paragraph, the results mainly describe SRSF1 expression pattern and binding of spermatogonia-associated gene's transcripts in testes. There is no functional assay to conclude SRSF1 has an essential role in mouse testes. The data only indicate that SRSF1 may have a vital role in posttranscriptional regulation in the testes.

      Thank you for the professional suggestions. Following this advice, we have corrected the text in this revised version (Page 4, Line 98 and 112).

      2) Line 141. In the mating scheme, Vasa-Cre Srsf1Fl/del mice should be obtained instead of Vasa-Cre Srsf1Fl/Fl mice.

      Thank you for the professional suggestions. Following this advice, we have corrected the text in this revised version (Page 4, Line 118).

      3) Fig 2 C, "PZLF" should be corrected to "PLZF".

      Thank you very much for the helpful comments. We have corrected this in Figure 2C.

      4) Fig 5 B, "VASA" and "Merge" should be interchanged.

      Thank you very much for the helpful comments. We have interchanged "VASA" and "Merge" in Figure 5B.

      5) Fig 5 D, "Ctrl" should be added in the up panel.

      Thank you very much for the helpful suggestions. We have added "Ctrl" in Figure 5C.

      6) The legend for Figure 6 D should be revised.

      Thank you very much for the helpful suggestions. We have revised the legend for Figure 7D

      7) The legend for Figure 7 G should be revised.

      Thank you very much for the helpful suggestions. We have revised the legend for Figure 8D

      8) Immunoprecipitation mass spectrometry (IP-MS) data showed that t SRSF1 interacts with other RNA splicing-related proteins (e.g., SRSF10, SART1, RBM15, SRRM2, SF3B6, and SF3A2). The authors should verify the interactions in testis or cells.

      We thank the reviewer for the professional comments and suggestions. Following this advice, we performed co-transfection and co-IP to verify the protein-protein interactions in 293T cells, the results showed that the RRM1 domain of SRSF1 interacted with SART1, RBM15 and SRSF10 in 293T cells. In addition, the fluorescence results showed complete co-localization of mCherry-SRSF1 with eGFP-SART1, eGFP-RBM15 and eGFP-SRSF10 in 293T cells. Therefore, we have incorporated the data into the Figure 9G-J. Meanwhile, these have been incorporated into the text, given descriptions, and highlighted (Page 17, Lines 338-347).

      9) To avoid overstatement, the authors should pay attention to the use of adjectives and adverbs in the article, especially when drawing conclusions about the role of Tail1.

      We thank the reviewer for the professional comments and suggestions. To avoid overstatement, we have revised the entire text (Page 4, Lines 98, and 112; Page 16, Lines 308; Page 17, Lines 346-347; Page 20, Lines 413-414; Page 21, Lines 432-433).

      Reviewer #2 (Recommendations For The Authors):

      Major

      1) I find the use of "SSC homing" misleading/confusing because this "homing" or relocation of postnatal gonocytes/nascent spermatogonia to the basement membrane precedes the maturation of the nascent spermatogonia into SSCs. In addition, "SSC homing" is commonly used in the SSC transplantation field to describe a transplanted SSC's ability to find and colonize its niche within the seminiferous tubules. I appreciate that "postnatal gonocytes/nascent spermatogonia homing" is not easily grasped by a broader audience. Perhaps "homing of precursor SSCs" is more appropriate.

      Thank you very much for the helpful comments and suggestions. Following this advice, we have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433)

      2) If I am misunderstanding the description of the Srsf1 cKO phenotype, and the authors truly believe SSCs have formed in the Srsf1 cKO testis, I strongly recommend immunostaining to show that the cKO germ cells robustly express SSC markers, not just markers of undifferentiated spermatogonia.

      We thank the reviewer for the professional suggestions. We fully agree with the reviewer. Immunohistochemical staining for FOXO1 and statistical results indicated a reduced number of prospermatogonia (Figure 6C-E). So, we have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      3) If the authors have the available resources, the significance of this report would be enhanced by additional characterization of the cKO phenotype at the transition from gonocyte to nascent spermatogonia. Do any cKO germ cells exhibit defects in maturing from gonocytes to nascent spermatogonia at the molecular level? I.e., by P5-7, do all cKO germ cells express PLZF and localize FOXO1 to cytoplasm, as expected of nascent spermatogonia? If the cKO germ cells are actually a heterogenous population of gonocytes and nascent spermatogonia, what is the distribution of each subpopulation in the lumen vs basement membrane?

      Thank you for the professional suggestions. Following this advice, immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. Meanwhile, these have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 389-391).

      Minor

      1) Could the authors clarify why Tial1 exon exclusion in the cKO results in reduced protein expression? Is it creating a transcript isoform that undergoes nonsense-mediated decay?

      Thank you for the professional suggestions. Following this advice, we analyzed Tial1 transcripts again, and we found that Tial1 exon exclusion resulted in reduced expression of protein isoform X2 (Figure 8J). Since this region is not in the CDS, no clear evidence of nonsense-mediated decay was found in the analysis.

      2) Could the authors confirm that the TIAL1 antibody is not detecting the portion of the protein encoded by the alternatively spliced exon?

      Thank you for the helpful comments. The TIAL1 monoclonal antibody is produced by Proteintech Group under the product number 66907-1-Ig. Immunogen is TIAL1 fusion protein Ag11981. The sequence is as follows. MDARVVKDMATGKSKGYGFVSFYNKLDAENAIVHMGGQWLGGRQIRTNWATRKPPAPKSTQENNTKQLRFEDVVNQSSPKNCTVYCGGIASGLTDQLMRQTFSPFGQIMEIRVFPEKGYSFVRFSTHESAAHAIVSVNGTTIEGHVVKCYWGKESPDMTKNFQQVDYSQWGQWSQVYGNPQQYGQYMANGWQVPPYGVYGQPWNQQGFGVDQSPSAAWMGGFGAQPPQGQAPPPVIPPPNQAGYGMASYQTQ The homology was 99% in mice and all TIAL1 isoforms were detected. So, TIAL1 antibody is detecting the portion of the protein encoded by the alternatively spliced exon.

      3) Lines 143: should "cKO" actually be "control"?

      Thank you for the helpful suggestions. There is a real problem in the text description. we have corrected the text in this revised version (Page 6, Line 138-139).

      4) Lines 272-3 "visual analysis using IGV showed the peak of Tial1/Tiar was stabilized in 5 dpp cKO mouse testes (Figure 7H)": "peak stabilization" is not evident to me from the figure nor do I see Tial1 listed as differentially expressed in the supplemental. I would refrain from using IGV visualization as the basis for the differential abundance of a transcript.

      Thank you very much for the helpful comments and suggestions. Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. Following this advice, we have substituted Tial1/Tiar's FPKM for his peaks (Figure 8H). Meanwhile, we have corrected the text in this revised version (Page 15, Line 296-300; Page 16, Line 303-304).

      5) Lines 468-473: please clarify the background list used for GO enrichment analyses. By default, the genes expressed in the testis are enriched for spermatogenesis-related genes. To control for this and test whether a gene list is enriched for spermatogenesis-related genes beyond what is already seen in the testis, I recommend using a list of all expressed genes (for example, defined by TPM>=1) as the background list.

      We thank the reviewer for the professional comments and suggestions. Following this advice, all expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have corrected in the figure (Figure 2A, 7E, and 9E)

      6) Figure 2B: Could the authors mark where the statistically significant peaks appear on the tracks? There are many small peaks and it's unclear if they are significant or not.

      Thank you for the helpful suggestions. Following this advice, we have marked the areas of higher peaks in the figure (Figure 2B). We generally believe that any region above the peaks of IgG is likely to be a binding region, and of course, the higher the peak value, the more pre-mRNA is bound by SRSF1 in that region.

      7) Figure 7A: I assume the SRSF1 CLIP-seq genes are all the genes from the adult testis experiments. I would suggest limiting the CLIP-seq gene set to only those expressed in the P5 RNA-seq data, as if the target is not expressed at P5, there's no way it will be differentially expressed or differentially spliced in at P5.

      Thank you very much for the helpful comments and suggestions. Following this advice, we found that 3543 of the 4824 genes bound by SRSF1 were expressed in testes at 5 dpp. we have corrected in the figure (Figure 8A). these have been incorporated into the text, given descriptions, and highlighted (Page 14, Lines 274-277).

      8) Figure 7F: Could the authors clarify where the alternatively spliced exon is relative to the total transcript, shown in 7H?

      Thank you for the helpful suggestions. Following this advice, we have labeled the number of exons where variable splicing occurs. (Figure 8F).

      9) Please include where the sequencing and mass spec data will be publicly available.

      Thank you very much for the helpful comments and suggestions. Following this advice, these have been incorporated into the text, given descriptions, and highlighted (Page 25, Lines 560-565).

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improving the data and analysis

      1) The claim that TIAL1 mediates SRSF1 effects is not well supported; this claim should be adjusted or additional supporting data should be provided. To support a claim that alternative splicing of Tial1 mediates the effects of SRSF1, at least two additional pieces of data are needed: first, a demonstration that the two alternative protein isoforms have different molecular functions, either in vitro or in vivo; and second, a better quantitation of the levels and ratios of expression of the two different isoforms in vivo.

      Thank you for the helpful comments and suggestions. Following this advice, we quantified the expression levels and ratios of two different isoforms in vivo, and we found that Tial1 exon exclusion resulted in reduced expression of protein isoform X2 (Figure 8J). However, it is not possible to prove that the two alternative protein isoforms have different molecular functions. So, this claim has been adjusted in the text. these have been incorporated into the text, given descriptions, and highlighted (Lines 1-2, 43-45, 95, 306, 323-325, 408, 413-414).

      2) Likewise, the claim that "SRSF1 is required for "homing and self-renewal" of SSCs should be adjusted or better supported. As of now, the data supports a claim that SRSF1 is required for the establishment of the SSC population in the testis after birth. This could be due to defects in homing, self-renewal, or survival. To support claims about homing and self-renewal, these phenotypes should be tested more directly, for example by quantitating numbers of spermatogonia at the basal membrane in juvenile testes (homing) and expression of SSC markers in addition to the pan-germ cell marker VASA across early postnatal time points.

      Thank you very much for the helpful comments and suggestions. Immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. These have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 387-389). Meanwhile, "homing and self-renewal" of SSCs have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      3) Additional, more detailed analyses of CLIP-seq and RNA-seq data at least showing that the libraries are of good quality should be provided.

      Thank you very much for suggestions. Following this advice, detailed analyses of RNA-seq data have been incorporated the data into the figures (Figure S2). But detailed analyses of CLIP-seq have already been used in another paper (Sun et al., 2023), and we have not provided it in order to avoid multiple uses of one figure. Meanwhile, we made a citation in the article (Page 4, Lines 105; Page 25, Lines 564-565).

      4) Gene Ontology analyses should be redone with a more appropriate background gene set.

      Thank you for the helpful suggestions. All expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have been corrected in the figure (Figure 2A, 7E, and 9E)

      Minor points about the text and figures

      5) The species (mouse) should be stated earlier in the Introduction.

      Thank you for the professional suggestions. Following this advice, the mouse has been stated earlier in the Introduction (Page 3, Line 65).

      6) In Fig. 1C (Western blot), the results would be more convincing if quantitation of band intensities normalized to the loading control was added.

      Thank you very much for comments and suggestions. Following this advice, ACTB served as a loading control. The value in 16.5 dpc testes were set as 1.0, and the relative values of testes in other developmental periods are indicated. Therefore, we have incorporated the data into the figures (Figure 1C).

      7) In Fig 5D, TUNEL signal in the single-channel image is difficult to see; please adjust the contrast.

      Thank you for the professional suggestions. Following this advice, the images of the channels have been replaced by enlarged images for better visibility (Figure 5C).

      Major comments

      1) In Fig 1D, it appears that SRSF1 is expressed most strongly in spermatogonia by immunofluorescence, but this is inconsistent with the sharp rise in expression detected by RT-qPCR at 20 days post partum (dpp) (Fig. 1B), which is when round spermatids are first added; this discrepancy should be explained or addressed.

      We appreciate the important comments from the reviewer. In another of our studies, we showed that SRSF1 expression is higher in pachytene spermatocytes and round spermatids (Sun et al., 2023). So, it is normal for the sharp rise in expression detected by RT-qPCR at 20 days post partum (dpp).

      Author response image 1.

      Dynamic localization of SRSF1 in male mouse germ cells. (Sun et al., 2023)

      2) It is important to provide a more comprehensive basic description of the CLIP-seq datasets beyond what is shown in the tracks shown in Fig. 2B. This would allow a better assessment of the data quality and would also provide information about the transcriptome-wide patterns of SRSF1 binding. No information or quality metrics are provided about the libraries, and it is not stated how replicates are handled to maximize the robustness of the analysis. The distribution of peaks across exons, introns, and other genomic elements should also be shown.

      Thank you very much for the helpful comments and suggestions. In fact, detailed analyses of CLIP-seq have already been presented in another paper (Sun et al., 2023), and we have not provided it in order to avoid multiple uses of one figure. Meanwhile, we made a citation in the article (Page 4, Lines 105; Page 25, Lines 564-565). In addition, the distribution of peaks in exons, introns, and other genomic elements is shown in Figure 2B.

      3) The claim that SRSF1 is required for "homing and self-renewal" of SSCs is made in multiple places in the manuscript. However, neither homing nor self-renewal is ever directly tested. A single image is shown in Fig. 5E of a spermatogonium at 5dpp that does not appropriately sit on the basal membrane, potentially indicating a homing defect, but this is not quantified or followed up. There is good evidence for depletion of spermatogonia starting at 7 dpp, but no further explanation of how homing and/or self-renewal fit into the phenotype.

      Thank you very much for the helpful comments and suggestions. Following this advice, immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. These have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 387-389). Meanwhile, "homing and self-renewal" of SSCs have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      4) In Fig. 6A (lines 258-260) very few genes downregulated in the cKO are bound by SRSF1 and undergo abnormal splicing. The small handful that falls into this overlap could simply be noise. A much larger fraction of differentially spliced genes are CLIP-seq targets (~33%), which is potentially interesting, but this set of genes is not explored.

      Thank you for the helpful comments. Following this advice, this was specifically indicated by the fact that 39 stabilizing genes were bound by SRSF1 and underwent abnormal AS. In our study, Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. Therefore, we fully agree with the reviewers' comments. These have been added in this revised version (Page 14, Lines 279-280; Page 15, Lines 296-300).

      5) The background gene set for Gene Ontology analyses is not specified. If these were done with the whole transcriptome as background, one would expect enrichment of spermatogenesis genes simply because they are expressed in testes. The more appropriate set of genes to use as background in these analyses is the total set of genes that are expressed in testis.

      We thank the reviewer for the professional comments and suggestions. All expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have been corrected in the figure (Figure 2A, 7E, and 9E)

      6) In general, the model is over-claimed: aside from interactions by IP-MS, little is demonstrated in this study about how SRSF1 affects alternative splicing in spermatogenesis, or how alternative splicing of TIAL1 specifically would result in the phenotype shown. It is not clear why Tial1/Tiar is selected as a candidate mediator of SRSF1 function from among the nine genes that are downregulated in the cKO, are bound by SRSF1, and undergo abnormal splicing. Although TIAL1 levels are reduced in cKO testes by Western blot (Fig. 7J), this could be due just be due to a depletion of germ cells from whole testis. The reported splicing difference for Tial1 seems very subtle and the ratio of isoforms does not look different in the Western blot image.

      Thank you very much for the helpful comments and suggestions. In our study, Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. However, Western blotting showed that expression levels of TIAL1/TIAR isoform X2 were significantly suppressed (Figure 8J). So, the data indicate that SRSF1 is required for TIAL1/TIAR expression and splicing.

      Sun, L., Chen, J., Ye, R., Lv, Z., Chen, X., Xie, X., Li, Y., Wang, C., Lv, P., Yan, L., et al. (2023). SRSF1 is crucial for male meiosis through alternative splicing during homologous pairing and synapsis in mice. Sci Bull 68, 1100-1104. 10.1016/j.scib.2023.04.030.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript represents an elegant bioinformatics approach to addressing causal pathways in vascular and liver tissue related to atherosclerosis/coronary artery disease, including those shared by humans and mice and those that are specific to only one of these species. The authors constructed co-expression networks using bulk transcriptome data from human (aorta, coronary) and mouse (aorta) vascular and liver tissue. They mapped human CAD GWAS data onto these modules, mapped GWAS SNPs to putatively causal genes, identified pathways and modules enriched in CAD GWAS hits, assessed those shared between vascular and liver tissues and between humans and mice, determined key driver genes in CAD-associated supersets, and used mouse single-cell transcriptome data to infer the roles of specific vascular and liver cell types. The overall approach used by the authors is rigorous and provides new insights into potentially causal pathways in vascular tissue and liver involved in atherosclerosis/CAD that are shared between humans and mice as well as those that are species-specific. This approach could be applied to a variety of other common complex conditions.

      The conclusions are largely supported by the analyses. Some specific comments:

      1) It appears that GWAS SNPs were mapped to genes solely through the use of eQTLs. Current methods involve a number of other complementary approaches to map GWAS SNPs to effector genes/transcripts and there is the thought that eQTLs may not necessarily be the best way to map causal genes.

      We agree with the reviewer that multiple approaches can be used to map GWAS SNPs to genes, and eQTLs is only one way to do so. We focused on eQTLs mainly because we aim to address tissue-specificity of eQTLs and the relative higher abundance of eQTLs compared to other tissue-specific functional genomics data, such as pQTLs and epiQTLs. We now acknowledge this limitation in the discussion section in our revised manuscript and point to future studies utilizing other approaches to map GWAS signals to downstream effectors.

      2) Given the critical causal role of circulating apoB lipoproteins in atherosclerosis in both mice and humans and the central role of the liver in regulating their levels, perhaps the authors could use the 'metabolism of lipids and lipoproteins' network in Fig 3B as a kind of 'positive control' to illustrate the overlap between mice and humans and the driver genes for this network.

      We appreciate the reviewer’s excellent suggestion and now elaborate the findings in Fig 3B as a positive control in the results section.

      3) Is it possible to infer the directionality of effect of key driver genes and pathways from these analyses? For example, ACADM was found to be a KD gene for a human-specific liver CAD superset pathway involving BCAA degradation. Are the authors able to determine or predict the effect of genetically increased expression of ACADM on BCAA metabolism and on CAD? Or the directionality of the effect of the hepatic KD gene OIT3 on hepatic and plasma lipids and atherosclerosis.

      The Bayesian networks only have information on which genes likely regulate the others but not the up or down-regulation direction, and the network key driver analysis only considers the enrichment of GWAS candidate genes in the neighborhood of each key driver. Therefore, it is not possible to directly infer whether increasing or decreasing a key driver will lead to up or downregulation of the downstream pathways based on our current analysis. We could, however, examine correlations of key driver genes with downstream genes, or disease traits in relevant datasets. For instance, we checked the mouse atherosclerosis HMDP datasets for the correlations between select key drivers and clinical traits and found various key drivers shared and species-specific in aorta and liver significantly correlate with aortic lesion area and other traits of interest such as LDL levels, and inflammatory cytokines. We have added these new findings into the results section and supplemental tables.

      4) While likely beyond the scope of this manuscript, the substantial amount of publicly available plasma proteomic and metabolomic data could be incorporated into this multiomic bioinformatic analysis. Many of the pathways involve secreted proteins or metabolites that would further inform the biology and the understanding of how these pathways relate to atherosclerosis.

      We appreciate the reviewer’s valuable suggestion. Here we focused on liver and aorta gene regulatory networks to understand the tissue-specific mechanisms at the gene level. Indeed, plasma proteomic and metabolomic data could be further incorporated in future studies to understand the pathways captured in the circulation that can capture cross-tissue interactions mediated by secreted proteins and metabolites from different tissues. We have addressed this as a future direction in the discussion section.

      The findings here will motivate the community of atherosclerosis investigators to pursue additional potential causal genes and pathways through computational and experimental approaches. It will also influence the approach around the use of the mouse model to test specific pathways and therapeutic approaches in atherosclerosis. In addition, the computational approach is robust and could (and likely will) be applied to a variety of other common complex conditions.

      Reviewer #2 (Public Review):

      Summary:

      Mouse models are widely used to determine key molecular mechanisms of atherosclerosis, the underlying pathology that leads to coronary artery disease. The authors use various systems biology approaches, namely co-expression and Bayesian Network analysis, as well as key driver analysis, to identify co-regulated genes and pathways involved in human and mouse atherosclerosis in artery and liver tissues. They identify species-specific and tissue-specific pathways enriched for the genetic association signals obtained in genome-wide association studies of human and mouse cohorts.

      Strengths:

      The manuscript is well executed with appropriate analysis methods. It also provides a compelling series of results regarding mouse and human atherosclerosis.

      Weaknesses:

      The manuscript has several weaknesses that should be acknowledged in the discussion. First, there are numerous models of mouse atherosclerosis; however, the HMDP atherosclerosis study uses only one model of mouse atherosclerosis, namely hyperlipidemic mice, due to the transgenic expression of human apolipoprotein ELeiden (APOE-Leiden) and human cholesteryl ester transfer protein (CETP). Therefore, when drawing general conclusions about mouse pathways not being identified in humans, caution is warranted. Other models of mouse atherosclerosis may be able to capture different aspects of human atherosclerosis.

      We appreciate the reviewer’s valuable insight! Indeed, the specific HMDP atherosclerosis model may miss important mouse pathways that could have overlapped with the human pathways. We have added this important point to the limitations section under the discussion to caution the interpretation of the human-specific pathways, as they could be present in mice but are missed by the specific HMDP atherosclerosis dataset used.

      Second, the mouse aorta tissue is atherosclerotic, whereas the atherosclerosis status of the GTEX aorta tissues is not known. Therefore, it is possible that some of the human or mouse-specific gene modules/pathways may be due to the difference in the disease status of the tissues from which the gene expression is obtained.

      We agree with the reviewer that GTEx vascular tissues have unclear atherosclerosis status. However, in addition to GTEx, we also included the human STARNET dataset which contains vascular tissues from human patients with CAD. Therefore, we believe the comparability of the human and mouse vascular tissue datasets is reasonable.

      Third, it is unclear how the sex of the mice (all female in the HMDP atherosclerosis study and all male in the baseline HMDP study) and the sex of the human donors affected the results. Did the authors regress out the influence of sex on gene expression in the human data before performing the co-expression and preservation studies? If not, this should be acknowledged.

      We acknowledge that the effect of sex in the mouse and human datasets were not regressed out in our analysis. We have added this under the limitations section.

      Fourth, some of the results are unexpected, and these should be discussed. For example, the authors identify that the leukocyte transendothelial migration pathway and PDGF signaling pathway are human-specific in their vascular tissue analysis. These pathways have been extensively described in mouse studies. Why do the authors think they identified these pathways as human-specific? Similarly, the authors identified gluconeogenesis and branched-chain amino acid catabolism as human and mouseshared modules in the vascular tissue. Is there evidence of the involvement of these pathways in atherosclerosis in vascular cells?

      We agree with the reviewer that these unexpected findings warrant further discussion. As pointed out by this reviewer, it is possible that the mouse HMDP atherosclerosis dataset cannot fully represent all mouse atherosclerosis biology and therefore missed the leukocyte migration and PDGF pathways that were identified in the human datasets. Regarding the surprising findings of pathways such as BCAA catabolism in vascular tissues, we acknowledge that future studies will need to further investigate such pathway predictions but also highlight that these pathway terms have many shared genes with more commonly known pathways such as the TCA cycle, which may indicate the involvement of energy metabolism in vascular tissues in CAD development. We have added these points to the discussion section under limitations and concluding remarks.

      Overall, acknowledging these drawbacks and adding points to the discussion will strengthen the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) Could the authors comment on why MEGENA produces so many more co-expression modules per tissue than WCGNA?

      As described in the methods section, MEGENA uses a multi-scale clustering structure to generate network modules at different scales, with each scale representing a different compactness level of the modules. At lower compactness scales larger modules are generated; at higher compactness scales, smaller modules are generated. By using all modules obtained from different scales, the total number of modules is much larger than WGCNA which only generates a network at one scale.

      2) Much of the results section involves repeating in the text lists of pathways, modules, and genes that are also listed in Figures 2 and 3. The text in this part of the results could be used more productively to focus on illustrative examples or potential new biology.

      We have revised the results section to reduce repeating long lists of pathways, modules, and genes as suggested.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the weaknesses I mentioned in the public review comments, there are a few minor issues that I outline below:

      1) The authors should introduce atherosclerosis as the underlying cause of CAD in the Introduction. In fact, I believe there are many places in the manuscript where the authors mean atherosclerosis instead of coronary artery disease, especially when presenting and discussing mouse results since the HMDP study did not examine the coronary arteries of mice. I believe the authors should make the appropriate changes throughout the manuscript.

      We have made the changes as suggested.

      2) The authors state in the introduction, "For example, mice tend to develop atherosclerotic lesions in the aorta and carotids, while humans often develop lesions in coronary arteries (Ma et al., 2012)." This is not entirely correct, so this sentence should be revised. Several models of mice show coronary artery atherosclerosis development, but most researchers study lesions in larger aortas. Further, humans develop lesions throughout the arterial tree, but perhaps what the authors meant was the most consequential plaque development is in the coronary arteries. Please rephrase.

      We have rephrased the statement as suggested.

      3) Last line of page 5 should read "...which will drive modules and pathways that are more likely..." not "derive"

      Typo corrected.

    1. Author Response

      We appreciate the editor's and reviewers' time to review our manuscript. We will work on the suggestions and have provided an initial assessment of what we can do for our revised submission.

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsilesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We considered two approaches initially. The first approach was to look at specific projections to the motor regions, focusing on the MLR. The second approach was to utilize a whole-brain analysis that is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that a reasonable starting point was to examine the full connectome. The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are known to be complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were from the Allen Brain Atlas terminology and were presented as abbreviations. We have looked at other ways to present it, including a greater emphasis on raw numbers and highlighting motor-related subareas. We will rewrite the connectomics section to make it more accessible, reflecting the change in the figures.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point and could help simplify the whole-brain results. We can present the motor-related inputs and outputs as part of a new figure in the main paper and add accompanying text in the results section. This will help highlight possible therapeutic pathways. We can also enhance our discussion of these motor-related pathways. We will retain the entire dataset and present it in a supplementary table for those who are interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice, and as pointed out, Kleinfeld’s group had a nice, focused presentation of their data. For the connectomic piece, we can certainly adopt their reporting style, which, as you point out, may highlight key motor-related regions. There are a few ideas here that we can explore further, as mentioned above.

      2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and even kinematic aspects during stimulation could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We will revise as suggested. We will provide additional and/or updated data in revised figures and text. We will also move Supplementary Figures S1 and S2, which present additional locomotor data, into the main text to partly address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we will add this reference. It is useful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. The area targeted by Chen et al. (2023) is in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while useful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters. However, this is in contrast to recent work showing a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in Figure 5C and G. As mentioned in comments for Reviewer 1, we will also present data in a revised Figure 5 and/or a new figure that focuses specifically on motor-related pathways to provide information on possible therapeutic pathways. As suggested, absolute values will be shared in a supplemental table.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. Our aim is to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important and important to document.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      We acknowledge that our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner. While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020).

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we will be highlighting motor-related A13 pathways in a revised Figure 5 and/or a new figure. We hope that our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We will provide more details regarding numbers we can identify as a table and text.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      We agree with the reviewer that this aspect needs to be highlighted more. Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. We can say that the lack of locomotion observed in 6-OHDA models can be reversed by A13 region stimulation. We have discussed some aspects of the gain of function possibility but will augment this in other areas of the paper as well, as suggested.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the useful comments - we will update our discussion accordingly.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019)). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Reviewer #1 (Recommendations For The Authors):

      1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      Agreed - we will add quantification and create graphs to present the data in Figure 2.

      2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Fgure 2A-E but it should be replaced with 3A-E. Please do that.

      Will be done

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      We believe that overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents. Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). We will update our paper to reflect these references.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. We will update the introduction.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA. Please correct through the text.

      We will correct this.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Thank you, we will correct this.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      We will correct this.

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. As mentioned previously, we will provide more information on viral spread and optrode location. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150 nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier and will provide c-Fos quantification to illustrate the extent of co-localization with TH.

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Will correct.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Will correct.

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Will correct.

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1, 1-way RM ANOVA: F5,25 = 0.486, P = 0.783)). We will update this.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. We will correct the error in the legend. Great suggestion for F-I - we will move them ahead of the summary figures.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We will do this.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      Data did not follow a normal distribution and thus, were plotted as box and whiskers with the horizontal line through the box indicating the group median, interquartile range indicated by the limits of the box, and group minimum and maximum indicated by the whiskers. And indeed, a non-parametric equivalent of paired t-test (Wilcoxon signed-rank test) was used.

      Fig. S2B: add the statistical marker.

      Will do

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Will do

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Will do

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      We will add that time frame. Agreed, it is shorter than the behavioral work, which was started 3 weeks after 6-OHDA injection.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Will do

      Page 8, para 3, line 4. Double-check the reference.

      Will correct and update

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We will explore alternative methods to present the data.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      As mentioned earlier, we will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. We will mention analysis time after 6-OHDA and update Figure 1a to include this.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We will explore alternative methods of presenting the data, as suggested in a previous comment. Should we retain the correlation matrix, we will incorporate the reviewer’s suggestions.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Will do

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Will do

      10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Will do

      Page 10, para 2: the section should be written in the past tense.

      Will do

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Will do

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5C and G. As mentioned in comments for Reviewers 1 and 2, we will revise Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values will be shared in a supplemental table.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a very small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labeling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each individual animal.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we will provide absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Agree - it is too simplistic. We will remove it and replace it with one outlined in comments to Reviewer 1.

      Discussion

      Although interesting, the discussion is too long.

      We will make it more concise in the revised paper.

      Page 12: para 2. If the A13 region has a pro-locomotor effect and has therapeutical potential; the claim about its plasticity relies on Fig. 4 and 5, which have a limited scope in the current analysis and presentation (see comments above).

      We will revise the paper per the comments above and then update this accordingly.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      We will include this information.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” iScience 26 (7). https://doi.org/10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October. https://doi.org/10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January): 1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December): 102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M. Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April): 144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv : The Preprint Server for Biology, June. https://doi.org/10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study provides a framework bearing on the role of Eph-Ephrin signaling mechanisms in the clinically condition of amyotrophic lateral sclerosis. It provides compelling evidence for the roles of glial cells in this condition. This novel astrocyte-mediated mechanism may help identify future therapeutic targets.

      Drs. Huang and Zaidi: Thank you for considering this revision of our manuscript for potential publication in eLife. We have addressed the excellent comments of the two reviewers, including the addition of new data. We have included detailed response-to-reviewer comments below to address each specific point, and we have highlighted all the changes in the manuscript text (using a red font color) made in response to these comments. Based on the reviewers’ critiques, we feel our re-working of the manuscript has made for a greatly improved study.

      Reviewer #1 (Public Review):

      In the manuscript by Urban et al., the authors attempt to further delineate the role which non-neuronal CNS cells play in the development of ALS. Toward this goal, the transmembrane signaling molecule ephrinB2 was studied. It was found that there is an increased expression of ephrinB2 in astrocytes within the cervical ventral horn of the spinal cord in a rodent model of ALS. Moreover, the reduction of ephrinB2 reduced motoneuron loss and prevented respiratory dysfunction at the NMJ. Further driving the importance of ephrinB2 is an increased expression in the spinal cords of human ALS individuals. Collectively, these findings present compelling evidence implicating ephrinB2 as a contributing factor towards the development of ALS.

      We thank Reviewer #1 for the very helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on these excellent points.

      Reviewer #2 (Public Review):

      The contribution of glial cells to the pathogenesis of amyotrophic lateral sclerosis (ALS) is of substantial interest and the investigators have contributed significantly to this emerging field via prior publications. In the present study, authors use a SOD1G93A mouse model to elucidate the role of astrocyte ephrinB2 signaling in ALS disease progression. Erythropoietin-producing human hepatocellular receptors (Ephs) and the Eph receptor-interacting proteins (ephrins) signaling is an important mediator of signaling between neurons and non-neuronal cells in the nervous system. Recent evidence suggests that dysregulated Eph-ephrin signaling in the mature CNS is a feature of neurodegenerative diseases. In the ALS model, upregulated Eph4A expression in motor neurons has been linked to disease pathogenesis. In the present study, authors extend previous findings to a new class of ephrinB2 ligands. Urban et al. hypothesize that upregulated ephrinB2 signaling contributes to disease pathogenesis in ALS mice. The authors successfully test this hypothesis and their results generally support their conclusion.

      Major strengths of this work include a robust study design, a well-defined translational model, and complementary biochemical and experimental methods such that correlated findings are followed up by interventional studies. Authors show that ephrinB2 ligand expression is progressively upregulated in the ventral horn of the cervical and lumbar spinal cord through pre-symptomatic to end stages of the disease. This novel association was also observed in lumbar spinal cord samples from postmortem samples of human ALS donors with a SOD1 mutation. Further, they use a lentiviral approach to drive knock-down of ephrinB2 in the central cervical region of SOD1G93A mice at the presymptomatic stage. Interestingly, in spite of using a non-specific promoter, authors note that the lentiviral expression was preferentially driven in astrocytes.

      Since respiratory compromise is a leading cause of morbidity in the ALS population, the authors proceed to characterize the impact of ephrinB2 knockdown on diaphragm muscle output. In mice approaching the end stage of the disease, electrophysiological recordings from the diaphragm muscle show that animals in the knock-down group exhibited a ~60% larger amplitude. This functional preservation of diaphragm function was also accompanied by the preservation of diaphragm neuromuscular innervation. However, it must be noted that this cervical ephrinB2 knockdown approach had no impact on disease onset, disease duration, or animal survival. Furthermore, there was no impact of ephrinB2 knockdown on forelimb or hindlimb function.

      We thank Reviewer #2 for the very helpful critique. We address each of the specific comments below, and have made changes to the manuscript based on all of these excellent points.

      The major limitation of the manuscript as currently written is the conclusion that the preservation of diaphragm output following ephrinB2 knockdown in SOD1 mice is mediated primarily (if not entirely) by astrocytes. The authors present convincing evidence that a reduction in ephrinB2 is observed in local astrocytes (~56% transduction) following the intraspinal injection of the lentivirus. However, the proportion of cell types assessed for transduction with the lentivirus in the spinal cord was limited to neurons, astrocytes, and oligodendrocyte lineage cells. Microglia comprise a large proportion of the glial population in the spinal grey matter and have been shown to associate closely with respiratory motor pools. This cell type, amongst the many others that comprise the ventral gray matter, have not been investigated in this study. Thus, the primary conclusion that astrocytes drive ephrinB2-mediated pathogenesis in ALS mice is largely correlative.

      This is an excellent point. While the majority of transduced cells were astrocytes, we did not identify the lineage of a portion of the transduced cells, which could consist of cell types such as microglia, endothelial cells and others, some of which have been linked to ALS pathogenesis. Nevertheless, we find that the cells expressing high levels of ephrinB2 in ventral horn of SOD1G93A mice are all astrocytes (as seen in Figure 1O-Q), strongly suggesting – though not definitively demonstrating – that astrocyte ephrinB2 is the pathogenic source in this model (even if our viral transduction did not solely target astrocytes).

      In the revised version of the manuscript, we now include an extensive paragraph in the Discussion section dedicated to this point.

      Importantly, we have toned down our conclusion by modifying the title by removing “…in spinal cord astrocytes…”. We changed the title from “EphrinB2 knockdown in spinal cord astrocytes preserves diaphragm innervation in a mutant SOD1 mouse model of ALS" to “EphrinB2 knockdown in cervical spinal cord preserves diaphragm innervation in a mutant SOD1 mouse model of ALS”.

      Further, it is interesting to note that no other functional outcomes were improved in this study. The C3-C5 region of the spinal cord consists of many motor pools that innervate forelimb muscles. CMAP recordings conducted at the diaphragm are a reflection of intact motor pools. This type of assessment of neuromuscular health is hard to re-capitulate in the kind of forelimb task that is being employed to test motor function (grip strength). Thus, it would be interesting to see if CMAP recordings of forelimb muscles would capture the kind of motor function preservation observed in the diaphragm muscle.

      We did perform forelimb grip strength analysis on these animals and found no effect of focal ephrinB2 knockdown. However, this functional assay is impacted more by distal forelimb muscle groups controlled by motor neuron pools located at more caudal locations of the spinal cord (i.e. low cervical and high thoracic), likely explaining the lack of effect on grip strength.

      Unfortunately, we did not perform this CMAP recording on forelimb muscle, and these mice have all already been sacrificed. We have added discussion of this point to the revised manuscript.

      On a similar note, the functional impact of increased CMAP amplitude has not been presented. An increase in CMAP amplitude does not necessarily translate to improved breathing function or overall ventilation. Thus, the impact of this improvement in motor output should be clearly presented to the reader.

      This is a very important point. While CMAP recording is a powerful assay of functional innervation of diaphragm muscle by phrenic motor neurons, it does not directly measure respiratory function. There are assays to test outcomes such as ventilatory behavior and gas exchange (e.g. whole-body plethysmography; blood gas measurements, etc.). We did not however perform these analyses. Respiratory function involves contribution of a number of other muscle groups, and these muscles are innervated by various motor neuron pools located across a relatively-large expanse of the CNS neuraxis. As we focally targeted ephrinB2 knockdown to only a small area, we would not expect effects on these other functional assays, which is why we restricted our testing to CMAP recording since this can be used to specifically study the phrenic motor neuron pool (and can be combined with detailed histological analyses in the cervical enlargement and at the diaphragm NMJ).

      Importantly, this is why we chose to use “preserves diaphragm innervation” in the manuscript title, as opposed to wording such as “preserves diaphragm function” in the title. In addition, have added this point to the Discussion section in the revised manuscript.

      Further, to the best of my knowledge, expression of Eph (or EphB) receptors has not been explicitly shown at the phrenic motor pool. It is thus speculative at best that the mechanism that the authors suggest in preserving diaphragm function is in fact mediated through Eph-EphrinB2 signaling at the phrenic motor pool. This aspect of the study would warrant a deeper discussion.

      We address this important comment with multiple pieces of data showing that Eph receptors are expressed in the phrenic motor neuron pool. EphrinB2 binds and activates EphBs, as well as EphAs such as EphA4. Importantly, previous work has linked expression of EphA4 in motor neurons to the rate of ALS progression (Van Hoecke, et al. Nature Medicine. 2012). Consistent with these studies, single-nucleus RNAseq on mouse cervical spinal cord shows that alpha motor neurons of cervical spinal cord express various EphA and EphB receptors (http://spinalcordatlas.org/; Blum et al., Nature Neuroscience, 2021; Alkaslasi et al., Nature Communications, 2021). In addition, this dataset identifies a phrenic motor neuron-specific marker (ErbB4); when we specifically look at the expression profile of only the ErbB4-expressing alpha motor neurons, the data reveal that phrenic motor neurons express a number of EphA and EphB receptors, including EphA4.

      To validate expression specifically of EphA4, we performed IHC for phosphorylated EphA4 (a marker of activated EphA4) on C3-C5 spinal cord sections from SOD1G93A mice injected with shRNAephrinB2 or control vector. We find that large ventral horn neurons are positive for phosphorylated EphA4. The ventral horn at these cervical spinal cord levels includes motor neuron pools in addition to just phrenic motor neurons; therefore, this result by itself does not conclusively show that phrenic motor neurons express EphA4, though they likely do since we find EphA4 expression in most ventral horn neuron cell bodies in C3-C5. A representative image is included in Supplemental Figure 1.

      In the revised manuscript, we added a paragraph to the Discussion section to address this important comment from the reviewer, including describing these data on Eph receptor expression.

      Lastly, although authors include both male and female animals in this investigation, they do not have sufficient power to evaluate sex differences. Thus, this presents another exciting future of investigation, given that ALS has a slightly higher preponderance in males as compared to females.

      As the reviewer notes, our studies are under-powered with respect to examining possible sex-specific effects. We now include a brief discussion of this issue in the revised manuscript.

      In summary, this study by Urban et al. provides a valuable framework for Eph-Ephrin signaling mechanisms imposing pathological changes in an ALS mouse model. The role of glial cells in ALS pathology is a very exciting and upcoming field of investigation. The current study proposes a novel astrocyte-mediated mechanism for the propagation of disease that may eventually help to identify potential therapeutic targets.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      Both reviewers were enthusiastic about your paper. Reviewer (1) had some technical queries (see his/her items 2 and 4). Reviewer (2) had some questions about principles (items 1 and 2) with the remaining points being technical queries.

      We have addressed all comments of both reviewers. We detail our responses in this Response to Reviewer Comments document and have made the associated modifications to the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Questions and/or Recommendations:

      There is convincing evidence that there is increased expression of ephrinB2 over time in the mouse model of ALS. Is there a corresponding increase in astrocytes in this animal model?

      We previously published data showing quantification of astrocyte numbers within the spinal cord of this same SOD1G93A mouse model. Specifically, we performed this quantification in the ventral horn of the lumbar spinal cord following disease onset. We found that there was a modest increase in the number of GFAP+ astrocytes at this location and disease time point.

      [ Lepore et al. Selective ablation of proliferating astrocytes does not affect disease outcome in either acute or chronic models of motor neuron degeneration. Experimental Neurology. 211 (2): 423-32, 2008. ]

      One could speculate that the increase in ephrinB2 expression we observe across the ventral horn in the mutant SOD1 mice was solely due to this modest increase in astrocyte number. However, this is highly unlikely to be the case, as in wild-type mice and in mutant SOD1 mice prior to disease onset astrocytes (and all other cell types) express very low levels of ephrinB2. Throughout disease course in these mutant SOD1 mice, the ephrinB2 expression level in individual astrocytes dramatically increases (including across most or all astrocytes), suggesting that the total increase in ephrinB2 expression across the ventral horn was not due to just this modest increase in astrocyte numbers but was instead due to the dramatically elevated eprhinB2 expression in most/all astrocytes. We have added this point to the Discussion section in the revised manuscript.

      It would help the reviewer and readers to show a lower magnification image of Figure 2, panels O and P to demonstrate the reduction of ephrin B2 in the ventral horns.

      We have added the lower magnification images to Figure 2.

      It is commended that not all data was "positive". Figure 4 especially shows some of the limitations of eprhinB2 knockdown. This provides a realistic image - strengths and limitations - of this approach. Very well done!

      Thank you! In future work, we could employ alternative vector-based strategies to restrict transduction/knockdown to only astrocytes. With such experiments, it’s possible that the impact of ephrinB2 knockdown would not be the same, if ephrinB2 actions in non-astrocytes also plays a role in disease pathogenesis. We have added discussion of this same point to the revised manuscript in response to a similar comment above from Reviewer #2.

      Reviewer comment 4: Fig 6 - if possible can you please add demographic (age/sex) with each band?

      We have added this information to the Legend. For aesthetic reasons, we chose not to add this information directly to the figure itself and instead included all of this information for each sample/band in the Legend.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the manuscript addresses a novel aspect of the role of astrocytes in mediating ALS pathogenesis. I commend the authors for a well thought-out and clearly presented study. However, a few concerns limit the enthusiasm and deserve attention to improve the clarity of the report.

      The biggest limitation of this study is that microglia or other cell types (endothelial cells) have not been explored in this study. They constitute a big proportion of cell types in the spinal cord and to conclude that only astrocytes mediate ephrinB2 signaling in the ALS model would be a stretch without the proper stains.

      Please see our comments above to address this same point from Reviewer #2.

      A clear premise for the investigation of EphrinB2 ligands has not been presented in the introduction. The authors provide a good background on the emerging role of EphEphrin interactions in neurodegenerative diseases. But it is unclear how the authors landed on this sub-class of ephrins.

      We have added this premise to the Introduction section of the revised manuscript. In published work, ephrinB2 has been shown to be upregulated in reactive astrocytes and to be involved in disease pathogenesis in other neurological disease models (e.g. traumatic spinal cord injury).

      There are several acronyms that have not been defined in the manuscript, e.g. GPI.

      We now define “GPI” and all other abbreviations in the revised manuscript.

      While the authors state that males and females had been included in the study, their individual n's for various outcomes have not been presented in the results section. Further, n's are missing from the figure legends, which will aid the clarity of the presentation. Further, please clarify the ages of the mice in the methods section.

      (1) We now provide the n’s for males versus females for all analyses in the figure legends. (2) We also now include the total n for each experimental condition in all of the figure legends. (3) We also now state the ages of the mice for the various analyses in the Methods section.

      It appears that several statistical interactions have been summarized in the results section but inconsistently reported on figures.

      We now provide the exact n’s for each analysis in all figure legends. We include all of the details of the statistical analysis in the text of the Results section and do not include this text in the Legends; we do this for all figures to maintain consistency.

      I presume that when the authors write "the number of neurons with somal diameter greater than 200 μm and with an identifiable nucleolus was determined", the 200 was a typo. Mouse motor neurons do not have a diameter of 200 μm and perhaps the authors meant an area of 200μm2?

      We have corrected this: 200 μm2

      Authors should consider adding a quantification for the human tissue immunoblots.

      We have added the quantification of these human tissue data for ephrinB2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for their overall positive assessment of this work. We have carefully revised the manuscript and implemented near all reviewers’ public and confidential recommendations. We believe these modifications have strengthened the manuscript and hope it will further convince the editors and reviewers.

      We below provide a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      To further understand the plasticity of vestibular compensation, Schenberg et al. sought to characterize the response of the vestibular system to short-term and partial impairment using gaze stabilization behaviors. A transient ototoxic protocol affected type I hair cells and produced gain changes in the vestibulo-ocular reflex and optokinetic response. Interestingly, decreases in vestibular function occurred in coordination with an increase in ocular reflex gain at frequencies where vestibular information is more highly weighted over visual. Moreover, computational approaches revealed unexpected detriment from low reproducibility on combined gaze responses. These results inform the current understanding of visual-vestibular integration especially in the face of dysfunction.

      Strengths

      The manuscript takes advantage of VOR measurements which can be activated by targeted organs, are used in many species including clinically, and indicate additional adverse effects of vestibular dysfunction. The authors use a variety of experimental procedures and analysis methods to verify results and consider individual performance effects on the population data. The conclusions are well-justified by current data and supported by previous research and theories of visuo-vestibular function and plasticity.

      The authors thank reviewer 1 for emphasizing these positive aspects of the work.

      Weaknesses

      The manuscript describes the methodology as inducing reversible changes (lines 44, 67,) but the data shows a reversible effect only in hair cell histology (Fig 3A-B) not in function as demonstrated by the persistent aVOR gain reduction in week 12 (Fig 1C) and increase of OKR gain in weeks 6-12 (Fig 4C/D).

      Rodents exposed to IDPN in the drinking water show from complete to null reversibility of the function loss depending on the IDPN concentration and duration of exposure, and the relationship between exposure and effect varies as a function of species, strain and sex of the exposed animals (Llorens and Rodríguez-Farré, Neurotoxicol. Teratol., 1997; Seoane et al., J. Comp. Neurol. 2001; Sedó-Cabezón et al., Dis. Model. Mech., 2015; Greguske et al., Arch. Toxicol., 2019). In addition, there is individual variability. The concentration of IDPN and time of exposure used in this study were selected to result in a loss followed by complete reversion but, as noted by the referee, the reversion was complete on Hair cells, while the gaze stabilizing reflexes show differential degrees of recovery depending on the functional tests (complete recovery on OCR; partial on aVOR and OKR). These make the IDPN subchronic protocol an interesting methodology to study the long term consequences of partial/reversible inner ear impairment. To be more accurate in the description of the reversibility, we have now introduced the following changes:

      Lines 43: Subchronic exposure to IDPN in drinking water at low doses allowed for progressive ototoxicity, leading to a partial and largely reversible loss of function.

      Lines 67-68: We demonstrate that despite the significant recovery in their vestibulo-ocular reflexes, the visuo-vestibular integration remains notably impaired in some IDPN-treated mice

      Lines 578: Previous experiments (Greguske et al., 2021) had demonstrated that at these concentrations, ototoxic lesions produced by IDPN are largely reversible.

      Reviewer 1: The manuscript begins with the mention of fluctuating vestibular function clinically, but does not connect this to any specific pathologies nor does it relate its conclusions back to this motivation.

      Thank you. We have now added a conclusion (lines 525-552) to discuss the results in a clinical perspective.

      Reviewer 1: The conclusions of frequency-specific changes in OKR would be stronger if frequency-specific aVOR effects were demonstrated similar to Figure 4D.

      We have presented the frequency-selective effects in Figure 1 supplement and related text; changes observed in aVOR are mostly (see below) comparable for all frequencies >0.2Hz. However, we have edited the text to better highlight when the IDPN differentially affect aVOR tested at different frequencies (see lines 97-99).

      Reviewer #2 (Public Review):

      This is a very nice study showing how partial loss of vestibular function leads to long term alterations in behavioural responses of mice. Specifically, the authors show that VOR involving both canal and otolith afferents are strongly attenuated following treatment and partially recover. The main result is that loss of VOR is partially "compensated" by increased OKR in treated animals. Finally, the authors show that treatment primarily affects type I hair cells as opposed to type II. Overall, these results have potentially important implications for our understanding of how the VOR Is generated using input from both type I and type II hair cells. As detailed below however, more controls as well as analyses are needed.

      The authors thank reviewer 2 for positive evaluation regarding the potential implication of the work.

      Major points:

      Reviewer 2: The authors analyze both canal and otolith contributions to the VOR which is great. There is however an asymmetry in the way that the results are presented in Figure 1. Please correct this and show time series of fixations for control and at W6 and W12. Moreover, the authors are plotting table and eye position traces in Fig. 1B but, based on the methods, gains are computed based on velocity. So please show eye velocity traces instead. Also, what was the goodness of fit of the model to the trace at W6? If lower than 0.5 then I think that it is misleading to show such a trace since there does not seem to be a significant VOR.

      Figure 1 was modified as suggested. Panel B now shows velocity traces, and goodness of fit is reported in figure legend. Panel E now shows raw OCR traces at W0, W6, W12.

      Reviewer 2: This is important to show that the loss is partial as opposed to total. It seems to me that the treatment was not effective at all for aVOR for at least some animals. What happens if these are not included in the analysis?

      The reviewer is correct, there is indeed variability in the alteration observed during the treatment, as previously described and expected from previous experiments (Llorens and Rodríguez-Farré, Neurotoxicol. Teratol., 1997; Seoane et al., J. Comp. Neurol. 2001; SedóCabezón et al., Dis. Model. Mech., 2015; Greguske et al., Arch. Toxicol., 2019). It was actually one of the goal of the study to compare hair cell loss and VOR responses in individuals. The individual aVOR gain and phase responses during the IDPN treatment are all presented in Figure 1 supplement. aVOR was reduced in all animals, although 2/21 only showed a decrease of less than 10% of their initial gain at W6. If these were excluded from the analysis, the statistical differences between the 2 groups would be reinforced.

      Reviewer 2: Figure 2A shows a parallel time course for gains of aVOR and OCR at the population level. Is this also seen at the individual level?

      Yes, this is seen in individuals. This result is presented in Figure 2C and 2D which illustrate the similar effect of IDPN on aVOR and OCR responses at week 6 and week 12 at the individual level (each symbol represents an individual mouse). The plotted delta gain of both aVOR and OCR represents the relative loss of vestibular function for each individual mouse at W6 and W12, respectively.

      Reviewer 2: Figure 3: please show individual datapoints in all conditions.

      Figure 3 was modified to show individual datapoints in all conditions (see Figure 3 A2, A3, C2 and C3).

      Reviewer 2: Figure 4: The authors show both gain and phase for OKR. Why not show gain and phase for aVOR and OCR in Figure 1. I realize that phase is shown in sup Figures but it is important to show in main figures. The authors show a significant increase in phase lead for aVOR but no further mention is made of this in the discussion. Moreover, how are the authors dealing with the fact that, as gain gets smaller, the error on the phase will increase. Specifically, what happens when the grey datapoints are not included?

      As pointed by the reviewer, we have included all aVOR phase results in Figure 1 supplement and also stated it in the main text (lines 100-102). There is however no phase calculated for the OCR which is a static test, as better illustrated in new Figure 1E. Error in phase calculations increases as gain gets smaller. To take this into account, the phase corresponding to the grey points (VAF<0,5; corresponding to Gains<0.10) were not included in the statistical analysis of the aVOR phase. This point is now made clearer in methods lines 639-640.

      Reviewer 2: Discussion: As mentioned above, the authors should discuss the mechanisms and implications of the observed phase lead following treatment. Moreover, recent literature showing that VN neurons that make the primary contribution to the VOR (i.e., PVP neurons) tend to show more regular resting discharges than other classes (i.e., EH cells), and that such regularity is needed for the VOR should be discussed (Mackrous et al. 2020 eLife). Specifically, how are type I and type II hair cells related to discharge regularity by central neurons in VN?

      We have now added discussion regarding mechanisms and implications of the phase changes in lines 363-371. The authors thank reviewer 1 for pointing at the Mackrous et al. 2020 eLife paper which is now included in the updated discussion. The relations between type I and type II and discharge regularity in afferents and central VN is further discussed 442-449.

      Below we provide answers to specific recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer 1: Were hair cells counted for the whole organ? what was the control for epithelial size differences?

      The effect of the treatment on hair cells was estimated by counting numbers of cells in square area of the central and peripheral parts of the sensory epithelia. The text has been modified to better describe the method, lines 748-751.

      Reviewer 1: The title of the article leads readers to expect more emphasis on hair cell changes, while the content of the manuscript is more functional and encompassing the visual and vestibular systems.

      We have retained the original tittle.

      Reviewer 1: Please provide acronym definitions before they are used. Examples: HC (line 63), W6 etc (line 82-83)

      Done as suggested on lines 63, 82 and 107.

      Reviewer 1: Please describe the ages of animals used in the study.

      The animals used in the study were 6 weeks old at the beginning of the protocol and 20 weeks old at the end. The text has been modified accordingly, line 564.

      Reviewer 1: Consider changing "until" to "through" when describing time ranges (initially line 88), as the following time mentioned is included in the statement. E.g., line 216-217 sounds as if gain was insignificantly different at W12.

      Done as suggested, lines 88 and 219.

      Reviewer 1: Line 162: lower case for "immunostaining".

      Done, line 164.

      Reviewer 1: Consider regrouping or renumbering panels of Figure 3 for more clarity.

      Panels in Figure 3 were regrouped as suggested, with first the canal-related data in panels A-B followed by the utricule-related data in panels C-D.

      Reviewer 1: Lines 222-223: reword as gain increased not frequency.

      Thank you, the text has been reworded, line 224-225.

      Reviewer 1: It is unclear if the two subgroups revealed in CGR analysis (line 288) are relevant to the two groups described in VOR responses (line 137-138). Please clarify if these are the same mice or distinct clusters.

      The two subgroups found in the CGR analysis differ from the clusters revealed by the decrease of the aVOR gain; the text has been modified lines 300-301.

      Reviewer 1: Consider adding that irregular afferents + calyces are relevant specifically to type I HCs (lines 411-426).

      The text has been modified to clarify the contacts between the two types of vestibular afferents and hair cells, lines 431-435.

      Reviewer 1: Line 434: clarify which "scheme" given context before and after this phrase

      In order to clarify this part of the discussion, the text has been modified and this term no longer appears.

      Reviewer 1: Please indicate the time gap from surgery to treatment.

      The time gap from the surgery to treatment, at least 72h, has been updated in the methods, lines 575.

      Reviewer 1: Line 619-620: It is unclear if VOR and OKR sessions were randomized in order or if the authors have considered training or adaptive effects from the initial test session.

      VOR and OKR sessions were performed on different days to limit cross effects, lines 639-640.

      Reviewer 1: Line 688: typo-change ifG to IgG.

      modified, line 744.

      Reviewer 1: Line 692-693: were hair cells counted for the whole organ? what was the control for epithelial size differences?

      The effect of the treatment on hair cells was estimated by counting numbers of cells in square area of the central and peripheral parts of the sensory epithelia. The text has been modified to better explain the method, lines 748-751.

      Reviewer 1: Change decimal indicator to be consistent (commas used in lines 732, 759, 776, Figure 6C),

      Thank you; modified as suggested.

      Reviewer 1: Line 763: "stimulation at 0.5Hz &10{degree sign}/s" is unclear.

      The text has been modified, line 817.

      Reviewer 1: Line 765: bold (E)

      The police format has been updated, line 820.

      Reviewer 1: Line 770-771: A) insert OKR to be "mean delta aVOR and delta OKR gain", B) plot is OKR as a function of VOR.

      Thank you, done as suggested. The text has been modified, line 824. Reviewer 1: Describe Figure 6 delta at initial mention (line 784 instead of 786) Authors: thank you, done as suggested, line 839.

      Reviewer 1: It is unclear why the tables are included if never mentioned in the text.

      The tables are now mentioned, lines 90 and 218.

      Reviewer 1: Figure 1: is the observed gain for Sham group expected value rather than closer to 1?

      Yes, as the value reported on Figure 1 is a mean of the values obtained during aVOR test in the dark at frequencies in range 0.2-1Hz (see also Figure 1 Supplement).

      Reviewer 1: Figure 2: A) give enough space to see error bars at W2. Consider making test data more easily distinguishable. B) is OCR mean or specific stimulation? C/D) move 1Hz label from title to x-axis label as it does not describe OCR test. Figure 5: C) consider making color specific to frequency for better distinction on C+D as symbols previously indicated individual data. D) 1Hz specific to OKR? move to axis label instead of title

      The Figures 2 and 5 have been modified according to reviewer 1 suggestions.

      Reviewer 1: Figure 6: A/B) what time point are these, W12?

      Those points correspond to W6 and W12, the text has been updated to specify the time points, lines 834 and 835.

      Reviewer #2 (Recommendations For The Authors):

      The authors should perform additional analyses that will help strengthen their results.

      We are unsure about the exact implementation of this recommendation. However, we have strengthened our results by following all reviewers’ specific recommendations.

    1. Author Response

      Reviewer #1 (Public Review):

      Assessment:

      The manuscript titled 'Rab7 dependent regulation of goblet cell protein CLCA1 modulates gastrointestinal 1 homeostasis' by Gaur et al discusses the role of Rab7 in the development of ulcerative colitis by regulating the lysosomal degradation of Clca1, a mucin protease. The manuscript presents interesting data and provides a potential molecular mechanism for the pathological alterations observed in ulcerative colitis. Gaur et al demonstrate that Rab7 levels are lowered in UC and CD. However, a similar analysis of Rab7 levels in ulcerative colitis (UC) and Crohn's disease (CD) patient samples was conducted recently (Du et al, Dev Cell, 2020) which showed that Rab7 levels are found to be elevated under these conditions. While Gaur et al have briefly mentioned Du et al's paper in passing in the discussion, they need to discuss these contradictory results in their paper and clarify these differences. Additionally, Du et al are not included in the list of references.

      Strengths:

      The manuscript used a multi-pronged approach and compares patient samples, mouse models of DSS, and protocols that allow differentiation of goblet cells. They also use a nanogel-based delivery system for siRNAs, which is ideal for the knockdown of specific genes in the gut.

      Weaknesses:

      Du et al, Dev Cell 2020 (https://doi.org/10.1016/j.devcel.2020.03.002) have previously shown that Rab7 levels are elevated in a similar set of colonic samples (age group, number etc) from UC and CD patients. Gaur et al have not discussed this paper or its findings in detail, which directly contradicts their results. Clarification regarding this should be provided.

      We thank and appreciate the reviewer for bringing this point.

      The results shown by Du et al, Dev Cell, 2020 depict elevated expression of Rab7 in UC and CD patients compared to controls. In first occurrence, these results appear contradictory, but there may be a few possible explanations for this.

      Firstly, Rab7 expression levels may fluctuate in the tissue depending on the degree of the gut inflammation. This can be concluded from our observations in DSS-mice dynamics model and the human patient samples with mild and moderate UC. Furthermore, Du et al provide no information of the severity of the condition among the patients employed in the study. Our motive, in the current work, was to emphasise this aspect. This point was mentioned in the discussion section of the manuscript. However, in view of the reviewer’s concern, we now intend to add a detailed comment on this in the main text of the revised version of the manuscript.

      Secondly, the control biopsies in our investigation were acquired from non-IBD patients, and not what was done by Du et al., wherein biopsies from the normal para-carcinoma region of the colorectal cancer patients was used. One can not overlook the fact that physiological and molecular changes are apparent even in non-inflamed regions in the gut of an IBD or CRC patient. It is possible that the observed discrepancy arises due to the differences in the sample type used for comparing the Rab7 expression.

      Finally, the main sub-tissue region showing a decrease in Rab7 expression in UC samples, appeared to be the Goblet cells which was not covered by Du et al.

      Keeping these points in mind we do not think that there is a contradiction in our findings with that of Du et al., 2020. In the revised submission some of these explanations will be incorporated. Include Du et al in the reference list and add the comment in main text.

      This was an oversight from our side. We have actually mentioned Du et al., 2020 in the discussion (line number 338) but somehow the reference was missing in the main list. We will ensure that the reference is included in the revised version and that their findings are included both in main text and in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors report a role for the well-studied GTPase Rab7 in gut homeostasis. The study combines cell culture experiments with mouse models and human ulcerative colitis patient tissues to propose a model where, Rab7 by delivering a key mucous component CLCA1 to lysosomes, regulates its secretion in the goblet cells. This is important for the maintenance of mucous permeability and gut microbiota composition. In the absence of Rab7, CLCA1 protein levels are higher in tissues as well as the mucus layer, corroborating with the anticorrelation of Rab7 (reduced) and CLCA1 (increased) from ulcerative colitis patients. The authors conclude that Rab7 maintains CLCA1 level by controlling its lysosomal degradation, thereby playing a vital role in mucous composition, colon integrity, and gut homeostasis.

      Strengths:

      The biggest strength of this manuscript is the combination of cell culture, mouse model, and human tissues. The experiments are largely well done and in most cases, the results support their conclusions. The authors go to substantial lengths to find a link, such as alteration in microbiota, or mucus proteomics.

      Weaknesses:

      There are also some weaknesses that need to be addressed. The association of Rab7 with UC in both mice and humans is clear, however, claims on the underlying mechanisms are less clear. Does Rab7 regulate specifically CLCA1 delivery to lysosomes, or is it an outcome of a generic trafficking defect? CLCA1 is a secretory protein, how does it get routed to lysosomes, i.e. through Golgi-derived vesicles, or by endocytosis of mucous components? Mechanistic details on how CLCA1 is routed to lysosomes will add substantial value.

      We thank the reviewer for the insightful comment. We would like to bring forth the following explanation for each these concerns:

      (a) Our immunofluorescence imaging experiments revealed co-localization of Rab7 protein with CLCA1 and the lysosomes (Fig 7I). In addition, the absence of Rab7 affects the transport of CLCA1 to lysosomes (Fig 7J). This demonstrates that Rab7 may be involved in regulation of CLCA1 transport (presumably along with other cargo), to lysosomes selectively. However, we do recognise that the point raised by the reviewer about possible effect of a generic trafficking defect is valid. (b) As mentioned in the manuscript, the trafficking of CLCA1 protein or CLCA1-containing vesicles within the goblet cell is unknown, with no information on the proteins involved in its mobility. The switching of CLCA1 containing vesicles from the secretory route to lysosomes needs extensive investigation involving overall trafficking of the protein. Taken together, the complete answer to both these important questions will need a series of experiments and those may be interesting avenues for future research.

      (a) Why does the level of Rab7 fluctuate during DSS treatment (Fig 1B)? (b) Does the reduction seen in Rab7 levels (by WB) also reflect in reduced Rab7 endosome numbers?

      This is a very thoughtful point from the reviewer. We detected a distinct pattern of Rab7 expression fluctuation in intestinal epithelial cells after DSS-dynamics treatment in mice. Perhaps, these changes are the result of complex cellular signalling in response to the DSS treatment. Rab7, being a fundamental protein involved in protein sorting pathway, is expected to undergo alteration based on cells requirement. Presently there are no reports suggesting the regulatory mechanisms that govern Rab7 levels in the gut. (b) We observed reduction in Rab7 expression both at RNA and protein levels. To confirm whether this alteration will lead to reduced Rab7 positive endosome numbers may require detailed investigations.

      Are other late endosomal (and lysosomal) populations also reduced upon DSS treatment and UC? Is there a general defect in lysosomal function?

      There are no direct evidences showing reduction in the late endosomal and lysosomal population during gut inflammation, but few studies link lysosomal dysfunction with risk for colitis (doi: 10.1016/j.immuni.2016.05.007).

      The evidence for lysosomal delivery of CLCA1 (Fig 7 I, J) is weak. Although used sometimes in combination with antibodies, lysotracker red is not well compatible with permeabilization and immunofluorescence staining. The authors can substantiate this result further using lysosomal antibodies such as Lamp1 and Lamp2. For Fig 7J, it will be good to see a reduction in Rab7 levels upon KD in the same cell.

      We used Lysotracker red in live cells followed by fixation. So, permeabilization issues were resolved. Lamp1, as suggested by the reviewer, is definitely a better marker for lysosomes in immunofluorescence studies, but is also shown to mark late endosomes (doi: 10.1083/jcb.132.4.565). As Rab7 protein also marks the late endosomes, using Lamp1 may leave the ambiguity of CLCA1 in Rab7 positive late endosomes versus lysosomes. Nevertheless, we will be carrying out this experiment and the data will be shared in revised version of the work.

      In this connection, Fig S3D is somewhat confusing. While it is clear that the pattern of Muc2 in WT and Rab7-/- cells are different, how this corroborates with the in vivo data on alterations in mucus layer permeability -- as claimed -- is not clear.

      The data in Fig. S3D suggest the involvement of Rab7 in packaging of Muc2. The whole idea for doing this experiment was to support our observation in the Rab7KD-mice model where mucus layer was seen to be loose and more permeable in Rab7 deficient mice.

      Overall, the work shows a role for a well-studied GTPase, Rab7, in gut homeostasis. This is an important finding and could provide scope and testable hypotheses for future studies aimed at understanding in detail the mechanisms involved.

      We thank the reviewer for this comment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study and associated data is compelling, novel, important, and well-carried out. The study demonstrates a novel finding that different chemotherapeutic agents can induce nucleolar stress, which manifests with varying cellular and molecular characteristics. The study also proposes a mechanism for how a novel type of nucleolar stress driven by CDK inhibitors may be regulated. The study sheds light on the importance of nucleolar stress in defining the on-target and offtarget effects of chemotherapy in normal and cancer cells.

      We are thankful to the reviewers and the editor for their feedback and thorough assessment of our work. Our responses to the comments and suggestions are below.

      Reviewer #1 (Public Review):

      The study titled "Distinct states of nucleolar stress induced by anti-cancer drugs" by Potapova and colleagues demonstrates that different chemotherapeutic agents can induce nucleolar stress, which manifests with varying cellular and molecular characteristics. The study also proposes a mechanism for how a novel type of nucleolar stress driven by CDK inhibitors may be regulated. As a reviewer, I appreciate the unbiased screening approach and I am enthusiastic about the novel insights into cell biology and the implications for cancer research and treatment. The study has several significant strengths: i) it highlights the understudied role of nucleolar stress in the on- and off-target effects of chemotherapy; ii) it defines novel molecular and cellular characteristics of the different types of nucleolar stress phenotypes; iii) it proposes novel modes of action for well-known drugs. However, there are several important points that should be addressed:

      • The rationale behind choosing RPE cells for the screen is unclear. It might be more informative to use cancer cells to study the effects of chemotherapeutic agents. Alternatively, were RPE cells selected to evaluate the side effects of these agents on normal cells? Clarifying these points in the introduction and discussion would guide the reader.

      RPE1, a non-cancer-derived cell line, was chosen for this study to evaluate the effects of anticancer drugs on normal nucleolar function, with the underlying premise that nucleolar stress in normal cells can contribute to non-specific toxicity. This clarification is added to the introduction. Another factor that played in selecting a normal cell line for the drug screen and subsequent experiments was the spectrum of known and unknown genetic and metabolic alterations present in various cancer cell lines. These variables are often unique to a particular cancer cell line and may or may not impact nucleolar proteome and function. Therefore, the nucleolar stress response can be influenced by the spectrum of alterations inherent to each cancer. Our primary focus was to determine the impact of these drugs under normal conditions.

      That said, the selected hits of main drug classes were validated in a panel of cell lines that included two other hTERT lines (BJ5TA and CHON-002) and two cancer lines (DLD1 and HCT116). In cancer cells starting nucleolar normality scores were lower than in hTERT cells, suggesting that genetic and metabolic changes in these cells may indeed affect nucleolar morphology. Nonetheless, all drugs from a panel of selected hits from different target classes validated in both cancer cell lines (Fig. 2F).

      • Figure 2F indicates that DLD1 and HCT116 cells are less sensitive to nucleolar changes induced by several inhibitors, including CDK inhibitors. It would be crucial to correlate these differences with cell viability. Are these differences due to cell-type sensitivity or variations in intracellular drug levels? Assessing cell viability and intracellular drug concentration for the same drugs and cells would provide valuable insights.

      One of the reasons for the reduced magnitude of the effects of selected drugs in DLD1 and HCT116 cells is their lower baseline normality scores compared to hTERT cells (now shown in Sup. Fig. 1B-C). Other potential factors include proteomic and metabolic shifts and alterations in signaling pathways that control ribosome production. The less-likely possibility of variations in intracellular drug levels cannot be excluded, but measuring this for every compound in every cell line was not feasible in this study. These limitations are now noted in the results section.

      Regarding the point about viability - our initial screen output, in addition to normality scores, included cell count (cumulative count of cells in all imaged fields), which serves as a proxy for viability. By this measure, all hit compounds in our screen were cytostatic or cytotoxic in RPE1 cells (Fig. 2C). The impact of these drugs on the viability of cancer cells that can have various degrees of addiction to ribosome biogenesis merits a separate study of a large cancer cell line panel.

      • Have the authors interpreted nucleolar stress as the primary cause of cell death induced by these drugs? When cells treated with CDK inhibitors exhibit the dissociated nucleoli phenotype, is this effect reversible? Is this phenotype indicative of cell death commitment? Conducting a washout experiment to measure the recovery of nucleolar function and cell viability would address these questions.

      Whether nucleolar toxicity is the primary cause of cytotoxicity for a given chemotherapy drug is an incisive and thought-provoking question. Our screen did not discern whether the cytotoxic effects of our hits were due to inhibition of their intended targets, their impact on the nucleolus, or a combined effect. This point is now mentioned in the results section. Regarding the reversibility of the nucleolar disassembly phenotype seen in CDK inhibitors –in the case of flavopiridol, which is a reversible CDK inhibitor, we demonstrated that nucleoli re-assembled within 4-6 hours after the drug was washed out. An example of this is shown in Sup. Figure 3 and in Video 5. For these experiments, cells were pretreated with the drug for 5 hours, not long enough to cause cell death.

      • The correlation between the loss of Treacle phosphorylation and nucleolar stress upon CDK inhibition is intriguing. However, it remains unclear how these two events are related. Would Treacle knockdown yield the same nucleolar phenotype as CDK inhibition? Moreover, would point mutations that abolish Treacle phosphorylation prevent its interaction with Pol-I? Experiments addressing these questions would enhance our understanding of the correlation/causation between Treacle phosphorylation and the effects of CDK inhibition on nucleolar stress.

      We agree that the Treacle finding is interesting and warrants further investigation. In our attempts to knock down Treacle with siRNA, its protein levels were reduced by no more than 50%, which was not sufficient to cause a strong nucleolar stress response. Therefore, these data were not incorporated into the manuscript. However, in our view, Treacle is unlikely to be the only nucleolar CDK substrate whose dephosphorylation is causing the “bare scaffold” phenotype caused by the transcriptional CDK inhibitors. Our phospho-proteomics studies identified multiple nucleolar CDK substrates with established roles in the formation of the nucleolus. For instance, the granular component protein Ki-67 was also dephosphorylated on multiple sites and dispersed throughout the nucleus (shown in Sup. Fig 4). Given that CDKs typically phosphorylate many substrates that can have multiple phosphorylation sites, identifying a sole protein or phosphorylation site responsible for nucleolar disassembly may be an unattainable target.

      Overall, this study is significant and novel as it sheds light on the importance of nucleolar stress in defining the on-target and off-target effects of chemotherapy in normal and cancer cells.

      Thank you, we appreciate the positive and constructive assessment of our study.

      Reviewer #2 (Public Review):

      This is an interesting study with high-quality imaging and quantitative data. The authors devise a robust quantitative parameter that is easily applicable to any experimental system. The drug screen data can potentially be helpful to the wider community studying nucleolar architecture and the effects of chemotherapy drugs. Additionally, the authors find Treacle phosphorylation as a potential link between CDK9 inhibition, rDNA transcription, and nucleolar stress. Therefore I think this would be of broad interest to researchers studying transcription, CDKs, nucleolus, and chemotherapy drug mechanisms. However, the study has several weaknesses in its current form as outlined below.

      1) Overall the study seems to suffer from a lack of focus. At first, it feels like a descriptive study aimed at characterizing the effect of chemotherapy drugs on the nucleolar state. But then the authors dive into the mechanism of CDK inhibition and then suddenly switch to studying biophysical properties of nucleolus using NPM1. Figure 6 does not enhance the story in any way; on the contrary, the findings from Fig. 6 are inconclusive and therefore could lead to some confusion.

      This study was specifically designed to examine a broad range of chemotherapy drugs. The newly created nucleolar normality score enabled us to measure nucleolar stress precisely and in high throughput. Our primary objective was to find drugs that disrupt the normal nucleolar morphology and then study in-depth the most interesting and novel hits. We have made revisions to emphasize that these are the primary focal points of the manuscript.

      As context, we were motivated to explore the biophysical properties of the nucleolus because they are thought to underlie its formation and function, which also suggested a potential predictive value for modeling nucleolar responses to drug treatments. For this, we edited the RPE1 cell line by endogenously tagging NPM1, a granular component protein that behaves in line with the phase-separation paradigm in vitro and when over-expressed. We fully expected to confirm that its behavior in vivo would be consistent with LLPS, but instead found that even in an untreated scenario, the dynamics of endogenous NPM1 could not be fully explained by the phase separation theory (Fig. 6 A-C). Our message is that accurately predicting drug responses using the nucleolar normality score as a readout, based on our current understanding of the biophysical forces governing nucleolar assembly, is unworkable. For instance, normality scores decrease and NPM1 dynamics increase radically when CDKs are inhibited, without changes in NPM1 concentration or concentrations of other protein components (Fig.6 E-H). These observations are important because they highlight our gaps in understanding the relative contribution of phase separation versus active assembly in nucleolar formation. We believe that these observations are worth sharing with the scientific community.

      2) The justification for pursuing CDK inhibitors is not clear. Some of the top hits in the screen were mTOR, PI3K, HSP90, Topoisomerases, but the authors fail to properly justify why they chose CDKi over other inhibitors.

      We decided to focus on CDK inhibitors for several reasons. First, their effects were completely new and unexpected, suggesting the existence of an unknown mechanism regulating nucleolar structure and function. In addition, CDK inhibitors caused a very strong and distinct nucleolar stress phenotype with the lowest normality scores that merited its own term, the “bare scaffold” phenotype. One more reason for pursuing CDK-inhibiting drugs was their high rate of failure in clinics because of the intense and hard-to-explain toxicity. We suspect that this toxicity may be due at least in part to their profound effect on nucleolar organization and ribosome production throughout the body. We stated this rationale more explicitly in the manuscript.

      3) In addition to poor justification, it seems like a very superficial attempt at deciphering the mechanism of CDK9imediated nucleolar stress. I think the most interesting part of the study is the link between CDK9, Pol I transcription, and nucleolar stress. But the data presented is not entirely convincing. There are several important controls missing as detailed below.

      We agree with the reviewer that follow-up studies of CDK9, Pol I, and nucleolar stress connection are important long-term goals. However, the primary objective of this study was to ascertain the scope of anticancer agents that can cause nucleolar stress and the establishment of nucleolar stress categories. This is an important advance and could serve as the foundation for a standalone in-depth study or multiple studies. We have included the complete screen, proteomics, and phospho-proteomics results (Sup. Tables 1, 2, and 3), which will enable other investigators to mine the screen information based on their specific interests. Furthermore, we have made multiple text revisions to clarify rationale and interpretation, and incorporated additional data that strengthen the manuscript.

      4) The authors did not test if inhibition of CDK7 and/or CDK12 also induces nucleolar stress. CDK7 and CDK12 are also major kinases of RNAPII CTD, just like CDK9. Importantly, there are well-established inhibitors against both these kinases. It is not clear from the text whether these inhibitors were included in the screen library.

      Our anticancer compound library contained CDK7 inhibitor THZ1⦁2HCL, and it was a hit at both 1 and 10 uM concentrations (Sup. Table 1). However, its nucleolar stress phenotype was morphologically distinct from CDK9 inhibitors, resembling the stress caps phenotype instead of the bare scaffold phenotype. We did not pursue CDK7 because of its two hard-to-separate functions: in addition to its role as an RNAPII CTD kinase, it also acts as a CDK-activating kinase (CAK) by promoting the associations of multiple CDKs with their cyclin partners. This dual role of CDK7 makes the interpretation of THZ1-induced nucleolar stress phenotype difficult because it could be attributed to either or both of these functions. Moreover, it was reported to cause DNA damage, which may explain why it causes stress caps. An image depicting nucleolar stress phenotype caused by THZ1⦁2HCL is provided in Author response image 1.

      Author response image 1.

      Control and THZ1 - treated RPE1 cells, images from screen plates.

      We are not aware of specific inhibitors of CDK12, as they also reportedly inhibit CDK13. None of the CDK12/CDK13 inhibitors were present in our library, therefore we can neither confirm nor exclude the possible involvement of these kinases in regulating nucleolar structure. Many other existing CDK inhibitors were absent from our library. Our work highlights the importance of assessing their potential to induce nucleolar stress and offers an approach for this assessment.

      5) In Figure 4E, the authors show that Pol I is reduced in nucleolus/on rDNA. The authors should include an orthogonal method like chromatin fractionation and/or ChIP

      We acknowledge the reviewer’s request for additional validation of reduced occupancy of rDNA by Pol I.<br /> Nucleolar chromatin fractionation in cells treated with CDK inhibitors is unlikely to work due to nearly complete nucleolar disassembly. Chromatin immunoprecipitation would require finding and validating a suitable ChIP-grade antibody. Moreover, the evaluation of repetitive regions by ChIP is non-trivial and error-prone. To help address this request and further confirm the POLR1A immunofluorescence results in 4E, we included additional immunofluorescence data obtained with a different POLR1A antibody (Sup. Fig. 3D), and the results were similar.

      6) In Fig. 5D, in vitro kinase lacks important controls. The authors should include S to A mutants of Treacle S1299A/S1301A to demonstrate that CDK9 phosphorylates these two residues specifically.

      7) To support their model, the authors should test if overexpression of Treacle mutants S1299A/S1301A can partially phenocopy the nucleolar stress seen upon CDK9 inhibition. This would considerably strengthen the author's claim that reduced Treacle phosphorylation leads to Pol I disassociation from rDNA and consequently leads to nucleolar stress.

      8) Additionally, it would be interesting if S1299D/S1301D mutants could partially rescue CDK9 inhibition.

      Points (6-8):

      We reiterate that transcriptional CDKs target multiple nucleolar proteins, and the observed phenotype might be due to the combined effects of de-phosphorylation of multiple substrates. We concur that deconstructing the role of Treacle phosphorylation sites is very interesting and warrants further in-depth studies. The phospho-proteomics enrichment method, while an effective first-pass strategy, might not capture 100% of the phosphorylated sites. Treacle is a phospho-protein with an abundance of serine and threonine residues. It could potentially have been selectively dephosphorylated on more sites than were detected by this method. Therefore, the suggested mutations may not be the exclusive contributors responsible for the functional phenotype. Additionally, overexpressing Treacle impairs the viability of RPE1 cells, complicating the interpretation of experiments involving overexpression of both wild-type and mutant proteins. A conceivable strategy would involve generating phosphomimetic and non-phosphorylatable mutants by gene editing, studying their interactions by biochemical approaches, and determining their impact on nucleolar function, but this may take years of additional work. We hope that our work will inspire further studies that explore Treacle phosphorylation and other functions of transcriptional CDKs in nucleolar formation.

      Thank you for the thoughtful review and suggestions.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript could be re-organized to focus on 'CDK9-Treacle-Pol I-nucleolar stress' as the central part of the story.

      While we acknowledge this suggestion, it's important to emphasize that the primary focus of this manuscript is on the identification of anticancer drugs that induce nucleolar stress and the establishment of nucleolar stress categories.

      2) Include a "no ATP" control in the in vitro kinase assay and indicate molecular sizes.

      We provided an additional kinase assay (Sup. Fig. 4B) that includes no ATP control lanes and a fragment of a Coomassie blue stained gel showing molecular weight markers. No ATP control assays (lanes 4 and 5) were blank as expected. Molecular weight markers were added to all other kinase assays based on the known sizes of isolated Pol II holoenzyme subunits Rbp1 (191 kDa) and Rbp2 (138 kDa).

      3) For in vitro phosphorylation, please provide an explanation for using CDK9/cyclin K instead of Cyclin T1 which is the predominant cyclin for CDK9

      Recombinant CDK9/cyclin K complex was used for in vitro kinase assays for a technical reason: CDK9/cyclin T obtained from the same vendor appeared to be low quality, as it showed only minimal activity toward our positive control, the isolated Pol II complex. The kinase assays using recombinant CDK9/cyclin T in parallel with CDK9/cyclin K are now presented it Sup. Fig. 4B. The first two assays in this experiment contained Pol II as a substrate, and it is evident that Pol II was phosphorylated much stronger by CDK9/cyclin K than CDK9/cyclin T (comparing lane 1 vs lane 2). Therefore, the lack of detectable Treacle phosphorylation by CDK9/Cyclin T (lane 7), in contrast to strong phosphorylation by CDK9/cyclin K (lane 6), was likely attributable to poor reagent quality rather than physiological differences. We can conclude that CDK9/cyclin K reliably phosphorylates Treacle in vitro, but CDK9/cyclin T kinase assays were inconclusive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Muthana et al. describes the effect of injection of an antibody specific for human CTLA4 conjugated to a cytotoxic molecule (Ipi-DM1) in knock-in mice expressing human CTLA4. The authors show that Ipi-DM1 administration causes a partial decrease (about 50% in absolute number) of mature B cells in blood and bone marrow 9-14 days after the beginning of treatment. Ipi-DM1 also results in a partial decrease in Foxp3+ Tregs (about 40% in absolute number) and a slight increase in activation of conventional T cells (Tconvs) in the blood at D9. Tconv depletion, CTLA4-Ig or anti-TNF mAb partially prevents the effect of ipi-DM1 on B cells. This work is interesting but has the following major limitations:

      1) This work could have been of more interest if the Ipi-DM1 molecule would be used in the clinic. As this is not the case, the intimate mechanism of the effect of this molecule in mice is of reduced interest.

      The goal of the current study is to use Ipi-DM1 ADC as probe to study mechanism of B cell loss observed in Treg-deficient host.

      2) The fact that a partial deletion of Tregs is associated with activation of Tconvs and a decrease in B cells has been published several times and is therefore not new. According to the authors, their work would be the first to show that activation of Tconvs would lead to B cell depletion. However, this is shown in an indirect way and the mechanisms are not really elucidated. Indeed, this work shows a correlation between an increase in Tconv activation and a decrease in the number of B cells in the blood. The experiments to try to show a causal link are of 2 types: deletion of T cells (Fig 4) and blocking T cell activation with CTLA4-Ig (Fig 5) (neutralization of TNF addresses another question). Neither of these 2 experiments is totally convincing. Indeed, the absence of B cell depletion when T cells are deleted can be explained by other mechanisms than the preservation of B cell destruction by activated T cells. The phenomenon could be explained by B cell recirculation to lymphoid tissues or an effect of massive T cell death for example. The experiment shown in Fig. 5 with Belatacept is more convincing because this time the effect is targeted to activated T cells only. However, the prevention of B cell ablation is only partial. Again, since only blood is analyzed, other mechanisms could explain the B cell loss, such as their recirculation in lymphoid tissues.

      While the concept of treg depletion leads to activation of Tconv cells and reduced B cells has been previously published, B cell loss was explained on basis of defective B cell lymphopoiesis due to low production of stroma cell-derived IL-7 or destruction of stromal cells by effector T cells. Our new data established that loss of B cells in the context of Treg depletion was not due to defects in the number of pre-/pro-B cells. Rather it is the death of mature B cells in the bone marrow.

      To address the reviewer’s concern that the B cell loss was merely caused by a change in circulating pattern, we performed a new study on the effect of the ADC on B cells in bone marrow. Our new data reveal loss of mature bone marrow B cells, and that such loss is associated with increased apoptosis of mature B cells. Therefore, the loss of B cells in the peripheral blood is not due to a changed circulation. Furthermore, our data show that B cell progenitor, Pre-B, cells are not changed. Therefore, B cell lymphopoiesis is not the reason for B cell loss in our model system.

      3) It is disappointing that only the blood (and sometimes the bone marrow) was studied in this work. The interest of doing experiments in mice is to have access to many tissues such as the spleen, lymph nodes, colon, lung, and liver. To conclude that there is B cell deletion without showing lymphoid organs (where the majority of B cells reside) is insufficient. As discussed above, the drop in B cells in the blood could be due to their recirculation in lymphoid organs. In addition, there is no measurement of functional B cells activity. Do mice treated with Ipi-DM1 have a decreased ability to develop an antibody response following immunization?

      We have analyzed lymph nodes and spleen at the same time points. Unfortunately, Treg depletion was no longer observed at these time points. As expected, we did not see a clear depletion of B cells (Figure 1-figure supplement 6). In regards to functional B cell activity, we observed an increase of plasma immunoglobulins especially IgE which are now shown in Figure 3-figure supplement 1.

      4) Although it is difficult to study in vivo, there is not a single evidence of increased B cell death after injection of Ipi-DM1.

      Figure 2 & Figure 2-supplement 1 provides B cell death comparisons between IpiDM1 and hIgGFc group for bone marrow, blood, spleen, and lymph nodes. Statistically significant increase in B cell death is observed in mature B cells in bone marrow.

      5) In most of the experiments, B cells are quantified with the B220 marker alone, but this marker, in some cases, can be expressed by other cells. It would have been preferable to use a marker more specific to B cells such as CD19 for example.

      We have added data to support the death of mature B cells using other markers.

      Minor points.

      1) It should be indicated whether human CTLA4 binds normally to mouse CD80 CD86. We do not know if knock-in mice with human CTLA4 have a fully functional immune system.

      We have indicated this point as suggested and cited our previous work line 226-227 (ref 23 & 24)

      2) The manuscript is too long. Some of the data in the figures should be moved to supplemental figures. This is the case, for example, for some trivial stainings (Fig 1F, Fig 4B, 4F, Fig 5A, D, F, G). The figure legends and the Materials and Methods section are far too long. On the other hand, Fig 5-Fig Sup 1 could go into the main figures.

      The figure legends, materials, and methods may be too long, but our intention is to provide as much info as possible for others who may be interested in our model system.

      3) The anti-CTLA4 ADC reagent should be better explained and defined in the text.

      The anti-CTLA-4 ADC reagent synthesis described in materials/methods under “Antibody-drug conjugate preparation.”

      Reviewer #2 (Public Review):

      Despite the fact that CTLA-4 is a critical molecule for inhibiting the immune response, surprisingly individuals with heterozygous CTLA-4 mutations exhibit immunodeficiency, presenting with antibody deficiency secondary to B cell loss. Why the loss of a molecule that regulates T cell activation should lead to B cell loss has remained unclear. In this study, Muthana and colleagues use an anti-CTLA-4 antibody drug conjugate (aCTLA-4 ADC) to delete cells expressing high levels of CTLA-4, and show that this leads to a reduction in B cells. The aCTLA-4 ADC is found to delete a subset of Tregs, leading to hyperactivation of T cells that is associated with B cell depletion. Using blocking antibodies, the authors implicate TNFa in the observed B cell loss.

      The reciprocal regulation of T and B cell homeostasis is an important research area. While it has been shown that Treg defects are associated with B cell loss, the mechanisms at play are incompletely understood. CTLA-4 is not normally expressed in B cells so an indirect mechanism of action is assumed. The authors show that the decrease in Treg following aCTLA-4 ADC treatment is associated with activation of T cells, and that B cell loss is blunted if T cells are depleted. A role for both CD4 and CD8 T cells is identified by selective CD4/CD8 depletion. T cells appear to require CD28 costimulation in order to mediate B cell loss, since the response is partially inhibited in the presence of the costimulation blockade drug belatacept (CTLA-4-Ig). Finally, experiments using the anti-TNFa antibody adalimumab suggest a potential role for TNFa in the depletion of B cells.

      While the manuscript makes a useful contribution, a number of questions remain. Perhaps most important is the extent to which this model mimics the natural situation in individuals with CTLA-4 mutations (or following CTLA-4-based clinical interventions). aCTLA-4 ADC treatment permits acute deletion of Treg expressing high levels of CTLA-4, whereas in patients the Treg population remains but is specifically impaired in CTLA-4 function. Secondly, although the requirement for T cells to mediate B cell loss is convincingly demonstrated, the incomplete reversal by TNFa blockade suggests additional unidentified factors contribute to this effect. Finally, although the manuscript favours peripheral killing of mature B cells over alterations to B cell lymphopoiesis, one concern is that this may simply reflect the model employed: the shortterm (6 day) treatment used here may be too acute to alter B cell development, but this may nevertheless be a feature of prolonged immune dysregulation in humans.

      We appreciate reviewer’s comments and the difference between short-term depletion and permanent inactivation of Treg by genetic mutation is discussed. We would note that apart from mutation, dynamic Treg perturbation does occur under autoimmune conditions. Therefore, our data have significant implications for T-B cell interactions.

      TNF-alpha is implicated in B cell loss as evidenced by the partial rescue with Anti-TNF treatment. We did not try to exclude the possibility that other mechanisms are involved.

      Our data shows loss of circulating B cell in peripheral blood and mature bone marrow B cells. B cell progenitor, Pre-B, cells are not changed due Ipi-DM1 induced treg impairment, therefore B cell lymphopoiesis is not the reason for B cell loss in our model system. Evidence of increased cell death is only observed in mature B cells (Figure 2).

      1) Following aCTLA-4 ADC treatment, it is surprising how subtle the deletion of Treg is (from ~8% to ~7%, Fig 1G), compared to the marked deletion of CTLA-4-expressing CHO cells. Is this a feature of in vivo versus in vitro treatment? If Treg are treated in vitro is deletion more efficient? How does the expression level of CTLA-4 in the CHO cells compare with the Treg in these assays?

      We appreciate reviewer’s comments. The anti-CTLA-4 ADC targets CTLA-4 on cell surface. On average about 5% of Tregs express surface CTLA-4 at given moment while human CTLA-4 expressing CHO cell line stains > 90%. Nevertheless, Treg cell number in peripheral blood is reduced by >40%. Additionally, we have included bone marrow data, which shows a greater percentage of Treg depletion (Figure 1J).

      2) The decrease in CTLA-4 seen after ipi-DM1 is complicated by the fact that the control DM1 conjugate (IgG1-DM1) appears to significantly increase CTLA-4 expression (Fig 1 supplement 2). It would be useful to clarify when hIgGFc is used versus hIgGFc-DM1 given the additional complexity introduced here (comparisons lacking a payload differ in more than one variable, while the hIgGFc-DM1 is clearly not inert).

      We appreciate reviewer’s comments. We agree that the hIgGFc-DM1 control slightly increased CTLA-4 level; nevertheless, it did not alter B cells, T cells or their proliferation capacity when compared to hIgGFc. Our point here is that B cell depletion is not mediated by DM1 payload off target release (new-version Figure 1-Figure supplement 4, old version Figure 1-figure supplement 2). As for the clarification comment when hIgGFc is used versus hIgGFcDM1 is used, the information is clarified in the figure legend. Comparisons are made between (hIgGFc VS Ipi-DM1) or (hIgGFc VS hIgGFc-DM1).

      3) T cell-derived IFNg is another potential contender for influencing B cell homeostasis - have you considered testing whether this also contributes in your model?

      We appreciate reviewer’s suggestion. IFN was reported to induce apoptosis and cell arrest in Pre- B cells, however these studies are invitro studies Garvey et.al Immunology. 1994 Mar; 81(3): 381–388; Grawunder et.al Eur. J. Immunol. 23, 544–551. Since we did not observe any effect on Pre-B cells, we have not followed the literature to investigate the role of IFNy in B cell loss in our model.

      Reviewer #3 (Public Review):

      The co-suppressive molecule CTLA-4 has a critical role in the maintenance of peripheral tolerance, primarily by Treg mediated control of the co-stimulatory molecules CD80 and CD86. As stated by the authors, previous studies have found a variety of effects of anti-CTLA-4 antibody treatment or genetic loss of CTLA-4 on B-cells. These include increased B-cell activation and antibody production, autoantibody production, impairment of B-cell production in the bone marrow and loss of peripheral B-cells. In this article Muthana et al use a CTLA-4 humanized mouse model and examine the effects of drug conjugated CTLA-4 on the immune system. They observe a transient loss of B-cells in the blood of the treated mice. They then use a range of immune interventions such as T-cell depletion and blocking antibodies to demonstrate that this effect is dependent on T-cell activation.

      Since anti-CTLA-4 immunotherapy is in active clinical use exploration of its effects are welcome, this is helped by the use of a humanized CTLA-4 system which should be considered a strength of the paper. However, currently, the central premise of this paper, that B-cells are depleted, seems underexplored. Direct evidence of T-cell killing of B-cells is never presented, rather it is inferred from the reduced numbers of B-cells in the blood. The status of B-cells in sites that contain a large proportion of B-cells such as the spleen and lymph nodes is not examined. Additionally, no examination of B-cell antibody production is performed.

      We appreciate reviewer’s comments. To address the reviewer’s concerns we performed additional experiments to evaluate the impact on B cells in other organs, as detailed in our responses to specific questions.

      1) Examination of B-cell apoptosis/cell death and T-cell mediated cytotoxicity is needed. The authors repeatedly refer to auto destructive T-cells without ever demonstrating their presence or any direct evidence that B-cells are dying. This is particularly important in the context of the blood since an alternative hypothesis would be a change in B cell trafficking and infiltration into tissues.

      We appreciate reviewer’s comments. To address the reviewer’s concern that B cell loss in blood might be caused by a change in B cell trafficking pattern. We performed new study on the effect of the ADC on B cells in bone marrow. Our new data reveal loss of mature bone marrow B cells, and that such loss is associated with increased apoptosis of mature B cells (Figure 2). Therefore, the loss of B cells in the peripheral blood is not due to B cell trafficking and infiltration into tissues.

      2) The authors demonstrate that B-cells are mostly reduced in blood at around days 10 to 15, I believe it is critical to determine if this is also reflected in the lymphoid organs such as the spleen and lymph nodes.

      We appreciate reviewer’s comments. We have analyzed lymph node and spleen at the same time points. Unfortunately, Treg depletion was no longer observed at these time points. As expected, we did not see a clear depletion of B cells (Figure 1-figure supplement 6).

      3) Related to the above point do the authors see evidence of Splenomegaly or lymphadenopathy?

      We appreciate reviewer’s comment. Evidence of splenomegaly and lymphadenopathy is presented in Figure 3-figure supplement 2.

      4) Minimal examination of the status of the B-cells or antibody production is performed. Previous reports would suggest that plasma cell induction and antibody responses may be expected. Do serum antibody levels change in this system?

      We appreciate reviewer’s comment. Increases of plasma immunoglobulins especially IgE are now shown in Figure 3-figure supplement 1.

      5) Its unclear how the authors interpret their experiment with anti-TNFa (figure 6). Are they suggesting that TNFa itself depletes B-cells or that it is part of the inflammatory milieu that contributes to wider T-cell activation and, in turn, B-cell depletion?

      We have discussed these possibilities in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing and assessing our paper. Reviewer2 had only posive comments. Reviewer 1 also had posive comments but included a list of suggesons. The revised version includes text edits to address the suggesons.

      Reviewer 1:

      … First, it is unclear whether the experiments and analyses were set up to be able to rule out more specific candidate funcons of the ZI.

      The list of possible funcons performed by the ZI is broad. Nevertheless, our study considers a rather long list of neural processes related to the behaviors listed below.

      Second, many important details of the experiments and their results are hard to decipher given the current descripons and presentaons of the data.

      The procedures used in the present study have all been used and described in our previous studies (cited). We used the same descripons and presentaons as in the prior studies. We have gone over the Methods and figures to ensure that all details required to understand the experiments are provided, but we also added further details following the suggesons noted below.

      The paper could be significantly strengthened by including more details from each experiment, stronger jusficaons for the limited behaviors and experimental analyses performed, and, finally, a broader analysis of how the recorded acvity in the ZI relates to behavioral parameters.

      The paper studied several behaviors including: 1) spontaneous movement of head-fixed mice on a spherical treadmill, 2) tacle (whisker, and body parts) and auditory (tones and white noise) smuli applied to head fixed mice, 3) spontaneous movement iniaon, change, and turns in freely moving mice, 4) auditory tone (frequency and SPL) mapping in freely behaving mice, 5) auditory-evoked orienng head movements (responses) in the context of several behavioral tasks, 6) signaled acve avoidance responses and escapes (AA1), 7) unsignaled/signaled passive avoidance responses (AA2ITI/AA3-CS2), 8) sensory discriminaon (AA3), 9) CS-US interval ming discriminaon (AA4), and 10) USevoked unsignaled escape responses.

      In freely moving experiments, the behavior is connuously tracked and decomposed into translaonal and rotaonal movement components. Discrete responses are also evaluated (e.g., acve avoids, escapes, passive avoids, errors, intertrial crossings, latencies, etc.). These behavioral procedures evaluate many neural processes, including decision making (Go/NoGo in AA1-3), response control/inhibion (unsignaled and signaled passive avoidance in AA2/3), and smulus discriminaon (AA3). The applied smuli, discrete responses, and tracked movement are always related to the recorded ZI acvity using a variety of techniques (e.g., cross-correlaons, PSTHs, event-triggered me extracons, etc.), which relate the discrete and me-series parameters to the neural acvity. We do not think all this qualifies as, “limited behaviors”.

      (1) Anatomical specificaon: The ZI contains many disnct subdivisions--each with its own topographically organized inputs/outputs and putave funcons. The current manuscript doesn't reference these known divisions or their behavioral disncons, and one cannot tell exactly which poron(s) of the ZI was included in the current study. Moreover, the elongated structure of the ZI makes it very difficult to specifically or completely infect virally. The data could be beter interpreted if the paper included basic informaon on the locaons of recordings, the extent of the AAV spread in the ZI in each viral experiment, and what fracon of infected neurons were inside versus outside ZI.

      Our experiments employed Vgat-Cre mice to target ZI neurons. In this line, GABAergic neurons from the enre ZI express Cre, including the dorsal and ventral subdivisions (see (Vong et al., 2011; Hormigo et al., 2020)). Consequently, AAV injecons in Vgat-Cre mice produce restricted expression in the ZI that can fully delineate the nucleus as shown in the papers referenced above (including ours). There is nil expression in structures above or below ZI because they do not express Cre in these mice (e.g., thalamus and subthalamic nucleus), which allows for selecve targeng of ZI. Our optogenec manipulaons and photometry recordings were not aimed at specific ZI subdivisions. We targeted the area of ZI indicated by the stereotaxic coordinates (see Methods), which are aimed at the center of the structure to maximize success in recording/manipulang neurons within ZI. While all the animals included in the study expressed opsins and GCaMP within ZI that in many animals fully delineated the nucleus, there was normal variability in the locaon of opcal fibers, but we did not detect any differences in the results related to these variaons.

      Fiber photometry and optogenecs experiments are performed with rather large diameter opcal probes, which record/manipulate relavely large areas of the targeted structure. This is useful because our goal was to idenfy funconal roles of the enre ZI, which could then be parsed. In the present study, we did not perform experiments to target specific ZI populaons (e.g., retrograde Cre expression from target areas), which may have revealed differences atributed to their projecon sites. However, in the last experiment, we selecvely excited ZI fibers targeng three different areas (midbrain tegmentum, superior colliculus, and posterior thalamus), which revealed clear differences on movement. Thus, future experiments should explore these different populaons (e.g., using retrograde/anterograde expression systems), which may be in different subdivisions.

      We have enhanced the Methods secon to clarify these points, including the addion of these references.

      (2) Electrophysiological recording on the treadmill: The authors are commended for this technically very difficult experiment. The authors do not specify, however, how they knew when they were recording in ZI rather than surrounding structures, parcularly given that recording site lesions were only performed during the last recording session. A map of the locaons of the different classes of units would be valuable data to relate to the literature.

      We have added details about this procedure in the Methods secon. These recordings are performed based on coordinates, and categorizing neurons as belonging to ZI is obviously an esmate based on the final histological verificaon. Nevertheless, the marking lesions revealed that the electrodes were on target, which likely resulted from the care taken during the surgical procedure to define reference points used later during the recording sessions (see Methods). Regarding a map of the unit locaons, we performed several analyses that did not reveal clear differences based on site. For example, we compared depth vs cell class, “There was no difference in recording depth between the four classes of neurons (ANOVA F(3,337)= 1.06 p=0.3676)”. Future experiments that employ addional methods (labelling, opto-tagging, etc.) would be more appropriate to address mapping quesons. Finally, as we state in the paper, “However, these recordings do not target GABAergic neurons and may sample some neurons in the tissue surrounding the zona incerta. Therefore, we used calcium imaging fiber photometry to target GABAergic neurons in the zona incerta”.

      (3) The raonale of the analysis of acvity with respect to “movement peak”: It is unclear why the authors did not assess how ZI acvity correlates with a broad set of movement parameters, but rather grouped heterogeneous behavioral epochs to analyze firing with respect to “movement peaks”.

      The reviewer is referring to movement peaks on the spherical treadmill. On the treadmill, we used the forward locomotor movement of the animal because this is the main acvity of the mice on the treadmill. We considered “all peaks” (or movements) and “>4 sec peaks”, which select for movement onsets. Compared to the treadmill, in freely movement condions during various behavioral tasks, there is a richer behavioral repertoire, which was analyzed in more detail (i.e., translaonal, and rotaonal components during spontaneous ongoing movement and movement onsets, movement related to various behaviors such as orienng, acve and passive avoidance, escape, sensory smulaon, discriminaon, etc.). Thus, we focused on a broader set of movement parameters in the Cre-defined ZI cells of freely behaving mice.

      (4) The display of mean categorical data in various figures is interesng, however, the reader cannot gather a very detailed view of ZI firing responses or potenal heterogeneity with so litle informaon about their distribuons.

      The PCA performs the heterogeneity classificaon in an unbiased manner, which we feel is a thoughul approach. The firing rates and correlaons with movement for each category of neurons are detailed in the results. Furthermore, the sensory responses for these neurons are also detailed. Together, we think this provides a detailed view of the units we recorded in awake/head-fixed mice. As already stated, further study would benefit from an addional level of cell site verificaon.

      (5) Somatosensory firing responses in ZI: It is unclear why the authors chose the specific smuli used in the study. How oen did they evoke reflexive motor responses? What was the latency of sensory-evoked responses in ZI acvity and the latency of the reflexive movement?

      These are broad quesons, and we assume that the reviewer is asking about somatosensory evoked responses on the spherical treadmill. We used air-puffs applied to the whiskers and on the back (le vs right) because the whiskers represent an important sensory representaon for mice, and the back is a part of the body (trunk), which we oen use to movate the animals to move forward on the treadmill. Regarding the latency of the somatosensory evoked responses, in this case, we did not correct them based on the me it takes the air-puff to travel to the whiskers or body part, and therefore we did not provide latencies. Moreover, air-puffs are not a very good method to quanfy whisker-evoked latencies, which are beter measured using other methods (whisker deflecons of single/mulple whiskers using piezo-devices or other mechanical devices, as we and others have done in many studies). We are not sure what the reviewer means by “reflexive behavior”; we did not measure any reflexive behavior under these condions. We have gone over the Methods and Results to ensure that sufficient details are provided about these experiments.

      (6) It would be valuable to see example traces in Figure 3 to get a beter sense of the me course and contexts under which Ca signals in ZI tracks movement. What is the typical latency? What is the typical range of magnitudes of responses? Does the Ca signal track both fast and slow movements? How are the authors sure that there are no movement arfacts contribung to the calcium imaging? It seems there is more informaon in the dataset that could be valuable.

      As is well known, fiber photometry calcium imaging is a slow populaon signal. We do not think it would be valuable to get into ming issues beyond what is already detailed in the study (i.e., magnitudes measured as areas or peaks, and ming as me-to-peaks). Regarding “movement arfacts”, these signals are absent (flat) in animals that do not express GCAMP. We agree that there must be addional valuable informaon in our datasets (as in most me-series). However, the current paper is already rather extensive. We will connue to peruse our datasets and report addional findings in new papers.

      (7) Figure 4: The raonale for quanfying the F/Fo responses over a 6-second window, rather than with respect to discrete movement parameters, is not well explained. What types of movement are binned in this approach and might this broad binning hinder the ability to detect more specific relaonships between acvity and movement?

      Figure 4 is focused on characterizing the relaonship between turns (ipsiversive and contraversive) during movement and ZI acvity. We tested different binning windows to find differences, including the 6 sec window in figure 4 for populaon measures (-3 to 3 sec around the turns). This binning approach is effecve at revealing differences where they exist (e.g., superior colliculus) as shown in our previous studies (e.g. (Zhou et al., 2023)). Moreover, the turns in the different direcons can be considered discrete responses at their peak, and the ming of the related acvaons (e.g., me to peaks), which we evaluated, are rather sensive and would have revealed differences, but we did not find them.

      (8) Separaon of sensory and motor responses in Figure 5: The current data do not adequately differenate whether the responses are sensory or motor given the high correlaon of the sensory inputs driving motor responses. Because isoflurane can diminish auditory responses early in the auditory pathway, this reviewer is not convinced the isoflurane experiments are interpretable.

      The reviewer is referring to Fig. 5C,D. Indeed, the point of this experiment was to show that it is difficult to differenate whether neural responses are sensory or motor in awake and freely moving condions. As we stated in the Results secon, “Although arousal and movement were not dissected in the present experiment (this would likely require paralyzing and ventilating the animal), the results indicate that activation of zona incerta neurons by sensory stimulation is primarily associated with states when sensory-evoked movement is also present”. This is followed in the Discussion by, “…as already noted, the suppression of sensory responses may be due to changes in arousal (Castro-Alamancos, 2004; Lee and Dan, 2012) and not caused by the abolishment of the movements per se”.

      (9) Given the broad duraon of the mean avoidance response (Fig. 6 C, botom), it would be useful to know to what extent this plot reflects a prolonged behavior or is the result of averaging different animals/trials with different latencies. Given that the shapes of the F/Fo responses in ZI appear similar across avoids and escapes (Fig. 6D), despite their apparent different speeds and movement duraons (Fig 6C), it would be valuable to know how the ming of the F/Fo relates to movement on a trial-by-trial basis.

      The duraon of the avoidance response cannot be ascertained from CS onset (panel 6C botom) and avoids are not wide but rather sharp. We have now made this clearer when Fig. 6C is first menoned (“note that since avoids occur at different latencies after CS onset they are best measured from their occurrence as in Fig. 6D”). Like other related condioned and uncondioned responses, avoids and escapes are similar, varying in the noted parameters. Regarding ming, as already menoned above, we think that the characteriscs of the populaon calcium signal make it unsuitable for further ming consideraons than what we included, parcularly for movements occurring at the fast speeds of avoids and escapes.

      (10) Lesion quanficaon: One cannot tell what rostral-caudal extent of ZI was lesioned and quanfied in this experiment. It would be easier to interpret if also ploted for each animal, so the reader can tell how reliable the method is. The mean ablaon would be beter shown as a normalized fracon of cells. Although the authors claim the lesions have litle impact on behavior, it appears the incompleteness of the lesions could warrant a more conservave interpretaon.

      The lesion experiment was a complement to the optogenecs inacvaon experiments we performed in our preceding ZI paper and in the present paper. Thus, the finding that the lesions had litle impact on behavior is supporve of the optogenecs findings. Regarding cell counts, we did not select any parts of the ZI to quanfy the number of neurons in either control or lesion mice. We considered the full rostrocaudal extent in our measurements. We are not sure what “fracon” the reviewer is suggesng, considering that these counts are from two different groups of mice (control vs lesion). Note that the red-marked neurons, as shown in Fig. 8A, reveal healthy non-Vgat-Cre neurons outside ZI that mark the extent of the AAV diffusion, which as shown spanned the full extent of the ZI in the coronal plane (and in other planes as the AAV spreads in all direcons).

      (11) Optogenecs: the locaon of infected neurons is poorly described, including the rostral-caudal extent and the fracon of neurons inside and outside of ZI. Moreover, it is unclear how strongly the optogenec manipulaons in this study are expected to affect neuronal acvity in ZI.

      We discussed the first point in (1) above. Regarding, how optogenec manipulaons are expected to affect neuronal acvity in ZI and its targets, we have conducted extensive electrophysiological recordings in slices and in vivo to detail the effects of our manipulaons on GABAergic neurons (e.g. (Hormigo et al., 2016; Hormigo et al., 2019; Hormigo et al., 2021a; Hormigo et al., 2021b), including ZI neurons (Hormigo et al., 2020). In fact, we never use an opsin we have not validated ourselves using electrophysiology. Moreover, our experiments employ a spectrum of optogenec light paterns (including trains/cont at different powers) that trate the optogenec effects within each session/animal. As shown in fig. 11 and 12, these paterns produce different behavioral effects related to the different levels of neural firing they induce. For ChR2-expressing neurons in ZI, firing is frequency dependent and maximal during Cont blue light (at the same power). For Arch-expressing neurons only Cont is used, and inhibion is a funcon of the green light power. When blue light is applied in ZI fibers targeng different areas, this relaonship changes. Blue light trains (1-ms pulses) at 40-66 Hz become the most effecve means of inducing sustained postsynapc inhibion compared to Cont or low frequencies.

      References

      Castro-Alamancos MA (2004) Dynamics of sensory thalamocorcal synapc networks during informaon processing states. Progress in Neurobiology 74:213-247.

      Hormigo S, Vega-Flores G, Castro-Alamancos MA (2016) Basal Ganglia Output Controls Acve Avoidance Behavior. J Neurosci 36:10274-10284.

      Hormigo S, Zhou J, Castro-Alamancos MA (2020) Zona Incerta GABAergic Output Controls a Signaled Locomotor Acon in the Midbrain Tegmentum. eNeuro 7.

      Hormigo S, Zhou J, Castro-Alamancos MA (2021a) Bidireconal control of orienng behavior by the substana nigra pars reculata: disnct significance of head and whisker movements. eNeuro. Hormigo S, Vega-Flores G, Rovira V, Castro-Alamancos MA (2019) Circuits That Mediate Expression of Signaled Acve Avoidance Converge in the Pedunculoponne Tegmentum. J Neurosci 39:45764594.

      Hormigo S, Zhou J, Chabbert D, Shanmugasundaram B, Castro-Alamancos MA (2021b) Basal Ganglia Output Has a Permissive Non-Driving Role in a Signaled Locomotor Acon Mediated by the Midbrain. J Neurosci 41:1529-1552.

      Lee SH, Dan Y (2012) Neuromodulaon of brain states. Neuron 76:209-222.

      Vong L, Ye C, Yang Z, Choi B, Chua S, Jr., Lowell BB (2011) Lepn acon on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron 71:142-154.

      Zhou J, Hormigo S, Busel N, Castro-Alamancos MA (2023) The Orienng Reflex Reveals Behavioral States Set by Demanding Contexts: Role of the Superior Colliculus. J Neurosci 43:1778-1796.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for their very useful and constructive comments. We went through the list and gladly received all their suggestions. The reviewers mostly pointed to minor revisions in the text, and we acted on all of those. The one suggestion that required major work was the one raised in point 13, about the processing pipeline being unconvincingly scattered between different tools (R → Python → Matlab). I agree that this was a major annoyance, and I am happy to say we have solved it integrating everything in a recent version of the ethoscopy software (available on biorxiv with DOI https://www.biorxiv.org/content/10.1101/2022.11.28.517675v2 and in press with Bioinformatics Advances). End users will now be able to perform coccinella analysis using ethoscopy only, thus relying on nothing else but Python as their data analysis tool. This revised version of the manuscript now includes two Jupyter Notebooks as supplementary material with a “pre-cooked” sample recipe of how to do that. This should really simplify adoption and provides more details on the pipeline used for phenotyping.

      Please find below a point-by-point description of how we incorporated all the reviewers’ excellent suggestions.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      1) Line 38: "collecting data simultaneously from a large number of individuals with no or limited human intervention" is a bit misleading, as the entire condition the individuals are put in are highly modified by humans and most times "unnatural". I understand the point that once the animals are placed in these environments, then recording takes place without intervention, but it would be nice to rephrase this so that it reflects more accurately what is happening.

      We have now rephrased this into the following (L39):

      Collecting data simultaneously from a large number of individuals, which can remain undisturbed throughout recording.

      2) Line 63: please add a reference to the Ethoscopes so that readers can easily find it.

      Done.

      2b) And also add how much they cost and the time needed to build them, as this will allow readers to better compare the proposed system against other commercially available ones.

      This information is available on the ethoscope manual website (http://lab.gilest.ro/ethoscope). The price of one ethoscope, provided all necessary tools are available, is around ~£75 and the building time very much depends on the skillset of the builder and whether they are building their first ethoscope or subsequent ones. In our experience, building and adopting ethoscopes for the first time is not any more time-expensive than building a (e.g.) deeplabcut setup for the first time. We have added this information to L81

      Ethoscopes are open source and can be manufactured by a skilled end-user at a cost of about £75 per machine, mostly building on two off-the-shelf component: a Raspberry Pi microcomputer and a Raspberry Pi NoIR camera overlooking a bespoke 3D printed arena hosting freely moving flies.

      3) Line 88: The authors describe that in the current setting, their system is capable of an acquisition rate of 2.2 frames per second (FPS). Would reducing the resolution of the PiCamera allow for higher FPS? I raise this point because the authors state that max velocity over a ten second window is a good feature for classifying behaviors. However, if animals move much faster than the current acquisition rate, they could, for instance, be in position X, move about and be close to the initial position when the next data point is acquired, leading to a measured low max velocity, when in fact the opposite happened. I think it would be good to add a statement addressing this (either data from the literature showing that the low FPS does not compromise data acquisition, or a test where increasing greatly FPS leads to the same results).

      We have previously performed a comparison of data analysed using videos captured at different FPSs, which is published in Quentin Geissman’s doctoral Thesis (2018, DOI: https://doi.org/10.25560/69514 ) in chapter 2, section 2.8.3, figure 2.9 ). We have now added this work as one of the references at L95 (reference 19).

      4) Still on the low FPS, would a Raspberry Pi 4 help with the sampling rate? Given that they are more powerful than the RPi3 used in the paper?

      It would, but it would be a minor increase, leading from 2.2 to probably 3-5 FPS. A significantly higher number of FPSs would be best achieved by lowering the camera’s resolution, as the reviewer’s suggested, or by operating offline. I think the interesting point being implied by the reviewers is that, for Drosophila, the current limits of resolution are more than sufficient. For other animals, perhaps moving more abruptly, they may not. The reviewer is right that we should add a line of caveat about this. We now do so in the discussion, lines 215-224.

      Coccinella is a reductionist tool, not meant to replace the behavioural categorization that other tools can offer but to complement it. It relies on raspberry PIs as main acquisition devices, with associated advantages and limitations. Ethoscopes are inexpensive and versatile but have limitations in terms of computing power and acquisition rates. Their online acquisition speed is fast enough to successfully capture the motor activity of different species of Drosophilae28, but may not be sufficient for other animals moving more swiftly, such as zebrafish larvae. Moreover, coccinella cannot apply labels to behaviour (“courting”, “lounging”, “sipping”, “jumping” etc.) but it can successfully identify large behavioural phenotypes and generate unbiased hypothesis on how behaviour – and a nervous system at large – can be influenced by chemicals, genetics, artificial manipulations in general.

      5) Along the same line of thought, would using a simple webcam (with similar specs to the PiCamera - ELP has cameras that operate on infrared and are quite affordable too) connected to a more powerful computer lead to higher FPS? - The reason for the question about using a simple webcam is that this would make your system more flexible (especially useful in the current shortage of RPi boards on the market) lowering the barrier for others to use it, increasing the chances for adoption.

      Completely bypassing ethoscopes would require the users to setup their own tracking solution, with a final result that may or may not match what we describe here. If a greater temporal resolution is necessary, the easiest way to achieve more FPSs would be to either decrease camera resolution or use the Pis to take videos offline and then process those videos at a later stage. The combination of these two would give FPS acquisition of 60 fps at 720p, which is the maximum the camera can achieve. We now made this clear at lines 83-92.

      The temporal and spatial resolution of the collected images depends on the working modality the user chooses. When operating in offline mode, ethoscopes are capable to acquire 720p videos at 60 fps, which is a convenient option with fast moving animals. In this study, we instead opted for the default ethoscope working settings, providing online tracking and realtime parametric extraction, meaning that images are analysed by each raspberry Pi at the very moment they were acquired (Figure 1b). This latter modality limits the temporal resolution of information being processed (one frame every 444 ms ± 127 ms, equivalent to 2.2 fps on a Raspberry Pi3 at a resolution of 1280x960 pixels with each animal being constricted in an ellipse measuring 25.8 ± 1.4 x 9.85 ±1.4 pixels - Figure 1a) but provides the most affordable and high-throughput solution, dispensing the researcher from organising video storage or asynchronous video processing for animals tracking.

      6) One last point about decreasing use barrier and increasing adoption: Would it be possible to use DeepLabCut (DLC) to simply annotate each animal (instead of each body part) and feed the extracted data into your current analysis with coccinella? This way different labs that already have pipelines in place that use DLC would have a much easier time in testing and eventually switching to coccinella? I understand that extracting simple maximal velocity this way would be an overkill, but the trade-off would again be a lowering of the adoption barrier.

      It would certainly be possible to calculate velocity from the whole animal pose measurement and then use this with HCTSA or Catch22, thus mimicking the coccinella pipeline, but it would be definitely overkilled, as the reviewers correctly points out. Given that we are trying to make an argument about high-throughput data acquisition I would rather not suggest this option in the manuscript.

      7) Line 96: The authors state that once data is collected, it is put through a computational frameworkthat uses 7700 tests described in the literature so that meaningful discriminative features are found. I think it would be interesting to expand a bit on the explanation of how this framework deals multiple comparison/multiple testing issues.

      We always use the full set of features on aggregate to train a classifier (e.g., TS_Classify in HCTSA) and that means no correction is necessary because the trained classifier only ever makes a single prediction (only one test is performed), so as long as it is done correctly (e.g., proper separation of training and test sets, etc.) then multiple hypothesis correction is not appropriate. This has been confirmed with the HCTSA/Catch22 author (Dr Ben Fulcher, personal communication). We have added a clarifying sentence about this to the methods (L315-318)

      8) It would be nice to have a couple of lines explaining the choice of compounds used for testing and also why in some tests, 17 compounds were used, while in others 40, and then 12? I understand how much work it must be in terms of experiment preparation and data collection for these many flies and compounds, but these changes in the compounds used for testing without a more detailed explanation is suboptimal.

      This is another good point. We have now added this information to the methods, in a section renamed “choice, handling and preparation of drugs” L280-285, which now reads like this:

      The initial preliminary analysis was conducted using a group of 12 compounds “proof of principle” compounds and a solvent control. These compounds were initially used to compare both the video method and ethoscope method. After testing these initial compounds, it was found that the ethoscope methodology was more successful, and then the compound list was expanded to 17 (including the control) only using the ethoscope method. As a final test, we included additional compounds for a single concentration, bringing up the total to 40 (including control), also for the ethoscope method.

      9) Line 119 states: "A similar drop in accuracy was observed using a smaller panel of 12 treatments (Supplementary Figure 2a)". It is actually Supplementary Figure 1c.

      Thank you for noticing that! Now corrected. The Supplementary figures have also been renamed to obey eLife’s expected nomenclature (both Figure 1 – Figure supplements)

      10) In some places the language seems a little outlandish and should either be removed or appropriately qualified. a- Lines 56-59 pose three questions that are either rhetorical or ill-posed. For example, "...minimal amount of information...behavior" implies there is a singular response but the response depends on many details such as to what degree do the authors want to "classify behavior".

      Yes, those were meant as rhetorical questions indeed, but we prefer to keep them in, because we are hoping to generate this type of thoughts with the readers. These are concepts that may not be so obvious to someone who is just looking to apply an existing tool and may spring some reflection about what kind of data do they really want/need to acquire.

      b) Some of the criticisms leveled at the state-of-the-art methods are probably unwarranted because the goals of the different approaches are different. The current method does not yield the type of rich information that DeepLabCut yields. So, depending on the application DeepLabCut may be the method of choice. The authors of the current manuscript should more clearly state that.

      In the introduction and discussion we do try to stress that coccinella is not meant to replace tools like DLC. We have now added more emphasis to this concept, for instance to L212:

      [tools like deeplabcut] are ideal – and irreplaceable – to identify behavioural patterns and study fine motor control but may be undue for many other uses.

      And L215:

      Coccinella is a reductionist tool not meant to replace the behavioural categorization that other tools can offer but to complement it

      11) The application to sleep data appears suddenly in the manuscript. The authors should attempt to make with text change a smoother transition from drug screen to investigation into sleep.

      I agree with this observation. We have now tried to add a couple of sentences to contextualise this experiment and hopefully make the connection appear more natural. Ultimately, this is a proof-ofprinciple example anyway so hopefully the reader will take it for what it is (L169).

      Finally, to push the system to its limit, we asked coccinella to find qualitative differences not in pharmacologically induced changes in activity, but in a type of spontaneous behaviour mostly characterised by lack of movement: sleep. In particular, we wondered whether coccinella could provide biological insights comparing conditions of sleep rebound observed after different regimes of sleep deprivation. Drosophila melanogaster is known to show a strong, conserved homeostatic regulation of sleep that forces flies to recover at least in part lost sleep, for instance after a night of forceful sleep deprivation.

      11b) Additionally, the beginning section of sleep experiments talks about sleep depth yet the conclusion drawn from sleep rebound says more about the validity of the current 5 min definition of sleep than about sleep depth. If this conclusion was misunderstood, it should be clarified. If it was not, the beginning text of the sleep section should be tailored to better fit the conclusion.

      I am afraid we did not a good job at explaining a critical aspect here: the data fed to coccinella are the “raw” activity data, in which we are not making any assumption on the state of the animal. In other words, we do not use the 5-minutes at this or any other point to classify sleep and wakening. Nevertheless, coccinella picks the 300 seconds threshold as the critical one for discerning the two groups. This is interesting because it provides a full agnostic confirmation of the five minutes rule in D. melanogaster. We recognise this was not necessarily obvious from the text and now added a clarification at L189-201:

      However, analysis of those same animals during rebound after sleep deprivation showed a clear clustering, segregating the samples in two subsets with separation around the 300 seconds inactivity trigger (Figure 3d). This result is important for two reasons: on one hand, it provides, for the third time, strong evidence that the system is not simply overfitting data of nought biological significance, given that it could not perform any better than a random classifier on the baseline control. On the other hand, coccinella could find biologically relevant differences on rebound data after different regimes of sleep deprivation. Interestingly enough, the 300 seconds threshold that coccinella independently identified has a deep intrinsic significance for the field, for it is considered to be the threshold beyond which flies lose arousal response to external stimuli, defining a “sleep quantum” (i.e.: the minimum amount of time required for transforming inactivity bouts into sleep bouts23,24,28). Coccinella’s analysis ran agnostic of the arbitrary 5-minutes threshold and yet identified the same value as the one able to segregate the two clusters, thus providing an independent confirmation of the fiveminutes rule in D. melanogaster.

      12) Line 227: (standard food) - please add a link to a protocol or a detailed description on what is "standard food". This way others can precisely replicate what you are using. This is not my field, but I have the impression that food content/composition for these animals makes big changes in behaviour?

      Yes, good point. We have now added the actual recipe to the methods L240:

      Fly lines were maintained on a 12-hour light: 12-hour dark (LD) cycle and raised on polenta and yeast-based fly media (agar 96 g, polenta 240 g, fructose 960 g and Brewer’s yeast 1,200 g in 12 litres of water).

      13) Data acquisition and processing: please add links to the code used.

      Both the code and the raw data used to generate all the figures have been uploaded on Zenodo and available through their repository. Zenodo has a limit of 50GB per uploaded dataset so we had to split everything into two files, with two DOIs, given in the methods (L356, section “code and availability” - DOIs: 10.5281/zenodo.7335575 and 10.5281/zenodo.7393689). We have now also created a landing page for the entire project at http://lab.gilest.ro/coccinella and linked that landing page in the introduction (L64).

      13b) Also your pipeline seems to use three different programming languages/environments... Any chance this could be reduced? Maybe there are R packages that can convert csv to matlab compatible formats, so you can avoid the Python step? (nothing against using the current pipeline per se, I am just thinking that for usability and adoption by other labs, the smaller amount of languages, the better?

      This is a very important suggestion that highlights a clear limitation of the pipeline. I am happy to say that we worked on this and solved the problem integrating the Python version of Catch22 into the ethoscopy software. This means the two now integrate, and the entire analysis can be run within the Python ecosystem. HCTSA does not have a Python package unfortunately but we still streamlined the process so that one only has to go from Python to Matlab without passing through R. To be honest, Catch22 is the evolution of HCTSA and performs really well so I think that is what most users will want to use. We provide two supplementary notebooks to guide the reader through the process. One explains how to go from ethoscope data to an HCTSA compatible mat file. The other explains how ethoscope data integrate with Catch22 and provides many more examples than the ones found in the paper figures.

      14) There are two sections named "References" (which are different from each other) on the manuscript I received and also on BioRxiv. Should one of them be a supplementary reference? Please correct it. I spent a bit of time trying to figure out why cited references in the paper had nothing to do with what was being described...

      The second list of references actually applied only to the list of compounds in the supplementary table 1. When generating a collated PDF this appeared at the end of the document and created confusion. We have now amended the heading of that list in the following way, to read more appropriately:

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing our manuscript. We do find that the reviews are constructive and meaningful. Accordingly, we incorporated most suggestions into our revision. We provided a point-by-point responses to the reviews below.

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sexbiased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of nonsynonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Thank you for your positive comments. Greatly appreciated.

      There are, however, parts of the manuscript that are not clearly described or could be otherwise improved.

      • The number of denovo-assembled unigenes seems large and I would like to know how it compares to the number of genes in other Cucurbitaceae species. The presence of alternatively assembled isoforms or assembly artifacts may be still high in the final assembly and inflate the numbers of identified sex-biased genes.

      The majority of unigenes were annotated by homologs in species of Cucurbitaceae (63%), including Momordica charantia (16.3%), Cucumis melo (11.9%), Cucurbita pepo (11.9%), Cucurbita moschata (11.5%), Cucurbita maxima (10.1%) and other species of Cucurbitaceae (Fig. S1C). We admit that in the final assembly, transcripts may be still overestimated due to the unavoidable presence of isoforms, although we have tried our best to filter it by several strategies of clustering methods. Additionally, we assessed the transcripts using BUSCOv5.4.5 and embryophyta_odb10 database with 1,614 plant orthologs assessment. Some 95.0% of these orthologs were covered by the unigenes, in which 1447 (89.7%) BUSCO genes were “Complete BUSCOs”, 85 (5.3%) were “Fragmented BUSCOs”, and only 82 (5.0%) were “Missing BUSCOs” (Table S2). Overall, our assessment suggested that we have generated high-quality reference transcriptomes in the absence of a reference genome. Subsequently, we revised the manuscript (lines 175-181).

      • It is interesting that the majority of sex-biased genes are present in the floral buds but not in the mature flowers. I think this pattern could be explored in more detail, by investigating the expression of male and female sex-biased genes throughout the flower development in the opposite sex. It is also not clear how the expression of the sex-biased genes found in the buds changes when buds and mature flowers are compared within each sex.

      Thank you for your advice for further understanding of this interesting pattern. In the near future, we would like to study these issues through more development stages of flowers in each sex, probably with the aid of single-cell techniques and a reference genome. We have revised the manuscript to reflect these in Results, in the section "Tissue-biased/stage-biased gene expression" (lines 202216).

      • The statistical analysis of evolutionary rates between male-biased, female-biased, and unbiased genes is performed on samples with very different numbers of observations, therefore, a permutation test seems more appropriate here.

      Thank you for your suggestion. However, all comparisons between sex-biased and unbiased genes were tested using Wilcoxon rank sum test in R software, which is more commonly used. Additionally, we tested some datasets, which were consistent with Wilcoxon rank sum test.

      • The impact of pleiotropy on the evolutionary rates of male-biased genes is speculative since only two tissue samples (buds and mature flowers) are used. More tissue types need to be included to draw any meaningful conclusions here.

      Thank you for your advice for further understanding of the impact of pleitropy. In the near future, we would like make further investigations through more development stages of flowers and new technologies in each sex to consolidate the conclusion.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Thank you for your meanful suggestions. We agree that the identification of chromosome origins for transcripts would greatly improve the insights of selection, and we will investigate these issues, probably with a reference genome in the near future.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      The main limitation of the study is the very low number of samples analyzed, with only three replicate individuals per sex (i.e. the whole study is built on six individuals only). This provides low power to detect differential expression. Along the same line, only three species were used to evaluate the rates of non-synonymous to synonymous substitutions, which also represents a very limited dataset, in particular when trying to fit parameter-rich models such as those implemented here.

      A third limitation relates to the absence of a reference genome for the species, making the use of a de novo transcriptome assembly necessary, which is likely to lead to a large number of incorrectly assembled transcripts. Of course, the production of a reference transcriptome in this non-model species is already a useful resource, but this point should at least be acknowledged somewhere in the manuscript.

      Each of these shortcomings is relatively important, and together they strongly limit the scope of the conclusions that can be made, and they should at least be acknowledged more prominently. The study is valuable in spite of these limitations and the topic remains grossly understudied, so I think the study will be of interest to researchers in the field, and hopefully inspire further, more comprehensive analyses.

      We acknowledged that our sample size was relatively small. We will investigate these issues at the population level, probably with a reference genome in the near future. We acknowledged in the revised manuscript that there may be some incorrectly assembled transcripts. We assessed the transcripts using BUSCOv5.4.5 and the latest embryophyta_odb10 database with 1,614 plant orthologs assessment. As mentioned, 95.0% of these orthologs were covered by the unigenes, which of 1447 (89.7%) BUSCO genes were “Complete BUSCOs”, 85 (5.3%) were “Fragmented BUSCOs”, and only 82 (5.0%) were “Missing BUSCOs” (Table S2). In short, the quality of transcriptome was high in the absence of a reference genome.

      Reviewer #1 (Recommendations For The Authors):

      My main criticism of this manuscript is that it refers to gene names and orthogroups throughout the text, however, the assembled transcripts are not accessible. The reference trascriptome, orthology data, and alignments used for evolutionary analysis should be made available through a public repository to support reproducibility and efficient use of produced resources in this study.

      We have uploaded these datasets in Researchgate (https://www.researchgate.net/publication/373194650_Trichosanthes_pilosa_datasets Positive_selection_and_relaxed_purifying_selection_contribute_to_rapid_evolution of_male-biased_genes_in_a_dioecious_flowering_plant).

      Comments to the authors:

      1) I have an issue with the tissue-biased gene expression analysis. Looking at Fig.3, it seems to me there are 3,204 male-biased genes that are expressed at the same level in male buds and mature flowers (same for 5,011 female-biased genes in female buds and flowers), however, only a handful of genes show sex bias between mature male and female flowers. Taking the male-biased genes as an example, if the 3,204 M1BGs experience the same expression levels in mature male flowers and are no longer male-biased when mature male vs female flowers are compared, why there are not found as female tissue biased (F2TGs)? I may be wrong, but one scenario would be that the M1BGs increase their expression in female flowers and become unbiased. However, that increase in expression (low expression in the female buds → higher expression in the female flowers) should classify them as female tissue-biased genes (F2TGs). Can you please clarify how are the M1BGs and F1BGs expressed in the flowers of the opposite sex?

      As to Fig. 3A, 3,204 male-biased genes expressed in male floral buds are part of all male-biased genes (3204+286+724=4214), as shown in Fig.2A. However, only 233 male-biased genes (88+1+144=233, Fig.2B and Fig.3B) expressed in male mature flowers. So, they are not expressed at the same level between male floral buds and mature flowers. Only 288 genes are sex-biased (M1BGs), as well as tissue/stage-biased (M1TGs) in male floral buds. M1BGs (4,214 male-biased genes) and F1BGs (5,096 female-biased genes) are 0 overlaps, except for 44,326 unbiasedgenes shown in Fig.2A. That is, F1BGs (5,096 female-biased genes) are low expression or no expression in M1BGs (4,214 male-biased genes). The expression levels of some genes have been shown in Table S14.

      2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results.

      In fact, these results have been shown in Table S13. It is not necessary for us to describe them in detail in the results.

      3) How did the authors conclude that the identified functions in male flowers make them more adapted to biotic and abiotic environments (line 347-350)? In the paragraph above (line 338-342) the authors describe that female buds are better equipped against herbivores, which are a biotic factor?

      Following your concerns, we have revised the manuscript as follows: For line 338-342, we revised the text as “Indeed, functional enrichment analysis in chemical pathways such as terpenoid backbone and diterpenoid biosynthesis indicated that relative to male floral buds, female floral buds had more expressed genes that were equipped to defend against herbivorous insects and pathogens, except for growth and development (Vaughan et al., 2013; Ren et al., 2022) (Fig. S7A and Table S11).” For line 347-350, we revised text as “We also found that male-biased genes with high evolutionary rates in male buds were associated with functions to abiotic stresses and immune responses (Tables S12 and S13), which suggest that male floral buds through rapidly evolving genes are adapted to mountain climate and the environment in Southwest China compared to female floral buds through high gene expression.”

      4) Line 417-418: decreasing codon usage bias is linked to decreasing synonymous substitution rates, should this be the opposite?

      No. Codon usage bias was positively related to synonymous substitution rates. That is, stronger codon usage bias may be related to higher synonymous substitution rates (Parvathy et al., 2022).

      5) Figures and Tables are not standalone and are missing details in the legends. - Fig.2C, which genes are plotted on the heatmap and what is the color scale corresponding to?

      • All Supplementary figures are missing the descriptions of individual panels (A, B, C,etc.) in the legends. In addition, please add the numbers of observations under boxplots.

      • Supplementary Fig.5 and 6: Panel B is not a Venn diagram, I suggest removing it from the figures.

      • Supplementary Fig.7: Should be 'sex-biased genes'. What is the x-axis on the plot?

      • Supplementary Fig.8: Please add the description of the abbreviations in the legend. - Supplementary Tables S4, S5, S6: Please add information about the foreground and background branches.

      • Supplementary Table S6, S7, S8, S9, S10: Please add more details about the column headers (what is Model-A, background ω 2a, Unconstrained_1.p, K, which was the foreground branch etc.).

      • Supplementary Table S11: Please add gene IDs for each KEGG category.

      We have revised/fixed these issues following your concerns and suggetions.

      Minor comments:

      Line 28: 'algae' in place of 'algas'

      Line 53-56: Please provide more recent references.

      Line65: 'most' instead of 'almost'

      Line 86-87: It is not clear from the sentence if the sex-biased expression was detected in flowers compared to leaves, or were the sex-biased genes detected between male and female leaves? Please clarify.

      Line 107-108: positive selection is referred to as adaptive evolution, please choose one or the other.

      Line 109: 'force' instead of 'forces'

      Line 110: 'algae' instead of 'alga'

      Line 132: '..mainly distributed from Southwest,' the country is missing.

      Line 202: 'protein sequence evolution'?

      Line 232: what does the 'number of evolutionary rates' refers to?

      Line 253: please provide a reference for the RELAX model.

      Line 274: 'relaxed selective male-biased genes' should be 'male-biased genes under relaxed purifying selection'?

      Line 318: Please add a sentence explaining why the Cucurbitaceae family is a great model to study the evolution of sexual systems.

      Line 321: 'genes' instead of 'gene'.

      Line 366: male-biased genes experience 'higher' or 'more rapid' evolutionary rates. line 377: in the present study and in the case of Ectocarpus alga, positive selection plays an important role in male-biased genes evolution, but does not account for the majority of evolutionary change. Therefore, I would not call it a 'primary' force.

      Line 477: missing reference for DESeq2 package.

      Line 480: 'used'.

      Line 498: 'coding sequences'.

      Line516: 'to' instead of 'by'.

      Line 553: 'the' is repeated twice.

      Sorry for the typos and grammatical issues. We have revised them accordingly.

      Reviewer #2 (Recommendations For The Authors):

      There are two areas for improvement, one empirical and one theoretical.

      Empirically, the analyses could be expanded by an attempt to distinguish between genes on the autosomes and the sex chromosomes. Genotypic patterns can be used to provisionally assign transcripts to XY or XX-like behavior when all males are heterozygous and all females are homozygous (fixed X-Y SNPs) and when all females are heterozygous and males are homozygous (lost or silenced Y genes). Comparing such genes to autosomal genes with sex-biased expression would sharpen the results because there are different expectations for the efficacy of selection on sex chromosomes. See this paper (Hough et al. 2014; https://www.pnas.org/doi/abs/10.1073/pnas.1319227111), which should be cited and does in fact identify faster substitution rates in Y-linked genes (and note that pollenexpressed genes, at least, are concentrated on the sex chromosome in this system: https://academic.oup.com/evlett/article/2/4/368/6697528, https://royalsocietypublishing.org/doi/10.1098/rstb.2021.0226).

      We have cited Hough et al. 2014 and noticed that several species have been observed to exhibit rapid evolutionary rates of sequences on sex chromosomes compared to autosomes, which has been related to the evolutionary theories of fast-X or fast-Z (lines 482-484).

      On the theoretical side, this study is making a very specific intervention, namely identifying more rapid evolutionary rates in genes with male-biased than femalebiased expression in a dioecious plant. The writing in the introduction and the discussion needs to be improved to differentiate between this comparison and similar comparisons, e.g. sex-biased expression in other dioecious plants (76-81), between Xlinked and Y-linked genes (Hough et al. 2014), sex chromosomes and autosome (several studies already cited), gametophytic and sporophytic tissue, and male and female reproductive tissue in hermaphroditic plants. Setting out this distinction early in the introduction will make the specific goals and novelty of this work clearer.

      Thank you for your constructive suggestions. We have revised the relevant part of the Introduction accordingly (lines 74-107).

      Specific comments by line:

      Sorry for the typos or wording issues. We have revised them.

      26 - driven not driving

      28 - check house style (algae vs algas)

      28-29 - consider clarifying the antecedent of "them" (evolutionary forces, not algas) 35 - maybe, but don't the signalling genes involved in stress responses function in many capacities, not just stress? Also, there's evidence that reproductive recognition machinery in plants may ultimately derive from immune function (e.g. https://doi.org/10.1111/j.1469-8137.2008.02403.x), so the GO category "biotic stress" may be too vague

      39 - maybe clarify that "for the first time" refers to male rather than female, since there have been other studies in dioecious plants

      66-68 - asserting that something is "essential" after describing how rare it is doesn't quite follow, since diecious plants - especially with sex chromosomes - are basically an exception. I agree that understanding the evolution of dioecious plants is important, but this isn't the most compelling way to make that case - perhaps try something else.

      137ff - this sentence can be consolidated and streamlined

      142 - "floral tissue" rather than "flowers tissue," here and elsewhere

      144 - divergence (singular)

      235 - "evidence for the contributions of" = "evidences" is unidiomatic 250 - efficiency or efficacy?

      300 - why is "inositol" capitalized here and elsewhere?

      300ff - are these typical patterns in male tissue in other species?

      308 - is that interesting? It seems like exactly what I'd expect. Perhaps start with the unsurprising but reassuring observation (anther and pollen development genes are indeed expressed in male buds) before moving on to the more surprising findings.

      319 - remove "the"

      321 - genes (plural)

      330 - replace "these differences" with "the differences" 336 - perhaps recap proportions / percents here?

      340 - unnecessary comma after diterpenoid

      341 - this seems like a big leap from the evidence, especially in the absence of supporting information about the chemical defenses of these species and how they differ by sex. Don't terpenoids have a diverse array of functions, not just defense? Here's a review: https://link.springer.com/chapter/10.1007/10_2014_295

      We have revised the text as “Indeed, functional enrichment analysis in chemical pathways such as terpenoid backbone and diterpenoid biosynthesis indicated that relative to male floral buds, female floral buds had more expressed genes that were equipped to defend against herbivorous insects and pathogens, except for growth and development (Vaughan et al., 2013; Ren et al., 2022) (Fig. S7A and Table S11)” (lines 373-378).

      349 - as mentioned in line 35, this is a big speculative leap. The discussion is the place for speculation, but consider other explanations too. How does the development of flowers work? Are male flowers suppressing or resorbing female primordial organs? Do male flowers in fact senesce faster? perhaps spell out the logic in more detail.

      We have revised the text as “In addition, the enrichment in regulation of autophagy pathways could be associated with gamete development and the senescence of male floral buds (Table S14) (Liu and Bassham, 2012; Li et al., 2020; Zhou et al., 2021). In fact, it was observed that male flowers senesced faster (Wu et al., 2011). We also found that homologous genes of two male-biased genes in floral buds (Table S14) that control the raceme inflorescence development (Teo et al., 2014) were highly expressed compared to female floral buds. Taken together, these results indicate that expression changes in sex-biased genes, rather than sex-specific genes play different roles in sexual dimorphic traits in physiology and morphology (Dawson and Geber, 1999).” (lines 390-402).

      351 - senescence of, not senescence for

      363 - but Hough et al. 2014 did show rapid evolution of Y-linked genes, and those are by definition sex biased ...

      391 - perhaps reiterate here that while some sex-BIASED genes did, sex-SPECIFIC genes did not, to avoid confusion

      We also revised them accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1- lines 56-57 : « have facilitated » : this wording confounds correlation with causation. Consider rephrasing as « is associated with »

      2- lines 58-60 : vague wording : what are these variations ? e.g. which tissues and stages are generally enriched?

      3- line 63 : this sentence is a bit misleading: consider changing it to « Most dioecious plants possess homomorphic sex-chromosomes » [and explain what homomorphic means in this context].

      4- line 68 : a reference is missing here. Also perhaps, allude to the fact that sexual selection in plants has long been considered a contentious issue (e.g. https://doi.org/10.1016/j.cub.2010.12.035)

      5- lines 72-76 : beyond simply describing the pattern, say what evolutionary processes are revealed by these observations.

      6- line 92 : remind the reader what these 5 studies are.

      7- line 94-95 : explain why the comparison of vegetative vs vegetative and vegetative vs reproductive tissues is a problem.

      The published studies only compared gene expression in vegetative versus vegetative tissues and vegetative versus reproductive tissues. Because it limited our understanding of sexual selection at different floral development stages. Revised accordingly (lines 103-104). We are very interested in flower development stage for sex-biased genes. The datasets could investigate sexual selection using two developmental stage (buds + mature flowers).

      8- line 100 « Evolutionary dynamic analyses » : this wording is vague

      9- line 110 : brown algae are NOT plants

      10- line 137-140 or in M&M : needs to describe somewhere how the male flowers differ from the female flowers and vice-versa: are the whole morphological structures related to female (male) reproduction entirely missing, or is their development arrested later on and they are still present but simply not producing gametes? This has consequences for the interpretation of the genes they express.

      We have revised the typos or wording issues accordingly. However, because the sampled floral buds were equal or less than 3 mm in size, we did not observe much morphological structural difference. Indeed, the male and female flowers at antheses were markedly different in this dioecious plant as shown in Fig. 1. Additionally, we found that dioecy is the ancestral state of Trichosanthes, and transitions to monoecy (Guo et al., 2020) based on our analysis (not shown in this study), which suggest that in the early stages of flower development, female floral buds may tend to masculinize, and vice versa (Fig. 2C).

      11- line 152 : it is important to be very transparent on the sample sizes here: « from three females and three males of the dioecious ... »

      12- line 153 : along the same line, explain here why a de novo transcriptome had to be generated here: « In the absence of an assembled reference genome for this nonmodel species, we de novo assembled ... »

      13- line 164-165 : « we have generated high-quality reference trancriptomes » : I am not entirely convinced of the quality of the transcriptome obtained without a reference genome, so I suggest simply removing this subjective sentence.

      Our assessment suggested that we have generated high-quality reference transcriptomes in the absence of a reference genome, which will be the next step of our work.

      14- line 169 : briefly explain the criteria used to call differentially expressed genes. Given the threshold (log-fold change >=1.3 if I read the figure correctly, but the M&M says >=1), explain how it was chosen.

      Sorry, you may have misunderstood the X, Y coordinates. The value of y coordinate represents -log10(FDR), and the value of x coordinate represents log2 (Fold Change).

      15- line 174 : Not clear to me how Fig2C is « revealing strong sexual dimorphism », since genes cluster neither by sex nor by tissue. This should be explained more clearly.

      16- line 174-177 : The fact that more ex-biased genes were identified in early buds than in mature flowers is an interesting observation that could be given more prominence in the manuscript, but it is not really explained. Could it reflect the fact that more genes are expressed in early buds because meiotic processes happen early in flower development? Also, the genes involved in male and female organ cell fate determination might also be expected to be expressed early, with mostly organ growth genes being expressed in the mature flower.

      17- line 181 : a wrap-up sentence might be useful here to drive the point home that sex-bias is more prevalent in buds than mature flowers.

      18- line 184 : « tissue-biased » : a more appropriate wording here would be « stagebiased », no ? These are indeed the same tissues but at different developmental stages.

      19- line 183-195 : this section could benefit from setting clear expectations in a hypothesis testing framework laying out the reasons to expect a different bias between stages and between sexes. How does that fit with the level of morphological divergence between sexes (relates to my point 10 above).

      20- line 197-204. A number of essential pieces of information are missing here: how many species, how divergent, say that one other is dioecious, and precise their relative phylogenetic placement (which is important to understand the models used below). Maybe adding a phylogeny of these species in Figure 4 could be useful. Also, briefly explain the « two-ratio » and « free-ratio » models here.

      21- line 196 and following: In these analyses, I could not understand the rationale for keeping buds vs mature flowers as separate analyses throughout. Why not combine both and use the full set of genes showing sex-bias in any tissue? This would increase the power and make the presentation of the results a lot more straightforward.

      As you pointed earlier (in the public review, paragraphy 2), “the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers)”, we totally agree with your points and were very interested in floral development stages for sex-biased genes.

      22- line 216 : say explicitly that the reason for not detecting a significant difference in spite of a relatively large effect size is probably related to the low number of genes, conferring low statistical power to detect a difference. An important feature also not highlighted here is that the trend (though not significant) is in the opposite direction than in the buds, and that both the 2-ratio and the free-ratio models agree on these inverted trends. This could be the basis for an interesting comparison.

      Thank you for your suggestions.

      23- line 220 : needs to explain more clearly how this « free-ratio » differs from the « two-ratio » model.

      24- line 232-234 : I don't see why this is necessary. Why not combine both? See also my comment 21 above.

      25- line 237 : the «A-model » was not defined before.

      26- line 237 : « male-biased » is missing after 343.

      27- line 253-258 : briefly explain what these other models are based on and how they are not redundant and instead complement the previous analyses and each other. 28- line 266-268 : the use of a more precise set of codons for male-biased genes than the others (if I understood correctly) could also be interpreted as another sign of stronger selective constraint, no?

      Codon usage bias is influenced by many factors, such as levels of gene expression. Highly expressed genes have a stronger codon usage bias and could be encoded by optimal codons for more efficient translation (Frumkin et al., 2018; Parvathy et al., 2022).

      29- line 269-291 : removing genes on a post-hoc basis seems statistically suspicious to me. I don't think your analysis has enough power to hand-pick specific categories of genes, and it is not clear what this brings here. I suggest simply removing these analyses and paragraphs.

      30- line 325 : say whether this patterns parallels / or not those in animals.

      31- line 335 : yes, these biological pieces of information are important and should be given in the introduction already.

      32- the discussion should focus at some point on the observation that more femalebiased genes are found in general, but that male-biased genes seem to be under greater selection. How do you reconcile these two apparently contradictory observations?

      We found that male-biased genes with high evolutionary rates in male floral buds were associated with functions to abiotic stresses and immune responses (Tables S12 and S13), which suggests that male floral buds through rapidly evolving genes are adapted to mountain climate and the environment in Southwest China compared to female floral buds through high gene expression (lines 387-390).

      33- line 355 : not clear how this follows from the previous sentences.

      34- line 356-358 : vagiue. not clear what the message of this sentence is.

      35- line 378-383 : say that these conclusions rely on the quality of gene annotation in this non-model species, which is probably pretty low (just like any other non-model species).

      36- line 403 : this conclusion seems far-fetched. Explain how exactly you reached this conclusion.

      37- line 406-416: these speculations on the role of paralogs seem unnecessary, in particular since the de novo transcriptome onto which all analyses are based cannot distinguish orthologs from paralogs.

      38- line 417-424. The discussion should not contain new results.

      39- line 510 : why were genes with dN/dS >2 discarded here? This might strongly bias the comparison, no? This needs to be properly justified.

      40- lines 516-523 : references to the models are missing.

      41- line 527: « omega = 1.5 » : why/how was this arbitrary threshold chosen?

      42- Fig 2 : write out « buds » and « mature flowers » on top of the graphs

      43- Fig 4 : add a phylogeny of the species showing the branch being compared. Also, add the number of genes in each category on each plot.

      Thanks, we revised/fixed these issues accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their thoughtful assessment and critiques. As detailed below in the point-by-point replies, we have modified the text and figures to clarify points of ambiguity and to document statistical significance in places where we had inadvertently neglected to do so. The manuscript is clearer and more rigorous as a result of the review process.

      Reviewer #1 (Public Review):

      This study addresses the fundamental question of how the nucleotide, associated with the beta-subunit of the tubulin dimer, dictates the tubulin-tubulin interaction strength in the microtubule polymer. This problem has been a topic of debate in the field for over a decade, and it is essential for understanding microtubule dynamics.

      McCormick and colleagues focus their attention on two hypotheses, which they call the "self-acting" model and the "interface-acting" model. Both models have been previously discussed in the literature and they are related to the specific way, in which the GTP hydrolysis in the beta-tubulin subunit exerts an effect on the microtubule lattice. The authors argue that the two considered models can be discriminated based on a quantitative analysis of the sensitivity of the growth rates at the plus- and minus-ends of microtubules to the concentration of GDP-tubulins in mixed nucleotide (GDP/GMPCPP) experiments. By combing computational simulations and in vitro observations, they conclude that the tubulin-tubulin interaction strength is determined by the interfacial nucleotide.

      The major strength of the paper is a systematic and thorough consideration of GDP as a modulator of microtubule dynamics, which brings novel insights about the structure of the stabilizing cap on the growing microtubule end.

      I think that the study is interesting and valuable for the field, but it could be improved by addressing the following critical points and suggestions. They concern (1) the statistical significance of the main experimental finding about the distinct sensitivity of the plus- and minus-ends of microtubules to the GTP-tubulin concentration in solution, and (2) the validity of the formulation of the "self-acting" model with an emphasis solely on the longitudinal bonds.

      We thank the reviewer for the comment about statistical significance, and we regret our oversight to have not included that analysis in the original manuscript. We have now included an analysis of statistical significance for the main experimental results supporting the interface-acting model (Fig. 2C and the replotting of those data against a different abscissa in Fig. 3C,D), and more broadly we have ensured that all figure legends contain information about the number of measurements and whether error bars indicate SD or SEM.

      The reviewers comment about the sole emphasis on longitudinal bonds helped us realize that a change to Fig. 1 (where we illustrate the two models) would improve clarity. We had originally chosen to illustrate Figure 1 using ‘pure’ longitudinal interactions (with no lateral contacts), and this may be what triggered the reviewer’s comment. We have now revised the figure to show ‘corner’ (longitudinal + lateral) interactions. There are two main reasons for this decision. First, the corner interactions are more long-lived and therefore more important for the phenomena under study. Second, because illustrating corner interactions provides a better basis for us to discuss what is a subtle aspect of our model – that the ‘GDP penalty’ affecting longitudinal or lateral interactions in a corner site is completely equivalent. Thus, our model is not quite as narrow/exclusive as the reviewer suggested. We appreciate having had the chance to clarify this.

      Reviewer #2 (Public Review):

      McCormick, Cleary et al., explore the question of how the nucleotide state of the tubulin heterodimer affects the interaction between adjacent tubulins.

      (1) The setup of the authors' model, which attributes the dynamic properties of the growing microtubule only to the differences in interface binding affinities, is unrealistic. They excluded the influence of the nucleotide-dependent global conformational changes even in the 'Self-Acting Nucleodide' model (Fig. 1A). As the authors have found earlier, tubulin in its unassembled state may be curved irrespective of the species of the bound nucleotide (Rice et al., 2008, doi: 10.1073/pnas.0801155105), but at the growing end of microtubules, the situation could be different. Considering the recently published papers from other laboratories, it may be more appropriate to include the nucleotide-dependent change in the tubulin conformation in the Self-Acting Nucleotide model.

      We understand the reviewer’s perspective, which may be summarized as: “We know conformational changes are happening and that they affect tubulin:tubulin interactions, so why isn’t your model trying to account for that?” In text added to the revised manuscript, we address this critique in the following ways. First, there is not a consensus in the field about how to parameterize the different conformations of tubulin and how they influence tubulin:tubulin interactions. Second, any attempt to explicitly account for different conformations of tubulin would substantially increase the number of adjustable model parameters, which in turn makes the fitting to growth rates more complicated. Third, compared to traditional ‘dynamics’ assays that use GTP, using mixtures of GMPCPP and GDP simplifies the biochemistry by eliminating GTPase. This results in a more uniform composition of nucleotide state in the body of the microtubule polymer, which diminishes the importance of explicitly modeling nucleotide-influenced changes in conformation. Fourth, it seems likely that different conformations of tubulin will modulate both longitudinal interactions (as tubulin becomes straighter the longitudinal contact area grows larger) and lateral interactions (as tubulin becomes straighter, the lateral contact areas on α- and β-tubulin come into better alignment). Our model treats longitudinal and corner (defined as longitudinal + lateral) interactions as independent, so in principle it could be implicitly capturing some of these conformational effects. By refining the strengths of the longitudinal and corner interactions independently, the model effectively allows the strength of longitudinal contacts to be different for pure longitudinal and corner interactions, which might implicitly capture some variations in longitudinal contacts for different tubulin conformations. Our model treats ‘bucket’-type sites (one longitudinal and two lateral interactions) as simply having an additional lateral interaction of equal strength as the first, but because bucket sites have such a high affinity, they rarely dissociate and this small oversimplification is unlikely to have a substantial effect. We have introduced text in several places (bottom of p. 7 and elsewhere) to cover these points.

      (2) The result that the minus end is insensitive to GDP (Fig. 2) was previously published in a paper by Tanaka-Takiguchi et al. (doi: 10.1006/jmbi.1998.1877). The exact experimental condition was different from the one used in Fig. 2, but the essential point of the finding is the same. The authors should cite the preceding work, and discuss the similarities and differences, as compared to their own results.

      Thank you for reminding us of this paper! We agree that it is an ‘on target’ citation, and have cited and discussed it in the revised manuscript (last paragraph of Introduction, third paragraph of Discussion).

      Reviewer #1 (Recommendations For The Authors):

      1) In my opinion, the way in which the authors have depicted their "self-acting" model in Fig. 1 and in Supplementary Figure 1, makes the model look intuitively implausible. The drawings seem to imply that at the plus-end the GTP hydrolysis in the beta-tubulin subunit somehow allosterically affects the alpha-tubulin subunit of the same dimer to weaken its longitudinal bond with adjacent tubulin dimer. Conversely, at the minus end, the same reaction now affects the very same beta-tubulin subunit, and modulates its longitudinal interaction with the next dimer.

      However, a more realistic formulation of the "self-acting" model would be that the exchangeable nucleotide affects the lateral bonds, formed by the same beta-tubulin with its lateral neighbors. Although the experimental data in this regard are controversial, at least some supporting evidence for this idea comes from structural arguments, e.g. [Manka, S.W., Moores, C.A. Nat Struct Mol Biol 25, 607-615 (2018).] This "lateral selfacting", but not the "longitudinal self-acting" hypothesis, seems more natural, and it was the one previously implemented in the seminal paper by [Vanburen et al, 2002 Proceedings of the National Academy of Sciences 99.9 (2002): 6035-6040.] and later by other some other models as well.

      This point has been addressed above, in part by modifying the cartoon in Fig. 1.

      2) To better clarify, which exact models are considered in this manuscript, it would be helpful if the authors provided a detailed table with all simulation parameters, including, k_off_loner, k_off_bucket and k_off_corner, for both nucleotide states, in both the selfacting and the interface-acting models.

      Thank you for the suggestion. We have added tables that show all simulation parameters, as well as the corresponding calculated on- and off-rates for each interaction.

      3) I am not sure that using some 'arbitrarily chosen' parameters is very helpful in Chapter 1 of Results. In fact, the results, obtained with an unconstrained set of parameters may be misleading or provide ambiguous answers. In other words, how reliable are the conclusions, based on the arbitrary parameter set? For example, could the dependences of the microtubule growth rate on the GDP-tubulin content be more or less pronounced with a different set of arbitrarily chosen parameters, compared to the graphs in Fig. 1BC?

      This is a fair criticism. In response, we have added three new sets of simulations that each test different choices of the biochemical parameters used in Figure 1. With respect to the original parameters, we tested a weaker and stronger choice for the longitudinal interaction (KDlong, a 100-fold range), the corner interaction (KDcorner, a 25-fold range), and the GDP weakening factor (a 100-fold range). The predicted supersensitivity of plus-end growth rates to GDP in the self-acting vs interface-acting mechanisms is robust across the range of different choices for the above parameters (Figure 1 Supplements 1 and 2). Parameters for these new simulations are shown in Tables 3 and 4.

      4) It took me some time to comprehend why the minus-end growth rate is assumed to be dependent only on the concentration of the GMPCPP-tubulin (in section 2 of Results). It would be great if the authors simply plotted the simulated dependence of the growth rate on the GMPCPP-tubulin concentration in the case when no GDP-tubulin was added. As I understand, that curve should almost exactly match the dependence observed in Fig 1B, correct? Otherwise, it does not seem obvious, why GDP-tubulin does not impede the minus-end growth. Again, is this conclusion model- and parameterdependent? This question is related to point 3 above.

      The minus-end growth rates decrease in proportion to the concentration of GMPCPPtubulin. We have added a note on minus-end growth rates in the Figure 1 legend.

      5) I was not quite convinced by the evidence for distinct sensitivities of the plus- and minus-end growth rates to GDP-tubulin concentration (Figure 2C and Fig 3C, D). These are the key experimental measurements in the paper. Therefore, I suggest that the authors try to strengthen this point by additional measurements to increase statistics. Or at least, please, explain the data points, the error bars, and provide some information on the number of independent measurements and the statistical significance between the curves. Maybe, they could be directly compared after normalizing by the "all GMPCPP growth rate"? How was the "1.5-fold" ratio obtained in Fig 2C? Does that number refer only to a certain GDP-tubulin concentration or does that value somehow characterize the whole range of the concentrations measured?

      This has been addressed above.

      Reviewer #2 (Recommendations For The Authors):

      These look identical to above and were addressed there.

      (1) The setup of the authors' model, which attributes the dynamic properties of the growing microtubule only to the differences in interface binding affinities, is unrealistic. They excluded the influence of the nucleotide-dependent global conformational changes even in the 'Self-Acting Nucleodide' model (Fig. 1A). As the authors have found earlier, tubulin in its unassembled state may be curved irrespective of the species of the bound nucleotide (Rice et al., 2008, doi: 10.1073/pnas.0801155105), but at the growing end of microtubules, the situation could be different. Considering the recently published papers from other laboratories, it may be more appropriate to include the nucleotide-dependent change in the tubulin conformation in the Self-Acting Nucleotide model.

      (2) The result that the minus end is insensitive to GDP (Fig. 2) was previously published in a paper by Tanaka-Takiguchi et al. (doi: 10.1006/jmbi.1998.1877). The exact experimental condition was different from the one used in Fig. 2, but the essential point of the finding is the same. The authors should cite the preceding work, and discuss the similarities and differences, as compared to their own results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviews

      Reviewer #1:

      We thank this reviewer for their comments on our paper. We have adjusted the methods secon to ensure it is clear, including an updated descripon of the stascs and in some cases updated stascal methods to ensure uniformity in analyses across datasets. The discussion has been modified so that the message regarding our results is set appropriately in the literature.

      Reviewer #2:

      We are grateful to this reviewer for their insight. We have modified the text of the discussion to address the points of this reviewer, including providing a greater focus on the significance of our results without overgeneralizing. We have addionally reframed our argument regarding the detecon of pescides by Bombus terrestris to more carefully consider conflicng results from other papers.

      Response to Recommendaons For The Authors

      Response to Reviewer #1

      We thank this reviewer for their thoughul comments and ideas. We have made several changes to the text of the manuscript to improve the clarity of our wring, and we are grateful to the reviewer for raising several important points that we had not sufficiently discussed in the paper previously. We feel the paper has been improved with the inclusion of a more thorough discussion and clarified methods. Please see below our responses to the points they raised.

      A few general thoughts that I had when reading your manuscript: I assume you have only tested the acve pescide ingredients, but not the formula generally applied in the field. Given that these formulas contain addional compounds but the acve ingredients, might it not be possible that they could be perceived by bees?

      For this study, we were interested specifically with the taste of acve pescide compounds, although we agree it could be interesng to explore the taste of other formula compounds, it was not within the scope of this paper to test these.

      Is there an alternave to quinine as a negave control? As you state, quinine is generally used in studies and likely oen in concentraons which are beyond what can be seen in e.g. floral nectar, which might explain its aversive effect. I would like to see it tested in natural concentraons and ideally in combinaon with other potenally toxic plant secondary metabolites (PSMs).

      The purpose of including quinine in our study was to provide an in-depth characterizaon of “biter” taste responses using the sensilla on bumblebee labial palps and galea (i.e., through the atenuaon of GRN firing). This was not to show how bumblebees may interact with plants containing quinine in the field, or other PSMs. It would indeed be interesng to explore other plant secondary metabolites, however this was beyond the scope of our paper.

      L177-187 AND 233-238 Could you, please, provide a photo or schemac drawing to illustrate your assay? I have a very hard me picturing the actual set-up.

      We have provided a labeled diagram of the bumblebee mouthparts and sensillum types (Fig 1A), as well as an image of the bumblebee feeding from a capillary in the behavioural assay (Fig 1G). Further details about the feeding assay (including a JoVe video) can be found with the Ma 2016 paper that we cite throughout our methods secon.

      L219 Why did you choose 5 sec here?

      This feeding bout duraon was selected based on the criteria defined in Ma et al 2016. We have added a citaon to that sentence.

      L221-224 How precisely was the behavior scored? Just length of bouts or also repeated short contacts? Please, specify.

      We used the first bout duraon and the cumulave bout duraon in our analyses. A sentence has been added to specify this.

      L231/233 Please, provide some brief details here, as many readers may find it annoying to find and read another study for methods' details.

      We have added three sentences in the methods to further explain the electrophysiological method.

      L238-245 See also my general methods comment: concentraons used for pescides and quinine differ quite substanally, which may explain their different effects on the bees' percepon. Are the concentraons used for quinine realisc? If not that is totally fine for a negave control, but it would be interesng to see a comparison of effects conducted for similar concentraons.

      The concentraons used of quinine were selected so that they would elicit a known “biter response” – these concentraons are not meant to be field-realisc, and our data (and others, e.g., Tiedeken et al 2014) show that lower concentraons of quinine are not detected by bumblebees.

      L277-301 I assume that this is a standard stascal approach to analyze electrophysiological data. However, I am really struggling with fully understanding what you did here. It might be good to add some addional explanaon/detail, e.g. on why you constructed firing rate histograms or how you derived slopes (aren't smulus and bin categorical variables?).

      Firing rate histograms are indeed very commonly used for visualizing neuron spikes over me. We have changed the text somewhat in an effort to make it more clear. Likewise, the “slopes” were derived from the LMEs, and in this case “bin” is a connuous me variable – any me variable will involve some binning depending on the resoluon used but should not be considered categorical.

      L291-295 As you were averaging and normalizing your data, could you, please, provide some informaon on variaon within animals?

      We have now included informaon on the coefficient of variaon for spike rates across sensilla for a given animal/smulus (CV range, median, and IQR).

      L295 I assume t-SNE represent a mulvariate approach for ordinaon, correct? Can you explain why you chose to use this approach? Did you use Euclidean Distance?

      Yes, t-SNE is a mulvariate technique for dimensionality reducon. It is parcularly well-suited for the visualizaon of high-dimensional datasets, as it can reveal the underlying structure of the data by embedding it in a lower-dimensional space (e.g., 2D) while preserving the local structure of the data as much as possible. We used t-SNE because it has been shown to be effecve in visualizing clusters of similar data points in high-dimensional data. Euclidean distance was used as the distance metric for the t-SNE embedding. Euclidean distance is the default distance metric for most implementaons of t-SNE and is appropriate for connuous data like the spike counts in this study. We have adjusted the methods to clarify this.

      L304 Why did you not always use LMEs?

      We have adjusted the text to show that we used LME for all comparisons, and the stascs have been updated accordingly in the results secon. None of the outcomes changed with the implementaon of LME for all tests.

      L306 Would it not make sense to also include the interacon between smulus and concentraon in your models?

      We have now included a sentence to explain that the interacon term was removed due to it being nonsignificant, and the models without the interacon term having beter model fit (determined by having lower AIC and BIC values).

      Results:<br /> L337, 339 and more: I would prefer to see actual p-values, not just "p > 0.05".

      This has been adjusted on L337 and 339. As far as we are aware, there are no other instances where exact p-values were not given (except when p < 0.0001).

      Discussion:<br /> L470 This is true, but the bees' behavior changed significantly, indicang that they may respond to this small change in firing paterns already?

      It is true that the bees’ behaviour changed significantly with 0.1mM QUI, but this was not the case with the pescides. Bees drank less overall of 0.1mM QUI than OSR because of the rapid posngesve effects of this compound. It’s important that the duraon of the first bout was not affected (i.e., they didn’t directly avoid it by taste upon first encountering it, as they do with 1mM QUI), but just that they drank less of the 0.1mM QUI over 2 minutes. Post-ingesve effects may occur as quickly as 30s aer inial consumpon. For the pescides, the small changes in GRN firing were not associated with any effects on consumpon or our other measures of feeding behaviour, and we suggest this results from a lack of rapid negave posngesve consequences. We now include further discussion of these important points.

      L481 But they consumed significantly less of the 0.1 mM QUI!?

      This is true, but they did not reject it (i.e., not drink it at all), and there were no changes in the amount of me the bees spent in contact with the 0.1mM QUI soluon compared to OSR. We have adjusted the text for clarificaon.

      L504/505 AND 520/521 AND 533-536 I see your point, but I am wondering whether the bees may need some me but are generally able to learn the taste of pescides, which may explain why e.g. Arce et al. only saw an effect over me. For example, learning to 'focus' on the pescide taste may require some internal feedback (bees not feeling well) or larvae feedback. If I understood right, you tested workers only, which might be less sensive than queens or larvae. I think these aspects should be discussed.

      In our study, we invesgated the mechanism of taste detecon of pescides. We agree that bees likely use posngesve mechanisms to learn to associate the locaon (or another cue) of a food source with posive or negave posngesve cues. ‘Focus’ is a higher-order process that involves increased atenon to sensory smuli but does not affect sensaon at the level of the receptor. We show that bees are unable to taste pescides using the gustatory receptors on their mouthparts, so post-ingesve learning would not be able to associate the pescides with any taste cue. Indeed, there may be caste-specific differences with foraging queens, however a discussion of this would be beyond the scope of our paper.

      I also recommend broadening the scope of your discussion. For example, you only focus on nectar, while the story might be different for pollen, which is also contaminated with pescides but represents a different chemical matrix with potenally different taste properes. Also, unlike nectar, pollen is collected with tarsae and hence on contact with other bee body parts.<br /> I would also like to see a discussion of your study's implicaons for other bee species and other potenally toxic compounds (e.g. PSMs).

      We do not include any data in this paper regarding tarsal or antennal taste or other potenally toxic compounds. In this paper we present one mechanism of biter taste percepon (i.e., of quinine) and specifically show that the buff-tailed bumblebee is unable to taste certain pescides using their mouthparts. To avoid overgeneralizing, we have not included discussions about other species or compounds, which may or may not share similaries with our study.

      Response to Reviewer #2

      We thank this reviewer for their comments. We have adjusted the text to avoid overgeneralizaons with our conclusions, and atempted to soen language in the discussion that may have been construed as combave towards the Arce et al (2018) paper. We hope this reviewer finds these adjustments to be in line with their expectaons.

      1) In two parts of the manuscript, the authors made broad predicons and conclusions beyond what the evidence in the paper can support and wrote "Future studies will be necessary to confirm this." (Lines 508-509) and " Future studies that survey a greater variety of compounds will be necessary to confirm this." (563-564). Instead of making conclusions based on what experimental data in future studies may support, I would ask the authors instead to make conclusions that their current study can support based on experimental evidence in this paper.

      We have removed these predicons that extend beyond the scope of the paper.

      2) Line 315 "GRNs encode differences in sugar soluon composion". The logic of the tle is wrong.

      This has been fixed.

      3) Since this study is only performed in one bumblebee species, then I would suggest that the tle be more specific e.g., "Mouthparts of the bumblebee Bombus terrestris exhibit poor acuity for the detecon of pescides in nectar".

      We have made this change.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for recognizing the importance of our work and for their insightful suggestions. A point-by-point response to their comments is listed underneath each reviewer’s section.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      1) Have the authors optimized the expression level of dCas9? I cannot find this information in this paper or in their 2021 paper. It is important to avoid the toxicity phenomenon that occurs when using guide RNAs that share specific five base seed sequences (referred to as 'bad seeds').

      Cui L., Vigouroux A., Rousset F., Varet H., Khanna V., Bikard D. A CRISPRi screen in E. coli reveals sequence-specific toxicity of dCas9. Nat. Commun. 2018; 9:1912.

      Rostain W., Grebert T., Vyhovskyi D., Thiel Pizarro P., Tshinsele-Van Bellingen G., Cui1 L., Bikard D. Cas9 off-target binding to the promoter of bacterial genes leads to silencing and toxicity. Nucleic Acids Research, 2023, gkad170.

      2) One guide per gene is highly unusual given that different guides block the RNA polymerase with different efficiency. This was even shown by the Machner lab in the Legionella context in Figure 1c of Ellis et al. 2021 for sidM and vipD. Typically, genes need three guides minimum to ensure that the gene of interest is knocked down fully unless it is not possible as the gene is too small and/or it is difficult to find an NGG sequence. The authors have used one guide per effector, how can they be sure that each gene is knocked down? The Machner lab themselves in Figure 3c of Ellis et al. 2021 shows not all genes targeted using multiplex CRISPRi are equally efficiently knocked down. Please justify why only one guide per gene was chosen and add controls to validate the results. The authors themselves state that the strategy of one guide may be problematic. Lines 315-316 it reads... A possible explanation was the incomplete knockdown of a seemingly important process.

      3) Given what the Machner lab observed about spacer location in Ellis et al. 2021 would it not make more sense to take one set of redundant effectors and make multiplex randomized CRISPRi with them in different locations? For Figure 1 at least.

      4) Following infection, it seems that the bacteria were not plated onto antibiotic media, so it is not known how well the plasmid harboring guides is kept through infection.

      Specific comments

      A) The first results paragraph describes the set-up of 10-plex synthesized CRISPR arrays, where 10 effector encoding genes of specific gene families are selected. The rationale of the choice of these genes is not given. Please explain.

      B) Please also add some biological data on what these genes code for, and what is their known or predicted function. It is not very informative and exciting to have tables of lpg numbers without any knowledge of what these genes code for and why they were selected, at least some.

      C) Figure 1 A Why are only some of the MC arrays shown? Please, at least include in supplementary the others. Again one array in detail would be more informative, showing true knockdown of all genes by qPCR and ideally by western blot.

      D) I am not convinced that the gene silencing efficiency qPCR comparison is done in the correct way. In my opinion, each of the genes to be knocked down should be tested against the expression of a control gene e.g. rpoS and then these results should be compared and not the results of empty plasmid or CRISPR array containing plasmid directly. L. pneumophila are very sensitive to growth conditions and inoculum, thus the two strains might not be completely at the same growth stage when being compared which can impact the results.

      E) Figure 1 B As stated in general comment number 4, the authors do not appear to plate onto antibiotic so we don't know how well the plasmid harboring the guides is kept through infection. The sustained presence of the guide is particularly important for CRISPRi.

      F) The authors found only a few growth phenotypes and mainly this was due to single genes and not combinations of genes. This might again be due to the fact that only one guide per gene was used. How do the authors know that all genes targeted were indeed knocked down?

      G) Line 119 Alternatively, the genes were not 100% all knocked down, escaping the knockdown effect expected. Could authors take three genes with three guides each and look at impact instead of only one?

      H) The authors then develop the randomized multiplexed arrays and chose to test 44 TME encoding genes. Line 141 Justify why these effectors were chosen in the text.

      I) Unfortunately, the method is not clearly described, and many parts are complicated and the text needs to be re-read several times to be understood (lines 150 - 166). Please re-write to better explain to the reader. In line 156 the authors point to a supplementary note 1. This information should be in the main text.

      J) What is the copy number of the CRISPR plasmid? Please add in the Material and Method section also the origin of this plasmid.

      Figure 2

      K) In the paper (line 154-160) and the extra notes, it states that authors attempt to size select CRISPR arrays. However, this is not apparent in Figure 2 schematic. Or are the authors stating that plasmids only containing one guide were selected out? However, line 312 would suggest not. Please clarify

      L) A limiting factor in making multiplex guide CRISPR, as the authors are trying to establish in this study, is cloning of multiple guides. In the pre-determined CRISPR arrays in this study, the guides were synthesized. For the randomized multiplex CRISPR in this study, the authors adapt a Golden Gate cloning method to generate multiple sgRNAs in the Cas9 vector. A similar protocol was established in the below paper. Please add this reference.

      Zuckermann, M.; Hlevnjak, M.; Yazdanparast, H.; Zapatka, M.; Jones, D.T.W.; Lichter, P.; Gronych, J. A novel cloning strategy for one-step assembly of multiplex CRISPR vectors. Sci. Rep. 2018

      M) As the authors note, Zuckermann et al. similarly note that plex of 3 or 4 is most common and above 5 is rare. This thus appears to still be the limiting step of multiplex CRISPR technology. Please discuss

      Figure 4

      N) The idea of multiplexed CRISPRi seq to address the biological phenomenon of redundancy is an interesting one, however, I am missing the in-depth functional characterization and discussion of at least one of the redundant functions discovered. Please add.

      Figure5/6

      O) As noted above, the strength of the experiments is undermined by how CRISPRi is set up. Having an average multiplex of 2 or three and again only using one guide per gene weakens the study and the results obtained. Furthermore, as stated in general comment number 4, the authors do not appear to plate onto antibiotic so again, we don't know how well the plasmid harboring the guides is kept through infection. The sustained presence of the guide is particularly important for CRISPRi. Please add a validation that the guides are all present.

      Response to Reviewer #1

      We are grateful to the reviewers for their insightful comments and suggestions on how to further improve the manuscript.

      Regarding the issue of ‘bad seed sequences’ (comment #1), we had previously evaluated the expression level of dcas9 (plotted in Figure 1b of the 2021 Communications Biol paper) and tuned our induction conditions accordingly (40 ng/mL as described in the Methods). Since all strains used in this study express dcas9 from the chromosome, not a plasmid, this eliminates the possibility of fluctuations in expression levels due to variabilities in plasmid copy numbers.

      In the rare event that toxicity by any given guide occurs, we would expect that guide to already be underrepresented or missing in the input pool following 24+ hours of CRISPRi induction during axenic growth. Our data, now discussed in the manuscript (Lines 211-216 and Figure S2), revealed that this was not the case and that all guide-encoding spacers were present in roughly equal amounts (median of >5000 occurrences). As with any knockdown study, the creation of true chromosome deletions was performed throughout as to alleviate the issue of false positives.

      Regarding comments #2, #3, and specific comments made under point F, G, and O, on the topic of using single vs. multiple guides, we agree that there are circumstances under which using more than one guide per target may be advantageous, for example when attempting to delete a gene from mammalian cells using conventional CRISPR engineering. In the study described here, this is not the case. In fact, we did create a second array library with alternative guides targeting the same group of genes at locations other than the “optimal location” identified in our 2021 paper and found that these “sub-optimal” guides were inefficient for identifying critical effectors as described in Supplemental Note S1 under the heading “Sub-optimal annealing sites” (Lines 919+). These data suggest that adding sub-optimal guides to the arrays of optimal guides might ‘poison’ the arrays and limit rather than enhance their ability to identify gene combinations.

      Regarding comment #2, #3, and specific comments made under point C, F, and G, on the topic of confirming efficient gene knockdown for the identification of critical genes, we remind Reviewer 1 that we did confirm knockdown of 60 of the target genes of the 10-plex screen to be at least 2-fold, with an average fold repression of one order of magnitude or more (Figure 1A). While knockdown of every gene in every 10-plex construct would be an unprecedented ask of any published CRISPR screen, we believe that these 60 genes provide a large enough sampling of all guides to elucidate the range of knockdown to be expected by our CRISPRi platform. As with other knockdown technologies, such as RNAi, there is no expectation of accomplishing complete knockdown for any given target. Hence, the data in Figure 1A suggest that the lack of identifying critical genes using pre-determined 10-plex arrays was not due to a lack of knockdown efficiency, but rather the difficulty to accurately predict redundancy within a cohort of uncharacterized genes, accentuating the need for array randomization with MuRCiS.

      On the topic of antibiotic use for plasmid selection (comments #4, E and O), we would like to clarify that the CRISPR plasmids were selected by thymidine prototrophy, not antibiotic resistance, and we apologize for not making this clearer. The laboratory strain Lp02 is a thymidine auxotroph (thyA-) L. pneumophila variant, and plasmid retention is routinely achieved by including the thymidine biosynthesis gene (thyA) on the plasmid backbone. Only with a plasmid bearing the thyA gene can L. pneumophila grow on CYE (thymidine-) plates. Our use of vectors bearing thyA and plating on CYE plates is described in the Methods section. Further, in Figure 7 of our 2021 paper, we show that CRISPR plasmids are efficiently retained by Lp02 for the duration of a 48-hour infection, resulting in efficient multi-gene knockdown even at the end of the intracellular growth experiment.

      Regarding comments A and B, on publishing the biological data used to classify genes in gene families for 10-plex silencing, we do not consider it critical to provide additional information beyond the broad classification (e.g. kinases, phosphatases, etc) described in Table S1. Structural predictions constantly change due to continuously evolving databases. Our initial analyses were made in 2015 using HHPRED Hidden-Markov models and, in all likelihood, those predictions have been refined since then. Moreover, with the recent advent of Alphafold, anyone interested in learning more about select effectors from our list is advised to simply access the most recent functional predictions directly on the Alphafold webpage (https://alphafold.ebi.ac.uk/). We clarify how predictions were made on Lines 97-101.

      Regarding specific comment D, on our method for qPCR normalization and comparison, we point Reviewer 1 to the Methods section (Lines 460+) where we describe that data obtained from each CRISPRi strain were in fact normalized to the levels of rpsL prior to comparing them to the normalized data from the strain with the empty control plasmid. This normalization to rpsL, a gene encoding a ribosomal protein, also corrects for growth differences between samples.

      Regarding specific comment H, the justification for studying 44 transmembrane effector-encoding genes was driven by the fact that activities mediated by transmembrane proteins are difficult (though not impossible) to be replaced by cytosolic proteins, for example the transport of metabolites across the LCV membrane. And since transmembrane regions can be predicted with high confidence, we decided to probe this group of TMEs for synthetic lethality with the randomized CRISPRi approach as proof-of-concept. To make this clearer, we have added more detail to the text (Lines 151-155).

      Regarding specific comment I, we have further simplified the description of the cloning technique to increase clarity (Lines 156+). The information listed under Supplemental Note S1, though informative, is not critical for the overall understanding of this highly technical section, and since the reviewer already considered this section to be difficult to follow, we would prefer to not further complicate the text by including these non-essential details.

      Regarding the origin of the CRISPRi plasmid (specific comment J), we point Reviewer 1 to the reference (Hammer BK and Swanson MS (Mol Microbiol 1999)) listed in Table S10: Strains and Plasmids Used in this Study.

      Regarding specific comment K and O, on the clarity of depicting the CRISPR array size selection process, we have updated the Figure 2 schematic. Reviewer 1 is correct in that despite our best efforts to exclude short CRISPR arrays, inevitably some 1-plex arrays remained in our input vector pool. Still, the average length of all arrays used in our pilot study exceeded three crRNA-encoding spacers. Further, having a population of 1- or 2-plex arrays in our libraries did allow us to pin-point the most critical effectors of a larger arrays within the same MuRCiS experiment (Table S5 and Table S7), a strength of MuRCiS as described in the discussion (Lines 378+).

      Regarding specific comment L, we appreciate Reviewer 1’s suggestion of an additional reference and we have added it to the manuscript as reference #23 (Line 71). While this reference does use a Golden Gate strategy to build a multiplex array, that array was not randomized but had a predefined order. Hence, our assembly method is unique due to its randomization.

      Regarding specific comment M, on array length cloning limitations, we agree with the conclusion of Zuckermann in Figure 1d of their article that longer inserts are generally harder to get into vector backbones. The challenge of cloning longer inserts is a common phenomenon of general biology and is not unique to cloning CRISPR arrays. We have altered the wording in our manuscript to better describe the intrinsic competition between short and long inserts during cloning (Lines 162-164).

      Regarding specific comment N, we second Reviewer 1’s desire to learn more about the critical effector pairs discovered here. With that said, the goal of this manuscript is to report the development of a novel MuRCiS pipeline to identify these critical pairs. Biochemical and molecular investigations of the encoded protein pairs are on-going and will be the topic of a future manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Specific points

      1) The effector repertoire of L. pneumophila seems to have evolved in response to the plethora of potential protozoan hosts (PMID: 31988381). To further assess evolutionary aspects of the vast L. pneumophila effector arsenal, it would be interesting to test the single and double effector mutant strains (Fig. 5FG, Fig. 6EF) for growth in protozoa other than A. castellanii.

      2) Most CRISPR arrays comprising genes encoding functionally similar proteins or encoding evolutionarily conserved proteins did not substantially affect intracellular growth of L. pneumophila (Fig. 1B). This rather surprising result should be further discussed.

      3) l. 118/119: "Similar results ..., where none of the MC arrays ..." This statement should be phrased more precisely, since some CRISPR arrays did indeed have an effect on intracellular growth of L. pneumophila in U937 macrophages, while none affected intracellular growth in A. castellanii (Fig. 1B).

      4) Typos:

      • l. 852: ... (arbitrarily set to -100).

      • l. 862: ... Legionella-containing vacuole (LCV).

      • l. 895: ... and so we would recommend ...

      Regarding point 1, we thank Reviewer 2 for the suggestion of testing effector mutants in different hosts. While the primary purpose of the current manuscript was to optimize the MuRCiS platform, future studies using this technology to investigate specific biological questions related to Legionella infection would certainly benefit from including more than one amoebaean species.

      Regarding point 2, we agree that the lack of substantial growth defects seems surprising. Yet only two of the seven core effectors (found in all Legionella sp.), lpg2300 and mavN, individually attenuated Legionella intracellular growth when deleted (Burstein 2016 Nat Genetics; Isaac et al., 2015 PNAS). Thus, we hypothesize that the functions many effectors fulfil are of such importance for intracellular survival that that redundancy reaches beyond the boundary of conservation or like-function. We have added a statement emphasizing this at the end of the Figure 1 results section (Line 122-125).

      Regarding points 3 and 4, we thank Reviewer 2 for catching these errors and have corrected where needed in the text.

      -l. 852 (now Line 874): … (arbitrarily set to -100,000) is correct for Figure 6E.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Comments from reviewer 1:

      Comment 1. Regarding SBSMMA, the authors may complement their discussion by mentioning recent work (PMID: 35738428) where SBSMMA was used to exemplify a potential fragment-based design approach for developing allosteric effectors for kinases.

      Thank you for the suggestion, we have added a short summary of the work where SBSMMA is used as a basis for developing small molecules to target kinases using fragment-based design approach

    2. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their generous comments on the manuscript and have made edits to address their concerns. The manuscript has been restructured and the reference (PMID: 35738428) has been added to the review. We addressed the reviewer's comment below.

      Reviewer #1 (Recommendations For The Authors):

      Regarding SBSMMA, the authors may complement their discussion by mentioning recent work (PMID: 35738428) where SBSMMA was used to exemplify a potential fragment-based design approach for developing allosteric effectors for kinases.

      Thank you for the suggestion, we have added a short summary of the work where SBSMMA is used as a basis for developing small molecules to target kinases using fragment-based design approach.

    1. Authorr Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The objective of this investigation was to determine whether experimental pain could induce alterations in cortical inhibitory/facilitatory activity observed in TMS-evoked potentials (TEPs). Previous TMS investigations of pain perception had focused on motor evoked potentials (MEPs), which reflect a combination of cortical, spinal, and peripheral activity, as well as restricting the focus to M1. The main strength of this investigation is the combined use of TMS and EEG in the context of experimental pain. More specifically, Experiment 1 investigated whether acute pain altered cortical excitability, reflected in the modulation of TEPs. The main outcome of this study is that relative to non-painful warm stimuli, painful thermal stimuli led to an increase on the amplitude of the TEP N45, with a larger increase associated with higher pain ratings. Because it has been argued that a significant portion of TEPs could reflect auditory potentials elicited by the sound (click) of the TMS, Experiment 2 constituted a control study that aimed to disentangle the cortical response related to TMS and auditory activity. Finally, Experiment 3 aimed to disentangle the cortical response to TMS and reafferent feedback from muscular activity elicited by suprathreshold TMS applied over M1. The fact that the authors accompanied their main experiment with two control experiments strengthens the conclusion that the N45 TEP peak could be implicated in the perception of painful stimuli.

      Perhaps, the addition of a highly salient but non-painful stimulus (i.e. from another modality) would have further ruled out that the effects on the N45 are not predominantly related to intensity/saliency of the stimulus rather than to pain per se.

      We thank the reviewer for their comment on the possibility of whether stimulus intensity influences the N45 as opposed to pain per se. We agree that the ideal experiment would have included multiple levels of stimulation. We would argue, however, that that in Experiment 1, despite the same level of stimulus intensity for all participants (46 degrees), individual differences in pain ratings were associated with the change in the N45 amplitude, suggesting that the results cannot be explained by stimulus intensity, but rather by pain intensity.

      Reviewer #2 (Public Review):

      The authors have used transcranial magnetic stimulation (TMS) and motor evoked potentials (MEPs) and TMS-electroencephalography (EEG) evoked potentials (TEPs) to determine how experimental heat pain could induce alterations in these metrics.
In Experiment 1 (n = 29), multiple sustained thermal stimuli were administered over the forearm, with the first, second, and third block of stimuli consisting of warm but non-painful (pre-pain block), painful heat (pain block) and warm but non-painful (post-pain block) temperatures respectively. Painful stimuli led to an increase in the amplitude of the fronto-central N45, with a larger increase associated with higher pain ratings. Experiments 2 and 3 studied the correlation between the increase in the N45 in pain and the effects of a sham stimulation protocol/higher stimulation intensity. They found that the centro-frontal N45 TEP was decreased in acute pain. The study comes from a very strong group in the pain fields with long experience in psychophysics, experimental pain, neuromodulation, and EEG in pain. They are among the first to report on changes in cortical excitability as measured by TMS-EEG over M1. While their results are in line with reductions seen in motor-evoked responses during pain and effort was made to address possible confounding factors (study 2 and 3), there are some points that need attention. In my view the most important are:

      1) The method used to calculate the rest motor threshold, which is likely to have overestimated its true value : calculating highly abnormal RMT may lead to suprathreshold stimulations in all instances (Experiment 3) and may lead to somatosensory "contamination" due to re-afferent loops in both "supra" and "infra" (aka. less supra) conditions.

      The method used to assess motor threshold was the TMS motor threshold Assessment Tool (MTAT) which estimates motor threshold using maximum likelihood parametric estimation by sequential testing (Awiszus et al., 2003; Awiszus and Borckardt, 2011). This was developed as a quicker alternative for calculating motor threshold compared to the traditional Rossini-Rothwell method which involves determining the lowest intensity that evokes at least 5/10 MEPs of at least 50 microvolts. The method has been shown to achieve the same accuracy of determining motor threshold as the traditional Rossini-Rothwell method, but with fewer pulses (Qi et al., 2011; Silbert et al., 2013).

      We have now made this clearer in the manuscript:

      “The RMT was determined using the TMS motor thresholding assessment tool, which estimates the TMS intensity required to induce an MEP of 50 microvolts with a 50% probability using maximum likelihood parametric estimation by sequential testing (Awiszus, 2003; Awiszus & Borckardt, 2011). This method has been shown to achieve the accuracy of methods such as the Rossini-Rothwell method (Rossini et al., 1994; Rothwell et al., 1999) but with fewer pulses (Qi, Wu, & Schweighofer, 2011; Silbert, Patterson, Pevcic, Windnagel, & Thickbroom, 2013). The test stimulus intensity was set at 110% RMT to concurrently measure MEPs and TEPs during pre-pain, pain and post-pain blocks.”

      Therefore, the high RMTs in our study cannot be explained by the threshold assessment method. Instead, they are likely explained by aspects of the experimental setup that increased the distance between the TMS coil and the scalp, including the layer of foam placed over the coil, the EEG cap and the fact that the electrodes we used had a relatively thick profile. This has been explained in the paper:

      “We note that the relatively high RMTs are likely due to aspects of the experimental setup that increased the distance between the TMS coil and the scalp, including the layer of foam placed over the coil, the EEG cap and relatively thick electrodes (6mm)”

      Awiszus, F. (2003). TMS and threshold hunting. In Supplements to Clinical neurophysiology (Vol. 56, pp. 13-23). Elsevier.

      Qi, F., Wu, A. D., & Schweighofer, N. (2011). Fast estimation of transcranial magnetic stimulation motor threshold. Brain stimulation, 4(1), 50-57.

      Silbert, B. I., Patterson, H. I., Pevcic, D. D., Windnagel, K. A., & Thickbroom, G. W. (2013). A comparison of relative-frequency and threshold-hunting methods to determine stimulus intensity in transcranial magnetic stimulation. Clinical Neurophysiology, 124(4), 708-712.

      2) The low number of pulses used for TEPs (close to ⅓ of the usual and recommended)

      We agree that increasing the number of pulses can increase the signal to noise ratio. During piloting, participants were unable to tolerate the painful stimulus for long periods of time and we were required to minimize the number of pulses per condition.

      We note that there is no set advised number of trials in TMS-EEG research. According to the recommendations paper, the number of trials should be based on the outcome measure e.g., TEP peaks vs. frequency domain measures vs. other measures and based on previous studies investigating test-retest reliability (Hernandez-Pavon et al., 2023). The choice of 66 pulses per condition was based on the study by Kerwin et al., (2018) showing that optimal concordance between TEP peaks can be found with 60-100 TMS pulses delivered in the same run (as in the present study). The concordance was particularly higher for the N40 peak at prefrontal electrodes, which was the key peak and electrode cluster in our study. We have made this clearer:

      “Current recommendations (Hernandez-Pavon et al., 2023) suggest basing the number of TMS trials per condition on the key outcome measure (e.g., TEP peaks vs. frequency measures) and based on previous test-retest reliability studies. In our study the number of trials was based on a test-retest reliability study by (Kerwin, Keller, Wu, Narayan, & Etkin, 2018) which showed that 60 TMS pulses (delivered in the same run) was sufficient to obtain reliable TEP peaks (i.e., sufficient within-individual concordance between the resultant TEP peaks of each trial).”

      Further supporting the reliability of the TEP data in our experiment, we note that the scalp topographies of the TEPs for active TMS at various timepoints (Figures 5, 7 and 9) were similar across all three experiments, especially at 45 ms post-TMS (frontal negative activity, parietal-occipital positive activity).

      In addition to this, the interclass correlation coefficient (Two-way fixed, single measure) for the N45 to active suprathreshold TMS across timepoints for each experiment was 0.90 for Experiment 1 (across pre-pain, pain, post-pain time points), 0.74 for Experiment 2 (across pre-pain and pain conditions), and 0.95 for Experiment 3 (across pre-pain conditions). This suggests that even with the fluctuations in the N45 induced by pain, the N45 for each participant was stable across time, further supporting the reliability of our data. These ICCs are now reported in the supplementary material (subheading: Test-retest reliability of N45 Peaks).

      Hernandez-Pavon, J. C., Veniero, D., Bergmann, T. O., Belardinelli, P., Bortoletto, M., Casarotto, S., ... & Ilmoniemi, R. J. (2023). TMS combined with EEG: Recommendations and open issues for data collection and analysis. Brain Stimulatio, 16(3), 567-593

      Kerwin, L. J., Keller, C. J., Wu, W., Narayan, M., & Etkin, A. (2018). Test-retest reliability of transcranial magnetic stimulation EEG evoked potentials. Brain stimulation, 11(3), 536-544.

      Lack of measures to mask auditory noise.

      In TMS-EEG research, various masking methods have been proposed to suppress the somatosensory and auditory artefacts resulting from TMS pulses, such as white noise played through headphones to mask the click sound (Ilmoniemi and Kičić, 2010), and a thin layer of foam placed between the TMS coil and EEG cap to minimize the scalp sensation (Massimini et al., 2005). However, recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by studies that show commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination. To separate the direct cortical response to TMS from sensory evoked activity, Experiment 2 included a sham TMS condition that mimicked the auditory/somatosensory aspects of active TMS to determine whether any alterations in the TEP peaks in response to pain were due to changes in sensory evoked activity associated with TMS, as opposed to changes in cortical excitability. Therefore, the lack of auditory masking does not impact the main conclusions of the paper.

      We have made this clearer:

      “… masking methods have been used to suppress these sensory inputs, (Ilmoniemi and Kičić, 2010; Massimini et al., 2005). However recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many leading authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination.”

      Ilmoniemi, R. J., & Kičić, D. (2010). Methodology for combined TMS and EEG. Brain topography, 22, 233-248.

      Massimini, M., Ferrarelli, F., Huber, R., Esser, S. K., Singh, H., & Tononi, G. (2005). Breakdown of cortical effective connectivity during sleep. Science, 309(5744), 2228-2232.

      Biabani, M., Fornito, A., Mutanen, T. P., Morrow, J., & Rogasch, N. C. (2019). Characterizing and minimizing the contribution of sensory inputs to TMS-evoked potentials. Brain stimulation, 12(6), 1537-1552.

      Conde, V., Tomasevic, L., Akopian, I., Stanek, K., Saturnino, G. B., Thielscher, A., ... & Siebner, H. R. (2019). The non-transcranial TMS-evoked potential is an inherent source of ambiguity in TMS-EEG studies. Neuroimage, 185, 300-312.

      Rocchi, L., Di Santo, A., Brown, K., Ibáñez, J., Casula, E., Rawji, V., ... & Rothwell, J. (2021). Disentangling EEG responses to TMS due to cortical and peripheral activations. Brain stimulation, 14(1), 4-18.

      3) A supra-stimulus heat stimulus not based on individual HPT, that oscillates during the experiment and that lead to large variations in pain intensity across participants is unfortunate.

      The choice of whether to calibrate or fix stimulus intensity is a contentious question in experimental pain research. A recent discussion by Adamczyk et al., (2022) explores the pros and cons of each approach and recommends situations where one method may be preferred over the other. That paper suggests that the choice of the methodology is related to the research question – when the main outcome of the research is objective (neurophysiological measures) and researchers are interested in the variability in pain ratings, the fixed approach is preferrable. Given we explored the relationship between MEP/N45 modulation by pain and pain intensity, this question is better explored by using the same stimulus intensity for all participants, as opposed to calibrating the intensity to achieve a similar level of pain across participants.

      We have made this clearer:

      “Given we were interested in the individual relationship between pain and excitability changes, the fixed temperature of 46ºC ensured larger variability in pain ratings as opposed to calibrating the temperature of the thermode for each participant (Adamczyk et al., 2022).”.

      Adamczyk, W. M., Szikszay, T. M., Nahman-Averbuch, H., Skalski, J., Nastaj, J., Gouverneur, P., & Luedtke, K. (2022). To calibrate or not to calibrate? A methodological dilemma in experimental pain research. The Journal of Pain, 23(11), 1823-1832.

      So is the lack of report on measures taken to correct for a fortuitous significance (multiple comparison correction) in such a huge number of serial paired tests.

      Note that we used a Bayesian approach for all analyses as opposed to the traditional frequentist approach. In contrast to the frequentist approach, the Bayesian approach does not require corrections for multiple comparisons (Gelman et al., 2000) given that they provide a ratio representing the strength of evidence for the null vs. alternative hypotheses as opposed to accepting or rejecting the null hypothesis based on p-values. As such, throughout the paper, we frame our interpretations and conclusions based on the strength of evidence (e.g. anecdotal/weak, moderate, strong, very strong) as opposed to referring to the significance of the effects.

      Gelman A, Tuerlinckx F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational statistics, 15(3):373-90.

      Reviewer #3 (Public Review):

      The present study aims to investigate whether pain influences cortical excitability. To this end, heat pain stimuli are applied to healthy human participants. Simultaneously, TMS pulses are applied to M1 and TMS-evoked potentials (TEPs) and pain ratings are assessed after each TMS pulse. TEPs are used as measures of cortical excitability. The results show that TEP amplitudes at 45 msec (N45) after TMS pulses are higher during painful stimulation than during non-painful warm stimulation. Control experiments indicate that auditory, somatosensory, or proprioceptive effects cannot explain this effect. Considering that the N45 might reflect GABAergic activity, the results suggest that pain changes GABAergic activity. The authors conclude that TEP indices of GABAergic transmission might be useful as biomarkers of pain sensitivity.

      Pain-induced cortical excitability changes is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are mostly convincing, and the interpretation is adequate. The following clarifications and revisions might help to improve the manuscript further.

      1) Non-painful control condition. In this condition, stimuli are applied at warmth detection threshold. At this intensity, by definition, some stimuli are not perceived as different from the baseline. Thus, this condition might not be perfectly suited to control for the effects of painful vs. non-painful stimulation. This potential confound should be critically discussed.

      In Experiment 3, we also collected warmth ratings to confirm whether the pre-pain stimuli were perceived as different from baseline. This detail has been added to them methods:

      “In addition to the pain rating in between TMS pulses, we collected a second rating for warmth of the thermal stimulus (0 = neutral, 10 = very warm) to confirm that the participants felt some difference in sensation relative to baseline during the pre-pain block. This data is presented in the supplementary material”.

      We did not include these data in the initial submission but have now included it in the supplemental material. These data showed warmth ratings were close to 2/10 on average. This confirms that the non-painful control condition produced some level of non-painful sensation.

      2) MEP differences between conditions. The results do not show differences in MEP amplitudes between conditions (BF 1.015). The analysis nevertheless relates MEP differences between conditions to pain ratings. It would be more appropriate to state that in this study, pain did not affect MEP and to remove the correlation analysis and its interpretation from the manuscript.

      The interindividual relationship between changes in MEP amplitude and individual pain rating is statistically independent from the overall group level effect of pain on MEP amplitude. Therefore, conclusions for the individual and group level effects can be made independently.

      It is also important to note that in the pain literature, there is now increasing emphasis placed on investigating the individual level relationship between changes in cortical excitability and pain as opposed to the group level effect (Seminowicz et al., 2019; Summers et al., 2019). As such, it is important to make these results readily available for the scientific community.

      We have made this clearer:

      ‘As there is now increasing emphasis placed on investigating the individual level relationship between changes in cortical excitability and pain and not only the group level effect, (Chowdhury et al., 2022; Seminowicz et al., 2018; Seminowicz, Thapa, & Schabrun, 2019; Summers et al., 2019) we also investigated the correlations between pain ratings and changes in MEP (and TEP) amplitude”

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      Summers, S. J., Chipchase, L. S., Hirata, R., Graven-Nielsen, T., Cavaleri, R., & Schabrun, S. M. (2019). Motor adaptation varies between individuals in the transition to sustained pain. Pain, 160(9), 2115-2125.

      Seminowicz, D. A., Thapa, T., & Schabrun, S. M. (2019). Corticomotor depression is associated with higher pain severity in the transition to sustained pain: a longitudinal exploratory study of individual differences. The Journal of Pain, 20(12), 1498-1506.

      3) Confounds by pain ratings. The ISI between TMS pulses is 4 sec and includes verbal pain ratings. Considering this relatively short ISI, would it be possible that verbal pain ratings confound the TEP? Moreover, could the pain ratings confound TEP differences between conditions, e.g., by providing earlier ratings when the stimulus is painful? This should be carefully considered, and the authors might perform control analyses.

      It is unlikely that the verbal ratings contaminated the TEP response as the subsequent TMS pulse was not delivered until the verbal rating was complete and given that each participant was cued by the experimenter to provide the pain rating after each pulse (rather than the participant giving the rating at any time). As such, it would not be possible for participants to provide earlier ratings to more painful stimuli.

      We have made this clearer:

      "To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse.”

      4) Confounds by time effects. Non-painful and painful conditions were performed in a fixed order. Potential confounds by time effects should be carefully considered.

      Previous research suggests that pain alters neural excitability even after pain has subsided. In a recent meta-analysis (Chowdhury et al., 2022) we found effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved. As such, we avoided intermixing pain and warm blocks given subsequent warm blocks would not serve as a valid baseline, as each subsequent warm block would have residual effects from the previous pain blocks.

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      At the same time, given there was no conclusive evidence for a difference in N45 amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), it is unlikely that the effect of pain was an artefact of time i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45, regardless of whether the stimuli are painful or not. We will make this point in our next revision.

      We have discussed this issue:

      “Lastly, future research should consider replicating our experiment using intermixed pain and no pain blocks, as opposed to fixed pre-pain and pain blocks, to control for order effects i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45 peak, regardless of whether the stimuli are painful or not. However, we note that there was no conclusive evidence for a difference in N45 peak amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), suggesting it is unlikely that the observed effects were an artefact of time.”

      5) Data availability. The authors should state how they make the data openly available.

      We have uploaded the MEP, TEP and pain data on the Open science framework https://osf.io/k3psu/

      Reviewer #1 (Recommendations For The Authors):

      I think the study is quite solid and I only have very minor recommendations for the authors:

      • Introduction, p. 3: "Functional magnetic resonance imaging has helped us understand where in the brain pain is processed". This is an overstatement. fMRI provides us with potential biomarkers (e.g. "the pain signature"), but the specificity of these responses for pain is debated and we still do not know where in the brain pain is processed.

      We have amended to:

      “functional magnetic resonance imaging has assisted in the localization of brain structures implicated in pain processing”

      • Introduction, p. 5: "neural baseline" should be "neutral baseline"?

      We thank the reviewer for identifying this – this has now been amended.

      Reviewer #2 (Recommendations For The Authors):

      INTRODUCTION

      The introduction mentions how important extra-motor areas can be explored by TMS-EEG, then the effects of DLPFC rTMS on TEPs ... but you do not explore the DLPFC... Perhaps the introduction should be reframed.

      The current work explores cortical excitability throughout the brain (as shown in our cluster-based permutation and source localization analyses), so our investigations are in line with the introductions statement about the importance of studying non-motor areas.

      The reference to DLPFC rTMS was to highlight current existing research that has applied TMS-EEG to understand pain. It was not used as a methodological rationale to investigate the DLPFC in the present study. To make the research gap clearer, we state:

      “While these studies assist us in understanding whether TEPs might mediate rTMS-induced pain reductions, no study has investigated whether TEPs are altered in direct response to pain”

      Lignes 63-65 the term "TMS" is used to refer to motor corticospinal excitability measures, in contrast to TMS-EEG measures of TEPs. Then the authors come back to TMS-EEG and then again back to MEPs. This is rather confusing: TMS means TMS... the concept of MEP/ motor corticospinal excitability measures is not intuitive when using the term "TMS". I suggest using motor corticospinal excitability measures when referring to MEP/MEP-based measures of cortical excitability...) and M1TMS-EEG-evoked potentials (usually abbreviated to TEPs) to refer to TMS-EEG responses as measured here.

      Throughout the manuscript, we now use the term TEPs when referring to TMS-EEG measures, and MEPs when referring to TMS-EMG measure. The use of TEPs vs. MEPs will make it easier for readers to follow which measures we are referring to.

      Line 83: "As such, the precise origin of the pain mechanism cannot be localized." Please rephrase, the sentence conveys the idea that it is indeed possible to localize the origin of a pain mechanism with a different approach, and we know this is not currently possible, irrespective of the methodological setup.

      We have replaced this with:

      “This makes it unclear as to whether pain processes occur at the cortical, spinal or peripheral level.”

      How can one predetermine the temperature that will be perceived as painful by someone else, and not base it on individual HPT? This is against principles of psychophysics. Please comment. Attesting all participants had HPT below 46 is important, but then being stimulated at 46C when our HPT is 45C is different from when our HPT is 39C. Please explain why the pain intensity was not standardised based on individual HPT.

      Please refer to our response to the public review related to the issue

      Line 38: "if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline". I do not understand why it is not possible to have a pain-free baseline, followed by a pain/warm sequence.

      In our study, we had the choice of either intermixing blocks or to use a fixed sequence. Previous research suggests that pain alters neural excitability even after pain has subsided. In a recent meta-analysis (Chowdhury et al., 2022) we found effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved. As such, we avoided intermixing pain and warm blocks given subsequent warm blocks would not serve as a valid baseline, as each subsequent warm block would have residual effects from the previous pain blocks.

      We have updated the manuscript to be clearer about why we used a fixed sequence:

      “The pre-pain/pain/post-pain design has been commonly used in the TMS-MEP pain literature, as many studies have demonstrated strong changes in corticomotor excitability that persist beyond the painful period. Indeed, in a systematic review, we showed effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved (Chowdhury et al., 2022). As such, if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline”

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      Please explain, and provide evidence that stimulation of people with predetermined temperatures is able to create warm/pain/warm sensations, without entraining pain in the last warm stimulation.

      A previous study by Dube et al. (2011) used sequences of warm (36°C), painful and neutral (32° C) and found that participants did not experience pain at any time when the temperature was at a warm temperature of 36°C. We have now cited this study:

      “Based on a previous study (Dubé & Mercier, 2011) which also used sequences of painful (50ºC) and warm (36°C) thermal stimuli, we did not anticipate that the stimulus in the pain block would entrain pain in the post-pain block”

      Dubé, J. A., & Mercier, C. (2011). Effect of pain and pain expectation on primary motor cortex excitability. Clinical neurophysiology, 122(11), 2318-2323.

      METHODS

      It is not clear if participants with chronic pain, present in 20% of the general population, were excluded. If they were, please provide "how" in methods.

      We excluded participants with a history or presence of acute/chronic pain. This has now been clarified:

      “Participants were excluded if they had a history of chronic pain condition or any current acute pain”

      Line 489: the definition of warm detection threshold is unusual, please provide a reference.

      We used an identical method to Furman et al., (2020). We have made the reference to this clearer: “Warmth, cold and pain thresholds were assessed in line with a previous study (Furman et al., 2020)”

      Furman, A. J., Prokhorenko, M., Keaser, M. L., Zhang, J., Chen, S., Mazaheri, A., & Seminowicz, D. A. (2020). Sensorimotor peak alpha frequency is a reliable biomarker of prolonged pain sensitivity. Cerebral Cortex, 30(12), 6069-6082.

      In Experiment 2, please explain how the lack of randomisation between "pre-pain" and "pain" may have influenced results.

      Given we tried to replicate Experiment 1’s methodology as close as possible (to isolate the source of the effect from Experiment 1) we chose to repeat the same sequence of blocks as Experiment 1: pre-pain followed by pain.

      Given there was no conclusive evidence for a difference in N45 amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), it is unlikely that the effect of pain was an order effect i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45, regardless of whether the stimuli are painful or not.

      We now discuss the issue of randomization:

      “Lastly, future research should consider replicating our experiment using intermixed pain and no pain blocks, as opposed to fixed pre-pain and pain blocks, to control for order effects i.e. the explanation that successive thermal stimuli applied to the skin results an increase in the N45 peak, regardless of whether the stimuli are painful or not. However, we note that there was no conclusive evidence for a difference in N45 peak amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), suggesting it is unlikely that the observed effects were an artefact of time”

      Also, in Methods in general, disclose how pain intensity was assessed, and how.

      Pain intensity was assessed using a verbal rating scale (0 = no pain, and 10 = most pain imaginable). We have provided more detail:

      “During each 40 second thermal stimulus, TMS pulses were manually delivered, with a verbal pain rating score (0 = no pain, and 10 = worst pain imaginable) obtained between pulses. To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse”

      Please explain how auditory masking was made during data collection.

      Auditory masking noise was not played through the headphones, given that Experiment 2 controlled for auditory evoked potentials. We have made this clearer:

      “Auditory masking was not used. Instead, auditory evoked potentials resulting from the TMS click sound were controlled for in Experiment 2”

      Please explain if online TEP monitoring was used during data collection

      Online TEP monitoring was not available with our EEG software. We have made this clearer in the manuscript:

      “Online TEP monitoring was not available with the EEG software”

      Line 499: what is subthreshold TMS here? You are measuring TEPs, and not MEPs initially, so you may have a threshold for MEPs and TEPs, which are not the same.

      The intensity was calibrated relative to the MEP response (rather than TEP response) - this has now been clarified:

      “… and the inclusion of a subthreshold TMS (90% of resting motor threshold) condition intermixed within both the pre-pain and pain blocks.”

      Please provide a reference and a figure to illustrate the electric stimulation used in the sham procedure in Study 2

      The apparatus for the electrical stimulation is shown in Figure 7A, and was based on previous papers using electrical stimulation over motor cortex to simulate the somatosensory aspect of real TMS (Chowdhury et al., 2022; Gordon et al., 2022; Rocchi et al., 2021). We have made this clearer:

      “Electrical stimulation was based on previous studies attempting to simulate the somatosensory component of active TMS (Chowdhury et al., 2022; Gordon et al., 2022; Rocchi et al., 2021)”

      Gordon, P. C., Jovellar, D. B., Song, Y., Zrenner, C., Belardinelli, P., Siebner, H. R., & Ziemann, U. (2021). Recording brain responses to TMS of primary motor cortex by EEG–utility of an optimized sham procedure. Neuroimage, 245, 118708.

      Chowdhury, N. S., Rogasch, N. C., Chiang, A. K., Millard, S. K., Skippen, P., Chang, W. J., ... & Schabrun, S. M. (2022). The influence of sensory potentials on transcranial magnetic stimulation–Electroencephalography recordings. Clinical Neurophysiology, 140, 98-109.

      Rocchi, L., Di Santo, A., Brown, K., Ibánez, J., Casula, E., Rawji, V., ... & Rothwell, J. (2021). Disentangling EEG responses to TMS due to cortical and peripheral activations. Brain stimulation, 14(1), 4-18.

      It is not so common to use active electrodes for TMS-EEG. Please confirm the electrodes used and if they are c-ring TMS compatible and provide reference if otherwise (or actual papers recommending active ones)

      To be more specific about the electrode type we have indicated:

      “Signals were recorded from 63 TMS-compatible active electrodes (6mm height, 13mm width), embedded in an elastic cap (ActiCap, Brain Products, Germany), in line with the international 10-10 system”

      A paper directly comparing TEPs between active and passive electrodes found no difference between the two and concluded TEPs can be reliably obtained using active electrodes (Mancuso et al., 2021). There is also evidence that active electrodes have better signal quality than passive electrodes at higher impedance levels (Laszlo et al., 2014).

      This information has now been added to the paper:

      “Active electrodes result in similar TEPs (both magnitude and peaks) to more commonly used passive electrodes (Mancuso et al., 2021). There is also evidence that active electrodes have higher signal quality than passive electrodes at higher impedance levels (Laszlo, Ruiz-Blondet, Khalifian, Chu, & Jin, 2014).”

      There is a growing literature showing that monophonic pulses are not reliable for TEPs when compared to biphasic ones, please provide references. https://doi.org/10.1016/j.brs.2023.02.009

      The reference provided by the reviewer states that biphasic and monophasic pulses both have advantages and disadvantages, rather than stating “monophonic pulses are not reliable for TEPs”. While there is some evidence that the artefacts resulting from monophasic pulses are larger than biphasic pulses, the EEG signal still returns to baseline levels within 5ms of the TMS pulse (Rogasch et al., 2013). Moreover, one paper (Casula et al. 2018) found that the resultant TEPs evoked by monophasic pulses are larger than those resulting from biphasic pulses. The authors postulated that monophasic pulses are more effective at activating widespread cortical areas than biphasic pulses. Ultimately the reference provided by the reviewer concludes that “effect of pulse shape on TEPs has not been systematically investigated and more studies are needed”.

      Rogasch, N. C., Thomson, R. H., Daskalakis, Z. J., & Fitzgerald, P. B. (2013). Short-latency artifacts associated with concurrent TMS–EEG. Brain stimulation, 6(6), 868-876.

      Casula, E. P., Rocchi, L., Hannah, R., & Rothwell, J. C. (2018). Effects of pulse width, waveform and current direction in the cortex: A combined cTMS-EEG study. Brain stimulation, 11(5), 1063-1070.

      In most heads, a pulse in the PA direction is not obtained by a coil oriented 45o to the midline. The later induced later-medial pulses, good to obtain MEPs

      We followed previous studies measuring MEPs from the ECRB elbow muscle (Schabrun et al., 2016; de Martino et al., 2019) whereby the TMS coil handle was angled at 45 degrees relative to the midline in order to induce a posterior-anterior current. We are not aware of literature that shows that the 45 degrees orientation does not induce a posterior anterior current in most heads.

      Schabrun, S. M., Christensen, S. W., Mrachacz-Kersting, N., & Graven-Nielsen, T. (2016). Motor cortex reorganization and impaired function in the transition to sustained muscle pain. Cerebral Cortex, 26(5), 1878-1890.

      De Martino, E., Seminowicz, D. A., Schabrun, S. M., Petrini, L., & Graven-Nielsen, T. (2019). High frequency repetitive transcranial magnetic stimulation to the left dorsolateral prefrontal cortex modulates sensorimotor cortex function in the transition to sustained muscle pain. Neuroimage, 186, 93-102.

      The definition of RMT is (very) unusual. RMT provides small 50microV MEPs in 50% of times. If you obtain MEPs at 50microV you are supra threshold!

      The TMS motor threshold assessment tool calculates threshold in the same manner as other threshold tools – it calculates the intensity that elicits an MEP of 50 microvolts, 50% of the time. We have made this clearer:

      “The RMT was determined using the TMS motor thresholding assessment tool, which estimates the TMS intensity required to induce an MEP of 50 microvolts with a 50% probability using maximum likelihood parametric estimation by sequential testing (Awiszus and Borckardt, 2011). This method has been shown to achieve the accuracy of methods such as the Rossini-Rothwell method (Rossini et al., 1994; Rothwell et al., 1999) but with fewer pulses (Qi et al., 2011; Silbert et al., 2013).”

      Please inform the inter TMS pulse interval used of TEPs and whether they were randomly generated.

      The pulses were delivered manually – the interval was not randomly generated – as stated:

      “As TMS was delivered manually, there was no set interpulse interval. However, the 40 second stimulus duration allowed for 11 pulses for each heat stimulus …. (~ 4 seconds in between …)”

      Why have you stimulated suprathreshold on M1 when assessing TEP´s? The whole idea is that large TEPs can be obtained at lower intensities below real RMT and that prevents re-entering loops of somatosensory and joint movement inputs that insert "noise" to the TEPs.

      The suprathreshold intensity was used to concurrently measure MEPs during pre-pain, pain and post-pain blocks.

      We have made this clearer:

      “The test stimulus intensity was set at 110% RMT to concurrently measure MEPs and TEPs during pre-pain, pain and post-pain blocks.”

      The influence of re-afferent muscle activity was controlled for in Experiment 3.

      Did you assess pain intensity after each of the TEP pulses? Please discuss how such a cognitive task may have influenced results

      Pain intensity was assessed after each TMS pulse, as stated:

      “TMS pulses were manually delivered, with a verbal pain rating score (0 = no pain, and 10 = most pain imaginable) obtained between pulses”

      Reviewer 3 also brought up a concern of whether the verbal rating task might have influenced the TEPs. However, it is unlikely that the task contaminated the TEP response as the subsequent TMS pulse was not delivered until the verbal rating was complete and given that each participant was cued by the experimenter to provide the pain rating after each pulse (rather than the participant giving the rating at any time). We have made this clearer where we state:

      “To avoid contamination of TEPs by verbal ratings, the subsequent TMS pulse was not delivered until the verbal rating was complete, and the participant was cued by the experimenter to provide the pain rating after each pulse”

      The QST approach is unusual. Please confirm the sequence of CDT, WDT and HPT were not randomised and that no interval beyond 6sec were used. Proper references are welcome.

      In line with a previous study (Furman et al., 2020), the sequence of the CPT, WDT and HPT were not randomized, and the interval was not more than 6 seconds.

      We have made this clearer:

      “A total of three trials was conducted for each test to obtain an average, with an interstimulus interval of six seconds. The sequence of cold, warmth and pain threshold was the same for all participants (Furman et al. 2020)”

      Performing 60 pulses for TEPs is unusual, and against the minimum number in recommendations

      Please explain and comment.https://doi.org/10.1016/j.brs.2023.02.009

      Please refer to our previous response to this concern in the public reviews.

      Line 578: when you refer to "heat" the reader may confound warm/heat with heat meaning suprathreshold. Please revise the wording.

      We have now replaced the word heat stimulus with thermal stimulus.

      Why were Bayesian statistics used instead as frequentist ones?

      We have made this clearer:

      “Given we were interested in determining the evidence for pain altering TEP peaks in certain conditions (e.g., active TMS) and pain not altering TEP peaks in other conditions (sham TMS), we used a Bayesian approach as opposed to a frequentist approach, which considers the strength of the evidence for the alternative vs. null hypothesis”

      RESULTS

      There is a huge response with high power after 100ms- Please discuss if you believe auditory potentials may have influenced it.

      It is indeed possible that auditory potentials were present at 100ms. We now state:

      “Indeed, the signal at ~100ms post-TMS from Experiment 1 may reflect an auditory N100 response”

      The presence of auditory contamination does not impact the main conclusions of the paper given this was controlled for in Experiment 2.

      Please discuss how pain ranging from 3-10 may have influenced results in the "PAIN" situation,

      It is anticipated that the fixed thermal stimulus intensity approach would lead to large variations in pain ratings (Adamczyk et al., 2022). This is a recommended approach when the aim of the research is to determine relationships between neurophysiological measures and individual differences in pain sensitivity (Adamczyk et al., 2022). Indeed, we were interested in whether alterations in neurophysiological measures were associated with pain intensity, and we found that higher pain ratings were associated with smaller reductions in MEP amplitude and larger increases in N45 amplitude.

      Adamczyk, W. M., Szikszay, T. M., Nahman-Averbuch, H., Skalski, J., Nastaj, J., Gouverneur, P., & Luedtke, K. (2022). To calibrate or not to calibrate? A methodological dilemma in experimental pain research. The Journal of Pain, 23(11), 1823-1832.

      Please indicate if any participants offered pain after warm stimulation ( possible given secondary hyperalgesia after so many plateaux of heat stimulation).

      As stated in the results “All participants reported 0/10 pain during the pre-pain and post-pain blocks”.

      Please discuss the potential effects of having around 10% of "bad channels) In average per experiment per participants, its impacts in source localisation and in TEP measurement. Same for >5 epochs excluded by participant.

      The number of bad channels has been incorrectly stated by the reviewer as being 10% on average per experiment per participant, whereas the correct number of reported bad channels was 3%, 4.7% and 9.8% for Experiment 1, 2 and 3 respectively (see supplementary material). These numbers are below the accepted number of bad channels to interpolate (10%) in EEG pipelines (e.g., Debnath et al., 2020; Kayhan et al., 2022), so it is unlikely that our channel exclusions significantly influenced the quality of our source localization an TEP data.

      Debnath, R., Buzzell, G. A., Morales, S., Bowers, M. E., Leach, S. C., & Fox, N. A. (2020). The Maryland analysis of developmental EEG (MADE) pipeline. Psychophysiology, 57(6), e13580.

      Kayhan, E., Matthes, D., Haresign, I. M., Bánki, A., Michel, C., Langeloh, M., ... & Hoehl, S. (2022). DEEP: A dual EEG pipeline for developmental hyperscanning studies. Developmental cognitive neuroscience, 54, 101104.

      The number of excluded epochs is unlikely to have influenced the results given there was evidence for no difference in the number of rejected epochs between conditions (E1 BF10 = 0.145, E2 BF10 = 0.27, E3 BF10 = 0.169 – these BFs have now been reported in the supplementary material), and given the reliability of the N45 was high (see response to previous comment on the number of trials per condition).

      HPT of 42.9 {plus minus} 2.5{degree sign}C means many participants had HPT close to 46oC. Please discuss

      While some participants did indeed have pain thresholds close to 46 degrees, they nonetheless reported pain during the test blocks. While such participants may have reported less pain compared to others, we aimed for larger variations in pain ratings, given one of the research questions was to determine why pain intensity differs between individuals (given the same noxious stimulus). Indeed, we showed that this variation was meaningful (pain intensity was related to alterations in N45 and MEP amplitude).

      Please explain the sentence : line 139 "As such, if we had used an alternative design with blocks of warm stimuli intermixed with blocks of painful stimuli, the warm stimuli blocks would not serve as a valid non-painful baseline." I cannot see why.

      Please refer to our previous point on why the fixed sequence was included.

      And on the top of that heat was not individualised according to HPT.

      Please refer to our previous point on why we used a fixed stimulus approach.

      Sequences of warm/heat were not randomised. Please refer to our previous point on the why the sequence of blocks was not randomized.

      Line 197: "However, as this is the first study investigating the effects of experimental pain on TEPsamplitude, there were no a priori regions or timepoints of interest to compare betweenconditions". This is not clear. It means you have not measured the activity (size of the N45) under the electrode closest to the TMS coil? The TEP is supposed to by higher under the stimulated target/respective corresponding electrode…

      We are not aware of any current recommendations that state that the region of interest should be based on the site of stimulation. The advantage of TMS-EEG is that it allows characterisation of cortical excitability changes throughout the brain, not just the site of stimulation. We based our region of interest on a cluster-based permutation analysis, as recommended by Frömer, Maier, & Abdel Rahman, (2018)

      Frömer, R., Maier, M., & Abdel Rahman, R. (2018). Group-level EEG-processing pipeline for flexible single trial-based analyses including linear mixed models. Frontiers in neuroscience, 12, 48.

      Please explain where N45 values came from.

      The N45 was calculated using the TESA peak function (Rogasch et al., 2017) which identifies a data point which is larger/smaller than +/- 5 data points within a specified time window (e,g, 40-70ms post-TMS as in the present study). Where multiple peaks are found, the amplitude of the largest peak is returned. Where no peak is found, the amplitude at the specified latency is returned.

      Rogasch, N. C., Sullivan, C., Thomson, R. H., Rose, N. S., Bailey, N. W., Fitzgerald, P. B., ... & Hernandez-Pavon, J. C. (2017). Analysing concurrent transcranial magnetic stimulation and electroencephalographic data: A review and introduction to the open-source TESA software. Neuroimage, 147, 934-951.

      If only the cluster assessment was made please provide the comparison between P45 from the target TMS channel location in pre pain vs pain.

      We assume the reviewer is referring to the N45 rather than P45, and that by “target” TMS channel they are referring to the stimulated region.

      We first clarify that there is no “target” channel given the motor hotspot differs between individuals and so the channel that is closest to the site of stimulation will always differ.

      Secondly, as stated above, we are not aware of any current recommendations in TMS-EEG research that states that the region of interest for TEP analysis should be based on the site of stimulation. The advantage of TMS-EEG is that it allows characterisation of cortical excitability throughout the brain, not just the site of stimulation. If we based our ROI on the target channel only, we would lose valuable information about excitability changes occurring in other brain regions.

      Lastly, the N45 was localized at frontocentral electrodes, which is also where the cluster differences emerged. As such, we do not believe it would be informative to compare N45 peak amplitude at the region of stimulation.

      Also explain how correction for multiple comparisons was made

      Please refer to our response to the public review related to this issue.

      And report data from pain vs post-pain.

      The pain vs. post-pain comparisons are now reported in the Supplementary material.

      There is a strong possibility the response at N85 is an auditory /muscle signal. Please provide the location of this response.

      We have opted not to include the topography at 85ms in the main paper as it would introduce too much clutter into the figures (which are already very dense), and because the topography was very similar to the topography at 100ms. As an example, for the reviewer, in Author response image 1 we have shown the topography for the pre-pain condition of Experiment 1.

      Author response image 1.

      Experiment 2: I have a strong impression both active TEPs and sham TEPs were contaminated by auditory (and muscle) noise. Please explain.

      While it possible that auditory noise may have influenced TEPs in the active and sham groups, it does not impact the main conclusions of the paper, given that the purpose of the sham condition was to control for auditory and somatosensory stimulation resulting from TMS.

      While muscle activity may also affect have influenced the TEPs in active and sham conditions, we used fastICA in all conditions to suppress muscle activity. The fastICA algorithm (Rogasch et al., 2017) runs an independent component analysis on the data, and classifies components as neural, TMS-evoked muscle, eye movements and electrode noise, based on a set of heuristic thresholding rules (e.g., amplitude, frequency and topography of the components). Components classified as TMS-evoked muscle/other muscle artefacts are then removed. In the supplementary material, we further report that the number of components removed did not differ between conditions, suggesting the impact of muscle artefacts are not larger in some conditions vs. others.

      Rogasch, N. C., Sullivan, C., Thomson, R. H., Rose, N. S., Bailey, N. W., Fitzgerald, P. B., ... & Hernandez-Pavon, J. C. (2017). Analysing concurrent transcranial magnetic stimulation and electroencephalographic data: A review and introduction to the open-source TESA software. Neuroimage, 147, 934-951.

      Experiment 3: One interpretation can be that both supra and sub-threshold TMS were leading to somatosensory re-afferent responses, based on the way RMT was calculated, which hyper estimate the RMT and delivers in reality 2 types of supra-threshold stimulations. Please discuss

      Please refer to our response to the public review related to this issue.

      Please provide correlation between N45 size and MEPs amplitudes.

      This has now been included:

      “There was no conclusive evidence of any relationship between alterations in MEP amplitude during pain, and alterations in N100, N45 and P60 amplitude during pain (see supplementary material).”<br /> The supporting statistics for these analyses have been included in the supplementary material.

      DISCUSSION

      Line 303: " The present study determined whether acute experimental pain induces alterations in cortical inhibitory and/or facilitatory activity observed in TMS-evoked potentials".

      Well, no. The study assessed the N45, and was based on it. It did not really explore other metrics in a systematic fashion. P60 and N100 changes were not replicated in experiments 2 and 3..

      We assume the reviewer is stating that we did not assess other TEP peaks (such as the N15, P30 and P180). However, we did indeed assess these peaks in a systematic fashion. First, we identified the ROI by using a cluster-based analysis. This is a recommended approach when the ROI is unclear (Frömer, Maier, & Abdel Rahman, 2018). We then analysed the TEP representing the mean voltage across the electrodes within the cluster, and then identified any differences in all peaks between conditions (not just the N45). This has been made clearer in the manuscript.

      This has now been included:

      “For all experiments, the mean TEP waveform of any identified clusters from Experiment 1 were plotted, and peaks (e.g., N15, P30, N45, P60, N100) were identified using the TESA peak function (Rogasch et al., 2017)”

      Frömer, R., Maier, M., & Abdel Rahman, R. (2018). Group-level EEG-processing pipeline for flexible single trial-based analyses including linear mixed models. Frontiers in neuroscience, 12, 48.

      And the N45 is not related to facilitatory or inhibitory activity, it is a measure of an evoked response indicating excitability

      Evidence suggests the N45 is mediated by GABAAergic neurotransmission (inhibitory activity), as drugs which increase GABAA receptor activity increase the amplitude of the N45 (Premoli et al., 2014) and drugs which decrease GABAA receptor activity decrease the amplitude of the N45 (Darmani et al., 2016). As such, we and various other empirical papers (e.g., Bellardinelli et al., 2021; Noda et al., 2021; Opie at 2019 ) and review papers (Farzan & Bortoletto, 2022; Tremblay et al., 2019) have interpreted changes in the N45 peak as reflecting changes in cortical inhibitory/GABAA mediated activity.

      Premoli, I., Castellanos, N., Rivolta, D., Belardinelli, P., Bajo, R., Zipser, C., ... & Ziemann, U. (2014). TMS-EEG signatures of GABAergic neurotransmission in the human cortex. Journal of Neuroscience, 34(16), 5603-5612.

      Belardinelli, P., König, F., Liang, C., Premoli, I., Desideri, D., Müller-Dahlhaus, F., ... & Ziemann, U. (2021). TMS-EEG signatures of glutamatergic neurotransmission in human cortex. Scientific reports, 11(1), 8159.

      Darmani, G., Zipser, C. M., Böhmer, G. M., Deschet, K., Müller-Dahlhaus, F., Belardinelli, P., ... & Ziemann, U. (2016). Effects of the selective α5-GABAAR antagonist S44819 on excitability in the human brain: a TMS–EMG and TMS–EEG phase I study. Journal of Neuroscience, 36(49), 12312-12320.

      Noda, Y., Barr, M. S., Zomorrodi, R., Cash, R. F., Lioumis, P., Chen, R., ... & Blumberger, D. M. (2021). Single-pulse transcranial magnetic stimulation-evoked potential amplitudes and latencies in the motor and dorsolateral prefrontal cortex among young, older healthy participants, and schizophrenia patients. Journal of Personalized Medicine, 11(1), 54.

      Farzan, F., & Bortoletto, M. (2022). Identification and verification of a'true'TMS evoked potential in TMS-EEG. Journal of neuroscience methods, 378, 109651.

      Opie, G. M., Foo, N., Killington, M., Ridding, M. C., & Semmler, J. G. (2019). Transcranial magnetic stimulation-electroencephalography measures of cortical neuroplasticity are altered after mild traumatic brain injury. Journal of Neurotrauma, 36(19), 2774-2784.

      Tremblay, S., Rogasch, N. C., Premoli, I., Blumberger, D. M., Casarotto, S., Chen, R., ... & Daskalakis, Z. J. (2019). Clinical utility and prospective of TMS–EEG. Clinical Neurophysiology, 130(5), 802-844.

      Line 321: why have you not measured SEPs in experiment 3?

      It is not possible to directly measure the somatosensory evoked potentials resulting from a TMS pulse, given that the TMS pulse produces a range of signals including cortical activity, muscle/eye blink responses, auditory responses, somatosensory responses and other artefacts. While some researchers attempt to isolate the SEP from TMS using pre-processing methods such as ICA, others use control conditions such as sensory sham conditions (to control for the “tapping” artefact) or subthreshold intensity conditions (to control for reafferent muscle activity), as we have done in Experiment 2 and 3 of our study.

      We have now stated this in the manuscript:

      “As it is extremely challenging to isolate and filter these auditory and somatosensory evoked potentials using pre-processing pipelines, masking methods have been used to suppress these sensory inputs, (Ilmoniemi and Kičić, 2010; Massimini et al., 2005). However recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many leading authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination”

      Line 365: SICI is dependent on GABAa activity. But the way the text is written if conveys the idea that TMS pulses "activate" GABA receptors, which is weird...Please rephrase.

      This has now been reworded.

      “SICI refers to the reduction in MEP amplitude to a TMS pulse that is preceded 1-5ms by a subthreshold pulse, with this reduction believed to be mediated by GABAA neurotransmission (Chowdhury et al., 2022)”

      Reviewer #3 (Recommendations For The Authors):

      -Key references Ye et al., 2022 and Che et al., 2019 need to be included in the reference list.

      These references have now been included in the reference list.

      -Heat pain stimuli and TMS stimuli are applied simultaneously. Sometimes the term "stimulus" is used without specifying whether it refers to TMS pulses or heat pain stimuli. Clarifying this whenever the word "stimulus" is used would enhance clarity for the reader.

      We have now clarified the use of the word “stimulus” throughout the paper.

      -Panels A-D in Figure 6 should be correctly labeled in the text and the figure legend.

      Figure 6 Panel labels have now been amended.

    1. Author Response

      We thank the reviewers and the editorial team for their assessment and valuable feedback on our manuscript. Their supporting comments reinforce the significance of our findings.

      Regarding the specific point raised about the partial effects observed in the TGN46 KO cell line, we acknowledge the importance of addressing this issue in more detail in the revised version of our manuscript. The partial effects observed when using the TGN46 KO cell line are likely caused by several factors:

      1) It is important to consider the phenomenon of cell adaptation/compensation, which is documented to occur in gene knockout cell lines. Cells often respond to genetic perturbations by adapting to compensate the loss of a specific gene. These compensatory effects could potentially mitigate the full impact of TGN46 depletion and might explain the partial effects observed.

      2) Our data indicate that the absence of TGN46 reduces PAUF secretion, but does not completely block its export. These results align with our proposed role TGN46 in cargo sorting. In its absence, the secretory proteins likely exit the TGN via alternative routes/mechanisms, such as "bulk flow" or by entering other transport carriers in an uncontrolled manner. The partial redistribution of the TGN46-∆lum mutant into VSVG carriers (Figure 4D) supports this likelihood. Importantly, similar situations are observed when unrelated sorting factors are depleted from the Golgi membranes. For example, when the cofilin/SPCA1/Cab45 sorting pathway is genetically disrupted, the secretion of this pathway's clients is inhibited but not completely halted (e.g., von Blume et al. Dev. Cell 2011; J. Cell Biol. 2012).

      3) As suggested by the reviewers, it remains possible that TGN46 is not the sole player for cargo sorting. The existence of redundant or alternative mechanisms cannot be ruled out.

      In our revised manuscript, we will provide a more in-depth discussion of these factors and their potential contributions to the observed partial effects in TGN46 KO cells. We believe that a comprehensive exploration of these possibilities will improve our understanding of the role(s) of TGN46 in cargo sorting and TGN export.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work provides a new dataset of 71,688 images of different ape species across a variety of environmental and behavioral conditions, along with pose annotations per image. The authors demonstrate the value of their dataset by training pose estimation networks (HRNet-W48) on both their own dataset and other primate datasets (OpenMonkeyPose for monkeys, COCO for humans), ultimately showing that the model trained on their dataset had the best performance (performance measured by PCK and AUC). In addition to their ablation studies where they train pose estimation models with either specific species removed or a certain percentage of the images removed, they provide solid evidence that their large, specialized dataset is uniquely positioned to aid in the task of pose estimation for ape species.

      The diversity and size of the dataset make it particularly useful, as it covers a wide range of ape species and poses, making it particularly suitable for training off-the-shelf pose estimation networks or for contributing to the training of a large foundational pose estimation model. In conjunction with new tools focused on extracting behavioral dynamics from pose, this dataset can be especially useful in understanding the basis of ape behaviors using pose.

      We thank the reviewer for the kind comments.

      Since the dataset provided is the first large, public dataset of its kind exclusively for ape species, more details should be provided on how the data were annotated, as well as summaries of the dataset statistics. In addition, the authors should provide the full list of hyperparameters for each model that was used for evaluation (e.g., mmpose config files, textual descriptions of augmentation/optimization parameters).

      We have added more details on the annotation process and have included the list of instructions sent to the annotators. We have also included mmpose configs with the code provided. The following files include the relevant details:

      File including the list of instructions sent to the annotators: OpenMonkeyWild Photograph Rubric.pdf

      Mmpose configs:

      i) TopDownOAPDataset.py

      ii) animal_oap_dataset.py

      iii) init.py

      iv) hrnet_w48_oap_256x192_full.py

      Anaconda environment files:

      i) OpenApePose.yml

      ii) requirements.txt

      Overall this work is a terrific contribution to the field and is likely to have a significant impact on both computer vision and animal behavior.

      Strengths:

      • Open source dataset with excellent annotations on the format, as well as example code provided for working with it.

      • Properties of the dataset are mostly well described.

      • Comparison to pose estimation models trained on humans vs monkeys, finding that models trained on human data generalized better to apes than the ones trained on monkeys, in accordance with phylogenetic similarity. This provides evidence for an important consideration in the field: how well can we expect pose estimation models to generalize to new species when using data from closely or distantly related ones? - Sample efficiency experiments reflect an important property of pose estimation systems, which indicates how much data would be necessary to generate similar datasets in other species, as well as how much data may be required for fine-tuning these types of models (also characterized via ablation experiments where some species are left out).

      • The sample efficiency experiments also reveal important insights about scaling properties of different model architectures, finding that HRNet saturates in performance improvements as a function of dataset size sooner than other architectures like CPMs (even though HRNets still perform better overall).

      We thank the reviewer for the kind comments.

      Weaknesses:

      • More details on training hyperparameters used (preferably full config if trained via mmpose).

      We have now included mmpose configs and anaconda environment files that allow researchers to use the dataset with specific versions of mmpose and other packages we trained our models with. The list of files is provided above.

      • Should include dataset datasheet, as described in Gebru et al 2021 (arXiv:1803.09010).

      We have included a datasheet for our dataset in the appendix lines 621-764.

      • Should include crowdsourced annotation datasheet, as described in Diaz et al 2022 (arXiv:2206.08931). Alternatively, the specific instructions that were provided to Hive/annotators would be highly relevant to convey what annotation protocols were employed here.

      We have included the list of instructions sent to the Hive annotators in the supplementary materials. File: OpenMonkeyWild Photograph Rubric.pdf

      • Should include model cards, as described in Mitchell et al (arXiv:1810.03993).

      We have included a model card for the included model in the results section line 359. See Author response image 1.

      Author response image 1.

      • It would be useful to include more information on the source of the data as they are collected from many different sites and from many different individuals, some of which may introduce structural biases such as lighting conditions due to geography and time of year.

      We agree that the source could introduce structural biases. This is why we included images from so many different sources and captured images at different times from the same source—in hopes that a large variety of background and lighting conditions are represented. However, doing so limits our ability to document each source background and lighting condition separately.

      • Is there a reason not to use OKS? This incorporates several factors such as landmark visibility, scale, and landmark type-specific annotation variability as in Ronchi & Perona 2017 (arXiv:1707.05388). The latter (variability) could use the human pose values (for landmarks types that are shared), the least variable keypoint class in humans (eyes) as a conservative estimate of accuracy, or leverage a unique aspect of this work (crowdsourced annotations) which affords the ability to estimate these values empirically.

      The focus of this work is on overall keypoint localization accuracy and hence we wanted a metric that is easy to interpret and implement, in this case we made use of PCK (Percentage of Correct Keypoints). PCK is a simple and widely used metric that measures the percentage of correctly localized keypoints within a certain distance threshold from their corresponding groundtruth keypoints.

      • A reporting of the scales present in the dataset would be useful (e.g., histogram of unnormalized bounding boxes) and would align well with existing pose dataset papers such as MS-COCO (arXiv:1405.0312) which reports the distribution of instance sizes and instance density per image.

      RESPONSE: We have now included a histogram of unnormalized bounding boxes in the manuscript, Author response image 2.

      Author response image 2.

      Reviewer #2 (Public Review):

      The authors present the OpenApePose database constituting a collection of over 70000 ape images which will be important for many applications within primatology and the behavioural sciences. The authors have also rigorously tested the utility of this database in comparison to available Pose image databases for monkeys and humans to clearly demonstrate its solid potential.

      We thank the reviewer for the kind comments.

      However, the variation in the database with regards to individuals, background, source/setting is not clearly articulated and would be beneficial information for those wishing to make use of this resource in the future. At present, there is also a lack of clarity as to how this image database can be extrapolated to aid video data analyses which would be highly beneficial as well.

      I have two major concerns with regard to the manuscript as it currently stands which I think if addressed would aid the clarity and utility of this database for readers.

      1) Human annotators are mentioned as doing the 16 landmarks manually for all images but there is no assessment of inter-observer reliability or the such. I think something to this end is currently missing, along with how many annotators there were. This will be essential for others to know who may want to use this database in the future.

      We thank the reviewer for pointing this out. Inter-observer reliability is important for ensuring the quality of the annotations. We first used Amazon MTurk to crowd source annotations and found that the inter-observer reliability and the annotation quality was poor. This was the reason for choosing a commercial service such as Hive AI. As the crowd sourcing and quality control are managed by Hive through their internal procedures, we do not have access to data that can allow us to assess inter-observer reliability. However, the annotation quality was assessed by first author ND through manual inspections of the annotations visualized on all of the images the database. Additionally, our ablation experiments with high out of sample performances further vaildate the quality of the annotations.

      Relevant to this comment, in your description of the database, a table or such could be included, providing the number of images from each source/setting per species and/or number of individuals. Something to give a brief overview of the variation beyond species. (subspecies would also be of benefit for example).

      Our goal was to obtain as many images as possible from the most commonly studied ape species. In order to ensure a large enough database, we focused only on the species and combined images from as many sources as possible to reach our goal of ~10,000 images per species. With the wide range of people involved in obtaining the images, we could not ensure that all the photographers had the necessary expertise to differentiate individuals and subspecies of the subjects they were photographing. We could only ensure that the right species was being photographed. Hence, we cannot include more detailed information.

      2) You mention around line 195 that you used a specific function for splitting up the dataset into training, validation, and test but there is no information given as to whether this was simply random or if an attempt to balance across species, individuals, background/source was made. I would actually think that a balanced approach would be more appropriate/useful here so whether or not this was done, and the reasoning behind that must be justified.

      This is especially relevant given that in one test you report balancing across species (for the sample size subsampling procedure).

      We created the training set to reflect the species composition of the whole dataset, but used test sets balanced by species. This was done to give a sense of the performance of a model that could be trained with the entire dataset, that does not have the species fully balanced. We believe that researchers interested in training models using this dataset for behavior tracking applications would use the entire dataset to fully leverage the variation in the dataset. However, for those interested in training models with balanced species, we provide an annotation file with all the images included, which would allow researchers to create their own training and test sets that meet their specific needs. We have added this justification in the manuscript to guide the other users with different needs. Lines 530-534: “We did not balance our training set for the species as we wanted to utilize the full variation in the dataset and assess models trained with the proportion of species as reflected in the dataset. We provide annotations including the entire dataset to allow others to make create their own training/validation/test sets that suit their needs.”

      And another perhaps major concern that I think should also be addressed somewhere is the fact that this is an image database tested on images while the abstract and manuscript mention the importance of pose estimation for video datasets, yet the current manuscript does not provide any clear test of video datasets nor engage with the practicalities associated with using this image-based database for applications to video datasets. Somewhere this needs to be added to clarify its practical utility.

      We thank the reviewer for this important suggestion. Since we can separate a video into its constituent frames, one can indeed use the provided model or other models trained using this dataset for inference on the frames, thus allowing video tracking applications. We now include a short video clip of a chimpanzee with inferences from the provided model visualized in the supplementary materials.

      Reviewer #1 (Recommendations For The Authors):

      • Please provide a more thorough description of the annotation procedure (i.e., the instructions given to crowd workers)! See public review for reference on dataset annotation reporting cards.

      We have included the list of instructions for Hive annotators in the supplementary materials.

      • An estimate of the crowd worker accuracy and variability would be super valuable!

      While we agree that this is useful, we do not have access to Hive internal data on crowd worker IDs that could allow us to estimate these metrics. Furthermore, we assessed each image manually to ensure good annotation quality.

      • In the methods section it is reported that images were discarded because they were either too blurry, small, or highly occluded. Further quantification could be provided. How many images were discarded per species?

      It’s not really clear to us why this is interesting or important. We used a large number of photographers and annotators, some of whom gave a high ratio of great images; some of whom gave a poor ratio. But it’s not clear what those ratios tell us.

      • Placing the numerical values at the end of the bars would make the graphs more readable in Figures 4 and 5.

      We thank the reviewer for this suggestion. While we agree that this can help, we do not have space to include the number in a font size that would be readable. Smaller font sizes that are likely to fit may not be readable for all readers. We have included the numerical values in the main text in the results section for those interested and hope that the figures provide a qualitative sense of the results to the readers.

    1. Author Response

      eLife assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence.

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Provisional point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge.

      We respect the thoughtfulness of the reviewers and editors and look forward to improving the paper to fully answer both public and private comments with a revised manuscript.

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      1. Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We will provide a more detailed description of the methods and results to clarify the temporal relationships between neural activation, astrocyte calcium dynamics, and astrocyte morphology segmentation.

      2. Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We will expand upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      3. Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We will provide additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      4. Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We will enhance our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes.

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge.

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge.

      Further, we used a lower stimulus frequency (2Hz) than Stobart et al. (90 Hz) to assess subthreshold activities. We found that stronger stimuli decreased response delays and will include this result in the revised manuscript. Interestingly, from Fig 4F, higher stimulus did not significantly alter the spatial threshold. In the revised version of the manuscript, we will provide a more detailed analysis and the consequent discussion of this analysis.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we aim to address this by novel analysis that will be provided in the revised version of the manuscript.

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we will include text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicates an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension.

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items will be discussed and clarified in the revised version of the manuscript.

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we aim to further address this issue in the revised version of the manuscript by analyzing the calcium dynamics in individual domains.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Here we take a conservative approach to constrain ROIs to SR101-positive astrocyte territory outlines without invading neighboring cells in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results.

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses.

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data will be interesting. We will provide the results of the suggested analysis in the revised version of the manuscript.

      1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses.

      2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome). The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal. Indeed, we have found arborization activity precedes soma activity. However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies.

      3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and an analysis of spatial clustering on pre-soma domain activation may be useful to answer it.

      4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      This is another interesting analysis that can be done with a spatial clustering analysis.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant or AQuA. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents.

      Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell, and we chose to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We plan to include a paragraph in the discussion to address this limitation in our study.

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we will acknowledge this is in the discussion.

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer that we should add to the paper a discussion for our justification on the use of the Heaviside step function, and plan to include this. We chose the Heaviside step function to represent the on/off situation that we observed in the data. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a similar graph should be included in Fig. 5 as well. We agree that a different statistical model describing the data would be more convincing and also confirmed the spatial threshold with the use of a confidence interval in the text.

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We will increase the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

    1. Author Response

      We are grateful to the reviewers for recognizing the importance of our work on transcription-independent early recovery of proteasome activity. We also thank them for their thoughtful criticisms and suggested improvements, which we will address in the revised version as described below.

      The reviewers and editors asked for data to support the model that early recovery of proteasome activity is due to accelerated proteasome assembly. This model is backed by published data that proteasome assembly intermediates increase dramatically in cells treated with proteasome inhibitors (Fig. 6 in Ref. 46 of the revised manuscript). We will expand the discussion of this paper in a paragraph that describes our model. Another key experiment to confirm this model would be to determine what fraction of nascent polypeptides is degraded within minutes after synthesis, which is not trivial, and Ibtisam ran out of time to conduct these experiments because she had to graduate in spring before the expiration of her visa. This type of experiment usually uses metabolic labeling by a heavy or radioactive amino acid that always includes a prior depletion of a non-labeled amino acid. However, the fundamental flaw of this approach, which is not recognized by the scientific community, is that depletion of an amino acid stresses cells and reduces the rate of protein synthesis, especially if this amino acid is methionine. Thus, this model is not easily to test, and should be considered a speculation. We will therefore move the description of this model, together with Fig. 4, into a separate "Ideas and Speculation" section and remove this model's description from the abstract.

      Reviewer 1 raised the possibility that a background band detected on the western blot of DDI2 KO cells could be a highly homologous protease DDI1. This is highly unlikely because, according to Protein Atlas, DDI1 is selectively expressed in the testis and is not expressed in the cell lines we used. Reviewer 1 also suggested that we should base our conclusion on Nrf1 KD, which we de-facto did because we confirmed that DDI2 KD blocks Nrf1 activation (Fig. 1d).

      In response to Reviewer 1 critiques regarding the presentation of proteasome subunits stability data in Fig. 4 (Ref. 45 of the revised manusript), we will remove PSMB8 and replace chaperons with the subunits of the 26S base. We will change color palettes, symbols, and axis scales to improve clarity.

      We will acknowledge in the discussion that our work did not exclude DDI2 role in the recovery of proteasome after repeated pulse treatments, as suggested by Reviewer 1.

      We agree with Reviewer 2 that using proteasome levels is inaccurate when describing our activity measurement data. However, in the manuscript, we use "levels" only when discussing data in the literature. We believe measuring activity and not the total levels is more important because not all proteasomes are active, e.g., latent 20S proteasome core particles.

      Reviewer 3 expressed concern that our conclusions were based on data in HAP1 cells, which are haploid, and appear not very sensitive to proteasome inhibitors. This is why we used DDI2 KD in MDA-MB-231 and SUM149 cells, which are highly sensitive to proteasome inhibitors (Weyburne et al., Ref. 11). In our experience, full extent of proteasome inhibitor cytotoxicity is not revealed until 48hr after treatments, and viability determined at 12hr and 24hr as on Fig. 1c should not be used to determine sesnsitivity (it was used for activity assay normalization). We will add a new supplementary figure showing that HAP1 cells are as sensitive to proteasome inhibitors as MDA-MD-231 cells when cell viability is assayed 48hr after treatment (new Fig. S2). Another panel on this new figure will demonstrate that the baseline proteasome activity is very similar in HAP1, MD-MB-231 and SUM149 cells. We will also add data demonstrating that inactivativion of DDI2 by mutation does not change the recovery of proteasome activity in HCT-116 cells (new Fig. 1g). Recovery in MDA-MB-231, SUM149, and HCT-116 cells was measured at 18hr, which is still within the 12 – 24hr window when other investigators observed partially DDI2-dependent recovery.

      We have conducted an experiment in which we followed activity recovery for up to 72hr. We found that activity plateaued at 24hr and opted against the repeat because there were no changes. We feel that the manuscript should not include one biological replicate data. The fact that the recovery is incomplete and that cells seem to survive with lower levels of proteasome activity is interesting; however, investigating the molecular basis for this phenomenon is beyond the scope of the current project.

      We were not disputing the conclusions of previous studies that DDI2/Nrf1 is responsible for enhanced expression of proteasomal mRNA in cells continuously treated with proteasome inhibitors. In fact, we confirmed that pulse-treatment causes similar increase (Fig. 2b). As for papers that measured activity recovery after pulse treatment, we objectively discuss our results in the context of these papers.

      We will also respond to Reviewers' recommendations and minor points:

      • We will review the revised version carefully to eliminate spelling and grammatical errors and typos.

      • We will no longer refer to DDI2 as a novel protease, as suggested by Reviewer 1.

      • We agree with Reviewer 2 that our CHX results do not necessarily mean that recovery involves translation of proteasomal mRNAs, and we will now conclude that proteasome recovery requires protein synthesis.

      • We will revise Fig. 1c, 3a and 4a to improve clarity.

      • We have stated in the caption that data in Fig. 4a comes from Table S4 in Ref. 45.

      • We will accept an excellent suggestion of Reviewer 3 to change "recovery" to "early recovery" in the title.

      • Regarding Reviewer 3 request to assay activity recovery at additional time points before 12hr, this was done in the cycloheximide experiment in Fig. 3A.

      • Even if we assume that the differences in the observed recovery activity in MDA-MB-231 cells (Fig. 1f) are statistically significant, which may implicate DDI2 involvement in the activity recovery, the percentage is still small, suggesting that most activity recovery is DDI2-independent.

      • We will tone down the statement "the present findings suggest that DDI2 desensitizes cells to PI by a different mechanism," replacing "suggest" with "raise a possibility."

      • We will indicate that only Bortezomib is approved for mantle cell lymphoma.

      • We will change the description of clinical dosing as suggested by Reviewer 3. We will add a reference on PK of subcutaneous bortezomib (Ref. 9), even though the review we quoted (Ref. 7) discussed subcutaneous dosing.

    1. Author Response

      Reviewer #3 (Public Review):

      Youssef et al. have used a range of markers to identify cancer stem cells (CSCs) in patients with oral cancers. CSCs were identified in lab conditions and were often linked to the invasiveness of cancers. The authors found a combination of markers convincingly liked to known biology and found cells expressing them in the invading cancers.

      The major weakness of the paper is in the technical side. There isn't enough description as to how they discriminated between CSCs inside the tumour and those invading its surroundings. Similarly, the way the information is presented it is not clear why artificial intelligence was needed to enhance the accuracy of the method linking CSCs to cancer invasion (and ultimately deadly metastasis to other organs).

      The method for applying tumour mask is displayed in Figure 2E for cohort 1 and Figure 2 figure supplement 3 for cohort 2. Briefly, in the image analysis pipeline, dense areas of EpCAM+ (cohort 1) or Vimentin+ (cohort 2) cells are merged to specify tumour/stroma regions. Thus, CSCs inside tumours (in the EpCAM dense tumour region) can be discriminated from CSCs invading the surroundings (in the Vimentin dense stromal region).

    1. Author Response

      Reviewer #1 (Pulic Review):

      The authors aimed to understand whether the superficial, retinorecipient layers of the mouse superior colliculus (sSC) participate in figure-ground segregation and object recognition. To address this question, they use a combination of optogenetic perturbations of sSC and recordings. These data are consistent with SC being causally involved in object recognition. This would be useful information for the field and likely to be cited.

      Thank you for your positive evaluation.

      However, I have several concerns regarding their conclusions.

      A significant limitation of this study is methodological. The major novelty is the effect of optogenetic silencing, because the recordings are largely correlative, but the optogenetic silencing approach lacks appropriate controls for the effects of the optogenetic excitation light. The authors acknowledge that the optogenetic light is a potential confound, but attempt to address this by shielding the fiber to eliminate light leak and strobing a blue led in the arena. The former does not account for the effects of excitation light scattering intracerebrally--during optogenetic experiments, intracerebral scattering causes the eyes to light up--and for the latter, there is no way to compare the intensity or qualia of the externally strobed LED and the intracerebral light. The proper control would be a cohort of mice lacking channelrhodopsin expression in sSC. Regardless, it is essential to acknowledge this potential confound.

      This is a good point. We have added discussion of this in lines 90-95. The proposed experiment was done in Kirchberger et al. (Sci Adv 2021, Suppl Figure 3). In mice without expression of channelrhodopsin trained on the same task as in our study, blue laser light in the cortex did not affect accuracy. Although the exact location of these fibers is different from ours, the distance from the fiber to the eye is very similar. Furthermore, in answer to this comment, we have done a new set of experiments with 4 wild type mice, in which we recorded neural activity in the sSC while delivering optogenetic light stimulation. The procedure was similar to our previous experimental animals except that they did not receive a virus injection. In these mice, we did not see any response in the superior colliculus to the laser light, but we noticed a 5% reduction in response to the visual stimuli (new Figure 1—figure supplement 3). This small reduction could be a small reduction of contrast of the visual stimulus due to the laser light hitting the retina, but given that we did not see any response to the laser alone, it is more likely to come from the known inhibiting effects of light on neural activity (e.g. through heat, see Owen et al. Nat Neurosci 2019). Because our aim was to silence sSC, this particular effect is not a strong confound for our study.

      Relatedly, as the authors note, there are GABAergic projection neurons in sSC that may be driving these effects via gain of function. This is a significant concern that has limited the widespread adoption of this approach in sSC despite its popularity in studies in cortex. Indeed, one recently published study of behavioral functions of deep SC found that activating inhibitory neurons actually caused paradoxical behavioral effects consistent with gain of function in the targeted hemisphere, due to the effects of long-range inhibitory projections on the other SC hemisphere. Given the presence of inhibitory projections in sSC, it would be preferable to use an orthogonal method for silencing and at least to thoroughly acknowledge these concerns and cite these recent studies.

      This is a valid point. When we started our study, we had some experience with inhibitory opsin (archaerhodopsin and halorhodopsin) and were not confident that we could widely inhibit the sSC reversibly, repeatedly and consistently for an extended period. Other labs have now shown this is feasible with improved inhibitory opsins, so this would now be our preferred option too. The method of silencing sSC by inhibition of GABAergic neurons, however, is still the most common optogenetic method to silence sSC for an extended period (e.g. Hu et al. Neuron 2019, Brenner et al. Neuron 2023) .

      We thank the reviewer pointing us to recently published paradoxical behavioral effects. These effects, that we found in Essig et al. (Comm. Biol. 2021) are very interesting, but are not really a concern for the interpretation of our results, partially because as the reviewer pointed out, the GABAergic neurons activated there were in the deep and intermediate layers of the SC, below the sSC that we targeted. The paradoxical effects in that manuscript were attributed to direct inhibition of the contralateral superior colliculus. In our case, we activated the inhibitory neurons bilaterally, and this interhemispheric GABAergic connectivity, if it extends to sSC, only strengthened the bilateral silencing of the sSC. However, we have now discussed the possibility of our transfection of these deeper GABAergic neurons (lines 272-278). The more general point that activating GABAergic neurons in the sSC may also cause inhibition in other structures is indeed a concern. GABAergic neurons in the sSC project to the PBG and the LGN (in particular the vLGN) (Gale & Murphy, 2014; Whyland et al., 2019; Li et al., 2023). Although the primary effect of our manipulation is silencing of the superior colliculus, including the GABAergic neurons (see our answer further below), we cannot exclude the possibility that activating these extracollicular GABAergic projections has an effect. We have edited our discussion of this and updated the references (lines 268-272). However, our measurements in anesthetized (previous submission) and in awake mice (new Figure 1—figure supplement 2) show that apart from a short period directly after the onset of the laser, also almost all putative GABAergic neurons are reduced in their response (see also our answer to the next comment).

      A minor point is that although activation of GABAergic neurons in sSC is expected to cause inhibition of neighboring neurons, I would expect channelrhodopsin-expressing GABAergic cells to show an increase in firing during optogenetic excitation. However, it seems that none of the cells plotted (assuming each point in Supplementary Fig 4D is a cell, which the legend does not specify) had such an increase. Do these extracellular recordings not detect inhibitory neurons well?

      This is indeed an intriguing observation. The data in the original figure (Supp Fig 1D) was spiking data from 15 recording sites and not from sorted units. This was mentioned in panel C, but not in the caption. For the purpose of the amount of silencing, there was no need to sort single units. Still, it is surprising to see the reduction on almost all channels. The data of Supp Fig 1D came from experiments in anesthetized mice. Prompted by a question from another reviewer, we have now redone these experiments in head-fixed awake mice. The new Figure 1—figure supplement 2 shows these results, for single- and multi-unit clusters. In response to a short laser pulse (50 ms), we find that many units significantly increase their firing rate (Figure 1—figure supplement 2A-B). However, almost all activated then reduce there firing rate and again, we see an overall reduction of responses to visual stimuli. Only one unit fires significantly more when the laser is on during the period of visual stimulation compared to when the laser is off, and the overall firing rate is strongly reduced (Figure 1—figure supplement 2C-E). It appears that optogenetically activating the inhibitory neurons in the sSC for a longer period also reduces the activity of these neurons. The effect that we are seeing might be similar to the paradoxical effects that may occur in visual cortex, where additional excitation of inhibitory neurons leads also leads to their reduced activity due to network dynamics (see e.g. Sadeh & Clopath, Nat Neurosci Rev 2021). However, the effect may also be due to a few inhibitory neurons having a strong inhibitory effect on other inhibitory neurons. This is an interesting point worthy of more investigation, but it falls out to scope of this manuscript.

      Finally, the relationship between these stimuli and objects is not entirely clear. The authors acknowledge this but it would be worthwhile to devote more attention to this point. In effect, as the authors note, the gray screen and sinuisoidal grating do not have any sharp edges on the screen, whereas each of the behaviorally relevant stimuli will create a sharp, step-like edge on the screen. Whether edge detection is truly object detection or simply a variant of more general visual detection is unclear.

      Indeed, the task can be solved by detection of texture edges, and it is not necessary to integrate the edge components into an object to successfully perform the task. A linear decoder fed with simple cell-like inputs is able to do the orientation task (Luongo et al., 2023). The same network failed to learn the phase task, but also the image of a phase-defined figure contains features that are not present in the background image, and could be solved by learning only local features. Even the texture-defined figures used in Kirchberger et al. (2021) and in earlier monkey studies (Lamme, 1995) which do not contain any sharp stimulus edges can be detected without integrating the local edges into objects and segregation the figure from the background. Several monkey studies show that late neuronal responses in V1 are enhanced for neurons with receptive fields on what we, humans, perceive as the figure. This effect has also been seen in mouse V1, even in the case where there are no local features distinguishing the figure from the background (Fig 7. in Kirchberger et al. 2021). Interfering with activity in V1 in this late phase reduces the ability to detect the figure in human (by TMS) and mouse (by optogenetics). This is suggestive that this figure-ground modulation is used in solving the task, but not a proof. To understand if mice solve the tasks by detecting a figure or by detecting specific features, we can look at generalization. Mice were previously shown to generalize to some degree for size, position and spatial phase of the figure grating patch (Schnabel et al., 2018), suggesting that the mice did not train to detect specific features at specific locations. Rats trained on a similar task had difficulty generalizing from a luminance-defined object to an orientation-defined object (De Keyser et al., 2015), as do mice (Khastkhodaei et al., 2016), but once the rats were acquainted with one set of oriented figures, they immediately generalized to other texture-orientations above chance. On a slightly different figure-detection task mice also showed generalization for different orientations once the initial task was learned (Luongo et al. 2023). This suggests that at least some generalization to object detection occurs in this task. We have added these observation to the discussion (line 301-305).

      Reviewer #2 (Public Review):

      The goal of this study is to show that the superficial superior colliculus (sSC) of mouse signals figure-ground differences defined by contrast, orientation, and phase, and that these signals are necessary for the animal to detect such figure-ground differences. By inhibiting sSC while the animals perform a figure-ground detection task, the study shows that detection performance decreases when sSC activity is suppressed during the onset of the visual stimulus. The study then intends to show that sSC neurons exhibit surround suppression based on orientation differences, and that surround suppression is stronger when the animal detects the correct location of the figure on the background.

      The major strength of this study is the use of a behavioural paradigm to test detection performance of figure-ground stimuli while manipulating neural activity in the sSC during different times after stimulus onset. This paradigm would show whether activity in the sSC is relevant for performing the task. Secondly, the study collected data to confirm previous findings: sSC neurons exhibit orientation specific surround suppression. Additionally, it is impressive that the authors were able to train mice to generalize their task performance across different stimulus categories (figure-ground differences in orientation and phase). This should be highlighted as it may inform future studies.

      Thank you for your positive evaluation. We have extended our discussion on the generalization in object detection tasks in mice.

      The study has, however, methodological and analytical weaknesses so that the stated conclusions are not supported by the presented results.

      1) Optogenetic inhibition is not limited to sSC (even expression may not be limited) About 30% of inhibitory neurons in the sSC project to other areas, e.g. ventral LGN, parabigeminal nucleus and pretectum (Whyland et al, 2019, see ref in manuscript). This means that these areas receive direct inhibition when inhibitory sSC neurons are optogenetically stimulated. This fact is mentioned in the discussion but the consequences and implications for the results are ignored. This is a major flaw of the optogenetic experiments of this study. Additionally, no evidence is given that opsin expression was limited to the superficial layers (except for one histological slice), which the authors acknowledge in line 285. Deeper layers may have other inhibitory neurons with long-range projections.

      The finding that sSC neurons show no figure-ground modulation for phase while the optogenetic manipulation has behavioural effects may be an indication for other areas being affected by the optogenetic manipulation.

      This is a valid point, also raised by reviewer 1. Although the primary effect of activating the GABAergic neurons in the sSC is a strong reduction of activity in the sSC (see also new figure S1), we cannot rule out that we also activate GABAergic neurons below the sSC and that there are some effects of activating GABAergic connections to the LGN and PBG. We have extended our discussion of this point in lines 269-277. However, as shown in new Figure 1—figure supplement 2, the effect of optogenetically activating Gad2-positive neurons appears to lead to a counter-intuitive reduction of their activity. This effect has previously been observed in cortex.

      2) Could other behavioural variables explain the results?

      a) Are there any task events other than the visual stimuli that the mice could use to make their decisions? The authors state the use of a custom made lick spout but it is not clear how this spout works, i.e. how do mechanics of the spout deliver water to the right versus the left output and could the mouse perceive these mechanics?

      We believe there were no task events besides the visual stimuli that the mice could use to make their decisions. The lick spout was Y-shaped (see Figure 1B) to facilitate the two-alternative forced choice task. Each side of the lick spout was connected to a separate water tube. The water flow in each tube was controlled using a valve. Also, each side of the lick spout was connected to its own lick detector wire. The two valves and the two detector wires were connected to an Arduino which was controlled by our MATLAB task script. The task script was coded such that, when the lick of the mouse had been on the correct side, the valve controlling the water flow on the correct side would briefly open to deliver the water reward. To summarize, the water would only flow after the mouse had licked and if the first lick had been on the correct side. Hence, the water reward did not produce additional cues. We have edited the description of the lick spout in the Methods section to make the functioning of the lick spout more clear (lines 511-513).

      b) Could the different neural responses to figure versus ground shown in Fig 2I-J and Fig 3B be explained by behaviours varying between the trial types, e.g. by early lick movements (which are conceivable even if the spout is not present), eye movements or changes in pupil-linked arousal? A behavioural difference seems even more likely to occur between hit and error/miss trials (Fig 4). If these behaviours were not measured, the possibility of behavioural modulation should be discussed.

      In the awake behaving electrophysiology experiments, the lick spout was not present until 500 ms after stimulus onset, so the mouse could not lick the spout. We did not record whisking or other face and jaw movements, hence we cannot say for sure whether the mice performed early ‘licks’ in the absence of the lick spout. We did, however, add a supplementary figure showing the licking behavior of the mice in the optogenetic interference experiments (see Figure 1—figure supplement 5). In this experiment, the lick spout was present at all times so all early licks would be recorded. Any licks before 200 ms after stimulus onset were disregarded as this would be too early for the decision to include knowledge about the stimulus. Figure 1—figure supplement 5B shows that the mice indeed only performed very few early licks as they probably knew this would not yield reward. The mice that performed the awake electrophysiology experiments were trained on the same task as these mice before introducing the lick spout delay of 500 ms. So although we cannot rule out early licks during electrophysiology, we think early licks would be an unlikely explanation for the neural response differences.

      We have added a new supplementary figure (Figure 2—figure supplement 2) showing data for eye movements and pupil dilation during the tasks. We had excluded all trials where the mice performed eye movements between 0-450 ms after stimulus onset, and indeed we saw no eye movements during the peak of the visual response (0-250 ms). Furthermore, the pupil dilation of the mice also did not change in this period.

      All in all, we view it as unlikely that the differences in neural activity in sSc were caused by either licking, eye movements or pupil-linked arousal.

      3) What is the behavioural strategy of the animals? Only licks beyond 200 ms after stimulus onset determine the choice of the animal because "mice made early random licks" from 0 to 200 ms. To better understand the behavioural strategies of the animals we need to see their behavioural data, i.e. left and right licks aligned to stimulus onset. It would be particularly interesting to see how number and latency of licks changes during optogenetic manipulation.

      Based on these suggestions, we investigated the licking behavior of the mice during the optogenetic experiments in more detail. Our new Figure 1—figure supplement 5 taught us several things:

      1) The fully trained mice hardly perform any early licks; they seem to understand that early licks cannot yield reward.

      2) The mice typically only lick one side of the lick spout during one trial. In correct trials the fluid reward is given directly after a correct lick, which causes the mouse to lick the correct side of the spout even more. However, even if the first lick is incorrect (bottom rows), the mouse generally does not lick the other (correct) side afterward. They seem to know that correct licks after an incorrect lick do not yield reward.

      3) The maximum licking rates were not significantly affected by laser onset.

      4) The latency of the first lick (reaction time) was not significantly affected by laser onset. (Please also see our response to question 2b).

      4) Data relating to misses should be included in analyses to provide a complete picture of behaviour and neural responses

      a) In the optogenetic manipulations, an increase in misses seems to dominate the decreased accuracy (please, explain when a response was counted as a miss). A separate analysis of miss trials may be more robust than of error trials and also offers a different interpretation of the data, namely that the mouse did not see the stimulus rather than perceiving the figure on the opposite side. However, if the mice reduced their lick rate in general during optogenetic stimulation, this begs the question whether their motor performance was affected by optogenetic manipulation. Can this possibility be excluded?

      Trials were counted as follows: A trial was counted as a hit when the first lick after 200 ms after stimulus onset was on the correct side. A trial was counted as an error, when the first lick after 200 ms after stimulus onset was on the incorrect side. A trial was counted as a miss, when the mouse did not lick in the window between 200 and 2000 ms after stimulus onset. We have clarified this in the methods section (line 517-526).

      Our previous text may not have been sufficiently clear but the decrease in accuracy during optogenetic trials is not dominated by an increase in missed trials. As we have now indicated explicitly in its caption, in figure 1, missed trials are excluded from the analysis. Hence, the significant effects shown in figure 1 are not driven by an increase in missed trials but rather by an increase in erroneous licks. When comparing figure 1 vs figure S3, where the missed trials are added to the analysis as if they were error trials, we can see an overall downward shift of the performances. Indeed, mice miss more trials when the laser is on. The increase in number of missed trials is lower than the increase in number of wrong choices. Furthermore, the range between the performances at early laser onset and late laser onset is still very similar. This indicates that the mice on average do not have higher miss rates when laser onset is early.

      Finally, nor maximum licking rate, nor the reaction time is affected by the laser onset (see the new figure S2)

      Related to Fig 4, it would be equally interesting to see how FGM changes during misses. Do the changes support the observations for error trials?

      We are not convinced that the neural data from missed trials can be interpreted in a simple way. Mice may have various reasons to miss a trial: they may be tired or not paying attention, they may not have seen the stimulus well, they may not feel thirsty enough, they might be distracted by some sensory input that humans might not be aware of, etc. This is why we specifically opted to not use a go-no/go task but instead opted to use a 2-alternative forced choice task.

      5) Statistical tests do not support the conclusions, are missing or inadequate

      a) In Fig 1E, accuracy is significantly affected at only 1-2 time points in each task, specifically either the 1st and 3rd or the 2nd time point. How do the authors interpret these results? If inhibition starting at the 2nd time point has no significant effects, why would it be significant when inhibition starts later (at the 3rd time)? Furthermore, given that all other starting points of laser stimulation have no significant effects, there is no reason to trust the latency of inhibition effects based on mostly insignificant data points. This analysis in its current form should be removed, including a comparison of latencies between tasks, which was not tested for significance. It may be more meaningful to analyse accuracy for each animal separately. This may reduce variability.

      We can understand that the reviewer may have concerns regarding the post-hoc analysis of Fig 1E, but we feel these concerns stem from a misinterpretation of our goal with this analysis. In Figure 1E, we use a 1-way repeated-measures ANOVA. By using this test, we ask whether the performance of the animals is affected by the laser onset. More specifically “does the performance increase or decrease with increasing laser onset?” The test is significant, so indeed the performance goes up as laser onset goes up. This indicates that the performance of the mice is affected by the inhibition of sSC. For the sake of completeness we had included the post-hoc tests for each latency in the statistics table. Indeed, some individual latencies are not significantly different to the no-laser condition. However, this does not invalidate the conclusion of the main test: a repeated measures ANOVA can only be performed on data with 3 or more groups, so the conclusion of the repeated measures ANOVA could not have been drawn from simply those laser onset(s) that is/are significantly different from the no-laser condition. The main effect of higher performance with higher latencies is significant, even if some individual comparisons are non-significant. The difference in significance of the post-hoc tests does not indicate a significant difference between the groups, but insufficient power to do six individual tests.

      We have changed the wording in the reporting of the statistics of Figure 1E to hopefully more precisely indicate the conclusions we drew from the statistics. We do not draw conclusions from the post hoc tests. We have considered removing them from the statistics table 1, but believe that some readers might be interested. We can remove them if the reviewer believes that would be better.

      b) Analyses regarding the difference in neural response to figure and ground (Fig 2I-J, Fig 3B, Fig 4B, Fig 5C) would be more convincing and informative if the differences were analysed on the level of single neurons in response to the same orientation within their RF (or at the location where the figure is presented, for edge-RF neurons). A histogram of these differences would show how many neurons are affected and how large the effect is in single neurons.

      We fully appreciate this idea, but the way we set up the behavioural task does not quite allow for this type of statistical analysis. This is because we tested all three of the tasks during single sessions (contrast/orientation/phase), and on top of that, we varied the orientations of the stimuli (0/90deg), as well as the phase of the gratings (60 different phases). This all was done with the idea that it would prevent the mice from memorizing the individual stimuli of the task. This also had the effect that only very few trials per session contained the exact same stimulus type, figure-ground condition, orientation and phase. For example, if a mouse would perform around 120 trials in a session. 25% of those were contrast-stimulus-trials, 37.5% of those were orientation-stimulus-trials and 37,5% were phase trials. If we look into 120*0.375 = 45 orientation-stimulus-trials, half of those were figure trials, half were ground trials: 22 trials each. If we split these trials up by their individual orientations, we are left with only about 11 trials per condition to analyse for figure-ground effects, each of which would probably have a different grating phase. Given the firing rate variations that the individual neurons show in awake mice, this amount of trials would not provide enough statistical power to test the significance of modulation in single neurons.

      Although we feel the study design would not allow analysis of individual neurons in response to the same orientation within their RF, we did perform an aggregated analysis on orientation selectivity. For this analysis, we included all the trials where the RF of the recorded neurons was on the background-half of the screen. We then computed the responses of each neuron to the trials where the background orientation was 0 and 90, respectively. This analysis showed that most neurons had no preference for either of the two tested orientations of the other. Only 4 out of 64 (6%) neurons showed a significant preference. We therefore believe that splitting the data by orientation preference would not be very informative.

      c) All statistical tests performed across neurons should account for dependencies due to simultaneous recordings (dependency on session) and due to recordings in the same animal (dependency on animal). This can be done in most cases by using linear mixed-effects models.

      We agree with the reviewer and have changed the analysis for figure 2I, 3B and 3E to an LME analysis (see also Table 1).

      d) There was no significant difference between model weights (Fig 3D), so the statement in line 210 (RF-edge neurons had higher weights) should be removed.

      In answer to previous we question changed the analysis for what is now Figure 3E to an LME. This shows that relative weights were significantly higher for the orientation compared to the phase task. We have adapted our conclusion accordingly (line 214-218).

      e) Fig 4B compares FGM during correct and error trials. This comparison has to be performed with the same set of neurons in correct and error trials (not the case for orientation). Again, the most compelling and informative comparison would be on the level of single neurons: response difference between figure and ground (same visual features at figure position) during hits versus errors.

      As described above, we feel the study design does not allow analysis on the level of individual neurons. The analysis in 4B was actually performed using the same set of neurons, we have removed the typo.

      f) There is no evidence that FGM for phase was different between hit and error trials as stated in line 234.

      Indeed, we had phrased this incorrectly. Since we recorded all task during single recording sessions, we have data for each task for most neurons. We were therefore able to pool the results from the different tasks, and the main d-prime difference between hit vs. error was significant. Post-hoc tests showed that this is mainly driven by the difference in the orientation task. We have edited the wording to be more accurate (line 239-242).

      g) It is not clear why and how the mixed linear effects model was used pooling data across tasks (Fig 4C and Fig 5D). Different neurons were recorded for each task, so the sample points (neurons) are not affected by both task effects (orientation and phase). Each task should be analysed separately.

      Since we recorded all three task versions during single behavioral sessions, we have data for multiple tasks from each neuron. This is why the linear mixed effects model pools the data across the tasks. We have added a note in the main text for clarity (line 238-242)

      h) Bonferroni correction in Fig 1E should correct multiple comparisons across time points, not across tasks (see Table 1).

      The multiple time points all belong to the same one-way repeated measures ANOVA, so there’s no need to correct the post-hoc analysis. We did run the ANOVA for three tasks, which is why we corrected the p-values of each task. We think that this is best way, but can also present uncorrected p-values if needed.

      i) What is the reason to perform some tests one-tailed, others two-tailed?

      Following the reviewer comments, we changed some analyses to LME models. The remaining tests that require definition of the tails are all two-tailed.

      6) The results relating to "multisensory neurons" are ambiguous regarding their interpretation (if significant at all) and seem unrelated to the goal of the study. It is particularly likely that behaviours like licking or other movements cause the response differences between figure and ground.

      We agree with the reviewer that finding these neurons was not the aim of the study. We did not include enough type of tests in our paradigm to fully determine the properties of these neurons. Furthermore, we note that we have recorded too few of these neurons to draw strong conclusions. The data shown in new Figure 2—figure supplement 1H suggest that the responses of these neurons or not as strongly time-locked to the first lick as they are to the trial onset. We presented the behavior of these neurons in our manuscript, because, whatever their exact behavior, they are clearly distinct from the visually responsive cells that show a short latency response to the visual stimulus (Figure 2—figure supplement 1). We still feel that it is useful for the reader to know there are cells in the sSC that show such a distinct behavior, but we have moved the figure and the accompanying text to a figure supplement to avoid distraction from the main message of the manuscript.

      7) What depth were neurons recorded from (Fig 3 and 4)?

      The depths of the recorded visually responsive neurons is now shown in Figure 2—figure supplement 1E.

      Reviewer #3 (Public Review):

      The authors used optogenetic manipulations and electrophysiology recordings to study a causal role and the coding of superficial part of the mouse Superior Colliculus (SCs) during figure detection tasks.

      Authors previously reported that figure-ground perception relies on V1 activity (Kirchberger et al. 2021) and pointed out that silencing of V1 reduced the accuracy of the mice but still the performance was above the chance level. Therefore, visual information necessary in this task, could be processed via alternative pathways. In this study, authors investigated specifically SCs and used similar approach and analysis as in Kirchberger et al. 2021. Optogenetic silencing of the activity of visual neurons in SCs impaired the accuracy in all 3 versions of the figure detection task: contrast, orientation, and phase. Electrophysiology recordings revealed that SCs neurons are figure-ground modulated, but only by contrast- and orientation-based figures. They show SCs visually responsive neurons reflect behavioral performance in orientation-based figure task. The authors conclusion is that SCs is involved in figure detection task.

      Overall, this study provides evidence that mouse SCs is involved in a figure detection task, and codes for task-related events. Authors heroically compared results between 3 different versions of the figure-based detection task. The logic of the study flows through the manuscript and authors prepared a detailed description of methods.

      Thank you for your positive comments.

      However, my main concern is with 1) the amount of data used to make the key arguments, and 2) the interpretation of results. The key findings of this study (figure-ground modulations in SCs) could be a result of the visual cortical feedback in SCs during the task, or pupil diameter changes. Unfortunately, the authors did not rule out these possibilities.

      Still, this study can be relevant to a general neuroscience audience, and results could be more convincing if the authors could clarify:

      1) Optogenetic inactivation

      a) The impact of laser stimulation on neural activity is not satisfactory (Supplementary Figure 1). The method seems to be insufficient to fully salience neurons. Electrophysiology control recordings of inactivation are performed in anesthetized mice, which is not a fair estimation of the effect in awake state. Therefore, it rises a major question how effective the inactivation is during the task?

      We have conducted new control experiments for the impact of laser stimulation on neural activity, now in awake animals (see Figure 1—figure supplement 2). The reviewer was right to ask for these experiments. We had not expected much difference in the effect of silencing in the awake and anesthetized state. To minimize the animal discomfort, we had therefore done these control experiments in terminal experiments under anesthesia. However, these new set of experiments showed that the impact of laser stimulation was much stronger in awake mice than anesthetized mice. We see an average spike rate reduction of 90% when the laser is on. Although it is not full silencing, we think this reduction is sufficient to draw some conclusions on the role of sSC in the behavioral tasks.

      b) Could authors provide more details if laser stimulation has an effect only on visual, or all sampled units? How many of units were recorded, and how many show positive and negative laser modulation?

      We defined visually responsive units as units that have an evoked rate of at least 2 spikes/s. In the new figure 1—figure supplement 2D from the new set of control experiments, we plotted, for every unit, the mean rate in laser ON and OFF trials - also including the non-visually responsive units. It is evident that the spiking activity of most units – including those that were not classified as ‘visual’ – is reduced in the laser ON compared to OFF trials. We observed 1 unit that showed strong positive laser modulation over the entire duration (figure 1—figure supplement 1D). Many units were activated by shorter laser pulses directly after laser onset (Figure 1—figure supplement 2A-B), but these also reduced in activity as the stimulation continued.

      c) How local the inactivation effect is? Where was the silicon probe placed in relation to AAV expression and optical fiber position?

      The AAV was injected at 0.3 mm anterior and 0.5 mm lateral to the lambda cranial landmark. With this injection location we aimed to focus the expression at low/nasal receptive fields, in front of the mouse, because that is where the visual stimulation would take place. From there, the expression did spread laterally across sSC (see Figure 1C). The silicon probe was placed roughly in the same location as the viral injection. The optical fiber was positioned such that the tip would shine on the surface of the sSC at a slight angle, from a lateral distance of ~200 µm from the silicon probe. We have edited the methods section to make this more clear (line 583-585). This procedure allowed us to record only relatively local effects of the inactivation. Although we did not record neural activity across the entirety of sSC, we did record from multiple electrode penetrations per mouse, each time slightly varying the recording location with up to ~300µm and ~500µm in the anterior and lateral directions, respectively. In these variations of recording location the optogenetic effect was always present (see new Figure 1—figure supplement 2G). Moreover, the suppressive effect of optogenetic stimulation of GAD2+ neurons was observed across the entire depth of the sSC (new Figure 1—figure supplement 2H).

      2) Number of sessions and units

      a) The inactivation effect on behavior (Figure 1E) during phase-task has a significantly larger effect at 66ms after stimulus onset. How can authors explain this? Could this result be biased by one animal/session, or low number of trials for this condition? There is no information about number of trials, or sessions from individual animals. Adding a single example of animal's performance, and sessions for individual mice could clarify results in Figure 1.

      The criterium for each mouse to be included in the analysis for one of the tasks was to have 100 trials where optogenetics were used (aggregated across the latencies). So at minimum, we would have about 100 trials/6 latencies = 17 trials per latency per mouse. For most mice though, the number of trials per latency was closer to about 40. We have added more information about this to the methods section (lines 567-570). Despite these inclusion criteria, the 66 ms effect is present for multiple mice (we have now added data visualizations for the individual mice in Figure 1—figure supplement 4). To address the reviewer’s concerns, we can only speculate as to why this happens. It might be random variation. A more speculative conclusion would be that perhaps this 66ms laser onset is particularly disturbing to the visual processing and/or decision-making of the mouse. But we feel that we do not have enough evidence to conclude this.

      b) Figure 2H shows an example of neuron with an effect in the figure detection task based on phase difference, but Figure 2I/J (population response) shows there is no effect. Overall, the conclusion is that SCs neurons are not modulated by a phase-defined object. It seems that number of mice and hence units are smaller in phase-detection task comparing to two other tasks. How many of single units are modulated in each version of the task? How big is the FGM effect on single neuron response (could authors provide values in spikes/s)? One task is dropped from analysis which it is one of the main points of the paper: to compare responses across different versions of the figure detection task in SCs. But Figures 3-5 only focuses on two tasks, because there is not enough of data for figure-based contrast task.

      We have updated Figure 2H to show spikes/s of the example single neuron response. For the population responses, we explicitly normalized the individual neurons because they all have different baseline and peak firing rates. This normalization was important for the decoding, so we decided to print the data such that the data from Figures 2I and 3B went into the decoding as printed. If we look at the non-normalized values, the maximum amplitude of the average FGM effect is 22.3, 5.9 and 2.9 sp/s respectively for the three tasks (for neurons with RF on stimulus center).

      We have furthermore updated the FGM analysis such that the clustered statistic is now based on linear mixed effects statistics instead of T-test statistics. The results based on this new analysis are largely the same (see statistics table T1). We checked the significance of individual neurons in the time window where the grouped LME analysis was significant. For the phase task (n.s. in grouped analysis), we used the significant window from the orientation task. For this analysis, we want to stress that the number of trials for each version of the task for each individual neurons is quite limited as we recorded all three of the tasks during each recording session. Individually, 7/23 neurons were significant for the contrast task, 1/49 were significant for the orientation task, 0/32 were significant for the phase task (after Bonferroni-holm correction).

      To address the final part of this comment on dropping the contrast task: we indeed have recorded too few data points to draw conclusions on decoding (Fig. 3) and discriminability (Fig. 4) for the contrast task. However, we do not see the contrast detection task as the main point of the paper. As earlier work had already shown involvement of the sSC in visually-evoked behaviours based on objects that are clearly isolated from the background, the main focus in this work is to show involvement of sSC in complex object detection, where the visual contrast and luminance is the same across object and background.

      3) Figure-ground modulation in SCs

      a) How is neural activity correlated with pupil size, movement (eg. whisking, or face), or jaw movement (preparation to lick)? Can activity of FGM neurons in SCs be explained by these behavioral variables?

      We did not record whisking or other face and jaw movements. We did record the eye of the mice, so have included a new Figure 2—figure supplement 2 which shows eye position and pupil dilation during the task. For the analysis in the originally submitted paper, trials with substantial eye movement (Z-score of eye speed > 2.5) between 0 and 450 ms had already been removed from the analysis. This way, we could exclude effects of eye movements (but not pupil dilation) on the visual responses in sSC. The additional figures and analyses have been done using the same inclusion criteria. Indeed, in the included trials mice did not move their eyes during the peak of the visual response (0-250 ms). The pupil dilation also did not change in this period.

      b) Could authors describe in more detail how they measure a pupil position and diameter, by showing raw data, pupil size aligned to task events?

      We have added a new Figure 2—figure supplement 2 to show the pupil position and diameter aligned to task onset.

      c) How does pupil diameter change between tasks? Small pupil changes can affect responses of visual neurons, and this could be an explanation of FGM effect in SCs. Can authors rule out this possibility, by for example showing pupil size and changes in position at stimulus onset in different tasks?

      Our new Figure 2—figure supplement 2B shows that pupil dilation changes and differences in pupil dilation between figure/ground trials do occur, but only after ~300 ms, so after the peak of the visual response and after the FGM is present in sSC.

      d) Authors in discussion mentioned that the modulation of V1 could be transferred to SCs through the direct projection. Moreover, animals perform above chance in both inactivation experiments (V1 and SC), which could be also an effect of geniculate projections to HVAs (eg. Sincich et al. 2004). Could authors discuss different possibilities?

      The direct geniculate projection to HVAs is an interesting possibility that we had not considered yet. The dLGN in the mouse projects (apart from V1) mostly to the medial HVAs (Bienkowski et al. 2018). The lateral extrastriate regions receive only very sparse input from the dLGN. The medial HVAs, however, could be silenced without drop in performance in a simple visual detection task (Goldback et al., 2020). Therefore, it does not seem likely that this geniculate to HVAs projections would be important in the figure detection task.

      4) Interpretation of multisensory neurons is not clear. In Figure 5B, there is an example of neuron with two peaks of response. Authors speculate about the activity (pre-motor) but there is lack of clear measurement showing "multisensory" response of these neurons. Could these responses be related to the movement of the lick spout towards the mouth of the mouse (500 ms after the presentation of the stimulus)? Moreover, the number of "multisensory" units is very low (5 units, and 8 units).

      We have not done definitive test to show what these putative multisensory neurons exactly respond to. Because of their response was after the appearance of the lick and time locking to the trial start, rather than to the licking response, we think that is likely that these neurons responded to the appearance of the spout. There might have been visual, auditory, vibrational or touch clues to which these neurons respond. We believe it is interesting for the reader to know that there is class of neurons in the sSC that did not show a visual stimulus but was time locked to the trial. This was the reason that we had included this figure in the manuscript. However, given the reviewers comments we have decided to move the figure and accompanying text to a figure supplement (Figure 2—figure supplement 1) in order to not distract from the main message of the manuscript.

    1. Author Response

      Joint Public Review:

      1) For the in vitro work, only one cell line is used in this article: HPAEpiC cells, an immortalized human cell line derived from alveolar epithelial type II cells. This limits the generalizability of the results obtained in this study, as SARS-CoV-2 is known to infect several kinds of cells.

      We appreciate the concerns of the reviewing editor. To test whether our findings were applicable to other cells, we performed similar experiments in human hepatoma cells (Huh-7) and renal tubular cells (HK-2), which are highly susceptible to SARS-CoV-2 (Yeung et al., 2021). We found that infection by SARS-CoV-2 upregulated the protein levels of ACE2, while colchicine treatment significantly inhibited the expression of ACE2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2A-D). In addition, we found that colchicine treatment also reduced the viral load of SARS-CoV-2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2E and F).

      2) From the results of two separate experiments (colchicine leading to reduced ACE2-expression in HPAEpiC cells & colchicine leading to reduced SARS-CoV-2 replication in HPAEpiC cells), the authors infer that inhibition of ACE2 expression by colchicine suppresses SARS-CoV-2 infection. However, their experiments do not explicitly prove this hypothesis and do not give weight to the importance of this reduced ACE2 expression in the colchicine antiviral effect they observed, as other mechanisms may play a (bigger) role in producing this effect.

      It has been well-established that the infection of SARS-CoV-2 and the Spike-RBD binding are dependent on ACE2 expression in different cell lines. ACE2 knockdown dramatically reduces SARS-CoV-2 infection in Caco2 cells (Shen et al., 2022), Spike-RBD binding, and SARS-CoV-2 replication in Calu-3 cells (Samelson et al., 2022). In contrast, overexpression of ACE2 greatly enhances SARS-CoV-2 virus infection in both A549 and H1299 cells (Chen et al., 2021). Meanwhile, two recent studies have demonstrated that androgen receptor positively regulates the expression of ACE2 at a transcriptional level (Qiao et al., 2021; Samuel et al., 2020). Importantly, inhibition of ACE2 expression by reducing the AR signaling attenuates SARS-CoV-2 infectivity (Qiao et al., 2021). A very recent study has demonstrated that ursodeoxycholic acid (UDCA), an inhibitor of the farnesoid X receptor (FXR), reduces ACE2 expression in human lung, intestinal, and liver organoids, thereby inhibiting SARS-CoV-2 infection (Brevini et al., 2022). These results clearly demonstrate that ACE2 expression levels determine the efficiency of SARS-CoV-2 infection to host cells.

      3) The authors refer to colchicine as a drug leading to mortality benefit when used as treatment for COVID-19 (line 101-105). However, whether colchicine is beneficial in COVID-19 is unclear. For instance, the randomized controlled trial by the RECOVERY Collaborative Group (Lancet Respir Med 2021), which included more than 11,000 patients, did not find benefit from colchicine in patients admitted to hospital with COVID-19. The authors refer to the review of Drosos et al to infer benefit of colchicine in COVID-19, however this review ignores the numerous trials contradicting this (as also stated in a letter from Finsterer in response to this review). The meta-analysis by Elshafei to which the authors refer was published before the largest RCT by the RECOVERY Group was published.

      We agree with the assessment made by the reviewing editor. Our goal is to discover a new mechanism of regulating ACE2 expression. Using colchicine, we have- identified that SP1 is a crucial transcription factor that regulates ACE2 expression. In response to the reviewer’s comments, we added the sentences “This study has several limitations. Firstly, although SP1 was identified as a pivotal transcription factor in modulating ACE2 expression via the action of colchicine and MithA, neither of these compounds currently qualify as a candidate for the treatment of COVID-19.…Additionally, the efficacy of colchicine as a treatment for COVID-19 remains inconclusive. While some studies suggest benefits (Chiu et al., 2021; Drosos et al., 2022; Elshafei et al., 2021), others indicate negligible impact on mortality or disease progression (Group, 2021; Mikolajewska et al., 2021).” in Discussion of revised manuscript (Lines 329-342).

      4) The authors did not let a pathologist blinded to the infection/treatment state of the animals score the samples obtained in the animal experiments, which could have introduced bias in these results.

      We appreciate the concerns of the reviewing editor. Actually, histological observations were made by one of authors, Dr. Li-Qiong Wang, who is a pathologist, blinded to group identity. In response to the reviewer’s suggestion, we have now added a sentence “Tissue sections were evaluated by a trained pathologist (L.-Q. W.) blinded to group identity” in the section of Material and Methods (Lines 516 and 517).

    1. Author Response

      We appreciate the insightful comments from three reviewers on our manuscript. These comments help us improve the clarity of this manuscript. We will revise our manuscript comprehensively in subsequent revision, and enclose a detailed response to each of these comments. In this public reply, we focus on (a) clarifying the theoretical motivation and implication of the present study, and (b) discussing the implications of our LLM study. Besides, we provide a brief justification regarding some methodological concerns shared by the reviewers.

      1) Theoretical rationale and implication

      As we stated in the manuscript, the present study tested whether body size serves as a reference for locomotion and object manipulation, or alternatively, plays a pivotal role in shaping the representation of objects as suggested by Protagoras. Behind this question is the long-lasting debate regarding the representation versus direct perception of affordance.

      One outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998). This hypothesis challenges the necessity of representation in the sense of computationalism cognitive theories (e.g., Fodor, 1975), which implies discretizing/categorizing inputs and then subjecting them to certain abstraction or symbolization so as to create discrete stand-ins for the input (e.g., representations/states). In this sense, our theoretical motivation can be restated explicitly as to test the ‘representationalization’ of affordance. That is, we tested whether object affordance would simply covary with its continuous constraints such as object size, in line with the representation-free view, or, whether affordance would be ‘representationalized’, in line with the representation-based view, under the constrain of body size. Such representationalization would generate categorization between the affordable (the objects) and those beyond affordance (the environment).

      Debates regarding the replacement hypothesis often turn into wrestles on the definition of representation (Shapiro, 2019). The present study tried to avoid this pitfall but examined where the embodied and computational theories make opposite hypotheses: discontinuity. Specifically, we considered two computationalism propositions about representation: (a) representations entail discretization of continuous input, and (b) the product of such discretization (representations) is supramodally accessible (that is, transcending sensorimotor processes). These claims are opposite to the prediction based on the idea of direct perception and other representation-free embodied theories.

      Thus, we tested whether, for continuous action-related physical features (such as object size relative to the agents), affordance perception introduces discontinuity and qualitative dissociation, i.e., to allow the sensorimotor input to be assigned into discrete states/kinds, as representations envisioned by computationalists. Alternatively, does the activity directly mirror the input, free from discretization/categorization/abstraction, as proposed by the replacement hypothesis that organisms do not need to re-present the world as they are always in contact with the world in a continuous way?

      All the experiment settings and analyses in the present study were organized around this motivation, following a progressive logic chain.

      First, we tested the discretization hypothesis, that is, whether affordance leads to discontinuity in perception. Here, the discontinuity in affordance perception would be in line with the representation-based view instead of the representation-free proposals. Second, to ensure that the observed discontinuity can be attributed to the discretization of sensorimotor input involved in human-object interaction rather than amodal sources, such as the discrete abstract concepts of the objects (independent from agent motor capability), we tested the embodied nature of this discontinuity through the body imagination experiment. If there is discontinuity in representing embodied information, this discontinuity should be locked to the motor capacity (constrained by the physical constitution such as body size) of the agent, rather than reflecting independent categorization of the absolute size of the objects. Finally, we probed the supramodality of this embodied discontinuity: whether this discontinuity is accessible beyond the sensorimotor domain. To do this, we leveraged the recent advance in AI and tested whether the discretization observed in affordance perception is supramodally accessible to disembodied agents which lack access to sensorimotor input but only have access to the linguistic materials built upon discretized representations, such as large language models (LLM).

      In this way, the experiments in the present study collectively contributed to the debate on the replacement theme of the embodiment of cognition, which serves as one of the three key themes of embodied theories of cognition (Shapiro, 2019). By addressing this theme, we hope to shed light on the nature of representation in, and resulting from, the vision-for-action processing. Our finding regarding discontinuity suggested that sensorimotor input undergoes discretization implied in the computationalism idea of representation. Further, not contradictory to the claims of the embodied theories, these representations do shape processes out of the sensorimotor domain, but after discretization.

      2) Implication in the development of LLM-based agents

      The finding that affordance was representationalized may have profound implications for the development of LLM-based agents. Traditional robots and non-LLM-based agents require implementation-level action instruction, acting as a tool for human beings to achieve desired results. In contrast, LLM-based agents (for a review, see Wang et al., 2023), such as Auto-GPT and BabyAGI, are able to autonomously perform tasks and achieve desired results based on LLMs’ planning ability. In this sense, LLM-based agents show a primary ability to interact on their own with the world. Generative agents, for instance, the agents in Smallville (Park et al., 2023), are a particularly applauded recent advantage in the school of LLM-based agents, which show even larger potentials in this aspect. Drawing on generative models to simulate human behaviors, these agents can formulate their own memories and goals, generate new environment-dependent behaviors, and interact convincingly with humans and other agents and their environments in the course. This brings new possibilities in resolving the long-lasting challenge in artificial general intelligence (AGI) development, that is, to bestow AI with human-level ability in agent-environment interactions. However, it is worth noting that the present investigation in LLM-based agents is still largely confined to virtual environments. This leaves an open question as to how to equip these agents with the ability of agent-environment physical interaction. Especially, according to embodied theories of cognition, sensorimotor interactions with the environment provide unique knowledge upon which various cognitive domains are built. From this point of view, building agents with human-level ability in agent-environment physical interactions might provide an unreplaceable missing piece for AGI.

      By probing the representation of action possibilities (affordances) provided by the environment to the agent (or the absence of them), the present study provided a clue in achieving such ability by illustrating the representationalization of affordance and the supramodality of these representations. For instance, the finding of supramodality may alleviate the doubts about the physical interaction ability of LLM-based agents comparable to biological agents. Specifically, LLM-based agents can leverage the affordance representation distilled into language to interact with the physical world. Indeed, by clarifying and aligning such representation with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may facilitate the sensorimotor interactions of LLM-based agents so as to achieve animal-level interaction ability with the world. This in turn may provide new instances for embodied theories.

      3) Clarification on incomplete evidence

      In response to the methodological and validity concerns of the reviewers, we will provide a point-by-point detailed response to reviewers enclosed with the revised manuscript. Here, we reply to the most prominent concerns.

      Reviewers were concerned about the statistical power of both the body imagination experiment and the fMRI experiment. Regarding the number of participants in the imagination study, we would like to clarify that we did not remove 80% of the participants. Actually, a separate sample of participants was recruited in the body imagination experiment. The sample size for the body imagination experiment (100 participants) was indeed smaller than that recruited for the first experiment (528 participants). This is because the first experiment was set for exploratory purposes, and was designed to be over-powered.

      Admittedly, the fMRI experiment recruited a small sample (12 participants), which might lead to low power in estimating the affordance effect. In revision, we will acknowledge this issue explicitly. Having said this, note that the null hypothesis of this fMRI study is the lack of two-way interaction between object size and object-action congruency, which was rejected by the significant interaction. That is, the interpretation of the present study did not rely on accepting any null effect. In addition, the fMRI experiment provided convergent evidence for the affordance discontinuity at the neural level. We showed that behind the behavioral discontinuity in action judgement, neural activity was qualitatively different between objects within the affordance boundary and those beyond, which reinforces our statement that objects were discretized along the continuous size axis into two broad categories.

      Reviewers also commented that more objects and actions should be included. We agree, and in revision, we will advocate future studies with more objects and more actions to comprehensively portray discontinuity. The present set of objects was designated to cover a relatively large range of object sizes, ranging from 14 cm to 7,618 cm to cover most size categories studied in Konkle and Oliva's (2011) work. In addition, the actions were selected to cover daily interactions between human and objects or environments from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing) referencing the kinetics human action video dataset (Kay et al., 2017). Thus, this set of selected objects and actions is sufficient to test the discontinuity.

      References

      Fodor, J. A. (1975). The Language of Thought (Vol. 5). Harvard University Press.

      Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.

      Shapiro, L. (2019). Embodied Cognition. Routledge.

      Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(5), 615-628.

      Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wen, J. R. (2023). A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.

    1. Author Response

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript and will submit our revised manuscript after the reviewed preprint is published by eLife.  

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (Ref 27; Woo, T. T. et al. 2020).

      1. T. T. Woo, C. N. Chuang, M. Higashide, A. Shinohara, T. F. Wang, Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaks. Nucleic Acids Res 48, 8474-8489 (2020).

      Second, in our preprint manuscript, we have also shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C).

      Third, as revealed by the results of our preprint manuscript (Figure 4), it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (Bachmair, A. et al. 1986; Tasaki, T. et al. 2012; Varshavshy, A. et al. 2019). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptides, unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (Hwang, C. S., 2019).

      A. Bachmair, D. Finley, A. Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186 (1986).

      T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, The N-end rule pathway. Annu Rev Biochem 81, 261-289 (2012).

      A. Varshavsky, N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci 116, 358-366 (2019).

      C. S. Hwang, A. Shemorry, D. Auerbach, A. Varshavsky, The N-end rule pathway is mediated by a complex of the RING-type Ubr1 and HECT-type Ufd4 ubiquitin ligases. Nat Cell Biol 12, 1177-1185 (2010).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus.

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (32, 74), and that polyX prevalence differs among species (43, 75-77).

      1. Cheung HC, San Lucas FA, Hicks S, Chang K, Bertuch AA, Ribes-Zamora A. An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 control. BMC Genomics. 2012;13:664.

      2. Mier P, Elena-Real C, Urbanek A, Bernado P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020;18:306-13.

      3. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      4. Kuspa A, Loomis WF. The genome of Dictyostelium discoideum. Methods Mol Biol. 2006;346:15-30.

      5. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev. 2017;41(6):923-40.

      6. Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709-19.

      We will cite the two references by Kiersten M. Ruff in our revised manuscript.

      K. M. Ruff and R. V. Pappu, (2015) Multiscale simulation provides mechanistic insights into the effects of sequence contexts of early-stage polyglutamine-mediated aggregation. Biophysical Journal 108, 495a.

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis.

      Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43; Palo Mier et al. 2020), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown below, i.e., polyQ (Author response image 1), polyN (Author response image 2), polyS (Author response image 3) and polyT (Author response image 4).

      Author response image 1.

      Q contents in 7 different types of polyQ motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 2.

      N contents in 7 different types of polyN motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 3.

      S contents in 7 different types of polyS motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 4.

      T contents in 7 different types of polyT motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      The results summarized in these four new figures support that polyX prevalence differs among species and that the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43; Palo Mier et al. 2020).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 1). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 2). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 3). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 4).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed. The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      (1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (Tuite, M. F. 2006). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      M. F., Tuite, Yeast prions and their prion forming domain. Cell 27, 397-407 (2005).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007).

      J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      (2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (Traven, A. and Heierhorst, J. 2005). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      A. Traven and J, Heierhorst, SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteins. Bioessays. 27, 397-407 (2005).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      (3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package.

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we show evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected by translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      Thank you. These comments are not supported by the results in Figure 1.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (Huttenhower, C., et al. 2009).

      Curtis Huttenhower, C., Haley, E. M., Hibbs, M., A., Dumeaux, V., Barrett, D. R., Hilary A. Coller, H. A., and Olga G. Troyanskaya, O., G. Exploring the human genome with functional maps, Genome Research 19, 1093-1106 (2009).

      The results presented in Author response image 5 and Author response image 6 support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (74). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      1. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      Author response image 5.

      Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes. The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stop codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 6.

      Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes. The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides valuable insights into allosteric regulation of BTK, a non-receptor protein kinase, challenging previous models. Using a variety of biophysical and functional techniques, the paper presents evidence that the N-terminal PH-TH domain of BTK exists in a conformational ensemble surrounding a compact SH3-SH2-kinase core, that the BTK kinase domain can form partially active dimers, and that the PH domain can form a novel inhibitory interface after SH2/SH3 disengagement. Overall the presented evidence is solid, but the EM results may be over-interpreted and the work would benefit from additional functional validation.

      We made every effort in our descriptions of the cryoEM data presented for full-length BTK to not overinterpret the results. In essence this is not an ideal EM target but given the failure by us and others to capture the full-length multi-domain protein crystallographically, we decided that the albeit low resolution cryoEM data are useful to the field.

      Reviewer #1 (Public Review):

      The manuscript by Lin et al describes a wide biophysical survey of the molecular mechanisms underlying full-length BTK regulation. This is a continuation of this lab's excellent work on deciphering the myriad levels of regulation of BTKs downstream of their activation by plasma membrane localised receptors.

      The manuscript uses a synergy of cryo EM, HDX-MS and mutational analysis to delve into the role of how the accessory domains modify the activity of the kinase domain. The manuscript essentially has three main novel insights into BTK regulation.

      1) Cryo EM and SAXS show that the PHTH region is dynamic compared to the conserved Src module.

      2) A 2nd generation tethered PH-kinase construct crystal of BTK reveals a unique orientation of the PH domain relative to the kinase domain, that is different from previous structures.

      3) A new structure of the kinase domain dimer shows how trans-phosphorylation can be achieved.

      Excitingly these structural works allow for the generation of a model of how BTK can act as a strict coincidence sensor for both activated BCR complex as well as PIP3 before it obtains full activity. To my eye the most exciting result of this work is describing how the PH domain can inhibit activity once the SH3/SH2 domain is disengaged, allowing for an additional level of regulatory control.

      I have very few experimental concerns as the methods and figures are well-described and clear. As the authors are potentially saying that the previously solved PH domain-kinase interface is artefactual, additional evidence strengthening their model would be helpful to resolve any possible controversies.

      We do not argue that the previously solved PH domain-kinase interface is artefactual. Instead we point out that the PH/kinase interface identified in the prior structure is incompatible with the contacts between the SH3 and kinase domains in autoinhibited BTK. This then leads us to the suggestion that a PH/kinase inhibitory interaction may instead occur upon dissociation of the SH3-SH2 cassette from the kinase domain. Our data support that model. Moreover, our data suggest the PHTH domain is dynamic, likely not settling in to one particular autoinhibitory state. Thus, it is possible the previously solved PH/kinase structure exists within the conformational ensemble of a range PH/kinase domain interactions. In an effort to clarify our think we added two sentences to the Discussion (pg. 19).

      Reviewer #2 (Public Review):

      In this study, multiple biophysical techniques were employed to investigate the activation mechanism of BTK, a multi-domain non-receptor protein kinase. Previous studies have elucidated the inhibitory effects of the SH3 and SH2 domains on the kinase and the potential activation mechanism involving the membranebound PIP3 inducing transient dimerization of the PH-TH domain, which binds to lipids.

      The primary focus of the present study was on three new constructs: a full-length BTK construct, a construct where the PH-TH domain is connected to the kinase domain, and a construct featuring a kinase domain with a phosphomimetic at the autophosphorylation site Y551. The authors aimed to provide new insights into the autoinhibition and allosteric control of BTK.

      The study reports that SAXS analysis of the full-length BTK protein construct, along with cryoEM visualization of the PH-TH domain, supports a model in which the N-terminal PH-TH domain exists in a conformational ensemble surrounding a compact/autoinhibited SH3-SH2-kinase core. This finding is interesting because it contradicts previous models proposing that each globular domain is tightly packed within the core.

      Furthermore, the authors present a model for an inhibitory interaction between the N-lobe of the kinase and the PH-TH domain. This model is based on a study using a tethered complex with a longer tether than a previously reported construct where the PH-TH domain was tightly attached to the kinase domain (ref 5). The authors argue that the new structure is relevant. However, this assertion requires further explanation and discussion, particularly considering that the functional assays used to assess the impact of mutating residues within the PH-TH/kinase domain contradict the results of the previous study (ref 5).

      In our hands BTK activity is not significantly affected by mutation of just two residues, R133 and Y134. It is somewhat difficult to compare the previously reported activity assay for the same BTK mutant (Wang et al. ref 5, Figure 4D) with the data we report here. For unexplained reasons, the time scale for the quantitative assay in the previous work is truncated to 50 munutes for the R133/Y134 mutant data compared to 120 minutes for all of the other activity data reported in that figure. In our data, if we qualitatively examine the differences in a representative progress curve at 50 minutes between WT and the double R133/Y134 mutant (see Figure 6a, dark blue and pink traces) one might conclude that the R133/Y134 mutation is activating BTK. However, when we calculate the average kinase activity rate ± standard error for three independent experiments we find that the difference between WT and the double R133/Y134 mutant is not significant (see Figure 6b and c). Thus, instead of making any assertions about the previously published data we are trying to be as rigoruous as possible in presentation and interpretation of our own data.

      In addition, throughout the manuscript we tried to be very careful in our discussion of our data and that published previously, to avoid conclusive statements about the previously described interface. Afterall, one of our overriding conclusions is that the N-terminal region of BTK is highly dynamic. See response to reviewer 1 above.

      Additionally, the study presents the structure of the kinase domain with swapped activation loops in a dimeric form, representing a previously unseen structure along the trans-phosphorylation pathway. This structure holds potential relevance. To better understand its significance, employing a structure/function approach like the one described for the PH-TH/kinase domain interface would be beneficial.

      We completely agree with this comment and are pursuing such studies now.

      Overall, this study contributes to our understanding of the activation mechanism of BTK and sheds light on the autoinhibition and allosteric control of this protein kinase. It presents new structural insights and proposes novel models that challenge previous understandings. However, further investigation and discussion would significantly strengthen the study.

      As indicated we are pursuing further investigation and felt that the body of work presented here is sufficient for a single manuscript.

      Reviewer #3 (Public Review):

      Yin-wei Lin et al set out to visualize the inactive conformation of full-length Bruton's Tyrosine Kinase (BTK), a molecule that has evaded high-resolution structural studies in its full-length form to this date. An open question in the field is how the Pleckstrin Homology-Tec Homology (PHTH) domain inhibits BTK activity, with multiple competing models in the field. The authors used a complimentary set of biophysical techniques combined with well-thought-out stabilizing mutations to obtain structural insights into BTK regulation in its full-length form. They were able to crystallize the full-length construct of BTK but unfortunately, the PHTH was not resolved yielding a structure similar to that previously obtained in the field. The investigation of the same construct by SAXS yielded an elongated structural model, consistent with previous SAXS studies. Using cryo-EM the authors obtained a low-resolution model for the FL BTK with a loosely connected density assigned to the dynamic PHTH around the compact SH2-SH3-Kinase Domain (KD) core. To gain further molecular insights into PHTH-KD interactions the authors followed a previously reported strategy and generated a fusion of PHTH-KD with a longer linker, yielding a crystal structure with a novel PHTH-KD interface which they tested in biochemical assays. Lastly, Yin-wei Lin et al crystallized the BTK KD in a novel partially active state in a "face-to-face" dimer with kinases exchanging the activation loops, although partially disordered, being theoretically perfectly positioned for transphosphorylation. Overall this presents a valiant effort to gain molecular insights into what clearly is a dynamic regulatory motif on BTK and is a valuable addition to the field.

      However, this work can be improved by considering these points:

      1) The cryo-EM reconstructions are potentially over-interpreted. The reported resolution for all of the analyzed reconstructions is better than 8Å, at which point helices should be recognized as well-resolved structural elements. In the current view/depiction of the cryo-EM maps/models it is hard to see such structural features and it would be great if the authors could include a panel showing maps at higher thresholds to show correspondence between the helices in the kinase C lobe and the cryo-EM maps. Otherwise, the overall positioning of the models within the cryo-EM maps is hard to evaluate and may very well be wrong. (Fig 4, S2).

      First, we fully recognize the model is low-resolution and we are careful in our discussion of the cryo-EM data to use language that acknowledges the limitations of the model. Nevertheless, this is the model we have (specific data processing points are discussed below).

      The resolution numbers are from the Fourier Shell Correlation (FSC) curve given by Cryosaprc at the end of refinement. We do acknowledge the reviewer’s comments that the resolution could be over estimated in that calculation, but our main focus is to show that the overall domain arrangement of the autoinhibited BTK core (Src-module) fits into the reconstructions.

      We tested visualizing the maps at higher threshold, but the secondary structures of the reconstructions were still not well resolved. We do realize that with the current reconstructions, we do not have the structural details to correctly orientate and fit individual domains; this is why we chose to simply fit the available crystal structure of the autoinhibited BTK SH3-SH2-kinase core into the maps.

      2) With the above in mind, if the maps are not at the point where helices are well resolved, it may be beneficial to low-pass filter the maps to a more conservative resolution for fitting, analysis, and representation. (Fig 4, S2).

      Using low-pass filtered maps at 10Å or unsharpened maps, the fitting of the BTK model and map do not change significantly.

      3) It would be valuable to get a quantitative metric on the model/map fitting for the cryo-EM work. One good package for this is Situs which provides cross-correlation values for the top orthogonal fits, without user input for initial fitting. This would again increase confidence in the correctness of model positioning on the map. (Fig 4, S2).

      Thank you for this suggestion. We tested the colores feature (Exhaustive One-At-A-Time 6D Search) in Situs to perform model to map fitting without user input as the reviewer suggested. The highest ranked fitting is identical to what we presented in the manuscript. Following are the cross-corelation numbers calculated from “Fit-in-map” tool in chimera and from “collage” function in Situs. We now indicate this step in the caption to Figure 4.

      Author response table 1.

      4) It would be great to see 2D class averages from the particles contributing to each of the 3D classes. Theoretically, a clear bright "blob" (hypothesized to be the PHTH domain) should be observable in the 2D class averages. In the current 2D class averages that region is unconvincingly weak. (Fig 4, S2).

      We attempted to improve both 2D and 3D reconstructitions by feeding the particles from each 3D class through many cycles of 2D classification and selection to exclude ‘bad’ paritcles, but neither the 2D class averages nor 3D reconstructions could be improved.

      We agree the feature that appears in the 2D class averages is weak. The BTK protein is only 77kD in size and is highly dynamic and flexible. Thus, in reality this is not an ideal system for cryo-EM. As well, the PHTH domain itself is quite small and NMR data, acquired in the context of a different project, provides evidence that the isolated PHTH domain is dynamic in solution (NMR linewidths vary throughout the protein suggesting intermediate exchange). Nevertheless, given the inability to capture the PHTH domain in crystal structures of full-llength BTK we reasoned that cryo-EM could provide some insight. In the future we anticipate building on these data to include inhibitory binding partners of BTK; however such an effort is beyond the scope of the current work.

      5) It seems like there was quite a large circular mask applied during 2D classification. Are authors confident that the weak density attributed to the PHTH domain is not neighboring particles making their way into the extraction box? It would be great if the authors would trim their particle stack with a very stringent interparticle distance cutoff (or report the cutoff in the manuscript if already done so) to minimize this possibility.

      We initially picked particles using a small radius (100 Å), and stringently selected 2D classes with particles that contained only density aligning to the core SH3-SH2-kinase domains. We found, however, that 3D ab initio reconstruction always resulted in an additional density located at different positions around the larger core density. The structure of a single BTK PHTH domain fits into that additional remote density. Given the additional density that consistently appeared in 3D reconstructions, we went back and picked particles using a larger circular mask (200 A). Subsequent 2D classification and 3D reconstruction from this analysis gave similar results and are presented in the manuscript.

      Regardless of the mask radius, we used stringent conditions for particle picking and checked for the presence of duplicates. An interparticle distance cutoff of 0.1 to 0.5 times the particle diameter was used and resulted in fewer number of particles, but the presence of the extended density remains. We also made use of template picking (2D class averages) to repick the particles and found no significant difference in the number of particles or quality of 2D classifications.

      6) The cryo-EM processing may benefit from more stringent particle picking. The authors picked over 2M particles from 750 micrographs which likely represents very heavy overpicking. I would encourage the authors to re-pick the micrographs with 2D class averages and use more stringent metrics to reduce the overpicking. This may result in higher-resolution reconstructions. (Fig 4, S2).

      This was an effort to maximize the number of particles extracted. After multiple rounds of 2D classification and selection to exclude empty and junk particles, the final number of particles selected for 3D ab-initio reconstructions were only 68,788, and only ~20K particles for each 3D reconstruction. Thus, we are not concerned that we overpicked particles. This approach is described in Supp Figure S2.

      7) The Dmax from SAXS for the Full Length BTK is at 190Å. It would be great if the authors could make a cartoon of what domain arrangement may satisfy this distance, as it is quite extended for such a small particle. Can the authors rule out dimerization at SAXS concentrations? (Fig 1).

      SAXS data for full-length, wild-type BTK has been previously published (Márquez et al, 2003 EMBO J. (2003) 22:4616-4624). Our data for WT BTK are consistent with that published previously (and we have cited this previous work). In that work, the authors attribute the ~200 Å Dmax value to an elongated BTK conformation where the domains of BTK are arranged in a linear fashion (a figure showing this domain arragement is provided by Marquez et al. precluding the need for such a cartoon here).

      In the present work we take advantage of targeted mutations to stabilize the autoinhibted SH2-SH2-kinase core and the Dmax value that we report for this more autoinhibited version of full-length BTK (FL 4P1F) is ~150Å. Notwithstanding low resolution in both SAXS and cryoEM, it is notable that superposition of the cryoEM models in Figure 4c & d gives a distance of ~150Å between the PHTH domains from the two models.

      Finally, we cannot completely rule out that a small fraction of full length BTK is forming dimers. However, in our experience purifying and working with this protein, we find that purified and concentrated monomeric fulllength Btk proteins (as high as 15mg/ml) are quite stable and remain monomeric and free of aggregation even after sitting at 4°C for more than a week. Here the BTK SAXS data were collected within 24 hours after the samples were thawed.

      8) In Figure S1 (C) it seems that the curves are just scattering curves with Guinier plots in the inserts, but are labeled as Guinier plots in the legend. The Guinier plots for some samples (FL 4P1F) show signs of aggregation, which may complicate the analysis, it could be beneficial to redo.

      We thank the reviewer for pointing out our mistake in presention of the SAXS data. We have now replaced plots in Figure S1c with the correct scattering profiles for each construct with the Guinier insets shown. We revised the label of this panel to “Scattering profile and Guinier plots (insets)”.

      In addition, we re-processed the FL 4P1F data by performing buffer subtraction (using a different buffer alone scattering dataset (also collected during original data acquisition)). The data quality after reprocessing were significantly improved (see new scattering profiles and Guinier plots for full-length BTK in Supplementary Figure S1). Protein stability (see above) and the current data quality therefore suggest that aggregation is not complicating the SAXS analysis.

      9) Have the authors verified that the activation loop mutations that they introduce do not disrupt the PHTH binding as they previously reported an activation loop on BTK to interact with PHTH, an interaction they do not see here? If so, a citation would be helpful in the text. If not, testing this would strengthen the paper.

      The same activation loop mutations were included in the constructs used in the previous solution studies of the PHTH/kinase domain interaction by NMR and HDX (see ref [11]). We clarify this point in the methods section. As well, all but one of the sequence changes introduced into the activation loop are at positions at the ‘base’ of the activation loop and therefore are not surface exposed. Only one amino acid change is on the exposed part of the activation loop (V555T).

      10) Can the authors comment on the surfaces which are accessible and inaccessible to the PHTH in the crystal (Fig 3E)? The fact that PHTH doesn't adopt a stable conformation in the solvent channel to some degree indicates that the accessible interaction surfaces are not suitable for PHTH interactions, as the "effective concentration" of the PHTH would be quite high. Are these surfaces consistent with the cryo-EM analysis?

      This is an excellent point and we did state the following in describing the crystallization results:

      “the crystallography results are consistent with a flexible N-terminal PHTH domain with the caveat that the domain swapped dimer organization might limit native autoinhibitory contacts between the PHTH and SH3SH2-kinase regions.”

      In the domain swapped dimer seen in the crystal, a symmetry related molecule does partially block the Ghelix region of the kinase domain while the activation loop and C-helix in the N-lobe remain accessible. Our previous solution studies (ref [11]) pointed to the G helix as part of the interaction interface in addition to the activation loop and part of the N-lobe. We have now modified the sentence above to more clearly describe which parts of the kinase domain are inaccessible in the crystal and the possible ramifications of the steric environment on PHTH domain mobility in the crystal (see pg. 10). That said, all of our previous HDX data shows little protection in the PHTH domain in full-length BTK (mapping of the PHTH/kinase interaction was only possible in trans using excess PHTH domain) and so our data can be best summarized by concluding that the PHTH domain visits a number of conformational states and makes transient contacts with various regions of the kinase domain (dependent upon whether the SH3-SH2 region is engaged or not). This is similar to the ‘fuzzy’ intramolecular contacts described for the N-terminal region of the SRC family. Like the SRC family, BTK (and other TEC kinases) contain a long disordered linker between the N-terminal region and the compact SH3-SH2-kinase core.

      11) For the novel active state dimer of the Kinase Domain it would be great to see some functional validation of the dimerization interface. It is structurally certainly quite suggestive, but without such experiments the functional significance is unclear. If appropriate mutations have been published previously a citation would be helpful.

      We completely agree. We scoured the literature and our own facuntional assay results over many years but the appropriate mutations to test the functional significance of the kinase domain dimer have not been reported or previously studied in our lab. We are therefore actively pursuing this line of investigation now.

      Reviewer #1 (Recommendations For The Authors):

      I have the following proposed experiments/analysis that should help.

      1) To better validate the putative PH-kinase interface seen, the authors should try some alphafold multimer / rosettaTTFold modelling of just the PHTH module with the kinase domain. The advantage of this is that it will test how conserved over evolution the potential interface is, and will help to decipher discrepancies between the two structures. This may end up being similar to what is seen in Akt (in this case the alphafold prediction does not match the allosteric inhibitor structure, or the nanobody bound structure), but this could help provide additional insight into how the PH domain interacts.

      We have applied alphafold to this system. The PHTH-kinase fusion sequence was fed to Alphafold and the separate PHTH and kinase domains to Aphafold multimer. The results provide a range of ‘complexes’ none of which recapitulate the PHTH/kinase interface reported here or that reported by Wang et al in previous work. Three of five results from Alphafold Multimer place the PHTH domain on the activation loop face of the kinase domain consistent with the previous solution data pointing to a similar regulatory interface. This is interesting but our experience in applying alphafold to dynamic confromationally heterogeneous systems is that the results need to be considered with caution. For that reason we did not include any of the alphafold predictions in the manuscript.

      Evolutionary conservation is discussed further in the next section:

      2) Could the authors provide a detailed evolutionarily analysis of the binding surface between the PHTH and kinase domains and include this in Fig5, this also would help interpret the likelihood of this interface.

      This is an excellent question and we have in fact previously published a detailed evolutionary analysis of the BTK kinase domain in collaboration with Kannan Natarajan (see Amatya et al., PNAS, 2019, [ref 11]). In that work we found that evolutionarily conserved residues on the kinase domain map to the activation loop face, supporting the solution data that the PHTH interacts with the kinase domain across the activation loop face. That work predated alphafold but it is interesting that, to the exent that alphafold predicts anything, it seems to converge on the PHTH domain containg the activation loop face.

      In the context of our current work, and this question from the reviewer, we re-examined the evolutionary anlysis carried out previously and find that BTK (or TEC family) specific residues on the kinase domain do not appear at the newly identified PHTH/kinase interface we report here. We could speculate that since the ‘back’ of the kinase domain N-lobe interacts with multiple binding partners (SH3, SH2-linker and PHTH) evolutionary pressures may have resulted in a certain degree of plasticity to allow recognition of multiple binding partners.

      Evolutionary analysis of the BTK PH domain was also carried out previously and shows that the conserved sites map to the phospholipid binding pocket of the PH domain. The analysis did not include TH domain residues. Since we find the TH domain contributes to the PHTH/kinase interface in our crystal structure, we do not have the data at this time to do a thourough anaylsis but we appreciate this comment and can address this in furture work with collaborators.

    1. Author Response

      First of all, we would like to thank you for the opportunity to get the three valuable sets of comments on our work from the reviewers and the important summary from the Chief Editor. If we understand correctly, at this moment, we are expected to check for any factual errors, and our response at this stage will affect the choice of which reviewer’s comment will be published as a part of the reviewed Preprint. If so, we want to comment on some of the reviewer's points (Part A). These are not factual errors but more misunderstandings that need to be corrected. Furthermore, it depends on your decision whether it will be a part of the response or not. In Part B, we will address the reviewer's comments.

      Part A:

      1) Reviewers #1 and #3 missed our originally already reported PNAs dynamics based on live-cell imaging (mainly Reviewer #3 stressed that the dynamic we present is extrapolated from fixed imaging). We previously published the detailed dynamics of PNAs as detected by live-cell imaging (Imrichova, Aging 2019, doi: 10.18632/aging.102248. Epub 2019 Sep 7). It seems that we have not sufficiently highlighted this important aspect in the present eLife manuscript, despite in the Introduction part, we have described the dynamic transitions between the individual PNAs types/stages, yet without explicitly emphasizing that such dynamic insights were deduced from our live-cell imaging experiments.

      2) Reviewer#2 asked us to reconcile the different phenotypes after RNAi of TOP2A (KD induces PNAs) and TOP2B (KD does not induce PNAs), vis a vis the fact that the TOP2B-targeting drug -doxorubicin is a strong inducer of PNAs formation. We would like to stress that doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (at low concentration) or inhibit (at high concentration) all subtypes of topoisomerase 2. In other words, doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes, which, on the other hand, can manifest under conditions when only a specific one member is depleted genetically. We have further discussed this interesting issue in the discussion presented in our manuscript, and we believe there is no discrepancy, due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B in preventing PNAs.

      3) We are aware that the biological significance of the interaction of PML with nucleolus has not been fully solved yet. At this moment, we can conclude that PNAs recognize and sequester the damaged/aberrant rDNA from active nucleolus. This novel sorting mechanism might be necessary for maintaining the integrity of the repetitive rDNA loci that might otherwise be altered or lost during complex recombinational rDNA repair. Importantly, we also identified substances (mostly chemotherapeutics) that cause rDNA damage. Given that PML is a multifaceted protein involved in diverse processes; PML depletion might affect several stress-related processes. The rDNA quality/quantity analysis is also highly challenging because of the high number of rDNA copies (200-400). As preparing such an experimental model/s is difficult and time-consuming, addressing this issue in more detail will be a part of our follow-up work. Nevertheless, we will perform the bulk of the experiments recommended by the reviewers, to strengthen the conclusions of this manuscript, as follows: A) We will explore whether the PNAs formation is linked to some specific cell cycle phase; B) To strengthen the experiments with inhibition of NHEJ (DNA PKi) and HR (B02i), we will perform the RNA interference or use some other inhibitor/s operating through a distinct mechanism yet targeting the same repair process; C) We will analyze the recovery from I-PpoI treatment and assess cell proliferation, ability to form colonies, and the presence of senescent cells.

      Part 2

      Reviewer #1 (Public Review):

      Summary:

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways.

      Strengths:

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51-mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure.

      Weaknesses:

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      At this moment, we cannot mechanistically fully elucidate the biological significance of this peculiar process. However, our data shows that the dynamic interaction of PML with nucleolus can sequester damaged rDNA from reactivating nucleolus. We propose that in this way, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on 5 chromosomes are repetitive. Thus, this novel sorting mechanism might help sustain repetitive rDNA loci integrity.

      Reviewer #2 (Public Review):

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms.

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease I-Ppol induced DSB at a defined location in rDNA and led to PNAs.

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study.

      1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B?

      1) We thank the reviewer for this comment and below explain why there is no discrepancy in the observed phenotypes. Doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (stabilize ternary complex at low concentration) or inhibit (e.g., defects in decatenation at high concentration) all subtypes of topoisomerase 2. It intercalates DNA (alteration of DNA torsion; histone eviction) and elevates oxidative stress. Therefore, the observed effect of doxorubicin reflects its broader impact, also beyond inhibition of Top2B: as doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes (which on the other hand can manifest under conditions when only one specific member is depleted genetically), thereby causing a robust induction of PNAs. We have further discussed this issue in the Discussion section of our manuscript, and we believe there is no discrepancy, in the observed phenotypes due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B (both of which are impacted to some extent by doxorubicin) in preventing PNAs.

      2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach.

      We are grateful for this suggestion and will perform the recommended experiments the outcome of which will indeed help to exclude the possible off-target effects of B02 and NU-7441. We are now collecting/testing the necessary tools and will carry out these analyses proposed by the reviewer.

      3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We are aware of the relevant literature on ATM, and appreciate this question from the reviewer. During the revision of this manuscript, we will therefore address the role of ATM signaling in the phenomena that we report here. As ATM signaling is essential for the repression of pre-rRNA synthesis and the compaction of rDNA into the nucleolar caps in response to rDNA damage, we will complement this knowledge by testing to what extent might ATM inhibition affect the induction of PNAs/PML-NDS in our model and experimental settings.

      Reviewer #3 (Public Review):

      Summary:

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited.

      Strengths:

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures.

      Weaknesses:

      The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics.

      We believe this comment reflects a misunderstanding, for the following reason: We fully agree with the reviewer that live-cell imaging is critical to properly capture the dynamics of the PNAs formation and evolution, and apologize for not sufficiently highlighting that this was already presented in our previous study in which we described the existence and dynamics of PNAs over time, based on the live cell imaging that the reviewer correctly regards as important. In Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7), we used live-cell imaging to describe the dynamics of forming PNAs and the transition between individual types, and we referred to this work in the Introduction section of our present manuscript. By those experiments, including the live-cell imaging, we showed that after the recovery of RNAPI transcription, which usually follows the washout (removal) of the DNA-damaging agents, the funnel-like PNAs are transformed into PML-NDS. These newly emerging PNAs (PML-NDS) are placed next to the reactivated nucleolus. To document this, we paste below the relevant part of the Introduction text that was included in our submitted manuscript (see below in italics). Nevertheless, we did not emphasize that the transition between individual types of PNAs was obtained using live-cell imaging of cells ectopically expressing PML-EGFP and B23-RFP. In the revised manuscript, we will include this critical information and will complement this by a scheme explaining the dynamics of PNAs transitions.

      Copied text from our manuscript, relevant to this issue: Doxorubicin, a topoisomerase inhibitor and one of the PNAs inducers, provokes a dynamic interaction of PML with the nucleolus, where the different phases linked to RNAPI inhibition can be discriminated into four basic structural subtypes of PNAs termed according to the 3D structures obtained by super-resolution microscopy as PML 'bowls', PML 'funnels', PML 'balloons' and PML nucleolus-derived structures (PML-NDS; (36)). The doxorubicin-induced inhibition of RNAPI leads to a nucleolar cap formation around which diffuse PML accumulates to form the PML bowl. Note that this event is rare as a minority of nucleolar caps are enveloped by PML (36). As the RNAPI inhibition continues, PML bowls protrude into PML funnels or transform into PML balloons wrapping the whole nucleolus. When the stress is relieved and RNAPI resumes activity, a PML funnel transforms into distinct compartments placed next to the non-segregated (i.e., reactivated) nucleoli, PML nucleolus-derived structures (PML-NDS). PML-NDSs contain nucleolar material, rDNA, and markers of DNA DSBs (36,37).

      Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division.

      We agree this is an important point. In a complementary setting we previously published (Imrichova et al., doi: 10.18632/aging.102248. Epub 2019 Sep 7) that exposure of RPE-1 hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. Thus, most of such cells will not enter the cell cycle again. Regarding the I-PpoI-based model, we indeed did not show in the present manuscript how I-PpoI activation (rDNA damage) affects the cell cycle. In our preliminary experiments that address this issue, we saw that only about 1–3% of cells can recover from the stress and form colonies in a colony-forming assay. We will further repeat and corroborate these preliminary data and include these results in the revised manuscript, together with β-galactosidase staining to demonstrate the presence of senescent cells.

      Furthermore, as suggested by this reviewer, we will assess the cell cycle phase/position of the cells in our experiments, to find out whether the cell cycle phase affects/correlates with the PNAs formation.

      The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.

      This is partly explained by our response to Reviewer no 1, related to our previous live-cell imaging analyses. The 'bowl' emerges first and can be transformed into a 'funnel' or 'balloon'. All these PML structures are in contact with the nucleolar cap (the RNAPI is inhibited). Upon reactivation of RNAPI, the funnel can transform into the PML-NDS. At this moment, we cannot conclude to which precise process the individual structure is linked. However, we already know (Hornofova et al., DOI: 10.1016/j.dnarep.2022.103319) that the funnels colocalize with the highest portion of rDNA, which may reflect some process of concentration/clustering of rDNA. This observation is supported by results presented in this manuscript, which show that individual acrocentric chromosomes (NORs) also accumulate in one funnel. To summarize, the formation of the bowl reflects the aberration in rDNA. The funnel can accumulate rDNA and NORs in one site. The transition between the funnel and PML-NDS mirrors the changes after the reactivation of RNAPI and facilitates the sequestration of damaged rDNA/NORs outside of the active nucleolus. As the processes linked to the individual PNA are not solved yet, we will at least address this issue in a discussion.

      An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea, although demanding and somewhat outside the focused scope of the present study. Our follow-up work will focus on the localization of individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI. In the context of those studies, we also plan to analyze rDNA 3D architecture.

      Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with this possibility and in response, we will perform a series of cell cycle analysis experiments to address this issue, during the revision phase of this manuscript. We will analyze whether I-Ppol-induced PNAs are linked to some cell cycle phase(s).

      Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.

      We will follow this recommendation by the reviewer. In ALT, PML is essential for clustering several (damaged) telomeres into APB. In PML-deficient cells, there is not only a defect in the formation of APB, but also the ALT telomeric DNA synthesis in G2 cells is blocked. As we already mentioned, funnel-like PNAs can accumulate several NORs. Thus, the recombination process between NORs might be facilitated. We will highlight this link and its relevance for cancer in our revised manuscript, thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their insightful comments, suggestions, and criticism. In the updated version of the manuscript, all these will be properly reflected. Here we briefly address the main points raised:

      Reviewer #1:

      1.1) Patient selection and tumor area selection are crucial for this study but not very carefully defined. Why are some core and others not? Figure referral is an issue here (sup figure 6 where all core and non-core samples are supposed to be according to the legend of Fig 4 is likely sup fig 7 but this is then a complete copy paste of Figure 4). In the methods it is stated that the core samples are based on limited contamination of additional morphotypes (<20%) but Fig 4 suggests that all tumours listed have multiple morphotypes.

      The tissue samples were obtained from a hospital cohort of patients with stage II-IV colorectal cancer (at diagnostic time), with no particular selection criteria imposed, as this was an exploratory study.

      Tumor regions were marked for macro-dissection by an experienced pathologist following the standard practice for whole-tumor transcriptomics studies. The subregions (morphological regions) were marked by the same experienced pathologist for macro-dissection (in an adjacent section) and reassessed later with respect to their “morphological purity”. It is impossible to macro-dissect regions containing a single morphological pattern. Hence, those regions which contained significant amount (>=20%) of other morphologies were considered “non-core”, while the rest were called “core” regions. This distinction applies to morphological regions solely and not to whole-tumor samples. Indeed, the reference in caption to Figure 4, should refer to Supp. Fig. 7 (and has been updated).

      1.2) CMS subtype should be performed with single sample predictor rather than CMScaller.

      We agree that a single-sample predictor for CMS is needed, however CMScaller is the de facto classifier for CMS (>130 citations) so we used it to illustrate the practical implications.

      1.3) A couple of surprising observations need specification. MUC2 is a strong CMS3 reporter gene yet Mucinous tumours appear to end up in CMS4 rather than 3. Can the authors show that indeed stroma cells are very evident in these samples?

      We do not have a direct estimation of the amount of stromal cells, but the high scores of the various fibroblast-related signatures in mucinous regions (Fig2 B, D) indicate that, indeed, there is an enrichment in stroma. In the follow-up study we plan to perform specific staining as well as spatial transcriptomics of these regions to further investigate our findings.

      1.4) The SE PP and CT are assigned to CMS2, but in Figure 4 this appears a lot more variable than the authors would make the reader believe. The full data are not completely clear (see point 1).

      In the paper, we transparently state that PP, SE, and CT were assigned to CMS2 in 62.5%, 41.7% and 41.9% of cases, respectively. These proportions referred to all samples for which CMSCaller made a prediction. In Fig.4, we also show the proportion of cases in which CMSCaller did not predict any subtype.

      1.5) The tumor response rates are rather weird as this is likely dependent on the complete tumour and not so much the subareas. It is not very well described what we see in this analysis.

      We did not compute any response rates but simple prognostic scores as (weighted, if weights were provided) means of genes in the specific signatures (see Methods). The question addressed was whether these scores were comparable between whole tumor and corresponding tumor regions (within same tumor). Given the observed (relative) variability, the more important follow-up question - which we cannot answer with our limited survival data – is whether a higher score in a region in comparison with whole-tumor is indeed indicative of a higher risk of relapse.

      1.6) Serrated adenomas have previously been aligned with CMS4. Is this different from serrated areas in cancers?

      We do not have data from adenomas to compare with the serrated carcinoma regions. But a comparison of (regions of) both traditional serrated and sessile serrated adenomas to serrated carcinoma would be interesting.

      1.7) The fact that iCMS2 and iCMS3 align rather well with the current analysis of the distinct regions suggests that the analysis that was reported last year is the proper way to view tumor intrinsic signatures. The authors now propose a rather similar outcome to this issue which does take away a lot of the novelty of the findings of this study.

      In the manuscript it is clearly stated that our goal was to describe the molecular characteristics associated with several morphological patterns. It was not to propose another stratification paradigm for colorectal cancer. As such, our analyses were not limited to molecular subtypes and the respective observations were but a small part of our findings. Indeed, the intrinsic subtypes (iCMS 2/3) were stable and robust, as they were based on the genes expressed in epithelial cells, and they might well prove to be of clinical importance too. However, they do not cover all aspects (e.g. fibroblasts subtypes) and, as stated in Joanito et al. Nat Gen 54, pages 963–975 (2022), “iCMS, MSI status and CMS jointly inform the molecular classification of CRC”. Last, in our opinion, the molecular classification of CRC, while a useful point of view in tumour classification, is not covering all the necessary perspectives on tumour heterogeneity.

      Reviewer #2:

      2.1) Overall, the manuscript provides an interesting histological/morphological framework through which we can consider heterogeneity in colorectal carcinoma and an approach by which we might improve the performance of gene expression-based classifiers in predicting clinical behaviour and/or responses to therapy. Exploration of CRC morphotypes and their differences was quite interesting. However, more work is needed to support the claims made by the authors. While I appreciate that the authors themselves identify limitations of their study within the manuscript, I believe awareness of these limitations is not reflected in some of the claims made in the abstract and at points in the main text when discussing the use of expression-based classifiers.

      The manuscript was improved to clarify several aspects that Reviewer 2 rightly pointed out:

      1. We clarify that for a patient (tumor) there might be one or several corresponding transcriptomics profiles (see Methods).

      2. The resulting “molecular portraits” were not derived with the goal to deconvolve the bulk tumor expression profiles and to estimate the proportions of morphotypes. Whether this is possible at all, is an open question and we mention this aspect in “Ideas and Speculation” section.

      3. We improved figures captions to be more descriptive.

      4. We included the reference for “Isela signature” at its first appearance.

    1. Author Response

      The following is the authors’ response to the current reviews.

      1) The main issue relates to Set2, and how STIM1 expression rescues Set2-dependent functions in Set2 KO flies. If Set2 is downstream of STIM1, how would STIM1 over-expression rescue a Set2-dependent effect?

      STIM rescue is of Set2 knockdown (RNAi) and NOT Set2 Knockout flies. Over expression of STIM raises SOCE in primary cultures of Drosophila neurons (as demonstrated in previous publications from our group: Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016). The higher SOCE drives greater expression of Set2 from the endogenous locus thus reducing the efficacy of Set2 RNAi. Hence the rescue by STIM of Set2 KD flies in Figure S2E. We have explained this in lines 227-234.

      2) There is still no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant.

      Measurement of SOCE is not technically feasible in ex-vivo preps due to the presence of extracellular calcium in the brain milieu. In the past we have measured SOCE from primary cultures of central dopaminergic neurons expressing either native Orai OR OraiE180A mutant (Pathak et al., 2015) where we found that all dopaminergic neurons expressing OraiE180A exhibit very low SOCE. This is the reason we have not measured SOCE in the fewer cells of the fpDAN subset marked by THD' GAL4. This point has been specifically mentioned and explained in the section on “limitations of the study” at the end of the manuscript.

      3) The revised version does not include an analysis of the STIM:Orai stoichiometry, which has been demonstrated to be essential for SOCE.

      To measure such stoichiometry we would need to perform direct measurements of STIM and Orai levels by protein extraction from the fpDANs of all appropriate genotypes. This is not feasible due to the small number of cells available from each brain.

      I confirm that there are no changes to the text OR figures from the previous version of the manuscript.


      The following is the authors’ response to the original reviews.

      […]

      The manuscript by Mitra and coworkers analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors show that a dominant-negative mutant of Orai (OraiE180A) significantly alters the gene expression profile of flight-promoting dopaminergic neurons (fpDANs). Among them, OraiE180A attenuates the expression of Set2 and enhances that of E(z) shifting the level of epigenetic signatures that modulate gene expression. The present results also demonstrate that Set2 expression via Orai involves the transcription factor Trl. The Orai-Trl-Set1 pathway modulates the expression of VGCC, which, in turn, are involved in dopamine release. The topic investigated is interesting and timely and the study is carefully performed and technically sound; however, there are several major concerns that need to be addressed:

      1) In Figure S2E, STIM is overexpressed in the absence of Set2 and this leads to rescue. It is presumed that STIM overexpression causes excess SOCE, yet this is rarely the case. Perhaps the bigger concern, however, is how excess SOCE might overcome the loss of SET2 if SET2 mediates SOCE-induced development of flight. These data are more consistent with something other than SET2 mediating this function.

      Our statement that STIM overexpression overcomes deficits in SOCE is based on the following published work, which has been highlighted in the revised version of the manuscript (see Lines 226-233):

      1. Studies of SOCE in wildtype cultured larval Drosophila neurons demonstrated that overexpression of STIM raised SOCE to the same extent as co-expression of STIM and Orai in the WT background (Chakraborty et al, 2016; Figure 1D).

      2. Both Carbachol-induced IP3-mediated Ca2+ release and SOCE (measured by Ca2+ add back after Thapsigargin-induced store depletion) were rescued in primary cultures of IP3R hypomorphic mutant (itprku) Drosophila neurons by overexpression of STIM (Agrawal et al., 2010; Figure 8A-G).

      3. Deb et al., 2016 (Supplementary Figure 2h,i) reaffirmed that overexpression of STIM significantly improves SOCE after Thapsigargin-induced passive store-depletion in Drosophila neurons expressing IP3RRNAi.

      4. Consistent with the cellular rescue of SOCE, defects in flight initiation and physiology observed in the heteroallelic IP3R hypomorphic background (itprku) could be rescued by overexpression of STIM (Agrawal et al., 2010; Figure 3A-E) as well as Orai (Venkiteswaran and Hasan, 2009; Figure 3).

      5. In Figure S2E, we show that flight deficits arising from THD’> Set2RNAi are rescued upon overexpression of STIM (i.e. THD’>Set2RNAi; STIMOE). Here and in another recent publication (Mitra et al., 2021) we show that neurons expressing Set2RNAi exhibit reduced expression of the IP3R and reduced ER-Ca2+ release presumably leading to reduced SOCE. As mentioned above we have consistently found that STIM overexpression raises both IP3-mediated Ca2+ release and SOCE in Drosophila neurons.

      In this study, we propose that Ca2+ release through the IP3R followed by SOCE are part of a positive feedback loop (described in the revised manuscript- see Lines 302-307) driving expression of Set2 which in turn upregulates expression of mAChR and IP3R (Figure 3F) to regulate dopaminergic neuron function. Our observation that loss of Set2 (THD’>Set2RNAi) can be rescued by STIM overexpression is consistent with this model because:

      1. Loss of Set2 (THD’>Set2RNAi) results in downregulation of several genes including mAChR and IP3R leading to decreased SOCE.

      2. As evident from our previous studies increased STIM expression in the Set2RNAi background (THD’>Set2RNAi; STIMOE) is expected to enhance SOCE which we predict would rescue Set2 expression leading to rescue of other Set2 dependent downstream functions like flight (Figure 2D).

      2) In Figure 3, data is provided linking SET2 expression and Cch-induced Ca2+ responses. The presentation of these data is confusing. In addition, the results may be a simple side effect of SET2-dependent expression of IP3R. Given that this article is about SOCE, why isn't SOCE shown here? More generally, there are no measurements of SOCE in this entire article. Measuring SOCE (not what is measured in response to Cch) could help eliminate some of this confusion.

      This section has been re-written in the revised version for better clarity and we have explained how Set2-dependent IP3R expression is an important component of Orai-mediated Ca2+ entry in fpDANs (see Lines 302-307). Here, we propose that IP3-mediated Ca2+ release and SOCE, through Orai, are together part of a positive feedback loop (see Lines 286-307) driving transcription of Set2 which in turn upregulates mAChR and IP3R expression (Figure 3F). We hypothesized that the observed loss of CCh-induced Ca2+ response in the Set2RNAi background (Figure 3B-D; THD’>Set2RNAi) results from decreased itpr and mAChR expression and verified this in Figure 3E. This is further validated by the rescue of CCh-induced Ca2+ response and itpr/mAChR expression in the OraiE180A background upon Set2 overexpression (Figure 3B-E; THD’>OraiE180A; Set2OE). We were constrained to measure CCh-induced Ca2+ responses in OraiE180A expressing neurons for the following reasons (highlighted in the revised version of the manuscript- (See Lines 307-313; ‘Limitations of the study’-Lines 719-735):

      1. SOCE measurements through Tg mediated store Ca2+ release followed by Ca2+ add back require a 0 Ca2+ environment that can only be achieved in culture. The Drosophila brain is bathed in hemolymph which contains Ca2+ and there do not exist any methods to readily deplete Ca2+ from the tissue to create a 0 Ca2+ environment without also effecting the health of the neurons.

      2. Cultures of the subset of dopaminergic neurons (THD’) we have focused on in this study were not feasible due to the small number of neurons being studied from the total number of dopaminergic neurons in the brain (~35/400). In previous studies we have shown that SOCE post-Tg induced store depletion is abrogated in cultured dopaminergic neurons from Drosophila upon expression of OraiE180A (Pathak et al., 2015). Furthermore, Carbachol-induced IP3-mediated Ca2+ release is tightly coupled to SOCE in Drosophila neurons (Venkiteswaran and Hasan, 2009) and Ca2+ release from the IP3R is physiologically relevant for flight behavior in THD’ neurons (Sharma and Hasan, 2020).

      3) A significant gap in the study relates to the conclusion that trl is a SOCE-regulated transcription factor. This conclusion is entirely based on genetic analysis of STIMKO heterozygous flies in which a copy of the trl13C hypomorph allele is introduced. While these results suggest a genetic interaction between the expression of the two genes, the evidence that expression translates into a functional interaction that places trl immediately downstream of SOCE is not rigorous or convincing. All that can be said is that the double mutant shows a defect in flight which could arise from an interruption of the circuit. Further, it is not clear whether the trl13C hypomorph is only introduced during the critical 72-96 hour time window when the Orai1E180E phenotype shows up. The same applies to the over-expression of Set2 and the other genes. If the expression is not temporally controlled, then the phenotype could be due to the blockade of an entirely different aspect of flight neuron function.

      The idea that Trl functions downstream of Orai-mediated Ca2+ entry in THD’ neurons is based on the following genetic evidence (highlighted in the revised version; see Lines 339-341; 351-367; 647-65; ‘Limitations of the study’: 736-739)

      1. In Figure 4D, we show evidence of genetic interaction between trl-STIM and trl-Set2. The rescue of trl13c/STIMKO with STIM overexpression in THD’ neurons indicates that excess SOCE (driven by STIMOE) may activate the residual Trl (there exists a WT Trl copy in this genetic background) to rescue THD’ flight function. This is further supported by the rescue of trl/STIMKO with Set2 overexpression in THD’ neurons, which is consistent with the feedback loop model proposed in Figure 5C (see Lines 390-396) where we propose that reduced SOCE leads to reduced ‘activated’ Trl and thus reduced Set2 expression, and the latter is rescued by SET2OE . The manner in which SOCE ‘activates’ Trl is the subject of ongoing investigations.

      2. The trl hypomorphic alleles (including trl13C) exist as genetic mutants and they affect Trl function in all tissues throughout development. While we concede that these mutant alleles would affect multiple functions at other stages of development, which may impinge on the phenotypes noted in Figure S4B, we have used a targeted RNAi approach to validate Trl function specifically in the THD’ neurons (see Figure 4C; Lines 339-341).

      3. Overexpression mediated rescues (including Set2) were not induced only during the critical 72-96 hrs APF developmental window. Having established that Orai function drives critical gene expression during this window (Figure 1), it is reasonable to assume that Set2 rescue of loss of flight in OraiE180A occurs in the same time window where flight is disrupted (see Lines 221-224).

      4) In Figure 4, data is shown that SOCE compensates for the loss of Trl, the presumed mediator of SOCE-dependent flight. The fact that flight deficits are rescued by raising SOCE in the absence of Trl is very inconsistent with this conclusion.

      We apologise for this confusion and have clarified in the revision (see Lines 346-367). trl13c is a recessive allele of Trl and has been written as such throughout the text and in the figures (i.e trl13c and NOT Trl13c). In all cases of Trl mutant rescue by STIMOE and Set2OE there exists residual Trl that can be activated by excess SOCE thus leading to the rescue. This is true for trl13C/ STIMKO where each mutant is present as a heterozygote (the complete genotype of this strain is STIMKO/+; trl13c/+; this has been corrected in the revision). Similarly, for TrlRNAi we expect reduced levels (but not complete loss) of Trl. Thus the SOCE rescue of loss of Trl occurs in conditions where Trl levels are reduced but NOT absent. Homozygous trl null mutants are lethal.

      5) In Figure 5 (A-C), data is provided that Trl transcripts are unaffected by loss of SOCE and that overexpression cannot rescue flightlessness. From this, the authors conclude that this gene "must" be calcium responsive. While that is one possibility, it is also possible that these genes are not functionally linked.

      The idea that Trl is functionally linked to SOCE is based on the following evidence (included in the revised version- see Lines 339-341; 346-367; 391-396)

      1. In Figure 4C we show that flight defects caused by partial loss of Trl (THD’>TrlRNAi) were rescued by STIM overexpression (THD’>TrlRNAi; STIMOE). As mentioned above we have found that STIM overexpression raises SOCE.

      2. Heteroalleles of the trl13C hypomorph exhibit a strong genetic interaction with a single copy of the null allele of STIMKO as shown by the flight deficit of trl13c/+; STIMKO/+ (trl13C/STIMKO ) flies (Figure 4D). The genotypes will be corrected in the revision.

      3. Flight defects in trl13C/STIMKO flies could be rescued by STIM overexpression in the THD’ neurons (trl13C/STIMKO; THD’>STIMOE)

      4. In Figure 4E, we show that partial loss of Trl in THD’ neurons (THD’>TrlRNAi) leads to decreased expression of the Ca2+ responsive genes mAChR, itpr, and Set2 genes indicating that Trl is a constituent of the SOCE-driven transcriptional feedback loop (see Figure 5C).

      Since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it could be activated by a Ca2+ dependent post-translational modification. Phosphoproteome analysis of Trl demonstrated that it does indeed undergo phosphorylation at a Threonine residue (T237; Zhai et al., 2008), which lies within a potential site for CaMKII. Independently, CaMKII has been identified as a binding partner of Trl from a Trl interactome study (Lomaev et al., 2018). Past work from our group (Ravi et al., 2018) identified a role for CaMKII in THD’ neurons in the context of flight. We are currently testing if CaMKII functions downstream of SOCE in THD’ neurons to mediate flight and will update this information in the next version of the manuscript.

      Now included in the revised version of the manuscript as Figure S5; Lines 397-424)

      6) There is no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant. While the authors refer to previous studies, as the manuscript is essentially based on Orai function thapsigargin-induced SOCE should be tested using the Ca2+ add-back protocol in order to assess the release of Ca2+ from the ER in response to thapsigargin as well as the subsequent SOCE.

      The fpDANs consist of 16-19 neurons in each hemisphere (PPL1 are 10-12 and PPM3 are 6-7 cells; Pathak et al., 2015). Measuring SOCE from these neurons in vivo is not possible due to the presence of abundant extracellular Ca2+ in the brain. Given their sparse number, it proved technically challenging to isolate the fpDANs in culture to perform SOCE measurements using the Ca2+ add back protocol. Due to these reasons, we have relied upon using Carbachol to elicit IP3-mediated Ca2+ release and SOCE as a proxy for in vivo SOCE. In previous studies we have shown that Carbachol treatment of cultured Drosophila neurons elicits IP3-mediated Ca2+ release and SOCE (Agrawal et al., 2010; Figure 8). Moreover, expression of OraiE180A completely blocks SOCE as measured in primary cultures of dopaminergic neurons (Pathak et al., 2015; Figure 1E). Hence we have not repeated SOCE measurements from all dopaminergic neurons in this work. In the revised version we have explicitly stated this weakness of our study and the reasons for it (See Lines 307-313; ‘Limitations of the study’-Lines 719-735).

      7) In the experiments performed to rescue flight duration in Set2RNAi individuals the authors overexpress STIM and attribute the effect to "Excess STIM presumably drives higher SOCE sufficient to rescue flight bout durations caused by deficient Set2 levels.". This should be experimentally tested as the STIM:Orai stoichiometry has been demonstrated as essential for SOCE.

      The assumption that STIM overexpression drives higher SOCE is based upon previously published work from Drosophila neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016) which demonstrates that excess WT STIM overcomes IP3R deficiencies (RNAi or hypomorphic mutants) to rescue SOCE. We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue. We have referenced the earlier work to validate our use of STIMOE for rescue of SOCE (See Lines 226-233).

      Here, we propose that Set2 is part of a positive feedback loop (see Lines 286-307) driving transcription of mAChR and IP3R (Figure 3F). In keeping with this hypothesis, we posit that the phenotypes observed in the Set2RNAi background (Figure 2D) result from decreased itpr and mAChR expression (validated in Figure 3E). This is further validated by the Set2 overexpression mediated rescue of OraiE180A (Figure 2D) and rescue of itpr/mAChR expression in the OraiE180A background (Figure 3B-E; THD’>OraiE180A; Set2OE).

      8) The authors show that overexpression of OraiE108A results in Stim downregulation at a mRNA level. What about the protein level? And more important, how does OraiE108A downregulate Stim expression? Does it promote Stim degradation? Does it inhibit Stim expression?

      We hypothesize that changes in STIM mRNA observed in the THD’ > OraiE180A neurons stems from an overall reduction in IP3-mediated Ca2+ release and SOCE due to loss of Trl-Set2 driven gene expression detailed in our transcriptional feedback loop model (Figure 5C; see Lines 286-307; 581-591). We have attempted to explain this aspect more clearly in the revised version of the manuscript. While we agree that measuring levels of STIM protein would be helpful, estimation of protein levels from a limited number of neurons (~35 cells per brain) is technically challenging. The STIM antibody does not work well in immunohistochemistry. In the absence of any experimental evidence we cannot comment on how expression of OraiE180A might affect STIM protein turnover (see Lines 307-313).

      9) Lines 271-273, the authors state "whereas overexpression of a transgene encoding Set2 in THD' neurons either with loss of SOCE (OraiE180A) or with knockdown of the IP3R (itprRNAi), lead to significant rescue of the Ca2+ response". This is attributed to a positive effect of Set2 expression on IP3R expression and the authors show a positive correlation between these two parameters; however, there is no demonstration that Set2 expression can rescue IP3R expression in cells where the IP3R is knocked down (itprRNAi). This should be further demonstrated.

      The rescue of IP3R expression by Set2 overexpression in itprRNAi was demonstrated in a different set of Drosophila neurons in an earlier study (Mitra et al., 2021) and has not been repeated specifically in THD’ neurons (see Lines 286-307). Similar to the previous study, here we tested CCh stimulated Ca2+ responses of THD’ neurons with itprRNAi and itprRNAi; SetOE (Fig S3), which are indeed rescued by SET2OE see Lines 280-285)

      10) The data presented in Figure 3E should be functionally demonstrated by analyzing the ability of CCh to release Ca2+ from the intracellular stores in the absence of extracellular Ca2+.

      CCh-mediated Ca2+ release from the intracellular stores in the absence of extracellular Ca2+ has been described in primary cultures of Drosophila neurons in previously published work (Venkiteswaran and Hasan, 2009; Agrawal et al., 2010) This work focuses on a set of 16-19 dopaminergic neurons in a hemisphere of the Drosophila central brain. It is technically challenging to generate a 0 Ca2+ environment in vivo, which is essential for measuring store Ca2+ release. Given their meagre numbers, primary cultures of these neurons is not readily feasible. (see Lines 307-313; ‘Limitations of the study’-Lines 719-735)

      11) The conclusion that SOCE regulates the neuronal excitability threshold is based entirely on either partial behavioral rescue of flight, or measurements of KCl-induced Ca2+ rises monitored by GCaMP6m in DAN neurons. The threshold for neuronal excitability is a precise parameter based on rheobase measurements of action potentials in current-clamp. Measurements of slow calcium signals using a slow dye such as GCaMp6m should not be equated with neuronal excitability. What is measured is a loss of the calcium response in high K depolarization experiments, which occurs due to the loss of expression of Cav channels. Hence, the use of this term is not accurate and will confuse readers. The use of terms referring to neuronal excitability needs to be changed throughout the manuscript. As such, the conclusions regarding neuronal excitability should be strongly tempered and the data reinterpreted as there are no true measurements of neuronal excitability in the manuscript. All that can be said is that expression of certain ion channel genes is suppressed. Since both Na+ channels and K+ channel expression is down-regulated, it is hard to say precisely how membrane excitability is altered without action potential analysis.

      The claim that SOCE influences neuronal excitability is based on the following observations:

      1. Interruption of the transcriptional feedback loop involving SOCE, Trl, and Set2 through loss of any of its constituents, results in the downregulation of VGCCs (Figure 5G, 6H), which are essential components of action potentials.

      2. OraiE180A mediated loss of SOCE in THD’ neurons abrogates the KCl-evoked depolarization response (Figure 6B, C) measured using GCaMP6m. We verified that this response requires VGCC function using pharmacological inhibition of L-type VGCCs (Figure 6E, F).

      3. SOCE deficient THD’ neurons, which were presumably compromised in their ability to evoke action potentials could be rescued to undergo KCl-evoked depolarisation by expression of NachBac, which lowers the depolarization threshold (Figure 7C, D) or through optogenetic stimulation using CsChrimson (Figure 7F).

      We agree that ‘neuronal excitability threshold’ is a precise electrophysiological parameter that has not been directly investigated here by measurement of action potentials. Therefore, references to neuronal excitability have been tempered throughout the revised manuscript and be replaced with a more generic reference to ‘neuronal activity’. In this context we have included further evidence supporting reduced activity of THD’ neurons upon loss of SOCE in the revision.

      Since one of the key functional outcomes of activity during critical developmental periods such as the 72-96 hrs APF developmental window identified in this study, is remodelling of neuronal morphology, we decided to investigate the same in our context. Neuronal activity can drive changes in neurite complexity and axonal arborization (Depetris-Chauvin et al., 2011) especially during critical developmental periods (Sachse et al., 2007). To understand if Orai mediated Ca2+ entry and downstream gene expression through Set2 affects this activity-driven parameter, we investigated the morphology of fpDANs, and specifically measured the complexity of presynaptic terminals within the 2’1 lobe MB using super-resolution microscopy. We found striking changes in the neurite volume upon expression of OraiE180A which could be rescued by restoring either Set2 (OraiE180A; Set2OE) or by inducing hyperactivity through NachBac expression (OraiE180A ; NachBacOE). These data have been included in the revised manuscript (Figure 8 B, C, D; see Lines 481-482; 519-534; 584-591; 701-704).

      12) Related, since trl does not contain any molecular domains that could be regulated by Ca2+ signaling, it is unclear whether trl is directly regulated by SOCE or the regulation is highly indirect. Reporter assays evaluating trl activation upon Ca2+ rises would provide much stronger and more direct evidence for the conclusion that trl is a SOCE-regulated TF. As such the evidence is entirely based on RNAi downregulation of trl which indicates that trl is essential but has no bearing on exactly what point of the signaling cascade it is involved.

      We agree that luciferase Trl reporters would provide a direct method to test SOCE-mediated activation. Future investigations will be targeted in this direction. Regarding possible mechanisms of Trl activation - since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it may be phosphorylation by a Ca2+ sensitive kinase. Phosphoproteome analysis of Trl indicates that it does indeed undergo phosphorylation at a Threonine reside (T237; Zhai et al., 2008), which may be mediated by the Ca2+ sensitive kinase-CaMKII based on binding partners identified in the Trl interactome (Lomaev et al., 2018; Past work (Ravi et al., 2018) has indeed demonstrated a requirement for CaMKII in THD’ neurons for flight. We are currently testing whether CaMKII functions downstream of SOCE in these neurons to mediate flight, and will be updating this information in the next version of the manuscript.

      New data and analysis has been included - see Figure S5; ‘Limitations of the study’- Lines 397-424; 736-739).

      13) Are NFAT levels altered in the Orai1 loss of function mutant? If not, this should be explicitly stated. It would seem based on previous literature that some gene regulation may be related to the downregulation of this established Ca2+-dependent transcription factor. Same for NFkb.

      As mentioned in the revised version of the manuscript (see Lines 315-326), Drosophila NFAT lacks a calcineurin binding site and is therefore not sensitive to Ca2+ (Keyser et al., 2007). In the past we tested if knockdown of NF-kB in dopaminergic neurons gave a flight phenotype and did not observe any measurable deficit. From the RNAseq data we find a slight downregulation of NFAT (0.49 fold, p value=0.048) and NF-kb (0.26 fold, p value =0.258) the significance of which is unclear at this point. We did not find any consensus binding sites for these two factors in the regulatory regions of downregulated genes from THD’ neurons.

      14) Does over-expression of Set2 restore ion channel expression especially those of the VGCCs? This would provide rigorous, direct evidence that SOCE-mediated regulation of VGCCs through Set2 controls voltage-gated calcium channel signaling.

      Set2 overexpression in the OraiE180A background indeed restores the expression of VGCC genes (see Figure 6H; Lines 461-468).

      15) All 6 representative panels from Figure 3B are duplicated in Figure 4G. Likewise, 2 representative panels from Figure 5H are duplicated in Figure 6D. Although these panels all represent the results from control experiments, the relevant experiments were likely not conducted at the same time and under the same conditions. Thus, control images from other experiments should not be used simply because they correspond to controls. This situation should be clarified.

      We regret the confusion caused by the same representative images for the control experiments. These have been replaced by new representative images for Figure 4G and 6D in the updated version of the manuscript.

      16) The figures are unusually busy and difficult to follow. In part this is because they usually have many panels (Fig. 1: A-I; Fig. 2, A-J, etc) but also because the arrangement of the panels is not consistent: sometimes the following panel is found to the right, other times it is below. It would help the reader to make the order of the panels consistent, and, if possible, reduce the number of panels and/or move some of the panels to new figures (eLife does not limit the number of display items).

      The image panels have been rearranged for ease of reading in the updated version of the manuscript.

      17) As a final recommendation, the reviewers suggest that the authors a- Reword the text that refers to membrane excitability since membrane excitability was not directly measured here. b-Explain why STIM1 rescues the partial loss of flight in Set2 RNAi flies (Fig. S2E); and c- Explain how/why trl is calcium regulated and test using luciferase (or other) reporter assays whether Orai activation leads to trl activation.

      a. Textual references to membrane excitability have been appropriately modified and some new data has been included in this regard (see Figure 8 B, C, D; Lines 481-483; 519-534; 584-591; 701-704).

      b. We have provided a detailed explanation for how STIM overexpression might rescue the phenotypes caused by Set2RNAi in Point 1 (see Lines 226-233). In short, these phenotypes depend upon IP3R mediated Ca2+ entry driving a transcriptional feedback loop. We relied upon past reports that STIM overexpression upregulates IP3R-mediated Ca2+ release and SOCE in Drosophila itpr mutant neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al, 2016). We therefore propose that STIM overexpression in the Set2RNAi background rescues IP3R mediated Ca2+ release followed by SOCE, which drives enhanced Set2 transcription, counteracting the effects of the RNAi. We will explain this more clearly with past references in the next revision.

      c. We have provided a detailed response to this comment in Point 12. Briefly, we agree that building luciferase reporters for Trl could be an ideal strategy to test for its responsiveness to SOCE and needs to be done in future. As an alternate strategy, we have looked at data from existing studies of interacting partners of Trl (Lomaev et al., 2017) and identified CamKII, which is both Ca2+ responsive (Braun and Schulman, 1995; Yasuda et al., 2022), and thus might activate Trl through a phosphorylation-switch like mechanism (see Figure S5; ‘Limitations of the study’-736-739; Lines 397-424). Moreover, a previous publication identified a requirement for CamKII in THD’ neurons for Drosophila flight (Ravi et al., 2018). We have tested the ability of a dominant active version of CamKII to rescue THD’>E180A flight deficits and have included this information in the next version of the manuscript.

      References

      1. Agrawal N, Venkiteswaran G, Sadaf S, Padmanabhan N, Banerjee S, Hasan G. Inositol 1,4,5-Trisphosphate Receptor and dSTIM Function in Drosophila Insulin-Producing Neurons Regulates Systemic Intracellular Calcium Homeostasis and Flight. J Neurosci. 2010;30:1301-1313. doi:10.1523/jneurosci.3668-09.2010

      2. Braun AP, Schulman H. A non-selective cation current activated via the multifunctional Ca(2+)-calmodulin-dependent protein kinase in human epithelial cells. J Physiol. 1995. 488:37-55. doi:10.1113/jphysiol.1995.sp020944

      3. Chakraborty S, Deb BK, Chorna T, Konieczny V, Taylor CW, Hasan G. Mutant IP3 receptors attenuate store-operated Ca2+ entry by destabilizing STIM-Orai interactions in Drosophila neurons. J Cell Sci. 2016. 129:3903-3910. doi:10.1242/jcs.191585

      4. Deb BK, Pathak T, Hasan G. Store-independent modulation of Ca2+ entry through Orai by Septin 7. Nat Commun. 2016. 7:11751. doi:10.1038/ncomms11751

      5. Depetris-Chauvin A, Berni J, Aranovich EJ, Muraro NI, Beckwith EJ, Ceriani MF. Adult-specific electrical silencing of pacemaker neurons uncouples molecular clock from circadian outputs. Curr Biol. 2011. 21:1783-1793. doi: 10.1016/j.cub.2011.09.027.

      6. Keyser P, Borge-Renberg K, Hultmark D. The Drosophila NFAT homolog is involved in salt stress tolerance. Insect Biochem Mol Biol. 2007. 37:356-362. doi:10.1016/j.ibmb.2006.12.009

      7. Kilo L, Stürner T, Tavosanis G, Ziegler AB. Drosophila Dendritic Arborisation Neurons: Fantastic Actin Dynamics and Where to Find Them. Cells. 2021. 10:2777. doi:10.3390/cells10102777

      8. Lomaev D, Mikhailova A, Erokhin M, et al. The GAGA factor regulatory network: Identification of GAGA factor associated proteins. PLoS One. 2017. 12:e0173602. doi:10.1371/journal.pone.0173602

      9. Mitra R, Richhariya S, Jayakumar S, Notani D, Hasan G. IP3/Ca2+ signals regulate larval to pupal transition under nutrient stress through the H3K36 methyltransferase dSET2. Development. 2021. 148:dev199018. doi:10.1101/2020.11.25.399329

      10. Pathak T, Agrawal T, Richhariya S, Sadaf S, Hasan G. Store-Operated Calcium Entry through Orai Is Required for Transcriptional Maturation of the Flight Circuit in Drosophila. J Neurosci. 2015. 35:13784-13799. doi:10.1523/jneurosci.1680-15.2015

      11. Ravi P, Trivedi D, Hasan G. FMRFa receptor stimulated Ca2+ signals alter the activity of flight modulating central dopaminergic neurons in Drosophila melanogaster. Barsh GS, ed. PLOS Genet. 2018. 14:e1007459. doi:10.1371/journal.pgen.1007459

      12. Sachse S, Rueckert E, Keller A, Okada R, Tanaka NK, Ito K, Vosshall LB. Activity-dependent plasticity in an olfactory circuit. Neuron. 2007. 56:838-50. doi: 10.1016/j.neuron.2007.10.035.

      13. Sharma A, Hasan G. Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic neurons. Elife. 2020;9. e62297.doi:10.7554/elife.62297

      14. Venkiteswaran G, Hasan G. Intracellular Ca2+ signalling and store operated Ca2+ entry are required in Drosophila neurons for flight. Proc Natl Acad Sci. 2009.106:10326-10331. doi: 10.1073/pnas.0902982106

      15. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022. 23: 666-682 doi:10.1038/s41583-022-00624-2

      16. Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome Analysis of Drosophila melanogaster Embryos. J Proteome Res. 2008. 7:1675-1682. doi:10.1021/pr700696a

    1. Author Response

      eLife assessment

      This useful study addresses epilepsy caused by the loss of a molecule called Pten, resulting in hyperactivity of the mTOR pathway. The findings suggest that inhibiting two molecules called mTORC1 and mTORC2 can reduce epilepsy symptoms but there is much less effect when inhibited separately. The evidence supporting the conclusions is currently incomplete, but could be strengthened after additional experiments.

      We thank the editors for this assessment and the reviewers for their comments. We will consider each of the recommendations we received and revise the manuscript accordingly.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform EEG monitoring on this many animals. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Raptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Raptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. We had also performed a preliminary analysis of the hippocampal Cre expression, which suggested that Cre expression in the hippocampus did not affect generalized seizure occurrence. We plan to include data on Cre expression in the hippocampus in the revised version of the manuscript.

      3) Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest-amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data.

      4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. We plan to include a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      Soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation in neurons reduces both soma size and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls. We will elaborate on this in our revised submission.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We will revise the manuscript to reflect this.

      2) the data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      3) it would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      We plan to include data further describing the location of knockout in each animal (in both the hippocampus and cortex) in the revised version of the paper. Initial analyses indicated that the affected area did not differ between groups.

      Also, it is not clear which cortical cells were measured for soma size.

      In the Methods it says “Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced.” Earlier under “Histology and imaging” it says “Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used.”

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript investigates how humans store temporal sequences of tones in working memory. The authors mainly focus on a theory named "Language of thought" (LoT). Here the structure of a stimulus sequence can be stored in a tree structure that integrates the dependencies of a stimulus stored in working memory. To investigate the LoT hypothesis, participants listened to multiple stimulus sequences that varied in complexity (e.g., alternating tones vs. nearly random sequence). Simultaneously, the authors collected fMRI or MEG data to investigate the neuronal correlates of LoT complexity in working memory. Critical analysis was based on a deviant tone that violated the stored sequence structure. Deviant detection behavior and a bracketing task allowed a behavioral analysis.

      Results showed accurate bracketing and fast/correct responses when LoT complexity is low. fMRI data showed that LoT complexity correlated with the activation of 14 clusters. MEG data showed that LoT complexity correlated mainly with activation from 100-200 ms after stimulus onset. These and other analyses presented in the manuscript lead the authors to conclude that such tone sequences are represented in human memory using LoT in contrast to alternative representations that rely on distinct memory slot representations.

      Strengths

      The study provides a concise and easily accessible introduction. The task and stimuli are well described and allow a good understanding of what participants experience while their brain activation is recorded. Results are extensive as they include multiple behavioral investigations and brain activation data from two different measurement modalities. The presentation of the behavioral results is intuitive. The analysis provided a direct comparison of the LoT with an alternative model based on estimating a transition-probability measure of surprise.

      For the fMRI data, the whole brain analysis was accompanied by detailed region of interest analyses, including time course analysis, for the activation clusters correlated with LoT complexity. In addition, the activation clusters have been set in relation (overlap and region of interest analyses) to a math and a language localizer. For the MEG data, the authors investigated the LoT complexity effect based on linear regression, including an analysis that also included transitional probabilities and multivariate decoding analysis. The discussion of the results focused on comparing the activation patterns of the task with the localizer tasks. Overall, the authors have provided considerable new data in multiple modalities on a well-designed experiment investigating how humans represent sequences in auditory working memory.

      Weaknesses

      The primary issue of the manuscript is the missing formal description of the LoT model and alternatives, inconsistencies in the model comparisons, and no clear argumentation that would allow the reader to understand the selection of the alternative model. Similar to a recent paper by similar authors (Planton et al., 2021 PLOS Computational Biology), an explicit model comparison analysis would allow a much stronger conclusion. Also, these analyses would provide a more extensive evidence base for the favored LoT model. Needed would be a clear argumentation for why the transitional probabilities were identified as the most optimal alternative model for a critical test. A clear description of the models (e.g., how many free parameters) and a description of the simulation procedure (e.g., are they trained, etc.) Here it would be strongly advised to provide the scripts that allow others to reproduce the simulations.

      We thank the reviewer for the requests and critiques. Although this paper follows upon our extensive prior behavioral work (Planton et al.), we agree that it should stand alone and that therefore the models need to be described more fully. We have now added a formal description of the LoT in the subsection The Language of Thought for binary sequences in the Results section and have added a formal and verbal description of the selected sequences in Figure 1-figure supplement 1. Furthermore, we added a model comparison similar to the one done in (Planton et al., 2021 PLOS Computational Biology). This analysis is now included in Figure 2 and in the Behavioral data subsection of the Results section. It replicates previous behavioral results obtained in Planton et al., 2021 PLOS Computational Biology, namely that complexity, as measured by minimal description length in the binary version of the “language of geometry” was the best predictor of participants’ behaviour.

      Interestingly, we found that the model that considered both complexity and surprise had even lower AIC suggesting that statistical learning is simultaneously occurring in the brain (Brain signatures of a multiscale process of sequence learning in humans, M Maheu, S Dehaene, F Meyniel - eLife, 2019). In this respect, we do not consider surprise from transition probabilities as an alternative model but rather as a mechanism that is occurring in parallel to sequence compression. The main goal of this work was to determine how sequence processing was affected by sequence structure, captured by the language of thought. In this line, we didn't select the tested sequences in order to investigate statistical learning but, instead, chose them with similar global statistical properties.

      The MEG experiment provided us with the opportunity to separate temporally the contributions of statistical mechanisms from the ones of sequence compression according to the language of thought. Indeed, contrary to the fMRI experiment, we could model at the item level the statistical properties of individual sounds. We report the results when accounting jointly for statistical processing and LoT-complexity in Supplementary materials.

      The different models considered in previous work didn’t need to be trained. The sequence complexity they provided could be analytically computed based on sequence minimal description length.

      Furthermore, the manuscript needs a clear motivation for the type of sequences and some methodological decisions. Central here is the quadratic trend selectively used for the fMRI analysis but not for the other datasets.

      To design the MEG, we had to decrease the number of sequences from 10 to 7. We selected them based on the LoT-complexity and the type of sequence information they spanned. As a consequence, the predictors for linear and quadratic complexity are very correlated (82%). Unfortunately, due to low SNR, this doesn’t allow to robustly account for the contributions of quadratic complexity in the MEG-recorded brain signals. Still, in response to the referee, we performed a linear regression as a function of quadratic complexity on the residuals of the regression as function of statistics and complexity that we report here. No significant clusters were found for habituation and standard trials but two were found (corresponding to the same topography) for deviant trials for late time-points.

      In Author response image 1 regression coefficients for the quadratic complexity regressor regressed on the residuals of the surprise from transition probabilities and complexity. In Author response image 2, 2 significant clusters were found for the deviant sounds.

      We also averaged the decoding scores from Figure7.A over the time-window obtained from the temporal cluster-based permutation test (see Author response image 2). The choice of complexity values didn’t allow any clear assessment of the contribution of the quadratic complexity term.

      In summary, in the current design, we do not think that the number of tested sequences allows us to clearly conclude that no quadratic effect can be found for Habituation and Standard trials. We would need to re-design an experiment to test specifically the quadratic complexity contribution to brain signals in MEG.

      Author response image 1.

      Author response image 2.

      Also, the description of the linear mixed models is missing (e.g., the random effect structure, e.g., see Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.). Moreover, sample sizes have not been justified by a power analysis.

      The linear mixed model that is considered in this work is very simple, it only uses Subject as a random variable. This is now stated clearly in the corresponding part in the Experimental procedures section:

      To test whether subject performance correlated with LoT complexity, we performed linear regressions on group-averaged data, as well linear mixed models including participant as the (only) random factor. The random effect structure of the mixed models was kept minimal, and did not include any random slopes, to avoid the convergence issues often encountered when attempting to fit more complex models.

    1. Author Response

      Reviewer #3 (Public Review):

      Myelodysplastic syndrome (MDS) is a heterogenous, clonal hematopoietic stem cell disorder characterized by morphological dysplasia in one or more hematopoietic lineages, cytopenias (most frequently anemia), and ineffective hematopoiesis. In patients with MDS, transfusion therapy treatment causes clinical iron overload; however it has been unclear if treatment with iron chelation yields clinical benefits. In the present study, the authors use a transgenic mouse model of MDS, NUP98-HOXD13 (referred to here as "MDS mice") to investigate this area. Starting at 5 months of age (before MDS mice progress to acute leukemia), the authors administered DFP in the drinking water for 4 weeks, and compared parameters to untreated MDS mice and WT controls.

      The authors first show that MDS mice exhibit systemic iron overload and macrocytic anemia that is improved by treatment with the iron chelator deferiprone (DFP). They then perform a detailed characterization the effects of DFP treatment on erythroid differentiation and various parameters related to iron transport and trafficking in MDS erythroblasts. Strengths of the work are the use of a well-characterized mouse model of MDS with appropriate animal group sizes and detailed analyses of systemic iron parameters and erythroid subpopulations. A remediable weakness is that in certain areas of the Results and Discussion, the authors overinterpret their findings by inferring causation when they have only shown a correlation. Additionally, when drawing conclusions based on changes in erythroblast mRNA expression levels between groups, the authors should consider that translation efficiency may be altered in MDS and that the NUP98 fusion protein itself, by acting as a chimeric transcription factor, may also impact gene expression profiles. Given that the application of chelators for treatment of MDS remains controversial, this work will be of interest to scientists focused on erythroid maturation and iron dysregulation in MDS, as well as clinicians caring for patients with this disorder.

      Major Comments

      1) The authors define the stages of erythroblast differentiation using the CD44-FSC method, which assumes that CD44 expression levels during the stages of erythroid differentiation are not altered by MDS itself. Are morphologically abnormal erythroblasts, such as bi-nucleate forms, captured in this analysis, and if so, are they classified in the appropriate subset? The percentage of erythroblasts in the bone marrow of MDS mice in this current study is lower than that reported by Suragani et al (Nat Med 2014), who employed a different strategy to define erythroid precursors. While representative erythroblast gating is presented as Supplemental Figure 17, it would be important to present representative gating from all 3 animal groups: WT, MDS, and MDS+DFP mice.

      We appreciate this comment and have added representative gating for all 3 groups to Supplemental Figure 17 (new Figure 3 – figure supplement 6 in the revised manuscript).

      2) Methods, "Statistical analysis." The authors state that all comparisons were done with 2-tailed student paired t test, which would not be appropriate for comparisons being made between independent animals groups (i.e. when groups are not "paired").

      We appreciate this comment and have reanalyzed all revised mouse data using one-way ANOVA with multiple comparisons and Tukey post-test analyses when more than 2 groups were compared. This has been edited in the Methods section in the revised manuscript.

      3) The Results (p.7) indicates that both sexes showed similar responses to DFP; however, the figure legends do not indicate sex. Given that systemic iron metabolism in mice shows sex-related differences, sex should be specified.

      We appreciate this comment and present here the gender-specific data for the reviewers’ evaluation (Author respone image 1). Similarly elevated transferrin saturation (a) (n = 3-4 male mice/group and n = 4-6 female mice/group) and hemoglobin (b) (n = 4-6 male mice/group and n = 4-9 female mice/group) are observed in male and female DFP-treated MDS mice. (c) Bone marrow erythroblasts are decreased to a greater degree in male relative to female DFP-treated MDS mice (n = 4-7 male mice/group and n = 8-9 female mice/group). We have added the data on gender-specific measures to new Figure 1 - figure supplement 3, Figure 2 – figure supplement 1, and Figure 3 – figure supplement 1 in the revised manuscript.

      Author respone image 1.

    1. Author Response

      Reviewer #1 (Public Review):

      Erbacher and colleagues provide further evidence for the function of epithelial cells as major contributors to the transduction of sensory stimuli. This technically advanced imaging study of human skin advances support for the anatomical and functional association of nerve fibers and skin keratinocytes. With combined high-resolution imaging and immunolabeling, the authors also advance the idea that gap junctions are at least one means by which direct neurochemical (e.g., ATP) communication from stimulated keratinocytes to nerve fibers can be achieved.

      A major strength of the study is the combined use of super-resolution array tomography (srAT), expansion microscopy, structured illumination microscopy and immunolabeling to analyze human skin in situ as well as co-cultures of human neurons and keratinocytes. High resolution static and video imaging of skin clearly supports the ensheathment by keratinocytes of nerve fiber projections as they traverse layers of the epidermis. Another strength of this study is the srAT imaging combined with connexin Cx43 immunolabeling that focus on sites of nerve fiber-keratinocyte contact zones. Imaging of Cx43+ plaques support these sites as regions of direct epithelial-neural contact and as such, of communication.

      Although imaging data support Cx43+/connexin plaques and neural ensheathment as regions of direct epithelial-neural communication, e.g., via keratinocyte release of ATP, this relationship remains correlative and lacking in quantification.

      The conclusion of this paper regarding the anatomical relationship between nerves and keratinocytes is well supported. Data also support the proposal of connexin plaques as sites of communication, although analyses that validate this relationship, using experimental models and in human samples, remain for future studies.

      Please note, comments referring to specific pages within the revised manuscript always refer to the tracked-word file version.

      Reviewer #2 (Public Review):

      Erbacher et al. have used new techniques to explore the neuro-cutaneous structures of human epidermis, which is a valuable goal given the lack of in-depth studies in human skin. Human skin is less studied than rodent skin because it presents challenges in obtaining samples and finding excellent immunohistological labels. They have employed expansion microscopy and super resolution array tomography for histological studies and have developed a human keratinocyte and human iPSC-derived sensory neuron co-culture. The authors have used these techniques to investigate the relation of intraepidermal nerve fibers (IENF) and keratinocytes, as well as to probe the localization of connexin 43. The data offer some anatomical insights, but as is does not add to our understanding of keratinocyte-neuron coupling.

      Strengths:

      This paper is applying newer techniques to probe structure in human skin and establishes some useful immunohistochemical labels to do this, which sets up a foundation that will be valuable for future studies. The observation that IENF sometimes tunnel through keratinocytes is interesting, and the manuscript does show that Cx43 hemichannels are localized near IENF. Their data definitely represents a technical achievement, as these studies are challenging.

      Weaknesses:

      Throughout the paper, the authors imply that they make discoveries that shed light on neuro-cutaneous interactions, but the data in this manuscript do not offer any functional insight into connections between IENF and keratinocytes. For example, the final figure legend indicates they have found evidence of "electrical and chemical synapse-like contacts to nerve fibers" (Figure 9), but no such evidence was shown. Only a single neuron vesicular marker (synaptophysin) was shown to localize to neurons in culture, as expected. They also "...propose a crucial role of nerve fiber ensheathment and Cx43-based keratinocyte-fiber contacts in neuropathic pain and small fiber pathology." but do not show any data regarding the contribution of their anatomical findings to sensory function.

      We recognize that our anatomical findings do not provide a complete picture of neuro-cutaneous interactions. Related findings on functional level, namely activation of nerve fibers after keratinocyte stimulation were previously reported (Klusch et al., 2013; Mandadi et al., 2009; Sondersorg et al., 2014). However, these studies otherwise lack morphological and molecular grounding and human biomaterial/cells, which we aimed to decipher in our study. We agree that functional and anatomical findings need to be connected in the future. We rephrased and attenuated our conclusions on Cx43 contacts in the context of IENF-keratinocyte interaction.

      Their data do show that IENF are anatomically closely apposed to keratinocytes, but this is inevitable given their location in the epidermis. The expression of Cx43 in human epidermis is also known (PMID: 7518858) and localizing Cx43 plaques near IENF does not add to current knowledge, as wide expression in keratinocytes naturally positions them near the embedded IENF. There is no indication whether IENF also expresses Cx43 to form gap junctions. Moreover, due to the lack of quantification, it is not clear whether Cx43 labeling is enriched at IENF sites as compared to other areas on the keratinocytes.

      We appreciate previous work on Cx43 and have integrated respective findings in the revised Introduction of our manuscript (see page 3-4):

      “Connexin 43 (Cx43) pores are well established as a major signaling route for keratinocyte-keratinocyte communication (Tsutsumi et al., 2009) and potentially transduce external stimuli likewise towards afferents.”

      As the Reviewer highlighted, Cx43 is widely clustered between keratinocytes and serves as an intercellular signaling route. Similar to keratinocyte-keratinocyte contacts, gap junctions (homomeric/heteromeric) or hemichannels towards IENF are possible. We aimed to quantify Cx43 contacts in healthy control and small fiber neuropathy patient-derived skin sections, since alterations in these contacts would affirm their biological relevance. We have generated pilot data for relative quantification of Cx43 contacts in skin samples of healthy controls (n = 5) and patients with small fiber neuropathy (n = 4). We have added respective passages in the Methods (see page 16-18), Results (see page 31-33), and Discussion (see page 41) sections of our revised manuscript. Please also see Figure 5.

      The authors' implication that their anatomical data offers insight into neuro-cutaneous functional coupling is a leap that is evident throughout the manuscript.

      We have attenuated our tone throughout the manuscript e.g. in:

      Abstract (page 2):

      “Unraveling human intraepidermal nerve fiber ensheathment and potential interaction sites advances research at the neuro-cutaneous unit.”

      Discussion (page 42):

      ”Our observation of Cx43 plaques along the course of IENF in native skin and a human co-culture model substantiates a morphological basis and suggests keratinocyte hemichannels or gap junctions as one potential signaling pathway towards IENF.”

      Conclusion (page 44):

      “Epidermal keratinocytes show an astonishing set of interactions with sensory IENF including ensheathment and potential electrical and chemical synapse-like contacts to nerve fibers which may have substantial implications for the pathophysiological understanding of neuropathic pain and neuropathies.”

      References

      Jiang, N., Rasmussen, J.P., Clanton, J.A., Rosenberg, M.F., Luedke, K.P., Cronan, M.R., Parker, E.D., Kim, H.-J., Vaughan, J.C., Sagasti, A., 2019. A conserved morphogenetic mechanism for epidermal ensheathment of nociceptive sensory neurites. eLife 8, e42455.

      Klein, T., Gruener, J., Breyer, M., Schlegel, J., Schottmann, N.M., Hofmann, L., Gauss, K., Mease, R., Erbacher, C., Finke, L., 2023. Small fibre neuropathy in Fabry disease: a human-derived neuronal in vitro disease model. bioRxiv, 2023.2008. 2009.552621.

      Klusch, A., Ponce, L., Gorzelanny, C., Schafer, I., Schneider, S.W., Ringkamp, M., Holloschi, A., Schmelz, M., Hafner, M., Petersen, M., 2013. Coculture model of sensory neurites and keratinocytes to investigate functional interaction: chemical stimulation and atomic force microscope-transmitted mechanical stimulation combined with live-cell imaging. J. Invest. Dermatol. 133, 1387-1390.

      Kruger, L., Perl, E., Sedivec, M., 1981. Fine structure of myelinated mechanical nociceptor endings in cat hairy skin. J. Comp. Neurol. 198, 137-154.

      Mandadi, S., Sokabe, T., Shibasaki, K., Katanosaka, K., Mizuno, A., Moqrich, A., Patapoutian, A., Fukumi-Tominaga, T., Mizumura, K., Tominaga, M., 2009. TRPV3 in keratinocytes transmits temperature information to sensory neurons via ATP. Pflugers. Arch. 458, 1093-1102.

      Sondersorg, A.C., Busse, D., Kyereme, J., Rothermel, M., Neufang, G., Gisselmann, G., Hatt, H., Conrad, H., 2014. Chemosensory information processing between keratinocytes and trigeminal neurons. J. Biol. Chem. 289, 17529-17540.

      Talagas, M., Lebonvallet, N., Leschiera, R., Sinquin, G., Elies, P., Haftek, M., Pennec, J.P., Ressnikoff, D., La Padula, V., Le Garrec, R., 2020. Keratinocytes Communicate with Sensory Neurons via Synaptic‐like Contacts. Ann. Neurol. 88, 1205-1219.

      Tavares-Ferreira, D., Shiers, S., Ray, P.R., Wangzhou, A., Jeevakumar, V., Sankaranarayanan, I., Cervantes, A.M., Reese, J.C., Chamessian, A., Copits, B.A., Dougherty, P.M., Gereau, R.W.t., Burton, M.D., Dussor, G., Price, T.J., 2022. Spatial transcriptomics of dorsal root ganglia identifies molecular signatures of human nociceptors. Sci. Transl. Med. 14, eabj8186.

      Tenenbaum, C.M., Misra, M., Alizzi, R.A., Gavis, E.R., 2017. Enclosure of Dendrites by Epidermal Cells Restricts Branching and Permits Coordinated Development of Spatially Overlapping Sensory Neurons. Cell Rep. 20, 3043-3056.

      Tobin, D.J., 2006. Biochemistry of human skin--our brain on the outside. Chem. Soc. Rev. 35, 52-67.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors provide compelling evidence that the activation of distinct populations of NTS neurons provides stronger decreases in eating/body weight when co-activated. Avoidance is not necessarily linked to the extent of the effects but seems to depend on specific neurons which when activated, not only reduce eating but also induce avoidance reactions. The results of this study provide strong data promoting multi-targeted approaches to reduce eating and body weight in obesity. Interestingly, none of the pathways identified is necessary for the weight-reducing effect of vertical sleeve gastrectomy. Future studies will hopefully shed light on the type of neurotransmitters released by these distinct populations of NTS neurons.

      We thank the reviewer for these helpful and supportive comments.

      Reviewer #2 (Public Review):

      Prior results established that Lepr, Calcr, and Cck neurons are non-overlapping neuronal populations in the NTS that individually suppress food intake when activated. This paper examines the consequences of activating or inhibiting two or three of these populations simultaneously. Activating two or three populations inhibits food intake a body weight more than each individually. Activation of Lepr and/or Calcr neurons is not aversive based on the conditioned taste aversion test, whereas activating all three is aversive by this test, indicating that aversion due to Cck neurons activation is dominant. Vertical sleeve gastrectomy (VSG) causes weight loss, but inhibiting each of these neurons individual or all three of them does not prevent weight loss. Overall, this paper provides a solid set of results but does not provide mechanistic insight into any of the phenomena examined.

      We have now added data demonstrating differences in the activation of FOS-IR in the downstream targets of our NTS neuron types, alone or in combination (new Figure 6). Our findings reveal that each population (NTSLepr, NTSCalcr, and NTSCck) activates an at least partially distinct set of neurons and that only NTSCck cells activate the known aversive PBN CGRP cells. These data suggest that the cumulative effects mediated by each of these NTS populations stem in part from their ability to activate at least partly distinct populations of downstream neurons.

      Unfortunately, it is outside of the scope of this manuscript (and the realm of the currently possible) to define the neurons that mediate the response to VSG, and we have now reorganized the manuscript to clarify that our VSG data (along with the feeding-induced FOS-IR data) serve to reveal that additional populations of neurons (other than NTSLCK cells) must contribute to the restraint of feeding.

    1. Author Response

      Reviewer #1 (Public Review):

      I believe it is important for the authors to clarify how the time frames to test for group differences of ERP components were defined. Were the components defined based on a grand average across lesions and controls or based or on the maximum range for both groups? As the paper is written currently this is unclear to me. It is also unclear why the group comparisons between controls and lateral PFC group were based only on the control group. To ensure no inadvertent biases towards the larger control group were introduced and ensure the studies findings were reliable, it would be appreciated if the authors could clarify this.

      We thank the reviewer for the helpful comment. We recognize the need for a clearer definition of time frames for testing group differences in the ERP components and apologize for any ambiguity in the previous version of the manuscript.

      Regarding the time frames to test for group differences of ERP components for the OFC and control groups, they were determined based on the combined maximum range for both groups. The time range for each group and each ERP component was derived from the statistical analysis of the condition contrasts run for each group. For instance, for the Local Deviance MMN, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a MMN component from 67 to128 ms, while the same condition contrast for the OFC group revealed a MMN from 73 to131 ms. The time frame used for the group comparison on the MMN time window was 50 to 150 ms to capture component activity for both groups. In the same way, for the Local Deviance P3a, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a P3a component ranging from 141 to 313 ms, while the same condition contrast for the OFC group revealed a P3a from 145 to 344 ms. The time frame used for the group comparison on the P3a time window encompassed 140 to 350 ms to capture component activity for both groups.

      In the “Results” section of the main manuscript, together with the results from the cluster-based permutation independent samples t-tests, we provide the time frames in which the latter were computed for each ERP component. These segments have been highlighted with yellow in the revised manuscript. Moreover, in the section “Materials and methods - Statistical analysis of event-related potentials” of the main manuscript [page 37, paragraph 2], we provide a revised description of how the time frames for group differences of ERPs were defined. The revised description states: “In a second step, to check for differences in the ERPs between the two main study groups, we ran the same cluster-based permutation approach contrasting each of the four conditions of interest between the two groups using independent samples t-tests. The cluster-based permutation independent samples t-tests were computed in the latency range of each component, which was determined based on the maximum range for both groups combined. The latency range for each group and component was based on the time frames derived from the statistical analysis of task condition contrasts.”

      Regarding the comparisons between the lateral PFC and control groups, they were not based solely on the control group condition contrast. This was miswritten. The approach to define time frames to test for ERP differences between the CTR and the lateral PFC group was the same as the one used to test differences between CTR and OFC groups. We apologize for any confusion this may have caused. We have revised the erroneous statements in the Supplementary File 1 [highlighted text, page 9-10].

      An additional potential weakness of the paper, and one that if addressed would increase our confidence that neural differences arise because of the specific lesion effect, is the lack of evidence that the lesion and control groups do not differ on measures that could inadvertently bias the neural data. For example, while the groups did not differ on demographics and a range of broad cognitive functions, were there any differences between the number or distribution of bad/noisy channels in each subject between the two groups? Were there differences in the number of blinks/saccades or distribution of blinks or saccades across the conditions in each subject across the two groups.

      We thank the reviewer for this suggestion. We have completed a number of measurements and tests to ensure that the OFC lesion group and the control group did not differ on measures that could affect the neural data. First, we computed the number of bad/noisy channels for each subject and group, and found that the two groups did not differ significantly. Second, we computed the number of trials remaining after removing the noisy segments across conditions for each subject and group, and found no significant differences between the groups. Third, the number of blinks/saccades across conditions for each subject and group showed no significant group differences. Altogether, the results indicate that the neural differences observed in our study arose because of the specific lesion effect.

      These additional EEG measures and the statistical test results are included in the Supplementary File 1 [page 15-16] and Supplementary File 1g. We have also added text in the section “Materials and methods - EEG acquisition and pre-processing” of the main manuscript [page 35, paragraph 3], which states: “To ensure the validity of the neural data analysis, potential sources of bias were assessed between the healthy control participants and the OFC lesion patients. Specifically, no significant differences were observed between the two groups in terms of the number of noisy channels, the number of noisy trials, or the number of blinks across the task blocks and the experimental conditions.”

      On a similar note, while I appreciate this is a well established task could the authors clarify whether task difficulty is balanced across the different conditions? The authors appear to have used the counting task to ensure equal attention is paid across conditions although presumably the blocks differ in the number of deviant tones and therefore in the task difficulty. Typically, tasks to maintain attention are orthogonal to the main task and equally challenging across the different blocks. Is there a way to reassure readers that this has not affected the neural results?

      Thank you for pointing this out. Indeed, the experimental blocks differ in the number of deviant tones and therefore in the task difficulty. Thus, it is a very good suggestion to look for behavioral performance differences across the different blocks. In the present set of analyses, two block types were used: Regular (xX) and Irregular (xY). In regular blocks, where the repeated sequence is xxxxx, participants were required to count the rare/uncommon sequences, i.e., xxxxy and xxxxo. In irregular blocks, where the repeated sequence is xxxxy, participants were required to count the rare/uncommon sequences, i.e., xxxxx and xxxxo. We have now updated the behavioral analysis. First, by excluding the omission block’s counting performance, and second, by calculating the counting performance separately for the two blocks. The new behavioral analysis revealed that participants from both groups performed better in the irregular block compared to the regular block. However, there was no statistically significant difference between the counting performances of the two groups.

      The new results are reported on page 5 of the main manuscript, section “Results - Behavioral performance”, paragraph 1: “Participants from both groups performed the task properly with an average error rate of 9.54% (SD 8.97) for the healthy control participants (CTR) and 10.55% (SD 6.18) for the OFC lesion patients (OFC). There was no statistically significant difference between the counting performance of the two groups [F(24) = 0.11, P = 0.75]. Participants from both groups performed better in the irregular block (CTR: 8.39 ± 8.24%; OFC: 7.50 ± 7.34%) compared to the regular block (CTR: 10.69 ± 11.36%; OFC: 13.60 ± 10.97%) [F(24) = 3.55, P = 0.07]. There was no block X group interaction effect [F(24) = 0.73, P = 0.40].”

      As with many patient lesion studies, while the comparison directly against the healthy age matched controls is critical it would have strengthened the authors claims if they could show differences between the brain damaged control group. Given the previous literature that also links lateral PFC with prediction error detection, I understand that this region is potentially not the clearest brain damaged control group and therefore another lesion group might have strengthened claims of specificity. Furthermore, the authors do not offer an explanation for why no differences between lateral PFC and control groups were found when others have previously reported them. Identifying those differences would strengthen our understanding of the involvement of different structures in this task/function.

      We thank the reviewer for raising this crucial issue. We recognize the importance of addressing the lack of neurophysiological differences between the lateral PFC lesion group and the control group. First, it is important to clarify that the lateral PFC lesion control group was initially included not as a control for specific lateral PFC lesions but rather a broader control group to account for potentially general effects of frontal brain damage. However, considering that previous studies have implicated specific areas of the lateral PFC (e.g., inferior frontal gyrus; IFG) in predictive processing, we also think that a more thorough justification of these null findings is needed.

      Intracranial EEG studies examining local and global level prediction error detection pointed to the role of inferior frontal gyrus (IFG) as a frontal source supporting top-down predictions in MMN generation (Dürschmid et al., 2016; Nourski et al., 2018; Phillips et al., 2016; Rosburg et al., 2005). However, other intracranial studies reported unclear (Bekinschtein et al., 2009) or weak (Dürschmid et al., 2016) frontal MMN effects. El Karoui et al. (2015) observed late ERP responses in the lateral PFC related to global deviants but no MMN to local deviants, and it was not clear where in the PFC these responses occurred, not showing responses in the IFG. Additionally, studies employing dynamic causal modeling of MMN consistently modeled frontal sources in the IFG region (Garrido et al., 2008; Garrido et al., 2009; Phillips et al., 2015). A review by Deouell (2007) highlighted the potential contributions of both IFG and middle frontal gyrus to MMN generation, suggesting that the specific source might vary depending on characteristics of the deviant stimuli, such as pitch or duration.

      In Alho et al. (1994) lesion study, diminished MMN to local-level deviants was found after lesion to the lateral PFC, with the lesion cohort exhibiting a hemisphere ratio of 7/3 for left and right hemispheres, respectively, which is different from our cohort's ratio of 4/6. Furthermore, all individuals in that study had infarcts in the middle cerebral artery, resulting in a more uniform lesion location compared to our cohort. Notably, the lesions observed in our lateral PFC group appeared to be situated in more superior brain regions and towards the MFG compared to the predominantly reported involvement of the IFG in previous studies. Another factor that might contribute to the lack of significant effects is the heterogeneity of the lesions in our lateral PFC group (see Supplementary Figures 2, 3 and 4). Especially for the left hemisphere cohort, the individual lesions did not share a consistent anatomical location. The right hemisphere cohort had a greater lesion overlap, but overall, the lesions were not centered in the IFG area with highest overlap being in the MFG area. This distinction in lesion location might contribute to the absence of effects observed in our study.

      Regarding the global effect, often reflected in the P300 component, it appears that the neural sources responsible for processing global deviance exhibit a more distributed pattern. This means that the brain regions involved in detecting and processing global deviations may not be as localized or concentrated as those implicated in local deviance processing. Given that the neural mechanisms underlying global deviance detection and processing are likely to involve a wider network of brain regions, they may be less susceptible to disruptions caused by focal lesions in the lateral PFC.

      In response to your comment, we have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      Finally, while the authors have already cited widely across multiple fields, again speaking to the likely large impact the study will make, there does appear to be an unexplored conceptual link between the conclusions here that the OFC supports "the formation of predictions that define the current task by using context and temporal structure to allow old rules to be disregarded so that new ones can be rapidly acquired" and that lesions of the lateral portions of the OFC disrupt the assignment of credit or value to a stimuli that occurred temporally close to the outcome (Walton et al 2010, Noonan et al 2010, PNAS, Rudebeck et al 2017 Neuron, Noonan et al 2017, JON, Wittmann et al 2023 PlosB, note the wider imaging literature in line with this work Jocham et al 2014 Neuron and Wang et al bioRxiv). Without the OFC monkeys and humans appear to rely on an alternative, global learning mechanism that spreads the reinforcing properties of the outcome to stimuli that occurred further back in time. Could the authors speculate on how these two strains of evidence might converge? For example, does the OFC only assign credit in the event of a prediction error or does one mechanism subsume another?

      We thank the reviewer for this comment regarding the unexplored conceptual link between our study’s conclusion, which suggests that the OFC facilitates the detection of prediction errors, and the findings of other research that delves into the OFC’s role in assignment of credit to stimuli. We find this comment very interesting and appreciate the opportunity to speculate on the potential functional convergence of these two processes within the OFC.

      The OFC is a critical neural hub implicated in learning, decision-making, and adaptive behavior. The detection of prediction errors and the assignment of credit to stimuli are mechanisms linked with the OFC, which play an important role in all these functions (Noonan et al., 2012; Schultz & Dickinson, 2000; Sul et al., 2010; Tobler et al., 2006; Walton et al., 2010; Walton et al., 2011). Prediction errors involve recognizing discrepancies between expected and actual outcomes, which engages the OFC in rapidly updating stimulus valuations to align with newfound information (Holroyd & Coles, 2002; Kakade & Dayan, 2002). Signaling of errors provides a powerful mechanism whereby OFC facilitates adaptive learning and enables the brain to adjust its expectations based on novel experiences (Schultz, 2015; Seymour et al., 2004). Credit assignment, on the other hand, refers to properly identifying the causes of prediction errors. Without proper credit assignment, one might have intact error signaling mechanisms, but lose the ability to learn appropriately. This is especially true when multiple possible antecedents may be related to the error or when past choices have been unpredictable. In such situations, it is important to assign credit to the most recent choice and not get distracted by previous alternatives (Stalnaker et al., 2015).

      These mechanisms within the OFC appear interrelated yet distinct. While prediction errors could trigger credit assignment, the OFC's ability to continually assess stimuli's values extends beyond instances of prediction errors. The OFC is involved in continuously evaluating and updating the values of stimuli based on ongoing experiences (Padoa-Schioppa & Assad, 2006; Tremblay & Schultz, 1999). This process enables the brain to learn from both unexpected outcomes and regular, predictable interactions with the environment. In situations where outcomes are not solely determined by prediction errors, the assignment of credit remains important. Complex decision-making involves considering a variety of factors beyond just prediction errors, such as contextual information and long-term consequences. Clarifying the convergence of these mechanisms within the OFC holds profound implications for understanding the intricacies of learning dynamics and the orchestration of adaptive responses to the environment.

      While we recognize the value of this discussion, we believe it extends beyond the primary focus of our study. Consequently, we have made the decision not to incorporate it into the current manuscript.

      One remaining weakness, which plagues all patient studies, is that of anatomical specificity. The authors have analysed what is, for the field, a large group of patients, and while the lesions appear to be relatively focused on the OFC the individuals vary in the degree to which different subregions within the OFC are damaged. This is increasingly important as evidence over the last 10 years has identified functional roles of these specific structures (Rushworth et al 2011, Neuron, Rudebeck et al 2017 Neuron). It would be important to ultimately know whether the detection of prediction errors was specific to a particular OFC subregion, a general mechanism across this area of cortex, or whether different subregions were more involved during different contexts or types of stimuli/contexts/tasks etc. Some comments on this would be appreciated.

      The reviewer raised an important point here. It would have been interesting to explore this aspect. However, one challenge with focal lesion studies is to establish large patient cohorts. The group size of our study, which is relatively large compared to other studies of focal PFC lesions, does not allow us to perform any exploratory lesion-symptom mapping analyses. A larger patient sample will provide a stronger basis for drawing conclusions about the critical role of a particular OFC subregion to the detection of prediction errors and allow statistical approaches to lesion subclassification and brain-behavior analysis (e.g., voxel-based lesion-symptom mapping (Bates et al., 2003; Lorca-Puls et al., 2018)).

      Considering the average percentage of damaged tissue in our study, the medial part of OFC or Brodmann area 11 is affected more by the lesion (approx. 33%), followed by the anterior-most region of the prefrontal cortex or Brodmann area 10 (approx. 25%), and the lateral portions of the OFC or Brodmann area 47 (approx. 12%). From our analysis, it is difficult to conclude whether the detection of prediction errors in our study was specific to a certain OFC area, or whether different subregions were involved more than others during different types of stimuli/contexts processing.

      To provide a more balanced interpretation of our findings, we incorporated a section in the “Discussion”, titled “Limitations and future directions” [page 24-25], which delves into the limitations of our study and lesion studies generally with respect to anatomical specificity and the challenge to establish large patient cohorts.

      Reviewer #2 (Public Review):

      The current version of the manuscript is overall very long and verbose, for example, the introduction is 5 pages long and includes up to 102 references. In my view this is way too much. I suppose authors wish to be very detailed, but somehow they get an opposite effect, the main message of the introduction and aims get diluted.

      We thank the reviewer for the feedback on our manuscript's length and content. This prompted us to carefully reconsider the balance between providing necessary context and ensuring the clarity of our main message. Our intention was to establish a strong foundation for our research by presenting relevant literature and setting the stage for our aims. In our revised manuscript, we have condensed the Introduction while retaining the key elements necessary to understand the context and motivations behind our research. Specifically, the current version of the “Introduction” is three pages long and includes 83 references.

      I wonder if the presentation rate used, SOA; 150 is too fast and the stimuli too short 50 ms. Please prove a rationale for this.

      We appreciate the reviewer's thoughtful consideration of the stimulus duration and presentation rate (SOA) used in our study. We understand the importance of providing a rationale for our choices to ensure the validity of our experimental design. The decision to use a SOA of 150 ms and stimuli of 50 ms duration was grounded in established practices and relevant literature in the field. Similar presentation rates and stimulus durations were employed in previous studies using similar auditory oddball paradigms, investigating rapid cognitive processes in combination with event-related potentials (ERPs). For instance, Bekinschtein et al. (2009) first introduced the task by using a SOA of 150 ms and stimulus duration of 50 ms, demonstrating that this combination is sensitive to detecting auditory deviations and eliciting early and late ERP components. Additionally, Wacongne et al. (2011), Chennu et al. (2013), Uhrig et al. (2014), and El Karoui et al. (2015) employed similar task designs with the same SOA and stimulus duration in combination with scalp EEG, fMRI and intracranial recordings, further supporting the validity of this approach. Other studies, employing the same paradigm, such as Chao et al. (2018) and Doricchi et al. (2021), used a SOA of 200 ms but kept the same stimulus duration of 50 ms.

      One of the conditions is 'omissions', but results are not reported, so either authors do not mention this at all, or they report these data, which would be probably interesting.

      We thank the reviewer for the nice reminder. The “omissions” condition is indeed an integral part of our study, and we acknowledge its potential significance. However, we have decided to publish the detailed analysis of the 'omissions' condition in a separate paper, because we think that such analysis and discussion would make the current paper quite dense and complicated. We apologize for any confusion that might arise from the absence of the 'omissions' results in this manuscript. On page 33 of the main manuscript, we state the reason for not including the “omissions” condition in the current analysis: “In the present set of analyses, the Omission blocks were not further examined, because such analysis and discussion would make the current paper overly dense and complicated.”

      The Discussion is very long and in some aspect even too speculative. For example, in the conclusions authors claim that the OFC contributes to a top-down predictive process that modulates the deviance detection system in the primary auditory cortices and may be involved in connecting PEs at lower hierarchical areas with predictions at higher areas. I am not sure the current data support this. This would-be probably more appropriate if they could compare results from OFC and AC etc. so it is a more dynamic study.

      We thank the reviewer for this observation. We have made revisions to shorten and refine the discussion, with a primary focus on presenting and interpreting the key results in a more concise and straightforward manner (See tracked changes in the revised manuscript).

      However, the overall length of the Discussion has not been reduced significantly because we have introduced two additional sections within the Discussion (i.e., “Lack of findings in the lateral PFC lesion group” and “Limitations and future directions”) in response to reviewers’ request to address the lack of finding in the lateral PFC lesion group and certain limitations associated with the employed lesion method.

      We also agree that the claim mentioned by the reviewer is overly too speculative and therefore revised the sentence as follows [page 38, “Conclusion”]: “We suggest that the OFC likely contributes to a top-down predictive process that modulates the deviance detection system in lower sensory areas.”

      At the beginning of Discussion, the authors mention that overall, these findings provide novel information about the role of the OFC in detecting violation of auditory prediction at two levels of stimuli abstraction/time scale. I think this needs to be detailed more specifically rather than mention they provide novel results.

      We understand the importance of providing readers with precise descriptions about the novelty of our study. Therefore, we have revised the statement to provide more detailed information about the novel contributions offered by our study. The revised text states as follows [“Discussion”, page 18,]: “These findings indicate that the OFC is causally involved in the detection of local and local + global auditory PEs, thus providing a novel perspective on the role of OFC in predictive processing.”

      I am not sure I like to have a section as a general discussion within the discussion itself, probably this heading should be reformatted to be more specific to what is discussed.

      As suggested by the reviewer, we reformatted the heading to “OFC and hierarchical predictive processing” [page 22-24] to better capture the essence of the content covered in this section of the “Discussion”. Here, we discuss the functional relevance of our EEG findings under the umbrella of the predictive coding framework and the potential role of OFC in predictive processes (See tracked changes in the revised manuscript).

      Reviewer #3 (Public Review):

      The central claim of the study is that hierarchical predictive processing is altered in OFC patients. However, OFC patients were able to identify global deviants as well as controls. Thus, hierarchical predictive processing itself seems to be unaltered, even though its neural correlates were different. This begs the question of what exactly the functional meaning of the EEG findings is. From the evidence presented this is difficult to determine for three reasons (See comments below).

      We thank the reviewer for the detailed observations and valuable comments. The reviewer points out that hierarchical predictive processing is unaltered even though the neural correlates were altered, because OFC patients were able to identify global deviants as accurately as control participants. We respectfully disagree with the reviewer’s claim for two reasons: 1) The primary purpose of the behavioral data in this study was not to measure the participants’ deviant detection performance, but to confirm that they were paying attention to the global rule of each block. However, we agree that an effect of lesion on behavioral performance would strengthen the claim of altered high-level predictive processing. Your point highlights the importance of looking more carefully at our behavioral results. In a follow up study, which we are currently running, we explore the behavioral nuances of our task by measuring reaction times of correct deviant detections. 2) Earlier lesion studies reported typical performance on simple oddball tasks for patients with focal frontal lesions that did not significantly differ from control participants. However, despite normal task execution and neuropsychological profiles, patients with LPFC and OFC lesions present distinct neurophysiological evidence of alterations in novelty processing (Knight, 1984, 1997; Knight & Scabini, 1998; Løvstad et al., 2012; Yamaguchi & Knight, 1991).

      Regarding the central claim of our study being that hierarchical predictive processing is altered in OFC patients, we have tried not to make strong claims about our results showing altered hierarchical predictive processing. For example, the conclusion of the abstract states: “the altered magnitudes and time courses of MMN/P3a responses after lesions to the OFC indicate that the neural correlates of detection of auditory regularity violation is impacted at two hierarchical levels of stimuli abstraction.” Thus, we do not claim that detection of regularity violation is directly impaired (e.g., OFC patients were able to identify global deviants as well as healthy controls) but that the neural correlates of deviants’ detection are altered, and therefore impaired.

      Finally, we have gone through all the comments/reasons, which the reviewer believes are difficult to determine the functional meaning of our EEG findings, and addressed them one by one (see comments below). We hope that the revised manuscript has been improved accordingly and provides a more critical view on the extent to which the findings support hierarchical predictive coding.

      It is possible that the shifts in scalp potentials are due to volume conduction differences linked to post-lesion changes in neural tissue and anatomy rather than differences in information processing per se.

      We appreciate your comment regarding the potential influence of volume conduction differences on the observed shifts in scalp potentials in our study. We acknowledge that there are special challenges in interpreting ERP findings in brain lesion populations (Kutas et al., 2012; Rugg, 1995). To reliably interpret changes in the ERPs in lesion patients as reflecting impairments in certain cognitive processes, it is necessary to identify factors that might possibly affect the results and to apply the appropriate control measures. As noted by the reviewer, structural pathology, and the replacement of neural tissue by cerebrospinal fluid following tumor resection, likely causes inhomogeneities in the volume conduction of electrical activity and resulting changes in current flow patterns. Moreover, post-craniotomy skull defects can cause local inhomogeneities in the resistive properties of the skull (Løvstad & Cawley, 2011; Rugg, 1995). Both types of biophysical changes might alter the amplitude levels and/or topography (by altering the configuration of the generators) of surface-recorded ERPs (e.g., Swick (2005)). Consequently, caution is warranted when comparing the ERPs and their scalp distributions of intact and brain-lesioned groups. It is difficult to directly quantify the consequences of brain lesions on tissue conductivity. To conclude that ERP differences between patients and controls reflect functional abnormalities in particular cognitive processes, and not primarily nonspecific effects of structural brain damage, it is helpful to demonstrate that they are specific to certain ERP components/stages of information processing and task conditions. Changes confined to one or a subset of ERP components, that additionally may not manifest across all task conditions, can give some indication concerning the specificity of ERP changes (Kutas et al., 2012; Swaab, 1998). In our study, group differences pertaining to ERP amplitudes were limited to specific task conditions and not across all data. This condition-dependent pattern suggests that the observed shifts are related to the specific cognitive processes engaged during those task conditions rather than being a global artifact of volume conduction. If volume conduction was the main driver, we would expect these group differences to be more uniformly present across task conditions. Another piece of evidence against volume conduction effects is the scalp potentials’ latency differences between the two groups observed for the Local + Global deviance detection. Group differences in the latencies of ERPs, such as the MMN and P3a, cannot be attributed to volume conduction alone (Hämäläinen et al., 1993). These differences in the timing of neural responses strongly indicate genuine variations in cognitive processing.

      To provide a more balanced interpretation of our findings, we have incorporated a section in the “Discussion” that delves into the limitations of our study and lesion studies generally with respect to volume conduction and amplitude changes, titled “Limitations and future directions” [page 24-25].

      It is unclear from the analyses whether the P3a amplitude differences are true amplitude differences or a byproduct of latency differences. The reason is that the statistical method used (cluster based permutations) might yield significant effects when the latency of a component is shifted, even if peak amplitudes are the same. Complementary analyses on mean or peak amplitudes could resolve this issue.

      We thank the reviewer for raising an important concern about the use of cluster-based permutation tests and their potential to yield significant effects when the latency of a component is shifted. We acknowledge this concern and recognize the need for complementary analyses to address this issue. To provide a clearer understanding of the nature of the observed ERP amplitude differences, we conducted complementary analyses on mean amplitudes of the MMN and P3a components on the midline sensors for the conditions where significant group differences were observed. For the MMN component elicited by the Local Deviance, we found group amplitude differences on the electrodes AFz (p = 0.021), Fz (p = 0.008), CPz (p = 0.015), and Pz (p < 0.001). Surprisingly, we also found amplitude differences for the P3a component elicited by the Local Deviance on the electrodes AFz (p < 0.001), Fz (p < 0.001), FCz (p < 0.001), and Cz (p = 0.002) that were not observed previously with the cluster-based permutation analysis. For the MMN component elicited by the Local+Global Deviance, our analysis showed group amplitude differences on the electrodes AFz (p = 0.007), FCz (p = 0.051), Cz (p = 0.004), CPz (p = 0.002), and Pz (p < 0.001). However, as the reviewer rightly pointed out, the group differences for the P3a elicited by the Local + Global Deviance seem to be a byproduct of latency differences, as we did not find amplitude differences on any of the midline electrodes. Overall, this complementary analysis shows that the OFC patients had an attenuated MMN/P3a to local level prediction violation, and an attenuated and delayed MMN followed by a delayed P3a to the combined local and global level prediction violation. The new analysis is added in the Supplementary File 1 [page 5-7] and Supplementary File 1c and 1d.

      The MMN, P3a and P3b components are difficult to map to the hierarchical PC theory. Traditionally, the MMN is ascribed to lower level processing while P3a and P3b are ascribed to higher level processing. However, the picture is more complicated. For example, the current results show that the MMN is enhanced in local + global surprise while the P3a is elicited by local surprise. Furthermore, the P3a is classically interpreted as reflecting attention reorientation and the P3b as reflecting the conscious detection of task-relevant targets. How attention and conscious awareness fit in hierarchical PC is not entirely clear.

      Indeed, the relationships between MMN, P3a and P3b components and the predictive coding (PC) framework can be intricate. However, numerous studies employed the PC theory to interpret these common electrophysiological signatures as prediction error (PE) signals (Garrido et al., 2007, 2009; Lieder et al., 2013) and dissociations between these ERPs supported that there are successive levels of predictive processing (Chennu et al., 2013; El Karoui et al., 2015; Wacongne et al., 2011).

      In terms of hierarchical PC (Friston, 2005), the temporally constrained MMN has been traditionally linked with first-level predictive processing, known as the local effect of short-term stimulus deviance. PE signals at this level feed forward to a temporally extended, attention-dependent system that extracts longer-term patterns. PE signals at the higher level are usually indexed by the P300, identified as the global effect of longer-term stimulus deviance. The P300 reflects a more attention-driven process, emerging in response to novel or low-probability “target” stimuli that violate broader contextual expectations (Polich, 2007), such as those that form over multiple trials. Because the MMN, P3a and P3b also appear to exhibit varying degrees of sensitivity to preconscious and conscious perceptual predictions (Sculthorpe et al., 2009), they could serve as measures for examining the concept of a predictive neural hierarchy.

      Indeed, the MMN has been viewed as sensitive to local violation and essentially blind to higher-order regularities. However, this is a simplified view. For example, Wacongne et al. (2011) showed that violating a low-level perceptual expectation triggers the MMN, violating contextual expectations triggers the higher-level P3, and when both expectations are simultaneously violated, a larger response is evoked compared to either one alone. These findings, which are consistent with the results of our study, show that the local and global effects are not fully independent but interact in an early time window, indexed by enhanced and temporally extended MMN responses. They provide support not just for a hierarchical model, but for a predictive rather than a feedforward one. Moreover, the MMN has been found to be relatively insensitive to attention, because it is elicited in situations in which the subjects’ attention is directed away from the stimuli and there are no task demands (Chennu et al., 2013). Given that early MMN is a pre-attentive automatic ERP component (Näätänen et al., 2001; Pegado et al., 2010; Tiitinen et al., 1994), and given that it has been observed in comatose and vegetative state patients (Bekinschtein et al., 2009; Fischer et al., 2004; Naccache et al., 2004), the finding that even early MMN is impaired in OFC patients indicate that patients may suffer from a deficit in sensory predictive processing that is independent of attention and conscious awareness.

      The picture is more complicated when it comes to the predictive roles of P3a and P3b components. Following the MMN, a positive polarity P300 complex, sensitive to the detection of unpredicted auditory events, has been reported (Chennu et al., 2013; Doricchi et al., 2021; Kompus et al., 2020; Liaukovich et al., 2022). However, the two types of P300 (P3a and P3b) have not been clearly fitted into the hierarchical PC theory. The P3a is considered to be part of the brain's mechanism for detecting PEs (Wessel et al., 2012; Wessel et al., 2014) and may indicate that the brain is reallocating attentional resources to process and learn from these unexpected events. The P3a is typically interpreted as reflecting an involuntary attentional reorienting process (Escera & Corral, 2007; Ungan et al., 2019), which may relate to the operations of the ventral attention network (Corbetta et al., 2008; Corbetta & Shulman, 2002; Nieuwenhuis et al., 2005). Predictive coding emphasizes the role of contextual information in generating predictions with P3a being influenced by the context in which an unexpected event occurs (Schomaker et al., 2014). In the hierarchy of predictive processing, the P3a may reflect PEs at different hierarchical levels, depending on the complexity of the prediction and the degree to which it deviates from the sensory input. On the other hand, the P3b is linked to higher-level cognitive processes that involve updating long-term predictions based on incoming sensory information. It is highly dependent on attention, conscious awareness and active engagement with the task (Bekinschtein et al., 2009; Del Cul et al., 2007; Sergent et al., 2005; Strauss et al., 2015). It is thought to play a role in integrating the unexpected sensory input into the current context, potentially leading to updates of predictions in working memory (Chao et al., 1995; Donchin & Coles, 1988; Polich, 2007).

      Hierarchical PC theory is continually evolving, and the relationship between these ERP components and attention or conscious awareness remains an active area of research. We acknowledge the need for further investigation to better understand how attention and conscious awareness fit within this framework. In light of your comment, we provide a more comprehensive discussion about the functional meaning of the EEG findings in our “Discussion - OFC and hierarchical predictive processing” [page 22-24].

      The fact that lateral PFC patients show unaltered neural responses contradicts prominent views from PC identifying this region as a generator of the MMN and a source of predictions sent to temporal auditory areas.

      We appreciate the reviewer's comment and want to acknowledge that another reviewer raised this concern previously. We have provided a detailed response to this issue in our previous response (see Response to Reviewer #1 Comment 4). We have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      For these reasons, a more critical view on the extent to which the findings support hierarchical predictive coding is needed.

      By responding to the reviewer’s previous comments (i.e., the reasons why the reviewer thinks it is difficult to determine the functional meaning of the EEG findings), we believe that we have offered a more critical view on this matter.

      References

      Alho, K., Woods, D. L., Algazi, A., Knight, R., & Näätänen, R. (1994). Lesions of frontal cortex diminish the auditory mismatch negativity. Electroencephalography and clinical neurophysiology, 91(5), 353-362.

      Bates, E., Wilson, S. M., Saygin, A. P., Dick, F., Sereno, M. I., Knight, R. T., & Dronkers, N. F. (2003). Voxel-based lesion–symptom mapping. Nature neuroscience, 6(5), 448-450.

      Bekinschtein, T. A., Dehaene, S., Rohaut, B., Tadel, F., Cohen, L., & Naccache, L. (2009). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences, 106(5), 1672-1677.

      Chao, L., Nielsen-Bohlman, L., & Knight, R. (1995). Auditory event-related potentials dissociate early and late memory processes. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 96(2), 157-168.

      Chao, Z. C., Takaura, K., Wang, L., Fujii, N., & Dehaene, S. (2018). Large-scale cortical networks for hierarchical prediction and prediction error in the primate brain. Neuron, 100(5), 1252-1266. e1253.

      Chennu, S., Noreika, V., Gueorguiev, D., Blenkmann, A., Kochen, S., Ibánez, A., Owen, A. M., & Bekinschtein, T. A. (2013). Expectation and attention in hierarchical auditory prediction. Journal of Neuroscience, 33(27), 11194-11205.

      Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human brain: from environment to theory of mind. Neuron, 58(3), 306-324.

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215.

      Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underlying the nonlinear threshold for access to consciousness. PLoS biology, 5(10), e260.

      Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of Psychophysiology, 21(3-4), 188-203.

      Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behavioral and brain sciences, 11(3), 357-374.

      Doricchi, F., Pinto, M., Pellegrino, M., Marson, F., Aiello, M., Campana, S., Tomaiuolo, F., & Lasaponara, S. (2021). Deficits of hierarchical predictive coding in left spatial neglect. Brain communications, 3(2), fcab111.

      Dürschmid, S., Edwards, E., Reichert, C., Dewar, C., Hinrichs, H., Heinze, H.-J., Kirsch, H. E., Dalal, S. S., Deouell, L. Y., & Knight, R. T. (2016). Hierarchy of prediction errors for auditory events in human temporal and frontal cortex. Proceedings of the National Academy of Sciences, 113(24), 6755-6760.

      El Karoui, I., King, J.-R., Sitt, J., Meyniel, F., Van Gaal, S., Hasboun, D., Adam, C., Navarro, V., Baulac, M., & Dehaene, S. (2015). Event-related potential, time-frequency, and functional connectivity facets of local and global auditory novelty processing: an intracranial study in humans. Cerebral cortex, 25(11), 4203-4212.

      Escera, C., & Corral, M. (2007). Role of mismatch negativity and novelty-P3 in involuntary auditory attention. Journal of psychophysiology, 21(3-4), 251-264.

      Fischer, C., Luauté, J., Adeleine, P., & Morlet, D. (2004). Predictive value of sensory and cognitive evoked potentials for awakening from coma. Neurology, 63(4), 669-673.

      Friston, K. (2005). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological sciences, 360(1456), 815-836.

      Garrido, M. I., Friston, K. J., Kiebel, S. J., Stephan, K. E., Baldeweg, T., & Kilner, J. M. (2008). The functional anatomy of the MMN: a DCM study of the roving paradigm. Neuroimage, 42(2), 936-944.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2007). Evoked brain responses are generated by feedback loops. Proceedings of the National Academy of Sciences, 104(52), 20961-20966.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2009). Dynamic causal modeling of the response to frequency deviants. Journal of Neurophysiology, 101(5), 2620-2631.

      Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 109(4), 679.

      Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V. (1993). Magnetoencephalography—theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2), 413.

      Kakade, S., & Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks, 15(4-6), 549-559.

      Knight, R. T. (1984). Decreased response to novel stimuli after prefrontal lesions in man. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 59(1), 9-20.

      Knight, R. T. (1997). Distributed cortical network for visual attention. Journal of Cognitive Neuroscience, 9(1), 75-91.

      Knight, R. T., & Scabini, D. (1998). Anatomic bases of event-related potentials and their relationship to novelty detection in humans. Journal of clinical neurophysiology, 15(1), 3-13.

      Kompus, K., Volehaugen, V., Todd, J., & Westerhausen, R. (2020). Hierarchical modulation of auditory prediction error signaling is independent of attention. Cognitive neuroscience, 11(3), 132-142.

      Kutas, M., Kiang, M., & Sweeney, K. (2012). Potentials and Paradigms: Event‐Related Brain Potentials and Neuropsychology. The handbook of the neuropsychology of language, 1, 543-564.

      Liaukovich, K., Ukraintseva, Y., & Martynova, O. (2022). Implicit auditory perception of local and global irregularities in passive listening condition. Neuropsychologia, 165, 108129.

      Lieder, F., Daunizeau, J., Garrido, M. I., Friston, K. J., & Stephan, K. E. (2013). Modelling trial-by-trial changes in the mismatch negativity. PLoS computational biology, 9(2), e1002911.

      Lorca-Puls, D. L., Gajardo-Vidal, A., White, J., Seghier, M. L., Leff, A. P., Green, D. W., Crinion, J. T., Ludersdorfer, P., Hope, T. M., & Bowman, H. (2018). The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings. Neuropsychologia, 115, 101-111.

      Løvstad, A., & Cawley, P. (2011). The reflection of the fundamental torsional guided wave from multiple circular holes in pipes. Ndt & E International, 44(7), 553-562.

      Løvstad, M., Funderud, I., Lindgren, M., Endestad, T., Due-Tønnessen, P., Meling, T., Voytek, B., Knight, R. T., & Solbakk, A.-K. (2012). Contribution of subregions of human frontal cortex to novelty processing. Journal of Cognitive Neuroscience, 24(2), 378-395.

      Naccache, L., Puybasset, L., Gaillard, R., Serve, E., & Willer, J.-C. (2004). Auditory mismatch negativity is a good predictor of awakening in comatose patients: a fast and reliable procedure. Clinical neurophysiology: official journal of the International Federation of Clinical Neurophysiology, 116(4), 988-989.

      Nieuwenhuis, S., Aston-Jones, G., & Cohen, J. D. (2005). Decision making, the P3, and the locus coeruleus--norepinephrine system. Psychological bulletin, 131(4), 510.

      Noonan, M., Kolling, N., Walton, M., & Rushworth, M. (2012). Re‐evaluating the role of the orbitofrontal cortex in reward and reinforcement. European Journal of Neuroscience, 35(7), 997-1010.

      Nourski, K. V., Steinschneider, M., Rhone, A. E., Kawasaki, H., Howard III, M. A., & Banks, M. I. (2018). Processing of auditory novelty across the cortical hierarchy: An intracranial electrophysiology study. Neuroimage, 183, 412-424.

      Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clinical neurophysiology, 115(1), 140-144.

      Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). ‘Primitive intelligence’in the auditory cortex. Trends in neurosciences, 24(5), 283-288.

      Padoa-Schioppa, C., & Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature, 441(7090), 223-226.

      Pegado, F., Bekinschtein, T., Chausson, N., Dehaene, S., Cohen, L., & Naccache, L. (2010). Probing the lifetimes of auditory novelty detection processes. Neuropsychologia, 48(10), 3145-3154.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Bekinschtein, T. A., & Rowe, J. B. (2015). Hierarchical organization of frontotemporal networks for the prediction of stimuli across multiple dimensions. Journal of Neuroscience, 35(25), 9255-9264.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Kochen, S., Bekinschtein, T. A., & Rowe, J. B. (2016). Convergent evidence for hierarchical prediction networks from human electrocorticography and magnetoencephalography. cortex, 82, 192-205.

      Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clinical neurophysiology, 118(10), 2128-2148.

      Rosburg, T., Trautner, P., Dietl, T., Korzyukov, O. A., Boutros, N. N., Schaller, C., Elger, C. E., & Kurthen, M. (2005). Subdural recordings of the mismatch negativity (MMN) in patients with focal epilepsy. Brain, 128(4), 819-828.

      Rugg, M. D. (1995). Event-related potential studies of human memory. Schomaker, J., Roos, R., & Meeter, M. (2014). Expecting the unexpected: The effects of deviance on novelty processing. Behavioral neuroscience, 128(2), 146.

      Schultz, W. (2015). Neuronal reward and decision signals: from theories to data. Physiological reviews, 95(3), 853-951.

      Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual review of neuroscience, 23(1), 473-500.

      Sculthorpe, L. D., Stelmack, R. M., & Campbell, K. B. (2009). Mental ability and the effect of pattern violation discrimination on P300 and mismatch negativity. Intelligence, 37(4), 405-411.

      Sergent, C., Baillet, S., & Dehaene, S. (2005). Timing of the brain events underlying access to consciousness during the attentional blink. Nature neuroscience, 8(10), 1391-1400.

      Seymour, B., O'Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., Friston, K. J., & Frackowiak, R. S. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429(6992), 664-667.

      Stalnaker, T. A., Cooch, N. K., & Schoenbaum, G. (2015). What the orbitofrontal cortex does not do. Nature neuroscience, 18(5), 620-627.

      Strauss, M., Sitt, J. D., King, J.-R., Elbaz, M., Azizi, L., Buiatti, M., Naccache, L., Van Wassenhove, V., & Dehaene, S. (2015). Disruption of hierarchical predictive coding during sleep. Proceedings of the National Academy of Sciences, 112(11), E1353-E1362.

      Sul, J. H., Kim, H., Huh, N., Lee, D., & Jung, M. W. (2010). Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron, 66(3), 449-460.

      Swick, D. (2005). 13 ERPs in Neuropsychological Populations. Event-related potentials: A methods handbook, 299.

      Swaab, T. Y. (1998). Event-related potentials in cognitive neuropsychology: Methodological considerations and an example from studies of aphasia. Behavior Research Methods, Instruments, & Computers, 30(1), 157-170.

      Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372(6501), 90-92.

      Tobler, P. N., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2006). Human neural learning depends on reward prediction errors in the blocking paradigm. Journal of Neurophysiology, 95(1), 301-310.

      Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398(6729), 704-708.

      Uhrig, L., Dehaene, S., & Jarraya, B. (2014). A hierarchy of responses to auditory regularities in the macaque brain. Journal of Neuroscience, 34(4), 1127-1132.

      Ungan, P., Karsilar, H., & Yagcioglu, S. (2019). Pre-attentive mismatch response and involuntary attention switching to a deviance in an earlier-than-usual auditory stimulus: an ERP study. Frontiers in Human Neuroscience, 13, 58.

      Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754-20759.

      Walton, M. E., Behrens, T. E., Buckley, M. J., Rudebeck, P. H., & Rushworth, M. F. (2010). Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron, 65(6), 927-939.

      Walton, M. E., Behrens, T. E., Noonan, M. P., & Rushworth, M. F. (2011). Giving credit where credit is due: orbitofrontal cortex and valuation in an uncertain world. Annals of the New York Academy of Sciences, 1239(1), 14-24.

      Wessel, J. R., Danielmeier, C., Morton, J. B., & Ullsperger, M. (2012). Surprise and error: common neuronal architecture for the processing of errors and novelty. Journal of Neuroscience, 32(22), 7528-7537.

      Wessel, J. R., Klein, T. A., Ott, D. V., & Ullsperger, M. (2014). Lesions to the prefrontal performance-monitoring network disrupt neural processing and adaptive behaviors after both errors and novelty. Cortex, 50, 45-54.

      Yamaguchi, S., & Knight, R. (1991). Anterior and posterior association cortex contributions to the somatosensory P300. Journal of Neuroscience, 11(7), 2039-2054.

    1. Author Response

      Reviewer #2 (Public Review):

      Major weaknesses:

      1) The biggest weakness of the manuscript is the lack of appropriate explanation and interpretation of these observed cyclin D1 ubiquitination and degradation by at least five different combinations of Cullin-E3 ligases. Are all the five cullin-E3 combinations exclusive and/or redundant to each other for cyclin D1 ubiquitination? What are the speculations in terms of the underlying mechanism? At least a working model should be included to better interpret the data.

      Cyclin D1 has been recognized as an oncogene, which is upregulated in multiple types of cancers. In different types of cells, different E3 ligase may be involved in the process of cyclin D1 protein degradation. Even in the same cells, multiple E3 ligases may be involved in cyclin D1 degradation to make sure that steady-state protein levels of cyclin D1 are under surveillance and fine-tune regulation.

      2) Although a phosphorylation-mutant cyclin D1 (i.e., T286) was included in the manuscript, there is no Lysine residue mutant within cyclin D1 identified and characterized for the critical function of cyclin D1 ubiquitination.

      It was reported that Lysine 269 is essential for cyclin D1 ubiquitination (Barbash et al., 2009). WT or mutant cyclin D1 (K269R) expression plasmids were co-transfected with Keap1, DDB2, and AMBRA1 expression plasmids into HEK293 cells. 48 hours after transfection, changes in cyclin D1 protein levels were detected by the Western blot analysis. We found the expression of WT cyclin D1 was decreased in HEK293 cells with Keap1, DDB2, and AMBRA1 co-transfected, while the expression of K269R mutant cyclin D1 showed no significant decrease in rhe cells co-transfected with co-transfected Keap1, DDB2, and AMBRA1, suggesting that Lysine 269 is essential for cyclin D1 ubiquitination.

      3) The significance of these different Cullin 1-7 and associated E3 ligases (Keap1-CUL3, DDB2-CUL4A/4B, WSB2-CUL2/5, and RBX1-CUL1-7) in cyclin D1 ubiquitination is mainly determined by siRNA-mediated knockdown or overexpression of target cullin/E3 proteins. However, it is not clear whether the observed phenotypes of cyclin D1 are due to these cullin-E3 ligases directly or indirectly. In vitro ubiquitination assay with E1, E2, and E3 should be performed to demonstrate whether recombinant cyclin D1 is ubiquitinated.

      We have performed in vitro ubiquitination assay as the reviewer suggested. The results demonstrated that Keap1, DDB2, and WSB2 can induce cyclin D1 ubiquitination. Especially, Keap1 induced cyclin D1 ubiquitination and formed ubiquitination ladder similar to AMBRA1-induced cyclin D1 ubiquitination ladder. In contrast, no clear ubiquitination ladder was observed in Rbx1 group (Figure S16).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides a comprehensive investigation of the effects of the genetic ablation of three different transcription factors (Srf, Mrtfa, and Mrtfb) in the inner ear hair cells. Based on the published data, the authors hypothesized that these transcription factors may be involved in the regulation of the genes essential for building the actin-rich structures at the apex of hair cells, the mechanosensory stereocilia and their mechanical support - the cuticular plate. Indeed, the authors found that two of these transcription factors (Srf and Mrtfb) are essential for the proper formation and/or maintenance of these structures in the auditory hair cells. Surprisingly, Srf- and Mrtfb- deficient hair cells exhibited somewhat similar abnormalities in the stereocilia and in the cuticular plates even though these transcription factors have very different effects on the hair cell transcriptome. Another interesting finding of this study is that the hair cell abnormalities in Srfdeficient mice could be rescued by AAV-mediated delivery of Cnn2, one of the downstream targets of Srf. However, despite a rather comprehensive assessment of the novel mouse models, the authors do not have yet any experimentally testable mechanistic model of how exactly Srf and Mrtfb contribute to the formation of actin cytoskeleton in the hair cells. The lack of any specific working model linking Srf and/or Mrtfb with stereocilia formation decreases the potential impact of this study.

      Major comments:

      Figures 1 & 3: The conclusion on abnormalities in the actin meshwork of the cuticular plate was based largely on the comparison of the intensities of phalloidin staining in separate samples from different groups. In general, any comparison of the intensity of fluorescence between different samples is unreliable, no matter how carefully one could try matching sample preparation and imaging conditions. In this case, two other techniques would be more convincing: 1) quantification of the volume of the cuticular plates from fluorescent images; and 2) direct examination of the cuticular plates by transmission electron microscopy (TEM).

      In fact, the manuscript provides no single TEM image of the F-actin abnormalities either in the cuticular plate or in the stereocilia, even though these abnormalities seem to be the major focus of the study. Overall, it is still unclear what exactly Srf or Mrtfb deficiencies do with F-actin in the hair cells.

      Yes, we agree. As suggested by the reviewer, to directly examine the defects in F-actin organization within the cuticular plate of mutant mice, we conducted Transmission Electron Microscopy (TEM) analyses. The results, as presented in the revised Figures 1 and 4 (panels F, G, and E, F, respectively), provide crucial insights into the structural changes in the cuticular plate. Meanwhile, the comparison of the volume of the phalloidin labeled cuticular plate after 3-D reconstruction using Imaris software was conducted and shown in Author response image 1. The results of the cuticular plate (CP) volume were consistent with the relative F-actin intensity change of the cuticular plate in the revised Figures 1B and 4B. For the TEM analysis of the stereocilia, we regret that due to time constraints, we were unable to collect TEM images of stereocilia with sufficient quality for a meaningful comparison. However, we believe that the data we have presented sufficiently addresses the primary concerns, and we appreciate the reviewers’ understanding of these limitations.

      Author response image 1.

      Figures 2 & 4 represent another example of how deceiving could be a simple comparison of the intensity of fluorescence between the genotypes. It is not clear whether the reduced immunofluorescence of the investigated molecules (ESPN1, EPS8, GNAI3, or FSCN2) results from their mis-localization or represents a simple consequence of the fact that a thinner stereocilium would always have a smaller signal of the protein of interest, even though the ratio of this protein to the number of actin filaments remains unchanged. According to my examination of the representative images of these figures, loss of Srf produces mis-localization of the investigated proteins and irregular labeling in different stereocilia of the same bundle, while loss of Mrtfb does not. Obviously, a simple quantification of the intensity of fluorescence conceals these important differences.

      Yes, we agree. In addition to the quantification of tip protein intensity, we have added a few more analyses in the revised Figure 3 and Figure 6, such as the percentage of row 1 tip stereocilia with tip protein staining and the percentage of IHCs with tip protein staining on row 2 tip. Using the results mentioned above, the differences in the expression level, the row-specific distribution and the irregular labeling of tip proteins between the control and the mutants can be analyzed more thoroughly.

      Reviewer #2 (Public Review):

      The analysis of bundle morphology using both confocal and SEM imaging is a strength of the paper and the authors have some nice images, especially with SEM. Still, the main weakness is that it is unclear how significant their findings are in terms of understanding bundle development; the mouse phenotypes are not distinct enough to make it clear that they serve different functions so the reader is left wondering what the main takeaway is.

      Based on the reviewer’s comments, in this revised manuscript, we put more emphasis on describing the effects of SRF and MRTFB on key tip proteins’ localization pattern during stereocilia development, represented by ESPN1, EPS8 and GNAI3, as well as the effects of SRF and MRTFB on the F-actin organization of cuticular plate using TEM. We have made substantial efforts to interpret the mechanistic underpinnings of the roles of SRF and MRTFB in hair cells. This is reflected in the revised Figures 1, 3, 4, 6, and 10, where we provide more comprehensive insights into the mechanisms at play.

      We interpret our data in a way that both SRF and MRTF regulate the development and maintenance of the hair cell’s actin cytoskeleton in a complementary manner. Deletion of either gene thus results in somewhat similar phenotypes in hair cell morphology, despite the surprising lack of overlap of SRF and MRTFB downstream targets in the hair cell.

      In Figure 1 and 3, changes in bundle morphology clearly don't occur until after P5. Widening still occurs to some extent but lengthening does not and instead the stereocilia appear to shrink in length. EPS8 levels appear to be the most reduced of all the tip proteins (Srf mutants) so I wonder if these mutants are just similar to an EPS8 KO if the loss of EPS8 occurred postnatally (P0-P5).

      To address this question, we performed EPS8 staining on the control and Srf cKO hair cells at P4 and P10. We found that the dramatic decrease of the row 1 tip signal for EPS8 started since P4 in Srf cKO IHCs. Although the major hair bundle phenotype of Eps8 KO, including the defects of row 1 stereocilia lengthening and additional rows of short stereocilia also appeared in Srf cKO IHCs, there are still some bundle morphology differences between Eps8 KO and Srf cKO. For example, firstly, both Eps8 KO OHCs and IHCs showed additional rows of short stereocilia, but we only observed additional rows of short stereocilia in Srf cKO IHCs. Secondly, in Valeria Zampini’s study, SEM and TEM images did not show an obvious reduction of row 2 stereocilia widening (P18-P35), while our analysis of SEM images confirmed that the width of row 2 IHC stereocilia was drastically reduced by 40% in Srf cKO (P15). Generally, we think although Srf cKO hair bundles are somewhat similar to Eps8 KO, the Srf cKO hair bundle phenotype might be governed by multiple candidate genes cooperatively.

      Reference:

      Valeria Zampini, et al. Eps8 regulates hair bundle length and functional maturation of mammalian auditory hair cells. PLoS Biol. 2011 Apr;9(4): e1001048.

      A major shortcoming is that there are few details on how the image analyses were done. Were SEM images corrected for shrinkage? How was each of the immunocytochemistry quantitation (e.g., cuticular plates for phalloidin and tip staining for antibodies) done? There are multiple ways of doing this but there are few indications in the manuscript.

      We apologize for not making the description of the procedure of images analyses clear enough. As described in Nicolas Grillet group’s study, live and mildly-fixed IHC stereocilia have similar dimensions, while SEM preparation results in a hair bundle at a 2:3 scale compared to the live preparation. In our study, the hair cells selected for SEM imaging and measurements were located in the basal turn (30-32kHz), while the hair cells selected for fluorescence-based imaging and measurements were located in the middle turn (20-24kHz) or the basal turn (32-36kHz). Although our SEM imaging and fluorescence-based imaging of basal turn’s hair bundles were not from the same area exactly, the control hair bundles with SEM imaging have reduced row 1 stereocilia length by 10%-20%, compared to the control hair bundles with fluorescence-based imaging (revised Figure 2 and Figure 5). Generally, our stereocilia dimensions data showed appropriate shrinkage caused by the SEM preparation.

      Recognizing the need for clarity, we have provided a detailed description of our image quantification and analysis procedures in the “Materials and Methods” section, specifically under “Immunocytochemistry.” This will aid readers in understanding our methodologies and ensure transparency in our approach.

      Reference:

      Katharine K Miller, et al. Dimensions of a Living Cochlear Hair Bundle. Front Cell Dev Biol. 2021 Nov 25:9:742529.

      The tip protein analysis in Figs 2 and 4 is nice but it would be nice for the authors to show the protein staining separately from the phalloidin so you could see how restricted to the tips it is (each in grayscale). This is especially true for the CNN2 labeling in Fig 7 as it does not look particularly tip specific in the x-y panels. It would be especially important to see the antibody staining in the reslices separate from phalloidin.

      Thank you for the suggestions. We have shown tip proteins staining in grayscale separately from the phalloidin in the revised Figure 3 and Figure 6. To clearly show the tip-specific localization of CNN2, we conducted CNN2 staining at different ages during hair bundle development and showed CNN2 labeling in grayscale and in reslices in revised Figure 9-figure supplement 1B.

      In Fig 6, why was the transcriptome analysis at P2 given that the phenotype in these mice occurs much later? While redoing the transcriptome analysis is probably not an option, an alternative would be to show more examples of EPS8/GNAI/CNN2 staining in the KO, but at younger ages closer to the time of PCR analysis, such as at P5. Pinpointing when the tip protein intensities start to decrease in the KOs would be useful rather than just showing one age (P10).

      We agree with the reviewer. To address this question, we have performed ESPN1, EPS8 and GNAI3 staining on the control and the mutant’s hair cells at P4, P10 and P15 (the revised Figures 3 and 6). According to the new results, we found that the dramatic decreases of the row 1 tip signal for ESPN1 and EPS8 started since P4 in Srf cKO IHCs, is consistent with the appearance of the mild reduction of row 1 stereocilia length in P5 Srf cKO IHCs. For Mrtfb cKO hair cells, the obvious reduction of the row 1 tip signal for ESPN1 was observed until P10. However, a few genes related to cell adhesion and regulation of actin cytoskeleton were significantly down-regulated in P2 Mrtfb deficient hair cell transcriptome. We think that in hair cells the MRTFB may not play a major role in the regulation of stereocilia development, so the morphological defects of stereocilia happened much later in the Mrtfb mutant than in the Srf mutant.

      While it is certainly interesting if it turns out CNN2 is indeed at tips in this phase, the experiments do not tell us that much about what role CNN2 may be playing. It is notable that in Fig 7E in the control+GFP panel, CNN2 does not appear to be at the tips. Those images are at P11 whereas the images in panel A are at P6 so perhaps CNN2 decreases after the widening phase. An important missing control is the Anc80L65-Cnn2 AAV in a wild-type cochlea.

      We agree with the reviewer. We have conducted more immunostaining experiments to confirm the expression pattern of CNN2 during the stereocilia development, from P0 to P11. The results were included in the revised Figure 9-figure supplement 1B. As the reviewer suggested, CNN2 expression pattern in control cochlea injected with Anc80L65-Cnn2 AAV has also been provided in revised Figure 9E.

    1. Author Response

      Reviewer #1 (Public Review):

      The work by Yijun Zhang and Zhimin He at al. analyzes the role of HDAC3 within DC subsets. Using an inducible ERT2-cre mouse model they observe the dependency of pDCs but not cDCs on HDAC3. The requirement of this histone modifier appears to be early during development around the CLP stage. Tamoxifen treated mice lack almost all pDCs besides lymphoid progenitors. Through bulk RNA seq experiment the authors identify multiple DC specific target gens within the remaining pDCs and further using Cut and Tag technology they validate some of the identified targets of HDAC3. Collectively the study is well executed and shows the requirement of HDAC3 on pDCs but not cDCs, in line with the recent findings of a lymphoid origin of pDC.

      1) While the authors provide extensive data on the requirement of HDAC3 within progenitors, the high expression of HDAC3 in mature pDCs may underly a functional requirement. Have you tested INF production in CD11c cre pDCs? Are there transcriptional differences between pDCs from HDAC CD11c cre and WT mice?

      We greatly appreciate the reviewer’s point. We have confirmed that Hdac3 can be efficiently deleted in pDCs of Hdac3fl/fl-CD11c Cre mice (Figure 5-figure supplement 1 in revised manuscript). Furthermore, in those Hdac3fl/fl-CD11c Cre mice, we have observed significantly decreased expression of key cytokines (Ifna, Ifnb, and Ifnl) by pDCs upon activation by CpG ODN (shown in Author response image 1). Therefore, HDAC3 is also required for proper pDC function. However, we have yet to conduct RNA-seq analysis comparing pDCs from HDAC CD11c cre and WT mice.

      Author response image 1.

      Cytokine expression in Hdac3 deficient pDCs upon activation

      2) A more detailed characterization of the progenitor compartment that is compromised following depletion would be important, as also suggested in the specific points.

      We thank the reviewer for this constructive suggestion. We have performed thorough analysis of the phenotype of hematopoietic stem cells and progenitor cells at various developmental stages in the bone marrow of Hdac3 deficient mice, based on the gating strategy from the recommended reference. Briefly, we analyzed the subpopulations of progenitors based on the description in the published report by "Pietras et al. 2015", namely MPP2, MPP3 and MPP4, using the same gating strategy for hematopoietic stem/progenitor cells. As shown in Author response image 2 and Author response image 3, we found that the number of LSK cells was increased in Hdac3 deficient mice, especially the subpopulations of MPP2 and MPP3, whereas no significant changes in MPP4. In contrast, the numbers of LT-HSC, ST-HSC and CLP were all dramatically decreased. This result has been optimized and added as Figure 3A in revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 6 Line 164-168.

      Author response image 2.

      Gating strategy for hematopoietic stem/progenitor cells in bone marrow.

      Author response image 3.

      Hematopoietic stem/progenitor cells in Hdac3 deficient mice

      Reviewer #2 (Public Review):

      In this article Zhang et al. report that the Histone Deacetylase-3 (HDAC3) is highly expressed in mouse pDC and that pDC development is severely affected both in vivo and in vitro when using mice harbouring conditional deletion of HDAC3. However, pDC numbers are not affected in Hdac3fl/fl Itgax-Cre mice, indicating that HDCA3 is dispensable in CD11c+ late stages of pDC differentiation. Indeed, the authors provide wide experimental evidence for a role of HDAC3 in early precursors of pDC development, by combining adoptive transfer, gene expression profiling and in vitro differentiation experiments. Mechanistically, the authors have demonstrated that HDAC3 activity represses the expression of several transcription factors promoting cDC1 development, thus allowing the expression of genes involved in pDC development. In conclusion, these findings reveals HDAC3 as a key epigenetic regulator of the expression of the transcription factors required for pDC vs cDC1 developmental fate.

      These results are novel and very promising. However, supplementary information and eventual further investigations are required to improve the clarity and the robustness of this article.

      Major points

      1) The gating strategy adopted to identify pDC in the BM and in the spleen should be entirely described and shown, at least as a Supplementary Figure. For the BM the authors indicate in the M & M section that they negatively selected cells for CD8a and B220, but both markers are actually expressed by differentiated pDC. However, in the Figures 1 and 2 pDC has been shown to be gated on CD19- CD11b- CD11c+. What is the precise protocol followed for pDC gating in the different organs and experiments?

      We apologize for not clearly describing the protocols used in this study. Please see the detailed gating strategy for pDC in bone marrow, and for pDC and cDC in spleen (Figure 4 and Figure 5). These information are now added to Figure1−figure supplement 3, The relevant description has been underlined in Page 5 Line 113-116, in revised manuscript.

      We would like to clarify that in our study, we used two different panels of antibody cocktails, one for bone marrow Lin- cells, including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19; the other for DC enrichment, including mAbs to CD3/CD90/TER-119/Ly6G/CD19. We included B220 in the Lineage cocktails to deplete B cells and pDCs, in order to enrich for the progenitor cells from bone marrow. However, when enriching for the pDC and cDC, B220 or CD8a were not included in the cocktail to avoid depletion of pDC and cDC1 subsets . For the flow cytometry analysis of pDCs, we gated pDCs as the CD19−CD11b−CD11c+B220+SiglecH+ population in both bone marrow and spleen. The relevant description has been underlined in the revised manuscript Page 16 Line 431-434.

      2) pDC identified in the BM as SiglecH+ B220+ can actually contain DC precursors, that can express these markers, too. This could explain why the impact of HDAC3 deletion appears stronger in the spleen than in the BM (Figures 1A and 2A). Along the same line, I think that it would important to show the phenotype of pDC in control vs HDAC3-deleted mice for the different pDC markers used (SiglecH, B220, Bst2) and I would suggest to include also Ly6D, taking also in account the results obtained in Figures 4 and 7. Finally, as HDCA3 deletion induces downregulation of CD8a in cDC1 and pDC express CD8a, it would important to analyse the expression of this marker on control vs HDAC3-deleted pDC.

      We agree with the reviewer’s points. In the revised manuscript, we incorporated major surface markers, including Siglec H, B220, Ly6D, and PDCA-1, all of which consistently demonstrated a substantial decrease in the pDC population in Hdac3 deficient mice. Moreover, we did notice that Ly6D+ pDCs showed higher degree of decrease in Hdac3 deficient mice. Additionally, percentage and number of both CD8+ pDC and CD8- pDC were decreased in Hdac3 deficient mice (Author response image 4). These results are shown in Figure1−figure supplement 4 of the revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 5 Line 121-125.

      Author response image 4.

      Bone marrow pDCs in Hdac3 deficient mice revealed by multiple surface markers

      3) How do the authors explain that in the absence of HDAC3 cDC2 development increased in vivo in chimeric mice, but reduced in vitro (Figures 2B and 2E)?

      As shown in the response to the Minor point 5 of Reviewer#1. Briefly, we suggested that the variabilities maybe explained by the timing of anaysis after HDAC3 deletion. In Figure 2C, we analyzed cells from the recipients one week after the final tamoxifen treatment and observed no significant change in the percentage of cDC2 when further pooled all the experiment data. In Figure 2E, where tamoxifen was administered at Day 0 in Flt3L-mediated DC differentiation in vitro, the DC subsets generated were then analyzed at different time points. We observed no significant changes in cDCs and cDC2 at Day 5, but decreases in the percentage of cDC2 were observed at Day 7 and Day 9. This suggested that the cDC subsets at Day 5 might have originated from progenitors at a later stage, while those at Day 7 and Day 9 might originate form the earlier progenitors. Therefore, based on these in vitro and in vivo experiments, we believe that the variation in the cDC2 phenotype might be attributed to the progenitors at different stages that generated these cDCs.

      4) More generally, as reported also by authors (line 207), the reconstitution with HDAC3-deleted cells is poorly efficient. Although cDC seem not to be impacted, are other lymphoid or myeloid cells affected? This should be expected as HDAC3 regulates T and B development, as well as macrophage function. This should be important to know, although this does not call into question the results shown, as obtained in a competitive context.

      In this study, we found no significant influence on T cells, mature B cells or NK cells, but immature B cells were significantly decreased, in Hdac3-ERT2-Cre mice after tamoxifen treatment (Figure 6). However, in the bone marrow chimera experiments, the numbers of major lymphoid cells were decreased due to the impaired reconstitution capacity of Hdac3 deficient progenitors. Consistent with our finding, it has been reported that HDAC3 was required for T cell and B cell generation, in HDAC3-VavCre mice (Summers et al., 2013), and was necessary for T cell maturation (Hsu et al., 2015). Moreover, HDAC3 is also required for the expression of inflammatory genes in macrophages upon activation (Chen et al., 2012; Nguyen et al., 2020).

      5) What are the precise gating strategies used to identify the different hematopoietic precursors in the Figure 4 ? In particular, is there any lineage exclusion performed?

      We apologize for not describing the experimental procedures clearly. In this study we enriched the lineage negative (Lin−) cells from the bone marrow using a Lineage-depleting antibody cocktail including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19. We also provide the gating strategy implemented for sorting LSK and CDP populations from the Lin− cells in the bone marrow (Author response image 5), shown in the Figure 3A and Figure4−figure supplement 1 of revised manuscript.

      Author response image 5.

      Gating strategy for LSK, CD115+ CDP and CD115− CDP in bone marrow

      6) Moreover, what is the SiglecH+ CD11c- population appearing in the spleen of mice reconstituted with HDAC3-deleted CDP, in Fig 4D?

      We also noticed the appearance of a SiglecH+CD11c− cell population in the spleen of recipient mice reconstituted with HDAC3-deficient CD115−CDPs, while the presence of this population was not as significant in the HDAC3-Ctrl group, as shown in Figure 4D. We speculate that this SiglecH+CD11c− cell population might represent some cells at a differentiation stage earlier than pre-DCs. Alternatively, the relatively increased percentage of this population derived from HDAC3-deficient CD115−CDP might be due to the substantially decreased total numbers of DCs. This could be clarified by further analysis using additional cell surface markers.

      7) Finally, in Fig 4H, how do the authors explain that Hdac3fl/fl express Il7r, while they are supposed to be sorted CD127- cells?

      This is indeed an interesting question. In this study, we confirmed that CD115−CDPs were isolated from the surface CD127− cell population for RNA-seq analysis, and the purity of the sorted cells were checked (Author response image 6), as shown in Figure4−figure supplement 1 in revised manuscript.

      The possible explanation for the expression of Il7r mRNA in some HDAC3fl/fl CD115−CDPs, as revealed in Figure 4H by RNA-seq analysis, could be due to a very low level of cell surface expression of CD127, these cells therefore could not be efficiently excluded by sorting for surface CD127- cells.

      Author response image 6.

      CD115−CDPs sorting from Hdac3-Ctrl and Hdac3-KO mice

      8) What is known about the expression of HDAC3 in the different hematopoietic precursors analysed in this study? This information is available only for a few of them in Supplementary Figure 1. If not yet studied, they should be addressed.

      We conducted additional analysis to address the expression of Hdac3 in various hematopoietic progenitor cells at different stages, based on the RNA-seq analyis. The data revealed a relatively consistent level of Hdac3 expression in progenitor populations, including HSC, MMP4, CLP, CDP and BM pDCs (Author response image 7). That suggests that HDAC3 may play an important role in the regulation of hematopoiesis at multiple stages. This information is now added in Figure1−figure supplement 1B of revised manuscript.

      Author response image 7.

      Hdac3 expression in hematopoietic progenitor cells

      9) It would be highly informative to extend CUT and Tag studies to Irf8 and Tcf4, if this is technically feasible.

      We totally agree with the reviewer. We have indeed attempted using CUT and Tag study to compare the binding sites of IRF8 and TCF4 in wild-type and Hdac3-deficient pDCs. However, it proved that this is technically unfeasible to get reliable results due to the limited number of cells we could obtain from the HDAC3 deficient mice. We are committed to explore alternative approaches or technologies in future studies to address this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very exciting manuscript from Meng Wang's lab on lysosomal proteomics. They used several different protein tags to identify the lysosomal proteome. The exciting findings include A) specific lysosomal proteins exist in a tissue-specific manner B) lipl-4 overexpression and daf-2 extend life span using different mechanisms C) identification of novel lysosomal proteins D) demonstration of the function of several lysosomal proteins in regulation lysosome abundance and function.

      We thank the reviewer for finding our manuscript exciting.

      Reviewer #2 (Public Review):

      In this manuscript, Yu and colleagues profile the lysosome content in C. elegans. They implement lysosome immunoprecipitation (Lyso-IP) for C. elegans and they convincingly show that this method successfully isolates lysosomes from whole worms. The authors find that the lysosomes of worms overexpressing the lysosomal lipase lipl4 are enriched for AMPK subunits and nucleoporins and that these proteins are required for the longevity of lipl-4 overexpressing worms. The authors also show that this is specific to this longevity pathway given that another long-lived worm strain (daf2) does not exhibit enrichment for nucleoporins nor does it require them for longevity. The authors go on to express the Lyso-IP tag in different tissues of C. elegans (muscle, hypodermis, intestine, neurons) and identify the tissue-specific lysosome proteomes. Finally, the authors use this method to identify lysosome proteins in mature lysosomes and they find new proteins that regulate lysosomal acidification.

      The authors present a powerful tool to unbiasedly identify lysosome-associated proteins in C. elegans, and they provide an in-depth assessment of how this method can be used to understand longevity pathways and identify novel proteins. Understanding lysosomal differences in specific tissues or in response to different longevity conditions are exciting as it provides new insight into how organelles could control specific homeostasis responses. This tool and proteomics datasets also represent a great resource for the C. elegans community and should pry open new studies on the regulation and role of the lysosome at the organismal level.

      We truly appreciate that the reviewer’s positive comment on our work.

      Addressing the following suggestions would help strengthen this already strong manuscript. First, it would be helpful to validate selected candidates from the tissuespecific Lyso-IP to verify that the protocol is still specific with lower sample amounts. Second, it would be helpful to provide more details on the methods, notably for sample preparation and analysis, so that it can serve as a guideline for the community. Third, the manuscript contains a lot of data and conditions, which is great, but they may also feel disconnected in some cases and it could be helpful to focus the study on the main key findings.

      We thank the reviewer’s comments. As suggested by the reviewer, we have also generated a CRISPR knock-in line for one hypodermis-specific candidate Y58A7A.1 that encodes a copper transporter and validated its hypodermis-specific lysosomal localization (new Supplementary Figure 2E).

      As suggested by the reviewer, we have extended the method section on Lyso-IP to include more details. We believe that the new version should be sufficient for any lab to follow this protocol and conduct their own analyses. We will also take advantage of the eLife “Request a Protocol” feature to share the detailed version of the Lyso-IP method with researchers who are interested.

      We have thoroughly reorganized the manuscript to increase the textual clarity and improve the connection between different analyses and results.

      Reviewer #3 (Public Review):

      The manuscript by Ji et al dissects the important role of lysosomes in cellular metabolism and signaling and their regulation by various associated proteins. The authors utilized deep proteomic profiling in C.Elegans to identify lysosome-associated proteins involved in regulating longevity and discovered the recruitment of AMPK and nucleoporin proteins in response to increased lysosomal lipolysis. Additionally, the authors found lysosomal heterogeneity across different tissues and specific enrichment of the Ragulator complex on Cystinosin-positive lysosomes.

      Strengths of this work include the utilization of deep proteomic profiling to identify novel lysosome-associated proteins involved in longevity regulation, as well as the discovery of lysosomal heterogeneity and specific protein enrichments across different worm tissues. These findings point to a complex interplay between lysosomal protein dynamics, signal transduction, organelle crosstalk, and organism longevity.

      One weakness of this work may be the limited scope of the study, as it focuses primarily on the identification and characterization of lysosome-associated proteins involved in longevity regulation, with limited mechanistic follow-up and some unsubstantiated claims.

      We thank the reviewer for her/his helpful comments and suggestions. The primary goal of this manuscript is to provide new methods and resource to the community. We did have several biological findings from the current study, and mechanistic follow-up with these findings will be interesting future topics but may beyond the scope of the current manuscript. In addition, we have provided new experimental results to further support several claims that the reviewer has commented on.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and reviewers for their constructive feedback on our manuscript. Based on their recommendations, we've conducted additional experiments, made revisions to the text and figures, and provide a point-by-point response below.

      Reviewer #1 (Recommendations for the authors):

      1) The lack of behavioral/physiological measures of the depth of anesthesia (ventilation, heart rate, blood pressure, temperature, O2, pain reflexes, etc...) combined with the lack of dose-response and the use of different routes of administration makes the data difficult to interpret. Sure, there is a clear difference in network activation between KET and ISO, but are those effects due to the depth of the anesthesia, the route of administration, and the dose used? The lack of behavioral/physiological measures prevents the identification of brain regions responsible for some of the physiological effects and different effects of anesthetics.

      We greatly appreciate the insightful feedback you have provided.

      In response to the concerns about anesthesia depth:

      a. We recorded EEG and EMG data both before and after drug administration. Supplementary Figure 1 showcases the changes in EEG and EMG power observed 30 minutes post-drug administration, normalized to a 5-minute baseline taken prior to the drug's administration. Notably, no significant differences were detected in the normalized EEG and EMG power between the ISO and KET groups. Given the marked statistical differences observed between the EEG power in the KET and saline groups, and the EMG power in the home cage and ISO groups, we infer that both anesthetics effectively induced a loss of consciousness.

      b. We used standard methods and doses for inducing c-Fos expression with anesthetics, as documented in prior studies (Hua, T, et al., Nat Neurosci, 2020; 23(7): 854-868; Jiang-Xie, L F, et al., Neuron, 2019; 102(5): 1053-1065.e4; Lu, J, et al., J Comp Neurol, 2008; 508(4): 648-62). In future research, it might be more optimal to adopt continuous intraperitoneal or intravenous administration of ketamine.

      c. Within the scope of our study, while disparities in anesthesia duration might potentially influence the direct statistical comparison of ISO and KET, such disparities wouldn't compromise the identification of brain regions activated by KET or ISO when assessed as distinct stimuli (ISO vs. home cage; KET vs. saline) or in relation to their individual functional network hub node results.

      We hope these additions and clarifications adequately address your concerns and enhance the comprehensibility of our data.

      2) Under anesthesia there should be an overall reduction of activity, is that the case? There is no mention of significantly downregulated regions. The authors use multiple transformations of the data to interpret the results (%, PC1 values, logarithm) without much explanation or showing the full raw data in Fig 1. It would be helpful to interpret the data to compare the average fos+ neurons in each region between treatment and control for each drug.

      Absence of Significantly Downregulated Regions Under Anesthesia: There are two primary reasons for this observation:

      a. Our study's sampling time for the home cage, ISO, saline, and KET groups was during Zeitgeber Time (ZT) 6-7.5. During this period, mice in both the home cage and saline groups typically showed reduced spontaneous activity or were in a sleep state. Our Supplementary Figure 1 EEG and EMG data corroborate this, revealing no significant statistical variations in EEG power between the home cage and ISO groups, nor in EMG power between the saline and KET groups.

      b. Our immunohistochemical data showed that the total number of c-Fos positive cells in the two control groups was notably lower than in the experimental groups (Saline group vs KET group: 11808±2386 versus 308705±106131, P = 0.006; Home cage vs ISO group: 3371±840 vs 12326±1879, P = 0.001). This is in line with previous studies, like the one by Cirelli C and team, which found minimal c-Fos expression throughout the mouse brain during physiological sleep (Cirelli, C, and G Tononi, Sleep, 2000; 23(4): 453-69). Thus, in our analysis, we did not detect regions with significant downregulation when comparing anesthetized mice with controls.

      Interpreting Raw Data from Figure 1: Regarding the average Fos+ neurons:

      In Figures 4 and 5, we utilized raw data (c-Fos cell count) to assess cell expression differences across 201 brain regions within each group. Only brain regions that had significant statistical differences after multiple comparison corrections are shown in the figures.

      3) I do not understand their interpretation of the PCA analyses. For instance, in Fig 2 they claim that KET is associated with PC1 while ISO is associated with PC2. Looking at the distribution of points it's clear that the KET animals are all grouped at around +2.5 on PC1 and -2.0 on PC2, this means that KET is associated with both PC1 and PC2 to a similar degree (2 to 2.5). Moreover, I'm confused about why they use PCA to represent the animals/group. PCA is a powerful technique to reduce dimensionality and identify groups of variables that may represent the same underlying construct; however, it is not the best way to identify clusters of individuals or groups.

      Clarification on PCA Analyses in Figure 2: Thank you for pointing out the ambiguities in our initial presentation of the PCA analyses. We are grateful for the opportunity to address these concerns.

      KET and ISO Associations with PC1 and PC2: You rightly observed that KET samples manifest both a positive value on PC1 (around +2.5) and a negative one on PC2 (around -2.0), suggesting that KET has a substantial influence on both principal components. In PCA, a positive score implies a positive association with that component, whereas a negative score suggests a negative association. Contrarily, ISO samples predominantly exhibit values around +2.5 on PC2, with nearly neutral values for PC1, underlining its stronger association with PC2 and lack of significant correlation with PC1. To ensure transparency and clarity, we've adjusted the corresponding descriptions in our manuscript, which can be found on Line 100.

      Rationale Behind Using PCA to Represent Animals/Groups: Our initial step was to conduct PCA clustering analysis on the 201 brain regions within both the ISO and KET groups. In the accompanying chart, varying colors denote different brain regions, while distinct shapes represent separate clusters. There wasn't a pronounced distribution pattern within the ISO and KET groups, which led us to adopt the current computational method presented in the paper. This approach was chosen to directly contrast the relative differential expressions between ISO and KET.

      We deeply value your feedback, which has steered us toward a clearer and more accurate presentation of our data. We genuinely appreciate your meticulous review.

      Author response image 1.

      4) The actual metric used for the first PCA is unclear, is it the FOS density in each of the regions (some of those regions are large and consist of many subregions, how does that affect the analysis) is it the %-fos, or normalized cells? The wording describing this is variable causing some confusion. How would looking at these different metrics influence the analysis?

      Thank you for raising concerns about the metrics used in our PCA analysis. We recognize the need for clearer exposition and appreciate the opportunity to clarify.

      PCA Metrics: The metric for our PCA is calculated by obtaining the ratio of the Fos density within a specific brain region to the global Fos density across the brain. Briefly, this entails dividing the number of Fos-positive cells in a given region by its volume, and then comparing this to the Fos density of the whole brain. The logarithm of this ratio provides our PCA metric. We've elaborated on this in the Materials and Methods section (Lines 401) and enhanced clarity in our revised manuscript, particularly at Line 96.

      In Figure 2A, we employed 53 larger, mutually exclusive brain regions based on the reference from the study by Do et al. (eLife, 2016;5:e13214). However, in Figure 3A, we used a more detailed segmentation, incorporating 201 distinct brain areas that are more granular than those in Figure 2A. Notably, the PCA results from both representations were consistent. The rationale behind selecting either the 53 or 201 brain regions can be found in our response to Question 10.

      Rationale for Metric Choice: The log ratio of regional c-Fos densities relative to the global brain density was chosen due to:

      a. Notable disparities in c-Fos cell expression across the groups.

      b. A significant non-normal distribution of density values across animals within the group. Employing the log ratio effectively mitigates the impact of extreme values and outliers, achieving a more standardized data distribution.

      We've added PCA plots based on c-Fos densities, depicted in Author response image 2. However, the data dispersion has resulted in a significantly spread-out horizontal scale for these visuals.

      Author response image 2.

      5) Based on Fig 3 the authors concludes that ISO activates the hypothalamic regions and inhibits the cortex, however, Fig 1 shows neither an activation of the hypothalamus in the ISO nor an inhibition of the cortex when compared to home cage control. If anything it suggests the opposite.

      Thank you for your insightful observations regarding the discrepancies between Figures 2 and 3. We believe that when you refer to Figure 1, you are actually referencing Figure 2C.

      ISO activation in Hypothalamus: In Figure 2C, we regret the oversight where we inadvertently interchanged the positions of ISO and Saline. When accurately represented, Figure 2C indeed shows that ISO notably activates the periventricular zone (PVZ) and the lateral zone (LZ) of the hypothalamus compared to the home cage group. Moreover, there's a discernible difference in the hypothalamic response between ISO and KET.

      ISO's Effect on the Cortex: The main aim of Figure 3 was to highlight the differing responses between ISO and KET in the cortex. Notably, KET demonstrates a positive correlation with PC1 (+7 on PC1), whereas ISO shows a negative association (-3 on PC1). Given that the coefficient of PC1 for the cortical region is positive, it suggests that the cortical areas activated by KET are inhibited by ISO (with KET's distribution around 0 on PC2). However, the divergence between ISO and the home cage is most apparent in PC2, with ISO clusters at +4 and the home cage approximately at -2, suggesting that ISO activates a different set of cortical nuclei. In alignment with this, Figure 2C also illustrates that ISO activates specific cortical areas, such as ILA and PIR, in contrast to the home cage.

      Thus, Figure 3 primarily employs PCA to delineate the contrasts between ISO and KET, whereas Figure 2C emphasizes the comparison of each against their respective controls.

      6) Control for isoflurane should be air in the induction chamber rather than home cage. It is possible that Fos activation reflects handling/stress pre-anesthesia in the animals, which would increase Fos expression in the stress-related regions such as the BST, striatum (CeA), hypothalamus (PVH) and potentially the LC.

      Thank you for emphasizing the importance of an appropriate control for Isoflurane.

      In our efforts to minimize the potential impact of stress-induced c-Fos expression, we implemented several precautionary measures. Prior to the experiment, both groups of mice were subjected to handling and acclimatization within the induction chamber over four days. By the day of the experiment, for the mice in the experimental group, we ensured they were comfortable and exhibited no signs of distress or fear—such as cowering or evading. With care, we slowly relocated them to the nearby anesthesia induction chamber. Using 5% ISO, anesthesia was induced promptly, following a meticulously devised protocol to reduce stress impacts on c-Fos expression.

      Moreover, existing studies have shown Isoflurane's activation of BST/CeA (Hua, T, et al., Nat Neurosci, 2020, 23: 854-868), PVH (Xu, Z, et al., British Journal of Anaesthesia, 2023, 130: 446-458), and LC (Lu, J, et al., J Comp Neurol, 2008, 508: 648-62), even when using oxygen controls. Such literature supports our findings, indicating that the activation we observed was indeed due to Isoflurane and not purely stress-related.

      7) In the Ket network there are a few anticorrelated regions, most of which are amongst the list of the most activated regions, does this mean that the strong correlation results from an overall decreased activation? And if so, is it possible that the ketamine anesthesia was stronger than the isoflurane, causing a more general reduction in activity?

      The pronounced correlations observed within the ketamine (KET) network do not signify a generalized decrease in activation. Instead, these correlations reflect significantly enhanced activity in specific regions under KET anesthesia. This amplified correlation is an indication of a more widespread increase in activity, rather than a decrease. These findings are consistent with previous research, which showed that anesthetic doses of ketamine produce patterns of Fos expression in the CNS similar to wakefulness (Lu, J, et al., J Comp Neurol, 2008; 508(4): 648-62).

      Regarding the comparative strength of KET versus ISO anesthesia, our electroencephalographic evidence confirms that both agents induce a loss of consciousness. No significant differences were observed in EEG and EMG readings within the first 30 minutes post-administration. In future research, a continuous intravenous or intraperitoneal administration of KET might be a preferable method.

      8) Since they have established networks it would be easy and useful to look at how the different regions identified (sleep, pain, neuroendocrine, motor-related, ...) work together to maintain analgesia, are they within the same module? Do they become functionally connected and is this core network of functional connections similar for KET and ISO?

      Thank you for your suggestion. In response to your inquiry, we undertook analysis of the core functional networks for KET and ISO, using a set threshold at r>0.82 and P<0.05. For evaluating the modularity of each network, we utilized Newman's spectral community detection algorithm.

      (A) The ISO’s core functional network (56 nodes, 372 edges) predominantly divides into two modules with a modularity quotient of 0.345. ISO-active regions include arousal-associated regions (PL, ILA, PVT), analgesia-related (CeA, LC, PB), neuroendocrine function nuclei (TU, PVi, ARH, PVH, SON) as detailed in Figure 5. Notably, ARH and SON weren't incorporated into the core network. Analgesia-associated regions, such as CeA, LC, and PB, reside within module 1, while neuroendocrine nuclei are spread between modules 1 and 2.

      (B) In contrast, KET's core functional network (61 nodes, 1820 edges) splits into three distinct modules, but its low modularity quotient (0.06) indicates a lack of clear functional modularization, suggesting denser interconnections among brain regions. Furthermore, functionally-related regions such as arousal (PL, ILA, PVT, DR), analgesia-related (ACA, APN, PAG, LC), and neuroendocrine regulation (PVH, SON),etc., as seen in Figure 4, are distributed across different modules. This distribution may implies that functions like analgesia and neuroendocrine regulation are not governed by simple, linear processes, but arise from complex, overlapping pathways spanning various modules and functional zones.

      In summary, the core functional networks of ISO and KET differ, with functionally-related regions spanning multiple modules, reflecting their diverse roles in varied physiological regulations.

      Author response image 3.

      9) The naming of the function of some of the regions is very much debatable. For instance, PL/ILA are named "sleep-wakefulness regulation" regions in the paper. I can think of many more important functions of the PL/IL including executive functions, behavioral flexibility, and emotional control. It is unclear how the functions of all the regions were attributed. I am not sure that this biased labeling of structure-function is useful to the reports, it may instead suggest wrong conclusions.

      Thank you for your thoughtful feedback regarding our classification of the functions of the PL/ILA regions in our manuscript.

      We recognize the challenge in accurately defining the functions of brain regions. While there is evidence highlighting the role of PL/ILA in arousal pathways, we also acknowledge their documented roles in executive functions, behavioral flexibility, and emotional control. In response to your comments, we have refined our description, changing "sleep-wakefulness regulation" to "wake-promoting pathways" (see Line: 159, 164).

      It's worth noting that many brain regions, including the PL/ILA, have multiple functions. We agree that a single label might not capture the entirety of their roles. To provide a broader perspective, we will add a section in our manuscript that sheds light on the varied functions of these regions (Line: 181).

      10) A point of concern and confusion is the number of brain regions analyzed. In the introduction, it is mentioned that 987 brain regions are considered, but this is reduced to 53 selected brain regions in Figure 2, then 201 brain regions in Figure 3, and reduced again to 63 for the network analysis. The rationale for selecting different brain regions is not clear.

      For the 987 brain regions: Using the standard mouse atlas available at http://atlas.brain-map.org/, the mouse brain is organized into nine levels. The broadest category is the grey matter, which then progresses to more specific subdivisions, totaling 987 unique regions.

      For the 53 brain regions: To effectively understand the activation patterns of ISO and KET, we started with a broad approach, looking at larger brain areas like the thalamus and hypothalamus. This broad view, presented in Figure 2, focuses on the 5th-level brain regions, encompassing 53 primary areas. This methodology is also employed in the study by Do et al. (Elife, 2016; 5: e13214). We have added the rationale for selecting these brain regions in the main text (Line: 92).

      Regarding the 201 brain regions in Figures 3, 4, and 5: We delved deeper, examining the 6th-level brain regions, a common granularity in neuroscience research. This detailed view allowed us to highlight specific areas, like the CeA and PVH (Line:129).

      Finally, for Figures 6 and 7, we selected 63 regions that were activated by both ISO and KET, as well as regions previously reported to be related to the mechanism of general anesthesia(Leung, L, et al., Progress in neurobiology, 2014; 122: 24-44) (Line: 220). Using these regions, we analyzed the correlation of c-Fos expression, aiming to construct a functional brain network with strong positive connections.

      We hope this clarifies our approach and the rationale behind our region selection at each stage of the study. Thank you for your attention to this detail.

      11) The statistical analysis does not seem appropriate considering the high number of comparisons. They use simple t-tests without correction for multiple comparisons.

      Thank you for pointing out the concern regarding our statistical analysis. In the revised manuscript, we addressed the issue of multiple comparisons correction in our t-tests. We adopted the statistical methods detailed in the papers by Renier, N, et al., Cell, 2016; and Benjamini, Y, and Y Hochberg, 1995. P-values were adjusted for multiple comparisons using the two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli, with a false discovery rate (FDR) threshold (Q) of 0.05. This approach is now explained in the Materials and Methods section (Line: 434). After this adjustment, the brain regions we initially identified remained statistically significant. Furthermore, we revisited the original immunohistochemical images to confirm the differences in c-Fos cell expression between the experimental and control groups, reinforcing our conclusions.

      12) There is no statistical analysis in Fig 2C。

      Thank you for bringing to our attention the lack of statistical analysis in Fig 2C. We have now added the relevant statistical data in Supplementary Table 1 and provided annotations in Fig 2C to reflect this.

      Reviewer #2

      1) The authors report 987 brain regions in the introduction, but I cannot find any analysis that incorporates these or even which regions they are. Very little rationale is provided for the regions included in any of the analyses and numbers range from 53 in Figure 1, to 201 in Figure 3, to 63 in Figure 6. It would help if the authors could first survey Fos+ counts across all regions to identify a subset that is of interest (significantly changed by either condition compared to control) for follow up analysis.

      Thank you for your insightful comments on the number of brain regions analyzed in our study.

      987 Brain Regions: The reference to 987 brain regions from the standard mouse atlas (http://atlas.brain-map.org/) represents the entire categorization of the mouse brain across nine levels. We recognize that a comprehensive analysis of all these regions would be valuable, but to ensure clarity and depth, we took a focused approach.

      Region Selection Rationale:

      Figure 2: Concentrated on 5th-level brain regions (53 areas), inspired by methods from Do et al. (eLife, 2016;5:e13214). This provided a broad overview of c-Fos expression differences. Figures 4 and 5: Delved into 6th-level brain regions (201 areas), a common practice in neuroscience for more detailed study. Figure 6: We focused on 63 regions, which encompass not only the regions activated by both ISO and KET but also those previously reported to be associated with the mechanisms of general anesthesia. Methodological Approach: Our region selection was rooted in identifying areas with significant changes under anesthetic conditions compared to controls. This staged approach allowed a targeted analysis of the most affected regions, ensuring robust conclusions.

      Enhancements: We've incorporated comparative analyses of activated brain regions at different hierarchical levels in Figures 4 and 5. For clearer comprehension, we’ve added clarifications in the manuscript at Lines: 92, 130, and 220.

      2) Different data transformations are used for each analysis. One that is especially confusing is the 'normalization' of brain regions by % of total brain activation for each animal prior to PCA analysis in Figures 2 and 3. This would obscure any global differences in activation and make it unlikely to observe decreases in activation (which I think is likely here) that could be identified using the Fos+ counts after normalizing for region size (ie. Fos+ count / mm3) which is standard practice in such Fos-based activity mapping studies. While PCA can be powerful approach to identify global patterns, the purpose of the analysis in its current form is unclear. It would be more meaningful to show that regional activation patterns (measured as counts/mm3) are on separate PCs by group.

      Thank you for your thoughtful comments. We regret any confusion caused by our initial presentation. For the PCA analysis in Figures 2A and 3A, we calculated the ratio of cell density in each brain region to the overall brain density, and then applied a logarithmic transformation to this ratio. Our approach in Figure 2C was to use the proportion of c-Fos cell counts in individual brain regions to the total cell counts throughout the brain. This methodology considers variations in overall c-Fos cell counts across animals, effectively mitigating potential biases due to differential global activation levels across subjects.

      Furthermore, our direct comparison of differences in c-Fos cell counts between ISO, KET, and their respective control groups in Figures 4 and 5 addresses your concerns about potential decreases in activation. Notably, we did not identify any brain regions with significant suppression in these figures, which is consistent with the trends observed post-normalization in Figure 2C.

      Given your feedback, we conducted another PCA using cell densities for each region (counts/mm3). However, we found significant variability and non-normal distribution of c-Fos density across the groups, leading to extensive data dispersion. Consequently, normalizing the cell counts across regions and then applying a logarithmic transformation before PCA might be more appropriate.

      Author response image 4.

      Additionally, our exploration of regional activation patterns using PCA analysis for ISO and KET separately, based on the logarithm ratio of the c-Fos density, revealed that there was no distinct clustering feature among the different brain regions (as illustrated in Author response image 5: colors represented distinct brain regions, while the shapes were indicative of different clusters). This observation further suggests that our original statistical approach might be more suitable.

      Author response image 5.

      3) Critical problem: The authors include a control group for each anesthetic (ketamine vs. saline, isofluorane vs. homecage) but most analyses do not make use of the control groups or directly compare Fos+ counts across the groups. Strictly speaking, they should have compared relative levels of induction by ketamine versus induction by isoflurane using ANOVAs. Instead, each type of induction was separate from the other. This does not account for increased variability in the ketamine versus isoflurane groups. There is no mention in the Statistics section or in Results section that any multiple comparison corrections were used. It appears that the authors only used Students t-test for each region and did not perform any corrections.

      We appreciate the reviewer's insights and have addressed your concerns:

      Given the pronounced difference in c-Fos cell count expression between the KET and ISO groups, a direct comparison of Fos+ counts may not effectively capture their inherent disparities. To better highlight these distinctions, we used the logarithm ratio of c-Fos density in our PCA analysis (Figure 3), mitigating potential disparities in overall cell counts between samples and emphasizing relative variations. However, in response to your feedback, we've included additional analyses. Author response image 6 depicts the c-Fos density (cells/mm^3) across different brain regions for the home cage, ISO, saline, and KET groups, with regions like the cerebral cortex, cerebral nuclei, thalamus, and others differentiated by shaded backgrounds. Data are represented as mean ± SEM. We performed a one-way ANOVA followed by Tukey’s post hoc test, marking significant differences between ISO and KET with asterisks: P < 0.001, P < 0.01, P < 0.05.

      Regarding multiple comparison corrections, we've conducted thorough analyses on the data in Figure 2C and Figures 4, 5, and 6, implementing multiple comparison corrections. The detailed methodology is provided in the “Statistical analysis” section.

      Author response image 6.

      4) Figures 4 and 5 show brain regions 'significantly activated' following KET or ISO respectively, but again a subset of regions are shown and the stats seem to be t-tests with no multiple comparisons correction. It would help to show these two figures side by side, include the same regions, and keep the y axis ranges similar so the reader can easily compare the 'activation patterns' across the two treatments. Indeed, it looks like KET/Saline induced activation is an order or magnitude or two higher than ISO/Homecage. I would also recommend that this be the first data figure before any other analyses and maybe further analysis could be restricted to regions that are significantly changed in following KET or ISO here.

      Thank you for your constructive feedback regarding Figures 4 and 5.

      Comparison and Presentation of Figures 4 and 5: We acknowledge your suggestion to present these figures side by side for easier comparison. In the supplementary figure provided in the previous question, we've placed Figures 4 and 5 adjacent to each other, with consistent y-axis ranges, ensuring that readers can make direct comparisons between the activation patterns elicited by KET and ISO.

      Statistical Concerns and Region Selection: As mentioned in our previous response, we have conducted multiple comparison corrections on the data presented in Figures 4 and 5. Detailed procedures are elaborated in the “Statistical analysis” section. We believe this approach addresses your concerns regarding the use of t-tests without corrections for multiple comparisons.

      Difference in Activation Levels: We observed that the c-Fos activation due to KET is significantly higher than that from ISO. When presented side-by-side using the same scale, ISO activations appear less prominent, potentially mask subtle differences in the activation patterns of ISO, particularly if both KET and ISO showed changes in the same direction in certain brain regions but differed in magnitude. To address this, we used the proportion of c-Fos cell counts in Figure 2C, the logarithm ratio of c-Fos density in Figure 2A and Figure 3. This method emphasizes the relative changes, rather than absolute values, giving a more balanced view of the effects of each treatment.

      5) Analyses in Figure 6 and 7 are interesting but again the choice of regions to include is unclear and makes interpreting the results impossible. For example, in Figure 7 it is unclear why the list of regions in bar graphs showing Degree and Betweenness Centrality are not the same even within a single row?

      Thank you for your pertinent observation. The choice of brain regions in Figures 6 and 7 was carefully determined based on two main criteria: regions that were significantly activated by ISO or KET within the scope of our study, and those previously reported to be associated with anesthesia mechanisms and sleep-wake regulation.

      Regarding your second concern on Figure 7, the discrepancies observed in the x-axes of the bar graphs arise from our methodological approach. We prioritized presenting the top 20% of regions based on their Degree or Betweenness Centrality values. By separately ranking these regions from highest to lowest, the regions presented for each metric inherently differ. This approach was taken to elucidate nodes that consistently emerge as significant across both metrics, thereby highlighting core nodes in the functional network. Were we to use a consistent x-axis without this ranking, it would not only necessitate a more extensive presentation but might also dilute the emphasis on key information. To clarify this methodology and its rationale for our readers, we have expanded upon this in the manuscript at Line 243.

      We hope these clarifications address your concerns and facilitate a clearer understanding of our findings.

      Reviewer #1 (Recommendations For The Authors):

      Minor points

      1) In Table 1: the separation of which substructures belong to which brain structure is not clear

      2) Line 132 on page 3 seems to repeat the sentence earlier in the paragraph "KET predominantly affects brain regions within the cerebral cortex (CTX), while significantly inhibiting the hypothalamus, midbrain, and hindbrain."

      3) Typos

      a) Line 99/100 and 130 Central nucleus (CNU) should be cerebral nucleus

      b) Comma on line 166

      c) Fig. 4D: KET instead of Keta

      d) Line 263 "ep"

      e) Line 332: 35" "ml (add space)

      4) Will data and code be made available?

      Thank you for your detailed feedback.

      1. We have revised Table 1 to clarify which substructures belong to which brain structures.

      2. We acknowledge the redundancy and have now edited line 139 on page 3 to remove the repeated sentence regarding the effects of KET on brain regions.

      3. We have addressed the typos you pointed out:

      a. The terms "Central nucleus (CNU)" have been corrected to "cerebral nucleus."

      b. The comma issue on line 166 has been rectified.

      c. In Fig. 4D, we have corrected "Keta" to "KET."

      d. We have corrected the typo "ep" on line 263.

      e. A space has been added between "35" and "ml" on line 332 as you indicated.

      1. Regarding the availability of data and code, we are currently conducting additional analyses related to this study. Once these analyses are completed, we will be more than happy to make the data and code available.

      Thank you for assisting us in improving our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      6) The term 'whole-brain mapping' in the title suggests that the mapping was performed on 'intact brains' where in fact serial sections were used here. Maybe the authors could change to 'brain-wide mapping' to align better with the study.

      Thank you for your insightful comments.

      We have revised the title as suggested, changing "whole-brain mapping" to "brain-wide mapping".

      7) It is unclear if the mice were kept under anesthesia for the 90-min duration and how the authors monitored the level of sedation. Additionally, if the KET mice were already sedated why were they further sedated with ISO before perfusions and tissue extraction? The methods should be clarified and any potential confounds discussed.

      To maintain consistency in the experimental protocol and to reduce stress reactions in the mice, ISO was used before perfusion in all cases. However, this does not affect c-Fos expression as the expression of c-Fos protein starts 20-30 minutes after stimulation (Lara Aparicio, S Y, et al., NeuroSci, 2022; 3(4): 687-702).

      We appreciate your guidance in enhancing the clarity of our manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation: Minor corrections.

      1) The authors should delve deeper into the molecular mechanisms underlying the observed effects, particularly the changes associated with NMDA and GABA receptors. Exploring these mechanisms would provide a more comprehensive understanding of how Ketamine and Isoflurane modulate neural activity and induce anesthesia.

      2) The clinical relevance of these findings has not been sufficiently addressed. It would be valuable to elaborate on how the current research outcomes could potentially lead to changes in current anesthesia practices. For instance, identifying the distinct pathways of action for Ketamine and Isoflurane could aid anesthesiologists in selecting the most appropriate anesthetic based on the specific needs of individual patients or surgical procedures.

      3) Both Ketamine and Isoflurane have been associated with neurotoxicity. It is important to discuss how the c-Fos activation induced by these anesthetics could contribute, at least partially, to anesthesia-related neurotoxicity. Examining the potential neurotoxic effects would provide a more comprehensive understanding of the risks associated with these anesthetics and aid in the development of safer anesthesia protocols.

      Thank you for your valuable suggestions.

      Regarding the three points (1, 2, and 3) you've raised, we fully recognize their significance. In the current study, our primary focus was on the differential impacts of Isoflurane and Ketamine on widespread c-Fos expression in the brain. However, we indeed acknowledge the importance of delving deeper into these mechanisms and their clinical relevance. Therefore, we intend to explore these critical issues in greater detail in our future research endeavors.

      We appreciate your feedback, which provides constructive guidance for our subsequent research directions.

    1. Author Response

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”.

      We also thank them for a careful reading and useful comments to improve the manuscript. We will build on this input to provide an improved version of the manuscript that will hope to submit soon to eLife along with our point by point answer.

    1. Author Response

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We will work to address each comment and suggestion offered by the Reviewers in a revision.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computer-simulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the range-expansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and male-biased dispersal system, as we discuss in L254–265. Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We will make sure to better introduce this important conceptual information in our revision.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (informationupdating) and lambda (risk-sensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions (we note this does not imply that the two cannot influence one another i.e., co-vary on the latent scale). Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we will incorporate into our revision, clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach. We will do better in our revision. As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We will work to make the above points on the insight afforded by agent-based forward simulation explicitly clear in our revision.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might much-needed population replicates—see L270), but our Bayesian models still allow us to learn a lot from our current data.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L53–56 we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We will work towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65, may apply to animals inhabiting urban environments more broadly.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not specifying that the review we cite in L42 by Lee & Thornton (2021) covers additional studies on cognition in both urban invasive species as well as urban-dwellers versus nonurban counterparts—we will remedy this omission in our revision. We will also revise our labelling of the lizard species. We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urban-dwelling and non-urban counterparts. Finally, the Reviewer’s general suggestion is a good one—we will work to add this biological clarity to our revision.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We will take care in our revision to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect meaningful behavioural or mechanistic population-level differences in grackles’ learning. Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we will revise our wording. As far as how our evolutionary results relate to the rest of the paper, these results suggest successful urban living should favour risk-sensitive learning, and our other analyses in our paper reveal male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—show pronounced risk-sensitive learning, so it appears risk-sensitive learning is a winning strategy for urban-invading male grackles and urban-invasion leaders more generally (we note, of course, other factors undoubtedly contribute to grackles’ urban invasion success, as discussed in ‘Ideas and speculation’; see also our first response to R1). We will work to make these links clearer in our revision. Finally, please see our above response on the inferential sufficiency of our sample size.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. In our revision, we will work to add further clarity, and to temper our tone.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript tried to answer a long-standing question in an important research topic. I read it with great interest. The quality of the science is high, and the text is clearly written. The conclusion is exciting. However, I feel that the phenotype of the transgenic line may be explained by an alternative idea. At least, the results should be more carefully discussed.

      We thank the reviewer #1 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions provided by the reviewer. Here is a point-by-point response to the reviewer's specific and other minor comments.

      Specific comments:

      1) Stability or activity (Fv/Fm) was not affected in PSII with the W14F mutation in D1. If W14F really represents the status of PSII with oxidized D1, what is the reason for the degradation of almost normal D1?

      In this study, we used W14F mutation to mimic Trp-14 oxidation. The W14F mutant did not affect the stability and photosynthetic activity under normal growth conditions. However, the W14F mutant showed increased D1 degradation and reduced Fv/Fm values under high light. These results suggested that the W14F mutant has almost normal D1 protein stability under growth light conditions, as pointed out by the reviewer.

      However, it should be noted that D1 protein in the W14F strain rapidly degraded under high light. In the discussion part, we mentioned the possibility that other OPTMs may have additive effects on D1 degradation. Synergistic effects such as different amino acid oxidations may cause D1 degradation, and among those oxidative damages, W14 oxidation would be a key signal for D1 degradation by FtsH.

      2) To focus on the PSII in which W14 is oxidized, this research depends on the W14F mutant lines. It is critical how exactly the W-to-F substitution mimics the oxidized W. The authors tried to show it in Figure 5. Because of the technical difficulty, it may be unfair to request more evidence. But the paper would be more convincing with the results directly monitoring the oxidized D1 to be recognized by FtsH.

      We agree that confirming the direct interaction of oxidized D1 protein with FtsH provides more robust evidence. However, since FtsH progressively degrades the trapped substrate, it would be quite a challenging attempt to capture that moment. There are also technical limitations to obtaining sufficient substrate using Co-IP to compare its oxidation state. We included your suggested point in the discussion part. Thank you for your valuable suggestion.

      3) Figure 3. If the F14 mimics the oxidized W14 and is sensed by FtsH, I would expect the degradation of D1 even under the growth light. The actual result suggests that W14F mutation partially modifies the structure of D1 under high light and this structural modification of D1 is sensed by FtsH. Namely, high light may induce another event which is recognized by FtsH. The W14F is just an enhancer.

      Our results indicated that W14 oxidation is one of the keys to D1 degradation. On the other hand, we agree with the possibility that the reviewer points out. There is the possibility that factors other than W14 may act synergistically to promote D1 degradation. High light triggered more D1 degradation in W14F, suggesting that unknown factor(s) may be required for D1 degradation, e.g., oxidative modification at other sites and/or conformational changes of PSII under the high light. However, the current data that we have cannot reveal. We have incorporated the reviewer's comment and discussed it in the discussion part.

      Reviewer #2 (Public Review):

      In their manuscript, Kato et al investigate a key aspect of membrane protein quality control in plant photosynthesis. They study the turnover of plant photosystem II (PSII), a hetero-oligomeric membrane protein complex that undertakes the crucial light-driven water oxidation reaction in photosynthesis. The formidable water oxidation reaction makes PSII prone to photooxidative damage. PSII repair cycle is a protein repair pathway that replaces the photodamaged reaction center protein D1 with a new copy. The manuscript addresses an important question in PSII repair cycle - how is the damaged D1 protein recognized and selectively degraded by the membrane-bound ATP-dependent zinc metalloprotease FtsH in a processive manner? The authors show that oxidative post-translational modification (OPTM) of the D1 N-terminus is likely critical for the proper recognition and degradation of the damaged D1 by FtsH. Authors use a wide range of approaches and techniques to test their hypothesis that the singlet oxygen (1O2)-mediated oxidation of tryptophan 14 (W14) residue of D1 to N-formylkynurenine (NFK) facilitates the selective degradation of damaged D1. Overall, the authors propose an interesting new hypothesis for D1 degradation and their hypothesis is supported by most of the experimental data provided. The study certainly addresses an elusive aspect of PSII turnover and the data provided go some way in explaining the light-induced D1 turnover. However, some of the data are correlative and do not provide mechanistic insight. A rigorous demonstration of OPTM as a marker for D1 degradation is yet to be made in my opinion. Some strengths and weaknesses of the study are summarized below:

      We thank reviewer #2 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions pointed out as weaknesses by reviewer #2. Other minor comments were also answered in a point-by-point response.

      Strengths:

      1) In support of their hypothesis, the authors find that FtsH mutants of Arabidopsis have increased OPTM, especially the formation of NFK at multiple Trp residues of D1 including the W14; a site-directed mutation of W14 to phenylalanine (W14F), mimicking NFK, results in accelerated D1 degradation in Chlamydomonas; accelerated D1 degradation of W14F mutant is mitigated in an ftsH1 mutant background of Chlamydomonas; and that the W14F mutation augmented the interaction between FtsH and the D1 substrate.

      2) Authors raise an intriguing possibility that the OPTM disrupts the hydrogen bonding between W14 residue of D1 and the serine 25 (S25) of PsbI. According to the authors, this leads to an increased fluctuation of the D1 N-terminal tail, and as a consequence, recognition and binding of the photodamaged D1 by the protease. This is an interesting hypothesis and the authors provide some molecular dynamics simulation data in support of this. If this hypothesis is further supported, it represents a significant advancement.

      3) The interdisciplinary experimental approach is certainly a strength of the study. The authors have successfully combined mass spectrometric analysis with several biochemical assays and molecular dynamics simulation. These, together with the generation of transplastomic algal cell lines, have enabled a clear test of the role of Trp oxidation in selective D1 degradation.

      4) Trp oxidative modification as a degradation signal has precedent in chloroplasts. The authors cite the case of 1O2 sensor protein EXECUTER 1 (EX1), whose degradation by FtsH2, the same protease that degrades D1, requires prior oxidation of a Trp residue. The earlier observation of an attenuated degradation of a truncated D1 protein lacking the N-terminal tail is also consistent with authors' suggestion of the importance of the D1 N-terminus recognition by FtsH. It is also noteworthy that in light of the current study, D1 phosphorylation is unlikely to be a marker for degradation as posited by earlier studies.

      Weaknesses:

      1) The study lacks some data that would have made the conclusions more rigorous and convincing. It is unclear why the level of Trp oxidation was not analyzed in the Chlamydomonas ftsH 1-1 mutant as done for the var 2 mutant. Increased oxidation of W14 OPTM in Chlamydomonas ftsH 1-1 is a key prediction of the hypothesis.

      We thank the reviewer for this valuable comment. We agree with the reviewer that the analysis of oxidized Trp level will reinforce the importance of Trp oxidation in the N-terminal of D1. In our preliminary experiment, we observed a trend toward increase of the kynurenine in Trp-14 in Chlamydomonas ftsH1-1 strain. However, we found large errors, and we could not conclude that this trend is significant. A possible reason for the large error was that the signal intensity of oxidized Trp was insufficient for quantification in a series of Chlamydomonas experiment. In addition, the fact that the amount of D1 in each culture was not stable also might be one reason. On the other hand, we keep note of a previous result that more fragmentation of D1 protein was observed in the Chlamydomonas ftsH1-1 mutant compared to that in Arabidopsis (Malnoë et al., Plant Cell 2014). This result suggests that an alternative D1 degradation pathway involving other proteases is more active in the Chlamydomonas ftsH1-1 mutant than in Arabidopsis var2 mutant. Furthermore, the Chlamydomonas ftsH1-1 mutant, caused by an amino acid substitution, still has a significant FtsH1/FtsH2 heterohexamer, and the level of FtsH1 and FtsH2 proteins increases significantly under high light irradiation. This is a significant difference from the Arabidopsis var2 mutant lacking FtsH2 subunit and showed reduced protein accumulation. These factors may explain to the lower detection levels of oxidized Trp in Chlamydomonas. We believe that improved sensitivity for detection of oxidized Trp peptides and more sophisticated experimental systems could solve this issue in the future.

      It is also unclear to me what is the rationale for showing D1-FtsH interaction data only for the double mutant but not for the single mutant (W14F).

      We thank the reviewer for the comment. As suggested by the reviewer, the analysis of the mutant crossing ftsH and W14F single mutation will provide more convincing evidence. Fig.3 showed that the photosensitivity in both W14F and W14FW317F was caused by the enhanced D1 degradation observed, which was due to the W14F mutation. Therefore, we crossed the ftsH mutant with W14FW317F, which has a more severe phenotype, to confirm whether FtsH is involved in this D1 degradation.

      Why is the FtsH pulldown of D2 not statistically significant (p value = {less than or equal to}0.1). Wouldn't one expect FtsH pulls down the RC47 complex containing D1, D2, and RC47. Probing the RC47 level would have been useful in settling this.

      For the immunoblot result of D2 and its statistical analysis, we answered in the following comment; No.2 in the reviewer's comment in Recommendations For The Authors.

      We agree with the reviewer's suggestion that further immunoblot analysis for CP47 protein would help our understanding of FtsH and RC47 interaction. Indeed, we attempted the immunoblot analysis of CP47 after the FtsH Co-IP experiment. However, the detection of CP43 protein was not sensitive enough. This reason may be due to the lower titer of the CP47 antibody compared to the D1 and D2 antibodies.

      A key proposition of the authors' is that the hydrogen bonding between D1 W14 and S25 of PsbI is disrupted by the oxidative modification of W14. Can this hypothesis be further tested by replacing the S25 of PsbI with Ala, for example?

      It is an interesting question whether amino acid substitution in PsbI-S25 affects the stability of D1-N-term and its degradation by FtsH. We would like to analyze the possibility in the future. We thank the reviewer for this helpful suggestion.

      2) Although most of the work described is in vivo analysis, which is desirable, some in vitro degradation assays would have strengthened the conclusions. An in vitro degradation assay using the recombinant FtsH and a synthetic peptide encompassing D1 N-terminus with and without OPTM will test the enhanced D1 degradation that the authors predict. This will also help to discern the possibility that whether CP43 detachment alone is sufficient for D1 degradation as suggested for cyanobacteria.

      In vitro experimental systems are interesting. However, FtsH is known to function as a hexamer, which has not yet been successfully reconstituted in vitro. Therefore, it would not be easy to perform an in vitro experimental system using the N-terminal synthetic peptide of D1 as a substrate. Thank you for your valuable suggestions.

      3) The rationale for analyzing a single oxidative modification (W14) as a D1 degradation signal is unclear. D1 N-terminus is modified at multiple sites. Please see Mckenzie and Puthiyaveetil, bioRxiv May 04 2023. Also, why is modification by only 1O2 considered while superoxide and hydroxide radicals can equally damage D1?

      We agree with the possibility that oxidative modifications in other amino acids are also involved in the D1 degradation, as pointed out by the reviewer. We also thank the reviewer for pointing us to the interesting article of Mckenzie and Puthiyaveetil et al. that showed additional oxidations occurred in the D1-Nterminus, which we had yet to be aware of when we submitted our manuscript. It will be interesting to see how these amino acid oxidations work with W14 oxidation on D1 degradation in the future. The oxidation of Trp by 1O2 can serve as a substrate for FtsH, as in the case of EX1, so we focused on the analysis of Trp oxidation. Single oxygen is believed to be the potential reactive species of Trp oxidation. However, the detected oxidative modifications in this study were not exactly sure depended on singlet oxygen. Thus, we changed several sentences that mention tryptophan oxidation by single oxygen.

      4) The D1 degradation assay seems not repeatable for the W14F mutant. High light minus CAM results in Fig. 3 shows a statistically significant decrease in D1 levels for W14F at multiple time points but the same assay in Fig. 4a does not produce a statistically significant decrease at 90 min of incubation. Why is this? Accelerated D1 degradation in the Phe mutant under high light is key evidence that the authors cite in support of their hypothesis.

      In Fig. 4a, the p-value comparing the D1 level at 90 min between control and W14F was 0.1075. This value is slightly larger than 0.1. The result that one of the control experiments showed a decrease in D1 level relative to 0 h might cause this value. Given that the D1 level of the remaining three of the four replicates was unchanged in the control experiments, it can be considered an outlier. We believe the results do not affect our hypothesis that the earlier D1 degradation is occurred in W14F.

      5) The description of results at times is not nuanced enough, for e.g. lines 116-117 state "The oxidation levels in Trp-14 and Trp-314 increased 1.8-fold and 1.4-fold in var2 compared to the wild type, respectively (Fig. 1c)" while an inspection of the figure reveals that modification at W314 is significant only for NFK and not for KYN and OIA.

      In this sentence, we described the result that is compared with the oxidized peptide levels calculated from all Trp-oxidized derivatives. However, as pointed out by the reviewer, it was not correct to explain the result of Fig.1C. We corrected the sentence following the reviewer's suggestion as below;“The levels of Trp-oxidized derivatives, OIA, NFK, and KYN in Trp-14 and the level of KYN in Trp-314 were significantly increased in var2 compared to the wild type, respectively (Fig. 1c). "

      Likewise, the authors write that CP43 mutant W353F has no growth phenotype under high light but Figure S6 reveals otherwise. The slow growth of this mutant is in line with the earlier observation made by Anderson et al., 2002.

      As pointed out by the reviewer, the growth of W353F seems to be a little slow under HL. We have changed our description of the result part. However, we still conclude that CP43 had little impact on the PSII repair, because the impaired growth in W353F is not as severe as those in W14F and W14F/W317F under HL

      In lines 162-163, the authors talk about unchanged electron transport in some site-directed mutants and cite Fig. 2c but this figure only shows chl fluorescence trace and nothing else.

      We agreed with the reviewer's suggestion and changed the sentence. In this study, we did not perform detailed photosynthetic analysis. Based on the analysis of phototrophic growth, oxygen-evolving activity, and Chl fluorescence, we concluded that overall photosynthetic activity was not a significant difference in the mutants.

      6) The authors rightly discuss an alternate hypothesis that the simple disassembly of the monomeric core into RC47 and CP43 alone may be sufficient for selective D1 degradation as in cyanobacteria. This hypothesis cannot yet be ruled out completely given the lack of some in vitro degradation data as mentioned in point 2. Oxidative protein modification indeed drives the disassembly of the monomeric core (Mckenzie and Puthiyaveetil, bioRxiv May 04 2023).

      Thanks for your suggestion. We added a discussion of PSII disassembly by ROS-induced oxidation to the discussion part, and the reference is added.

      Reviewer #3 (Public Review):

      Light energy drives photosynthesis. However, excessive light can damage (i.e., photo-damage) and thus inactivate the photosynthetic process. A major target site of photo-damage is photosystem II (PSII). In particular, one component of PSII, the reaction center protein, D1, is very suspectable to photo-damage, however, this protein is maintained efficiently by an elaborate multi-step PSII-D1 turnover/repair cycle. Two proteases, FtsH and Deg, are known to contribute to this process, respectively, by efficient degradation of photo-damaged D1 protein processively and endoproteolytically. In this manuscript, Kato et al., propose an additional step (an early step) in the D1 degradation/repair pathway. They propose that "Tryptophan oxidation" at the N-terminus of D1 may be one of the key oxidations in the PSII repair, leading to processive degradation of D1 by FtsH. Both, their data and arguments are very compelling.

      The D1 protein repair/degradation pathway in its simplest form can be defined essentially by five steps: (1) migration of damaged PSII core complex to the stroma thylakoid, (2) partial PSII disassembly of the PSII core monomer, (3) access of protease degrading damaged D1, (4) concomitant D1 synthesis, and (5) reassembly of PSII into grana thylakoid. An enormous amount of work has already been done to define and characterize these various steps. Kato et al., in this manuscript, are proposing a very early yet novel critical step in D1 protein turnover in which Tryptophan(Trp) oxidation in PSII core proteins influences D1 degradation mediated by FtsH.

      Using a variety of approaches, such as mass-spectrometry (Table 1), site-directed mutagenesis (Figures 2-4), D1 degradation assays (Figures 3, and 4), and simulation modeling (Figure 5), Kato et al., provide both strong evidence and reasonable arguments that an N-terminal Trp oxidation may be likely to be a 'key' oxidative post-translational modification (OPTM) that is involved in triggering D1 degradation and thus activating the PSII repair pathway. Consequently, from their accumulated data, the authors propose a scenario in which the unraveling of the N-terminal of the D1 protein facilitated by Trp oxidation plays a critical 'recognition' role in alerting the plant that the D1 protein is photo-damaged and thus to kick start the processive degradation pathway initiated possibly by FtsH. Coincidently, Forsman and Eaton-Rye (Biochemistry 2021, 60, 1, 53-63), while working with the thermophilic cyanobacterium, Thermosynechococcus vulcanus, showed that when the N-terminal DE-loop of the D1 protein is photo-damaged that occurs which may serve as a signal for PSII to undergo repair following photodamage. While the activation of the processive degradation pathways in Chlamydomonas versus Thermosynechococcus vulcanus have significant mechanistic differences, it's interesting to note and speculate that the stability of the N-terminal of their respective D1 proteins seems to play a critical role in 'signaling' the PSII repair system to be activated and initiate repair. But it's complicated. For instance, significant Trp oxidation also occurs on the lumen side of other PSII subunits which may also play a significant role in activating the repair processes as well. Indeed, Kato et al.,( Photosynthesis Research volume 126, pages 409-416 (2015)) proposed a two-step model whereby the primary event is disruption of a Mn-cluster in PSII on the lumen side.

      A secondary event is damage to D1 caused by energy that is absorbed by chlorophyll. But models adapt, change, and get updated. And the data provided by Kato et al., in this manuscript, gives us a unique glimpse/snapshot into the importance of the stability of the N-terminal during photo-damage and its role in D1-turnover. For instance, the author's use site-directed mutagenesis of Trp residues undergoing OPTM in the D1 protein coupled with their D1 degradation assays (Figure 3 and 4), provides evidence that Trp oxidation (in particular the oxidation of Trp14) in coordination with FtsH results in the degradation of D1 protein. Indeed, their D1 degradation assays coupled with the use of a ftsh mutant provide further significant support that Trp14 oxidation and FtsH activity are strongly linked. But for FstH to degrade D1 protein it needs to gain access to photo-damaged D1. FtsH access to D1 is achieved by having CP43 partially dissociate from the PSII complex. Hence, the authors also addressed the possibility that Trp oxidation may also play a role in CP43 disassembly from the PSII complex thereby giving FtsH access to D1. Using a site-directed mutagenesis approach, they showed that Trp oxidation in CP43 appeared to have little impact on the PSII repair (Supplemental Figure S6). This result shows that D1-Trp14 oxidation appears to be playing a role in D1 turnover that occurs after CP43 disassembly from the PSII complex. Alternatively, the authors cannot exclude the possibility that D1-Trp14 oxidation in some way facilitates CP43 dissociation. Further investigation is needed on this point. However, D1-Trp14 oxidation is causing an internal disruption of the D1 protein possibly at the N-terminus of the protein. Consequently, the role of Trp14 oxidation in disrupting the stability of the N-terminal domain of the D1 protein was analyzed computationally. Using a molecular dynamics approach (Figure 5), the authors attempted to create a mechanistic model to explain why when D1 protein Trp14 undergoes oxidation the N-terminal domain of D1protein becomes unraveled. Specifically, the authors propose that the interaction between D1 protein Trp14 with PsbI Ser25 becomes disrupted upon oxidation of Trp14. Consequently, the authors concluded from their molecular dynamics simulation analysis that " the increased fluctuation of the first α-helix of D1 would give a chance to recognize the photo-damaged D1 by FtsH protease". Hence, the author's experimental and computational approaches employed here develop a compelling early-stage repair model that integrates 1) Trp14 oxidation, 2) FtsH activation and 3) D1- turnover being initiated at its N-terminal domain. However, a word of caution should be emphasized here. This model is just a snapshot of the very early stages of the D1 protein turnover process. The data presented here gives us just a small glimpse into the unique relationship between Trp oxidation of the D1 protein which may trigger significant N-terminal structural changes of the D1 protein that both signals and provides an opportunity for FstH to begin protease digestion of the D1 protein.

      However, the authors go to great lengths in their discussion section to not overstate solely the role of Trp14 oxidation in the complicated process of D1 turnover. The authors certainly recognize that there are a lot of moving parts involved in D1 turnover. And while Trp14 oxidation is the major focus of this paper, the authors show in Supplemental Fig S4 the structural positions of various additional oxidized Trp residues in the Thermosynecoccocus vulcans PSII core proteins. Indeed, this figure shows that the majority of oxidized Trps are located on the luminal side of PSII complex clustered around the oxygen-evolving complex. So, while oxidized Trp14 may be involved in the early stages of D1 turnover certainly oxidized Trps on the lumen side are also more than likely playing a role in D1 turnover as well. To untangle this complex process will require additional research.

      Nevertheless, identifying and characterizing the role of oxidative modification of tryptophan (Trp) residues, in particular, Trp14, in the PSII core provides another critical step in an already intricate multi-step process of D1 protein turnover during photo-damage.

      We thank reviewer #3 for all the helpful comments and their supportive review of the manuscript.

      We thank the reviewer for raising this interesting study that ROS might disrupt the interaction between the PsbT and D1 in Thermosynechococcus vulcanus. The stroma-exposed DE-loop of D1 is one of the possible cleavage sites by Deg protease. Because the D1 cleavage by Deg facilitates the effective D1 degradation by FtsH under high-light conditions, it is interesting to elucidate Deg and FtsH cooperative D1 degradation further. We added this discussion in the manuscript. Other minor comments were also answered in a point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Other minor points

      4) L227. How do you eliminate the possibility of reduced stability under high light?

      D1 synthesis under HL as pointed out by the reviewer was not tested in this study. Therefore, we can not rule out the possibility of a reduced D1 synthesis rate under HL in the mutant. However, the rate of D1 turnover(coordinated degradation and synthesis) is increased under HL. Since the pulse-labeling experiment is affected D1 degradation as well as D1 synthesis, even if there is a difference in the rate of D1 synthesis under HL, we can not clearly distinguish whether the cause of reduced labeling is the increased D1 degradation seen in the W14F mutant or the delay in D1 synthesis. We thank the reviewer for this valuable comment.

      5) Ls25-26. It would be quite rare that P680 directly absorbs light energy.

      We changed the sentence.

      6) L28. intrinsic antenna? Is this commonly used? core antenna?

      Corrected to “core antenna”

      7) Ls4143. Because the process is described as step iii), it is curious to mention it again as other critical steps.

      We removed the sentence.

      8) L75. Is it correct? Do you mean damage is caused by inhibition?

      We changed the sentence to “…the disorder of photosynthesis…”

      9) Figure 1c. +4, +16 and +32 should be explained in the legend.

      We added the explanation in the legend.

      10) Supplementary Figures S1 and S2. Title. Is it true that oxidation depends on singlet oxygen? This is a question. If it is not experimentally proved, modify the expression.

      In general, singlet oxygen (1O2) is believed to contribute in vivo oxidation of Trp. However, as suggested, these detected oxidative modifications were not exactly sure depends on singlet oxygen. Thus, we changed the title of Fig S1 and S2.

      11) Figure 3. Correct errors in + or - in the Figure.

      Corrected

      12) L328. Cyc > Cys.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      1) A few suggestions on typos and style:

      • Lines 2-3, please rephrase the sentence. The meaning is unclear.

      rephased the sentence to “Photosynthesis is one of the most …”

      • Lines 28-29, "Despite its orchestrated coordination...". Tautology.

      We changed the sentence.

      • Line 31, "...one, known as the PSII repair...". Please rewrite.

      We followed the reviewer suggestion and changed the sentence to “…synthesized one in the PSII repair.”

      • Line 49, "Their family proteins...". Rephrase.

      Rephrased the words.

      • Lines 64-66, please rewrite. I am not sure what the authors imply here. Are they talking about FtsH turnover or regulation of FtsH at the protein or gene level?

      FtsH itself is also degraded under high-light stress. To compensate for this, ftsH gene expression is upregulated and contributes to the proper FtsH level in thylakoid membranes. We rewrote the sentence as follows “increased turnover of FtsH is crucial for their function under high-light stress. That is compensated by upregulated FtsH gene expression”.

      • Line 68, "...to dislocate their substrates..."

      We changed the sentence to “to pull their substrates and push them into the protease chamber by ATPase activity”

      • Line 86, N-formylkymurenine => N-formylkynurenine

      Corrected

      • Lines 111-112, "Consistent with previous results...". Please specify which studies are being referred to and cite them if relevant.

      We added references.

      • Line 114, "...in extracts Arabidopsis..." => "...in extracts of Arabidopsis...".

      Corrected

      • Line 171, "influences in high-light sensitivity." Please rephrase.

      We rephrased the sentence.

      • Line 192, Fv/Fm. "v" and "m" should be subscripts.

      Corrected

      • Line 210, "...encounters...". Unclear meaning.

      We rephrased the sentence.

      • Line 358, hyphen usage. "fine-tuned". This sentence should be rewritten to make the role of phosphorylation clear. "Fine-tuning" is vague.

      We changed the sentence to “…spatiotemporal regulation of D1 degradation”

      • Fig. 6 legend, luminal => lumenal

      Changed to luminal

      2) The statistical notation used for some results is confusing. In Fig. 6b, "*" stands for p = {less than or equal to}0.1 while in fig. 4 it denotes p = {less than or equal to}0.05. If this is not a typo, this usage deviates from the standard one. How is a D2 change in Fig. 6b significant given its p value of {less than or equal to}0.1? The Fig. 6b key for D2 does not correspond with the histogram pattern.

      Thank you for your comments and suggestions. The asterisk in the Figure 6b is not a typo. We revised p value sign for less than 0.05 with a single asterisk to avoid confusion. While the case of p value in less than 0.1, we applied section sign “§” instead of the single asterisk sign to avoid confusion. Generally accepted p value to indicate statistically difference is less than 0.05. We found that D1 was p = 0.03322 and D2 was p = 0.07418. As we suspect these p value differences, the results for D2 protein detection were somewhat fluctuating while not in D1 protein detection as you commented. Still the reason of the fluctuating result of D2 signal intensity is not clear yet, we found the p value was between 0.05 and 0.10. We also rewrite the description in the corresponding result part.

      3) There are no error bars in Fig. 5d while the error bars in Fig. 5e show that there are no significant differences between Cβ distances of W14F and W14ox with WT contrary to the authors' assertion in the text (lines 254-255).

      The reason that there are no error bars in Fig. 5d. is because the fluctuation value in Fig. 5d was calculated from the entire trajectory (i.e., all snapshots) of the MD simulation. In contrast, the Cβ-Cβ distance value can be obtained at each individual snapshot of the simulation. Thus, Fig. 5e shows the averaged distances with the standard deviations (the error bars) over all these snapshots. To prevent any confusion for the reader, we have explicitly described “averaged Cβ-Cβ distance” and added an explanation of the error bars in the caption of Fig. 5e. It is important to note that our focus in the text (lines 254-255) was not on comparing the Cβ-Cβ distance of W14F with that of W14ox but the distance of W14F or W14ox with that of WT.

      4) Figure 3 legends and figure labels do not correspond. Fig. 3b should be labeled as High light - Chloramphenicol and likewise, fig 3c should read growth light + Chloramphenicol to be consistent with the legend.

      Corrected

      5) How are OPTM levels of D1 Trp residues normalized? Is it against unmodified peptides or total proteins?

      Oxidation levels of three oxidative variants of Trp in Trp14 and Trp317 containing peptides were obtained by label-free MS analysis. Fig.1 shows the intensity values of oxidized variants of Trp14 and Trp317. In this analysis, the levels of unoxidized peptides were not significantly changed between var2 and WT.

      6) Fig. 1a cartoon might need work. It looks like the oxygen atom in OIA is misplaced.

      Corrected

      Reviewer #3 (Recommendations For The Authors):

      In regard to Table 1, the sequence of the mass spectra fragment listed for Trp14 (i.e., ENSSL(W)AR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S1 (i.e., ESESLWGR). Likewise, the sequence of the mass spectra fragment listed for Trp317 (i.e., VLNT(W)ADIINR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S2 (i.e., VINTWADIINR). This discrepancy, I think can be simply explained.

      Table 1 shows the newly detected peptide of Trp oxidation in PSII core protein in Chlamydomonas. On the other hand, Figures S1 and S2 are the results of MS analysis used for the level of Trp oxidation analysis in Arabidopsis var2 mutant, as shown in Fig. 1C. To avoid confusion, we added in the supplemental figure title that it was detected in Arabidopsis.

      Labeling: In Figure 3, the figure legend states that b, high-light in the absence of CAM; but panel b, shows +CAM conditions. I think this labeling is incorrect and needs to be -CAM. Likewise, the figure legend states that c, growth-light in the presence of CAM. I think this labeling is incorrect and needs to be +CAM.

      Corrected

      This reviewer has a few comments/suggestions on the presentation of the sequence alignments showing the various positions of oxidized Trps within the D1(Figure 1), D2 and CP43 (Supplemental Figure S3) and CP47 (Supplemental Figure S3):

      The authors should consider highlighting in red all the various Trps shown in Table 1 with the corresponding alignments shown in Figure 1 for D1 protein and corresponding alignments in Supplemental Figure S3 (for D2 and CP43) and Supplemental Figure S3 continued (For CP47). Highlighting the locations of oxidized Trps across various species is very informative but as presented here the red labeling somewhat is haphazard, confusing and thus these figures lose some of their impact factor. For instance, in Supplementary Fig. S4, the reader can visualize the structural positions of oxidized Trp residues in the Thermosynecoccocus vulcanus PSII core proteins. When one then looks at the various alignments presented by the authors, one can see that other species have a similar arrangement of oxidized Trp residues as well. Consequently, when you now collectively look at the data presented in Table 1, Figure 1, Supplemental Figure S3 and Supplemental Figure S4, a picture emerges that illustrates how common the phenomenon of overall Trp oxidation is and more specifically how oxidized Trp14 across species is playing a similar role in possibly activating D1 turnover. I think these Figures, if presented in a more comprehensive and unified fashion, will really add to the paper.

      Thank you for your suggestion. In this study, we tried to show the identified oxidized Trp by the MS-MS analysis, the residue conservation in the sequences, and its position in the structure. Since we have to show a lot of information, combining them into one figure is difficult. We hope you understand the reason for this.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study as a concept is well designed, although there are two issues I see in the methodology (these may be just needing further explanation or if I am correct in my interpretation of what was done, may need reanalysis to take into account). Both issues relate to the data that was extracted from the published literature on zoonotic malaria prevalence in the study area.

      1) No limit was set on the temporal range

      With no temporal limit on the range of studies, the landscape in many cases will have changes between the study being conducted and the spatial data. This will be particularly marked in areas where there has been clearing since the zoonotic malaria prevalence study. Also, population changes (either through population growth, decline or movement) will have occurred. All research is limited in what it can do with the available data, so I realise that there may not be much the authors can do to correct this. One possible solution would be to look at the land use change at each site between the prevalence study and the remote sensing data. I'm not sure if this is feasible, but if it is I would recommend the authors attempt this as it will make their results stronger.

      Thank you for the comments. We agree that matching the date of remote sensing data to samples is particularly important for environmental variables that change rapidly (such as forest loss). To clarify, no limit was set on the date range of the studies identified from the literature to ensure no articles were excluded due to arbitrary date restrictions. We have edited the manuscript to clarify this (line 422). Regarding landscape and environmental features, remote sensing data was extracted annually for every year for the full date range of the data (see Table 1 and S11, annual temporal resolution from 2006 to 2020). Forest was then matched contemporaneously (see lines 467–473) meaning that, insofar as it was possible, forest data was extracted for the same year as the data was collected. Where a date range was given for the primate data, the mean year was used. For human population density, covariate data were extracted for multiple years but were found to be relatively stable over the time period for the sites covered, so median year was used (see Supplementary Information, Appendix E and Table S11). Elevation is stable and typically only one time point is used as reference (in this instance the SRTM 90m Digital Elevation model, 2003).

      2) Most studies only gave a geographic area or descriptive location.

      The spatial analysis was based on a 5km and 20km radius of the 'study site' location, but for many of the studies the exact site is not known. Therefore the 'study site' was artificially generated using a polygon centroid. Considering that the polygon could be an administrative boundary (i.e., district/state/country), this is an extremely large area for which a 5km radius circle in the middle of the polygon is being taken as representative of the 'study site'. This doesn't make sense as it assumes that the landscape is uniform across the district, which in most cases it will not be (in rural areas it is going to be a mixture of villages, forest, plantation, crops etc which will vary across the landscape). This might just be a case of misunderstanding what was done (in which case the text needs rewording to make it clearer) or if I have interpreted it correctly the selection of the centroid to represent the study area does not make sense. I am not sure how to overcome this as it probably not possible to get exact locations for the study sites. One possibility could be to make the remote sensing data the same scale as the prevalence data ie if the study site is only identifiable at the polygon level, then the remote sensing data (fragmentation, cover and population) is used at the polygon level.

      Both these issues could have an impact on the study's findings. I would think that in both cases it might make the relationship between the environmental variables and prevalence even clearer.

      We would like to thank the reviewer for their concerns and provide some clarification on the methods used to extract environmental variables:

      • Centroid was initially explored, but not pursued for the same concerns raised by the reviewer. Taking the centroid would be arbitrary and the central point of a large polygon is not likely to be representative of habitat across the entire sampling area and introduces error so this was not pursued(Cheng et al., 2021). We have clarified the wording in the manuscript with reference to centroids to avoid confusion on this point (line 491).

      • We demonstrate a method to account for the lack of precise geolocation by taking 10 ‘pseudo-sampling’ points instead of a single random location, with environmental variables extracted at 5, 10 and 20km for each site (lines 487-500). By including 10 environmental realisations, surveys conducted in smaller or more uniform landscapes will have more consistent covariates and this will lend more weight to the model. Conversely, samples taken from large administrative polygons are likely to be highly variable, and these associations will have less representation in the final model. This approach was used to demonstrate an alternative to using a single arbitrary site to represent the area.

      To further support the validity of this technique:

      • Figures illustrating the variance of the environmental variables across the 10 sampling sites at 5, 10 and 15km for GADM administrative classifications at country level (GID0), state (GID1), district (GID2) and exact coordinates (GPS) are now included in the SI (Figure S12).

      • Sensitivity analyses were conducted, in which final GLMM models were fit again but using only acceptable levels of variance in environmental variables and/or acceptable size of administrative boundary (Table S15 and S16). In sensitivity analyses, forest cover and fragmentation retained a significant effect on prevalence of P. knowlesi in macaques, suggesting this effect is robust to spatial uncertainty.

      We would also like to highlight that the main finding of this research is the novel synthesis of regional prevalence of P. knowlesi in simian reservoirs across Southeast Asia, which was formerly assumed to be ubiquitous high prevalence, and which can now be used to inform regionally specific transmission modelling, better estimate spatial risk and parameterise early warning systems for P. knowlesi malaria in countries approaching elimination of human malarias. The risk factor analysis here is provided to begin to understand what may be driving this geographic heterogeneity in P. knowlesi prevalence at finer scales and demonstrate methods that could be used to accommodate spatial uncertainty in secondary data. We appreciate that this may not have been clear and have edited the manuscript accordingly.

      Reviewer #2 (Public Review):

      This is the first comprehensive study aimed at assessing the impact of landscape modification on the prevalence of P. knowlesi malaria in non-human primates in Southeast Asia. This is a very important and timely topic both in terms of developing a better understanding of zoonotic disease spillover and the impact of human modification of landscape on disease prevalence.

      This study uses the meta-analysis approach to incorporate the existing data sources into a new and completely independent study that answers novel research questions linked to geospatial data analysis. The challenge, however, is that neither the sampling design of previous studies nor their geospatial accuracy are intended for spatially-explicit assessments of landscape impact. On the one hand, the data collection scheme in existing studies was intentionally opportunistic and does not represent a full range of landscape conditions that would allow for inferring the linkages between landscape parameters and P. knowlesi prevalence in NHP across the region as a whole. On the other hand, the absolute majority of existing studies did not have locational precision in reporting results and thus sweeping assumptions about the landscape representation had to be made for the modeling experiment. Finally, the landscape characterization was oversimplified in this study, making it difficult to extract meaningful relationships between the NHP/human intersection on the landscape and the consequences for P. knowlesi malaria transmission and prevalence.

      Thank you for the feedback on the manuscript. We agree that the data was not originally intended for spatial assessment of landscape impact nor represents a full range of landscape conditions across the region. However, we would like to highlight the first set of results from the meta-analysis. Here, the synthesis of all available data allows for the detection of regional disparities and geographic heterogeneity of prevalence in host species, which individual small-scale opportunistic studies are not powered to do, and which had not been identified before this investigation.

      In this context, the risk factor analysis is an exploratory analysis to understand what may be driving the observed geographic variation at broad scales as well as provide a framework for dealing with spatial uncertainty. Landscape data was extracted at a level deemed appropriate given the limitations of the data. The majority were geolocated to district level and sensitivity analysis showed a reasonable consistency of landscape features at our chosen scales (Table S8, Figure S12A). To address some of these concerns, we conducted further analysis to explore the deviation of environmental covariates in each sampling area and ran sensitivity analysis by removing extremely variable datapoints (Table S15 and Table S16). When removing highly uncertain data and/or countrylevel data, effects of canopy cover on non-human primate malaria prevalence is retained, supporting the original findings.

      Despite many study limitations, the authors point to the critical importance of understanding vector dynamics in fragmented forested landscapes as the likely primary driver in enhanced malaria transmission. This is an important conclusion particularly when taken together with the emerging evidence of substantially different mosquito biting behaviors than previously reported across various geographic regions.

      Another important component of this study is its recognition and focus on the value of geospatial analysis and the availability of geospatial data for understanding complex human/environment interactions to enable monitoring and forecasting potential for zoonotic disease spillover into human populations. More multi-disciplinary focus on disease modeling is of crucial importance for current and future goals of eliminating existing and preventing novel disease outbreaks.

      Reviewer #1 (Recommendations For The Authors):

      A couple of minor points

      1) Was the human density and forest cover correlated? If so was this taken into account

      Human density and forest cover at selected scales were not found to be strongly correlated (Spearman’s rank values -0.38 and -0.45 within 5km and 20km buffer radii for human population density respectively).

      In selecting variables for inclusion in the final model, we examined variance inflation factors (VIF) to detect and minimise multicollinearity in the model. VIF measures the correlation and strength of correlation between independent predictors. VIF of each predictor variable was examined starting with a saturated model and sequentially excluding the variable with the highest VIF score from the model. Stepwise selection continued until the entire subset of explanatory variables in the global model satisfied a conservative threshold of VIF ≤6 (Rogerson, 2001), which ensures that the remaining variables included in the final model have minimal correlation. Spearman’s correlation matrices for all variables at all scales and final selected variables (below VIF threshold) are included in the Supplementary Information (Figure S13 and Figure S14).

      2) Reference (Speldewinde et al., 2019) is down as Davidson et al. in the reference list

      Thank you for the thoroughness in this review. There are two similar but separate references, both published in 2019 with the same co-authors, and the (Speldewinde et al, 2019) was incorrectly referenced. They should be (Davidson et al., 2019a) and Davidson et al., 2019b) respectively. This has now been corrected in the manuscript.

      Davidson, G., Chua, T.H., Cook, A. et al. Defining the ecological and evolutionary drivers of Plasmodium knowlesi transmission within a multi-scale framework. Malar J 18, 66 (2019). https://doi.org/10.1186/s12936-019-2693-2

      Davidson G, Chua TH, Cook A, Speldewinde P, Weinstein P. The Role of Ecological Linkage Mechanisms in Plasmodium knowlesi Transmission and Spread. Ecohealth. 2019;16(4):594-610. https://doi:10.1007/s10393-019-01395-6

      Reviewer #2 (Recommendations For The Authors):

      Line 143: "We hypothesise that higher prevalence of P. knowlesi in primate host species is driven by landscape change..." without specifying here the kind of landscape change (e.g. "forest degradation and fragmentation") it is virtually impossible to confirm or reject this hypothesis.

      We agree that the wording of the hypotheses needed to be more specific. We have edited lines 142 – 145 to specify forest fragmentation as our landscape variable of interest, and to more explicitly include the regional meta-analysis of P. knowlesi prevalence.

      Table 1 vs Table S11 discrepancy regarding spatial resolution of Forest cover and fragmentation variables. The original dataset resolution is 30m but I don't think one can compute a PARA index at 30 m since it really requires a polygon that is larger than the single value pixel. Table S11 indicates a 30 km gridcell with some postprocessing of the original datasets.

      We appreciate this being identified. The resolution refers to the input layer (tree canopy cover, 30m). PARA was calculated from the binary forest cover layer (30m resolution) within each buffer radii 5, 10 and 20km. We have edited both Table 1 and Table S11 to help clarify this.

      It would be very helpful if you provided justification for selecting specific metrics to represent the key landscape variables. How are these particular landscape variables relevant? Why not other land cover/land use components?

      We have now included a paragraph in the Supplementary Information (Appendix D) to explain the choice of environmental covariates. Elevation was chosen as an important proxy for vector distribution (but was not retained in model selection). Human population density was chosen as a measure of proximity to human settlement, rather than relying on qualitative assessment of rural/peri-urban/urban. Tree canopy cover and fragmentation indices are key determinants of primate habitat selection and of vector breeding habitat, and justification for the use of perimeter: area ratio is included in the methods section (section beginning line 462).

      I think the other issues present substantial weaknesses that you cannot address without redoing the study. I will list those below just for reference.

      1) If the forest is so dominant (which I would agree with based on my understanding of macaque ecology), how does it make sense to select completely random points (especially at the country or even state level) to represent landscape covariates? At a minimum, I would suggest getting random points within the forest or better yet forest edge habitat. But even then, I doubt that these points would be at all representative of the conditions of a specific study. The geospatial uncertainty is just too large. The dataset simply doesn't support the analysis that is attempted here.

      On the point of selecting from only within forest: forest is a dominant habitat, but Long-tailed macaques are anthropophilic and not exclusively found in forest (Stark et al., 2019), and a proportion of the more opportunistic and nuisance samples caught were found in areas more associated with human activity (Li et al., 2021). As such, random points only within forested areas is also unlikely to capture the true habitat of the primates sampled and selecting only from forested areas would bias the results.

      Whilst fully georeferenced samples would be the ideal scenario, the idea behind selecting random points from the sampling polygon is that for smaller areas (with higher spatial certainty), habitat would be more consistent between random points and lend more weight to the final model, whereas large polygons with high uncertainty are likely to vary and lend less weight to the final model. In response to these comments, we have further supported this by running regression models only on samples within a reasonable administrative boundary size and on samples within reasonable threshold of uncertainty (i.e., data points are removed if the deviation of environmental covariates across the 10 random points is so high that the sample is uninformative, or if datapoints can only be geolocated to country-level). In these sensitivity analyses, forest cover and species are retained as factors associated with higher malarial prevalence in non-human primates (Table S15S16).

      2) Hansen et al. dataset reflects "tree cover" - which is not the same as "forest cover" since it would also include plantations that are very widely distributed across Southeast Asia. If the animal use of plantations differs from that of natural forests, it will present a large issue for the study.

      In this analysis the feature of interest was habitat configuration (fragmentation) and deforestation (forest loss) rather than specific land class. We have defined forest as >50% canopy cover, which considers canopy density given historical forest loss and has precedence in other work (Fornace et al.,, 2016). In addition to importance to macaque ecology, forest (canopy) cover, forest loss and forest edge are noted to be key determinants of vector breeding and vector habitat (Byrne et al., 2021, Chua et al., 2019). For this reason, these are important variables to include in analyses. More specific landscape variables were explored, but the temporal and spatial range of the data precluded fine-scale land classification data. To investigate preliminary links to landscape configuration and habitat fragmentation at broad scales this is felt to be sufficient. We have also amended the manuscript to be more discerning with the use of ‘forest’ to avoid confusion throughout.

      3) Tree regrowth in the ecosystems of monsoonal Asia is very rapid. Based on the study description, tree regrowth was not accounted for in the study which could potentially lead to a very large underestimation of tree cover if only tree loss since 2000 was monitored. Again unless there is a reason to assume that macaques do not use young successional forests or use it at a highly reduced rate. Both of these points are acknowledged as limitations at the end of the discussion section but in my opinion they have a very strong impact on the study, making the results non-significant.

      This is an interesting suggestion. Macaques do forage in plantations and cultivated landscapes to supplement food, but preferentially roost and range in forest edges and interior forest, though ranging behaviour will be complex and vary across Southeast Asia. In this study the primary interest was in deforestation (forest loss) and fragmentation of old growth forested landscapes, which are key variables both for macaque ecology and for vector breeding sites. Therefore, it was felt that forest loss (transition from >50% canopy cover to <50% canopy cover since 2000) was sufficient to capture this. Ranging behaviour of individual animals and macaque troops would not be captured at this scale, and higher spatial and temporal resolution would be required to characterise relationships with tree regrowth and young plantations which is outside the scope of this study. In all regions, purposeful fine scale follow-up studies would be required to unpick fine scale relationships across a habitat gradient.

      I am not 100% sure I understand the geospatial design fully. The pieces are distributed between different subsections and it was challenging to string together the processing chain between subsections of the manuscript and the supplemental information. I would help to add a figure (a flowchart, perhaps?) to the supplemental section that walks through the entire geospatial covariates assembly. E.g.

      • GPS location create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer - I still don't understand the 30m or 30 km spatial resolution reference for forest and PARA in this context.

      This was an error in the table in the Supplementary Information and has been corrected – the forest cover raster has a resolution of 30m, and the perimeter: area ratio is calculated within 5, 10 and 20km buffers.

      • landscape covariates receive the full weight (1) in the model. - This is defensible even though not ideal

      This is equivalent, but we felt more intuitive, to sampling GPS points x10 and inputting with equal weights to the areal data.

      • No GPS location assign to the best identifiable administrative unit (country, state, or district) generate 10 random points within the administrative unit create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer landscape covariates from each point receive the proportional weight (0.1) in the model. I do not believe that this approach is representative of macaque habitat/macaque human interaction characterization.

      In other examples dealing with spatial uncertainty, the centroid is taken to be representative of an area. This method generates considerable bias and uncertainty – particularly if the uncertainty is not then accounted for by weighting subsequent models (Cheng, 2021). In this exploratory analysis, pseudo-sampling from 10 random sites generates a more realistic generalised environmental realisation than taking a centroid/random point. This was used as an exploratory analysis to explain broad regional trends in prevalence between, which can be used to guide further investigation on fine scale studies which are required to completely describe disease dynamics in specific macaque habitats.

      Thank you for this useful suggestion – we have taken this advise and added a flowchart of data processing to the Supplementary Information (Appendix D, Figure S8).

      Discussion:

      Based on information in Table S4, sampled NHPs were predominantly from human-dominated (peridomestic, agricultural, and urban) landscapes. In forested landscapes, only macaques that live in forest edge habitats were likely sampled in the first place just simply due to extreme challenges in getting to macaques in remote inaccessible areas. There is a very substantial spatial bias in sampling will undoubtedly reflect that fragmented habitat is a key landscape component impacting the prevalence of Pk in NHP, especially as the authors point out in the later part of the discussion, the critical vectors for transmission are also associated with forest edge habitats. High forest fragmentation is also linked to the presence/ increase in migrant human workers (logging or plantation activities) - a population also strongly associated with higher malaria prevalence for a variety of P spp (although I am not aware of studies that are specific to Pk malaria). However, the living conditions for migrant workers have frequently been implicated in higher rates of malaria transmission and thus those could, hypothetically, also contribute to Pk infection rates in NHP. Ultimately, the discussion appears to suggest that the biggest gap in our understanding is within vector ecology and understanding the NHP-vector-human dynamics within local landscape settings. It is an interesting finding. However, my overall conclusion would be that the sampling strategy (both for NHP and geospatial covariates) renders this study as "exploratory" at maximum and that all findings would need to be tested and verified through independent and more rigorously designed studies.

      Thank you to the reviewer for a comprehensive assessment. We would first like to highlight the regional meta-analysis, which was one of the main findings. This is a novel result for P. knowlesi literature; being the first demonstration of regional differences in prevalence that correlate to regional hotspots of human incidence, the force of infection from NHP may drive hotspots of P. knowlesi in human populations.

      We include a risk factor analysis that suggests a method for dealing with high spatial uncertainty, and an exploratory analysis that finds landscape complexity may be a contributory factor to broad regional heterogeneity. These associations are robust to sensitivity analysis where data with extreme variability in environmental variables is removed (Table S15-S16).

      Habitat descriptions in original studies are qualitative, likely subjective, and whilst there is likely to be an important sampling bias there was also evident differences in prevalence between the NHP sampled in different environments from the available data that we have further characterised. Risk factors for human P. knowlesi do include forest loss (reduction in canopy cover) within 5 years and within 2km, as well as contact with macaques and occupations in plantations (Fornace et al., 2014; Fornace et al., 2016). Reverse spillover from humans to NHP is an interesting suggestion, but outside the scope and scale of the study. Given known links of deforestation (forest loss) with human incidence of P. knowlesi and also with increased vector breeding sites (Byrne et al., 2021), this analysis explores whether deforestation is linked to prevalence in reservoir species thus contributing to the force of infection at broad scales.

    1. Author Response:

      We are sorry that both eLife and the Reviewers feel that our submitted studies are currently insufficient to support our hypothesis that loss of H2-O function affects thymic Treg selection. As this is the first study directly evaluating loss of H2-O in the thymus we do not feel that we overstated our finding as suggested by Reviewer 1. We hope that a revised version of the manuscript can satisfy the reviewers’ criticisms.

      -Reviewer 1 is asking us to address the presumed discrepancies between our previous work (Welsh et al 2020, https://doi.org/10.1371/journal.pbio.3000590) and data from Lee et al. 2021 (https://doi.org/10.4049/jimmunol.2100650) in this current manuscript, which does not report on the development of EAE in DO-KO and DO-WT mice. All experiments here are on naïve mice. As such, we wish to justify our lack of discussion of Lee et al (2021) findings.

      Lee et al (2021) reported the effects of DO on both EAE and SLE development, they used mainly H2-Oβ KO mice. As we have never used these CRISPR generated mice, we cannot have a direct in-house comparison. However, we did note that reported disease curve for female H2-Oβ KO mice had a similar trend indicating increased EAE disease development, similar to what we have reported back in our 2020 paper (Welsh et al PLoS Biology). In the single experiment that utilized H2-Oβ KO mice for EAE development, Lee et al found a different disease trend than ours. However, Lee et al (2021)’s tested only 4-5 mice per group in the single experiment and their measurement of the disease development solely relied on visual assessment of the limbs and tail functionality. Our study verified EAE disease development by multiple approached including analyses of MOG-specific tetramer staining of the CNS CD4 lymphocyte infiltrate, and in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody probe specific to MBP. We had repeated our experiments on the disease development greater than 15 times using 5-8 mice per group. Below is an excerpt from our Results Section of Welsh et al PLoS Biology, clearly explaining how many experiments were performed and the number of mice per group per experiment:

      “From these studies, we found that DO-KO mice had an accelerated onset of disease compared to DO-WT mice (Fig 7A). Disease symptoms (Score 1) appeared around Day 8–10 and quickly progressed to advanced disease (Score 3–4) by Day 14–16 in DO-KO. In contrast, DO-WT mice started showing symptoms around Day 12 and progressed to advanced disease scores by Day 20. Total cell infiltration into the CNS tissue was slightly higher in DO-KO mice, but no change in total brain weight was observed (S5 Fig). To further correlate the state of disease with CD4 infiltration, we performed in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody (Ab) probe specific to myelin basic protein (MBP). The Ab reacts with MBP only when the myelinated glia cells are damaged during disease development [56]. Thus, by detecting demyelination, whole-body imaging allowed us to fully visualize the co-localization of CD4 T cells at the sites of demyelination occurring in diseased mice. Interestingly, when mice of various disease scores were imaged, we found increased co-localization of infiltrating CD4 T cells with anti-MBP staining in DO-KO mice, but not in DO-WT mice (Fig 7B). These data not only confirmed the flow cytometric findings that diseased DO-KO mice have a greater influx of lymphocytes into their CNS tissue (S5 Fig), it also verified the massive demyelination that occurs during the disease”

      And again in the Legend to Figure 7;

      “Representative curves showing the time course of disease development in DO-KO (red) and DO-WT mice (white). N = 5 mice per group, representative of >15 repeat experiments. Score system: 0 = no symptoms, 1 = limp tail, 2 = limp tail + partial hind limb paralysis, 3 = limp tail + total hind limb paralysis, 4 = limp tail + total hind limb paralysis + partial forelimb paralysis. Data represented as mean ± SEM.”

      Despite clarity of the description of our experiments, Lee et al have publicly slandered us and grossly misrepresented our work by stating the following:

      “A recent study (11-Welsh et al) found that B6.Oa−/− mice were more susceptible to EAE than control B6J animals. However, that conclusion was based on a single experiment, in which control B6J mice developed very mild EAE disease with an average score of 1, which is far lower than the disease scores published by other groups (30–32) and also observed in our study. Thus, in this inducible model of autoimmunity, H2-O deficiency does not contribute to either disease development or severity.”

      -Another important variable between our studies and Lee et al (Lee et al 2021) was the use of a commercially available disease induction kit versus our immunization solutions that followed the established protocols by Nancy Ruddle et al (J Exp Med. 1997 Oct 20; 186(8): 1233–1240. doi: 10.1084/jem.186.8.1233). Notoriously, EAE disease development could vary widely based upon the quantities and purity of, a) MOG peptide, b) amount of tuberculosis antigen in the CFA, c) quantity of pertussis toxin and injection strategies, as well as many other uncontrollable factors. While a comparison these two results are irrelevant to our current study, we will be more than happy to compare our results from the previously published work with Lee et al. in the discussion.

      -We want to emphasize that we did follow Hogquists et al’s gating strategy for detecting auditing vs deleted thymocytes by subdividing total thymocytes into “Non-signaled” (TCR-β-, CD5-/inter) and “Signaled” (TCR-β+ CD5+/hi) populations before further gating on only medulla localized CD4 T cells. The “CCR7+ CD4+” label in Figure 1 was meant to orient the reader without overwhelming the figure with numerous flow plots. To address this concern, we will be including (1) updated Supplemental figures showing the complete gating strategy, (2) updated figure legends and text to emphasize the fact that auditing/deletion gating came from CD4 T cells which passed positive selection (i.e. TCR-β+ CD5+/hi), and (3) including representative flow plots for all Figure 1 panels to the revise manuscript.

      -Also, regarding “discrepancies between our data and Liljedahl et al 1998”;

      H2-O KO mice used by Liljedahl et al were on a 129/Ola genomic background. The H2-O KO mice used for both of our papers have been completely backcrossed to C57BL/6J. Clearly, non-MHC genes contribute to the impacts of MHC proteins, yet how the 129/Ola genomic background could affect the H2-O genes remains to be discovered. And (B), no data was shown supporting their published statement below:

      “The proportions of B cells as well as of CD4+ and CD8+ T cells in the lymph node, spleen, and thymus were similar in H2-Oa–deficient and wild-type mice (data not shown)”. (Liljedahl et al 1998).

      Reviewer 2:

      scRNA-Seq analysis was performed by the Computational Biology Computing Core at Johns Hopkins School of Medicine. We missed including this acknowledgement as our core facility does not request authorship or acknowledgements. The sentence has been edited for the correct terminology.

      -About truncated bar graph, in the entire paper we have only two bar graphs, neither of which is truncated. So, we are puzzled by the reviewer’s comment as to what figure he/she is referring to. -We would like to remind the Reviewer 2 that since DO works together with DM and functions differently on peptide of different sequences, the reported data on cumulative effects of DO in vivo have notoriously been rather minor. Especially, since our current study focuses on the naïve mice, major changes were not expected.

      -Regarding leaving out gating strategies, we missed out on providing the gating strategies for all the figure in the original version. However, full FACS gating strategies have now been provided in the new supplemental figures and representative FACS plots have been added to ALL main figures.

    1. Author Response

      We would like to express our gratitude to the reviewers for their insightful comments and suggestions on our manuscript. We appreciate the time and effort they have devoted to evaluating our work. In response to their valuable feedback, we will undertake a comprehensive revision of our manuscript to address their concerns and enhance the clarity of our findings.

      Reviewer #1 has raised the important point of the need for a more thorough exploration of how ELF3 promotes cell tolerance to DNA damage.

      Just as mentioned by the reviewer, we totally agreed that genomic instability is key to cell transformation. In the original manuscript, we proposed that ELF3 might be an important factor for cells to tolerate the lethal genomic instability caused by BRCA1 deficiency, keeping an “appropriate” level of genomic instability, thus fueling cell transformation. And we acknowledge the limitation that the mechanism of how ELF3 promotes cell to tolerate DNA damage remains further exploration. To address this, ELF3 overexpression and knockdown experiments in more BRCA1 wildtype or deficient breast cell lines are planned. In addition, since ELF3 is an inherent transcription factor, we suspect the function of ELF3 to promote cell tolerance to DNA damage is mediated by transcription, and more downstream genes of ELF3 will be explored as well.

      Regarding the concerns raised by Reviewer #2, we acknowledge that our manuscript may have contained gaps and limitations of the datasets used.

      We appreciate the reviewer's feedback regarding the limitations of our cell models and their representativeness of LP cells. While we have utilized MCF10A cells for the knockdown experiments, we understand that these may not be a perfect representation of LP cells. To address this concern, we will incorporate a discussion on the limitations of our cell models and their relevance to LP cells, along with potential plans in LP cells that may be included in future studies.

      We will also clarify the rationale for focusing on ELF3 and discuss the other genes identified in our analysis for completeness. Regarding to ELF3 functions in cells other than LP, in our analysis, ELF3 is highly expressed in LPs compared to other cell populations in mammary gland, making ELF3 a previously undefined LP gene. Thus, we suspect that ELF3 functions may be more significant in LP cells. We are also interested in ELF3 functions in cells other than LP cells and will further explore

      We agree that different pathogenic variants of BRCA1 may cause diverse impacts on its function and tumorigenesis. We will add detailed information and discussion about BRCA1 pathogenic variants of patients in our single-cell RNA-seq. Also, to enhance the overall clarity of our manuscript, we will revise the figure legends to include critical details that were previously omitted. This will ensure that readers can better evaluate the presented data.

    1. Author Response

      We appreciate the feedback from all the reviewers. We will incorporate their comments into the revised manuscript.

      In response to reviewer three's suggestion regarding complementary approaches for identifying rootlet components, we'd like to provide further insight into the strategies we explored.

      We performed mass spectrometry on our purified rootlets. This identified the rootlet components rootletin and CCDC102B and various axonemal components, due to the association between the rootlet and axoneme. However, due to the limitations in quantifying components using mass spectrometry, we were unable to confidently identify novel rootlet constituents present in quantities comparable to rootletin.

      We further attempted cross-linking mass spectrometry on the rootlets to gain deeper insights to the interactions between rootletin molecules. Unfortunately, this effort resulted in a completely insoluble sample despite extended digestion times, leading to issues with mass spectrometry column clogging and rendering our results inconclusive.

      We attempted to express rootlet components recombinantly and were able to purify fibres, but they did not contain the characteristic repeat pattern seen in native rootlets. We also considered purifying native rootlets from cultured cells, but realized the yield would be too low for cryo-ET studies.

      We therefore regret that other approaches to validate our model are outside the scope of this current work.

    1. Author Response

      1) The analysis of Shh deletion in mossy cells and influences of aging related NSC pool decline is not well connected with the rest of the study on the expression/requirement of Shh in mossy cells to regulate seizure-induced neurogenesis. To promote cohesion, the authors should examine/discuss what happens to mossy cells during aging - it is similar or different to what happens to mossy cell neuronal activity during seizures?

      We believe that both are similar mechanisms. Seizure induced neurogenesis increases NSC proliferation, which increases demand of Shh to increase self-renewal. Similarly, we assume that increased NSC decline in Shh cKO mice is due to the increased demand of Shh for self-renewal of NSC with aging. It has been shown that NSCs in young mice generally don’t self-renew and instead are consumed after one or two rounds of cell division. On the other hand, NSCs in old mice are known to undergo more rounds of cell division compared with younger mice. This suggests that NSCs may be more dependent on signals driving self-renewal in aged-mice. Our suggestion is that Shh from mossy cells contributes to minimising the NSC pool decline with aging, and therefore loss of Shh from mossy cells results in increased decline of the NSC pool in aged-Shh cKO mice. This aligns with our hypothesis that Shh from mossy cells contributes to maintenance of the NSC pool.

      What is the exact mechanism regulating the shift of proliferation capacity of NSC with aging remains unclear and would be an interesting topic for future studies. In addition, whether mossy cell neuronal activity is decreased with age or Shh release/expression is compromised in aged animals remains to be elucidated. Considering these factors together, the brain region(s) and other factors that regulate neuronal activity of mossy cell thereby controlling Shh release and how these are dysregulated in pathological conditions and in aging will be important studies for future research.

      2) Only male mice were analyzed in the seizure induction experiments, leaving open the possibility of sex differences since previous reports suggest sex differences in adult neurogenesis.

      Seizure induced neurogenesis was observed in both male and female mice. Considering that, we assumed that mossy cell derived Shh regulates seizure induced neurogenesis also in female mice. However, we agree with the reviewers’ comments. We can not exclude the possibility that female mice reacts to KA or seizures differently from male mice, or that Shh from mossy cells might have distinct effects in female mice in that paradigm. It is also an interesting possibility that female specific behaviors may affect mossy cell activation and also regulate neurogenesis though Shh. Because these are large and unresolved questions, we elected to leave potential sex difference in mossy cell regulated neurogenesis for future research.

      3) Several control groups are missing:

      -For seizure induction: missing vehicle (instead of no KA treatment).

      -For TAM induction: missing corn oil only to check leakiness and specificity of transgene.

      -For DREADD experiment: missing vehicle (to control for hM3 non-specific effects)

      About missing vehicles in KA treatments, we used saline (0.9% NaCl) as a vehicle for Kainic acid, which is commonly used as a vehicle for water soluable reagents in adult neurogenesis experiments. In addition, the average volume of KA solution that mice received intrapenitorially for seizure induction was less than 500ul, which is less than recommended maximum volume in NIH and UCSF. We have not tested if the saline injection makes a difference in our experiments but based on previous reports using saline, we believe that saline would not affect our experimental results.

      About Tamoxifen injections, the Gli1-CreER mice have been widely used for fate tracing analysis including in our previous research where Gli1-CreER mice have shown specific recombination in Gli1-expressing NSCs. Our results in this study have shown consistently that Gli1-CreER;;Ai14 mice label NSCs in the dentate gyrus. Given this, we believe that our result using Gli1-CreER line are not affected by non-specific recombination without tamoxifen.

      About Clozapine (CZL) injection, we decided to administer CLZ in both control and DREADD animals considering the possible side-effects of CLZ. We agree with the reviewer that our experiment cannot exclude the possibility that expression of hM3Dq affects neurogenesis without CLZ or CNO. However, although we have not included the analysis using saline as a control in our experiments, we have tested that both transgenic and virus-injected mice DREADD expressing mice respond to CLZ and activate neuronal activity of mossy cells compared with control animals. Therefore, we believe that it does not affect the interpretation of our data that mossy cell neuronal activity controls neurogenesis.

      We appreciate reviewers' carefully considered comments and we will apply suggested controls to our future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive feedback and very helpful comments. We agree that this manuscript focuses primarily on functional outcomes and phenotypes. The studies were designed to address an important clinical question, i.e., repurposing dantrolene for the treatment of ventricular tachyarrhythmias and the prevention of sudden cardiac arrest. Thus, the current manuscript emphasizes in vivo studies over in vitro studies.

      However, we also acknowledge the need for additional mechanistic studies. We are in the final stages of submitting a second manuscript in which we dissect the underlying mechanisms through detailed in vitro studies of mitochondrial antioxidant capacity, reactive oxygen species, phosphorylation of ryanodine receptors, autonomic dysfunction, beta-adrenergic signaling, etc. that are beyond the scope of the current manuscript.

      Additionally, a third manuscript in progress focuses on the mechanistic link between ion channels, dispersion of repolarization, and sudden cardiac death. We previously reported the preliminary results in abstract form (Circulation Research. 2019;125:A102). Briefly, current-voltage relationships from patch clamp studies of isolated LV myocytes revealed that pressure-overload stress strongly reduced K currents, including IK1, IKs, and IKr. These changes were driven by downregulation of K channels and their components at the mRNA level. As expected, the reduced K currents destabilized the resting membrane potential, especially in phases II and II of the cardiac action potential, and reduced repolarization reserve. Scavenging mitochondrial ROS stabilized repolarization, suggesting mROS is the upstream driver of K channel downregulation. However, we have not specifically tested whether dantrolene stabilizes repolarization via the same mechanism. As such, we agree that "lability" or "dispersion" are more precise terms than "reserve" for the phenomenon reported in the present manuscript, and we have made these changes. Thank you for pointing this out. We have also changed the title accordingly.

      The present study investigates the effect of dantrolene on male animals. We agree that we need to evaluate the effect on females, especially because females have historically been underrepresented in studies of sudden cardiac arrest. Based on our preliminary studies, female animals exhibit increased variability in their phenotypic response to pressure-overloaded stress. Given the importance of this issue, we will examine the sex differences in carefully controlled future experiments, including the effect of dantrolene in females controlled for hormonal effects (e.g., with and without oophorectomy).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

      Thank you for your input into our work. Your comments have been very helpful in enhancing our work.

      Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

      Thank you for your thorough analysis of our work.

      Reviewer #1 (Recommendations For The Authors):

      1) I suggest the authors to remove one copy of the sentence "It should be noted that CD4-CreAcc1fl/fl mice lack ACC expression in both conventional CD4+ T cells and iNKT cells." in Lines 421-423.

      We have removed the redundant sentence originally shown in LINES 421-423. Thank you for pointing this out.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a very strong study with few concerns.

      1) Are there tissue specific differences in the iNKT cell populations? The authors examined lung iNKT cells in the Figs 1-3, and used liver NKT cells for the mechanistic studies in Fig 4-5. The studies shown in Fig S2 suggest that ACC1 deficient iNKT cells have developmental defects and impaired homeostatic proliferative capacity. Does ACC1 impact lung and liver iNKT cells similarly and is the lack of allergic asthma in ACC1 deficient iNKT cells due to defective iNKT cell trafficking to the lungs or a failure to survive after transfer (Fig 3)?

      2) Similarly, are chemokine receptor expression patterns similar between WT and ACC1 deficient iNKTs (Fig 4)?

      3) The authors data suggest that Tregs are not playing a major role in the regulation of asthma induction in their ACC1 deficient mice, based on FoxP3 expression. Did the authors perform suppressor assays to show that the Tregs function similarly in WT and ACC1 deficient mice?

      In the revised manuscript, the authors addressed my major concerns.

      Thank you for your previous comments. They were very helpful in upgrading our scientific work here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate very much the comments and suggestions on our manuscript "Cylicins are a structural component of the sperm calyx being indispensable for male fertility in mice and human". According to the comments, we performed a series of further experiments, re-worded and re-wrote several paragraphs and re-structured the manuscript according to the reviewers’ comment. We think that the manuscript is now improved and are looking forward to the further evaluations. We provide a point by point response to all comments and have prepared a version.

      Recommendations for the authors:

      Editor’s comment:

      1) As pointed out by all three reviewers, it is critical to show the specificity of the antibodies used. The authors should clarify how the specificity of antibodies is tested. Western blot analysis to show the absence of the protein in the knockout is essential.

      As suggested by all reviewers, we additionally performed Western Blot analysis on cytoskeletal protein fractions to further verify the specificity of generated antibodies and the generation of functional knockout alleles. Results nicely confirm the results of the IF staining, however, both anti-bodies detected the bands lower than the predicted molecular weight. In addition, Mass Spectrometry was performed to search for the presence of peptides in the cytoskeletal protein fractions. The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested. The section reads now as follows:

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings (IHC), showing a specific signal in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      2) Re-structuring/streamlining of the figures is recommended. Please consider the flow suggested by reviewer #2 and shorten the evolutionary analysis which takes up more space than it adds to the value of the story.

      We thank the reviewers and editor for the valuable suggestion. We re-structured the figures as suggested and rewrote the results section accordingly. The evolutionary analysis was significantly shortened.

      3) Provide statistics for the imaging analysis such as TEM as only a single representative image is shown.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – supplement 1). Furthermore, we quantified the manchette length of step 10-13 spermatids to prove the increased elongation of the manchette in Cylc2-/- and Cylc1-/y Cylc2-/- spermatids (Fig. 5 A-B).

      4) Please consider other points raised by the reviewers below to improve the manuscript and provide responses on how the authors have addressed them.

      We thank all reviewers for the detailed review of our manuscript and their valuable suggestions, which helped a lot to improve the manuscript. We considered all points raised by the reviewers to the best of our knowledge and hope that the reviewers will approve the manuscript ready for publication. We added a point-by-point discussion of all comments/suggestions hereafter.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Antibody specificity: Fig 1E - there are some unspecific binding in Cylc2-/- for CYLC2 and in Cylc1/y Cylc2+/- for CYLC1 in the testis (and elongating spermatids in Figure 1 – Supplement 4). Could authors elaborate/comment on this? Western blot analysis would be also helpful to further support the antibody specificity.

      The very weak unspecific staining in the testis for CYLC2 (in Cylc2-/-) and CYLC1 (in Cylc1-/y Cylc2+/-) is only present in the lumen of the seminiferous tubules and/or the residual bodies of the testicular sperm cells and can be referred to as background signal. Importantly, the signal is entirely lost in the PT region, proving specificity of the generated antibodies. We added the following paragraph to the results section:

      Line 124-127: The generated antibodies did not stain testicular tissue and mature sperm of Cylc1- and Cylc2-deficient males, except for a very weak unspecific background staining in the lumen of seminiferous tubules and the residual bodies of testicular sperm (Fig. 1 F).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining. No unspecific bands were detected in the Western Blot, further supporting the notion that the weak unspecific signals in IF resemble staining artifacts.

      The paragraph reads now as follows:

      Line 127-132: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-.

      (2) Please provide more interpretation of the gene dosage effect of Cylicin 2. It is not common to see a gene dosage effect in the sperm phenotype as transcripts and proteins can be shared between haploids due to syncytium formation during spermatogenesis.

      We agree and we apologize for the misinterpretation. In Cylc2+/- mice expression of Cylc2 was reduced by half but there was no altered phenotype observed. The sentence now reads as follows:

      Line 112: In Cylc2+/- animals expression of Cylc2 was reduced by 50 %.

      (3) Line 194-196 - the authors say that the sperm are smaller, with shorter hooks and increased circularity of the nuclei, and reduced elongation. Are these statistically significant? There seems to be a high variation in the graph in S2D and the statistical analysis is not given.

      We agree, performed statistical analyses, and highlighted significantly altered values for sperm head elongation and circularity in Figure 2 – Supplement 3.

      (4) Line 153-164 It is interesting that the absence of Cylc2 affected many parts of sperm structure. I think some ratios of sperm always have a morphological defect in diverse ways, so it is hard to confirm the finding only with a single sperm image. I think that it will be important to do some statistical analysis or at the minimum show more TEM images from TEM.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – Supplement 1).

      (5) Line 236-242 - I believe that the phenotype described applies to the sperm from Cylc2-/- and Cylc1/y Cylc2-/- animals; however, I think that the Cylc1-/y Cylc2+/- has a more subtle, intermediate phenotype between the WT and the genotypes missing both Cylc-/- alleles.

      We agree and we added a quantification of manchette length at step 10-13 to visualize the differences between the genotypes. The section reads now as follows: Line 268-272: Manchette length was measured starting from step 10 until step 13 spermatids and the mean was obtained, showing that the average manchette length was 76-80 nm in wildtype, Cylc1-/Y and Cylc2+/- while for Cylc2-/- and Cylc1-/Y Cylc2-/- spermatids mean manchette length reached 100 nm (Fig. 5 B). Cylc1-/Y Cylc2+/- spermatids displayed an intermediate phenotype with a mean manchette length of 86 nm.

      (6) Since CYLC1 staining is absent in Fig 5B, does that mean that the mutation resulted in protein degradation/instability? Is RNA present? Additional biochemical studies of Cyclins demonstrating the deleterious nature of the mutations would strengthen the molecular pathogenesis of the human mutations.

      Thank you for raising these important questions. The CYLC1 variant c.1720G>C is predicted to cause the amino acid substitution p.(Glu574Gln). It is, thus, conceivable that the RNA is present but either the protein is degraded or misfolded and, therefore, not detectable by IF. Unfortunately, for personal reasons of the patient, it is currently not possible to receive additional semen samples, preventing additional analyses of the semen, e.g. analysis of Cylicin transcript level.

      (7) Strongly suggest shortening the evolutionary analysis - all the corresponding materials are in supplemental while texts are extensive- or even consider entirely omitting. It does not add a lot to the current study.

      We agree that the evolutionary analysis was very detailed. However, we think that the results are important to understand the role of Cylicins for male reproduction in general. The results obtained from the mouse model might be transferable to other species, including humans. Further, the results present a possible explanation for the subfertility of Cylc1-deficient mice, in contrast to infertility of Cylc2-deficient males. We shortened the section, the paragraph reads as follows:

      Line 287-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6).

      Minor comments:

      (1) Line 114, 115, 118 à Figure 1D is already well-described in the previous paragraph and thus redundant. Ref Fig 1D, E; but only figure E shows IF. Maybe supposed to be E and F or just 1E?

      We apologize for the mix-up with the subfigures. The mentioned paragraph refers to Fig. 1 E-F, which was corrected accordingly.

      Line 117-123: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E). The signal was first detectable in the subacrosomal region as a cap-like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3). As the spermatids elongate, CYLC1 and CYLC2 move across the PT towards the caudal part of the cell (Figure 1 – supplement 4). At later steps of spermiogenesis, the localization in the subacrosomal part of the PT faded, while it intensified in the postacrosomal calyx region (Fig. 1 E-F).

      (2) Figure 1F - Arguably, IF images show expression of both CYLC1 and CYLC2 to reach/include the acrosome/hook portion of the sperm head, but the diagram does not reflect that. Why is that?

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      (3) Line 124 - PAS staining mentioned on line 124, is not explained (Periodic acid Schiff staining) until line 605

      We agree and introduced the abbreviation accordingly. The PAS staining was moved to Fig. 4. The paragraph reads now as follows:

      Line 220-222: To study the origin of observed structural sperm defects, spermiogenesis of Cylicin deficient males was analyzed in detail. PNA lectin staining and Periodic Acid Schiff (PAS) staining of testicular tissue sections were performed to investigate acrosome biogenesis.

      (4) Some figures are hard to read due to being very small (S1B, 3F).

      We agree and we increased the figure size. For former Figure 3F (now figure 4A), insets with higher magnification of representative sperm were added. Insets are additionally shown in Figure 4 – Supplement 1 in higher resolution.

      (5) Line 139 Please specify whether the sperm was capacitated or not.

      Analysis of the flagellar beat was performed with non-capacitated sperm. We clarified this in the main text:

      Line 203: The SpermQ software was used to analyze the flagellar beat of non-capacitated Cylc2-/- sperm in detail 22.

      As described in the Material and Methods section, sperm were only activated in TYH medium, prior to analysis:

      Line 732-733: Sperm samples were diluted in TYH buffer shortly before insertion of the suspension into the observation chamber.

      (6) Line 142-145; The sentence is interrupted strangely, perhaps the authors meant to write: "Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high-frequency beating occurs at the flagellar tip"

      We corrected the sentence accordingly.

      Line 206-208: Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high frequency beating occurs at the flagellar tip (Fig. 3 C, Video 1, Video 2).

      (7) Line 142 -Wrong Figure number. Figure S4A is a phylogenic analysis.

      We regret the mix up and corrected the Figure reference accordingly. Line 204-205: Cylc2-/- sperm showed stiffness in the neck and a reduced amplitude of the initial flagellar beat, as well as reduced average curvature of the flagellum during a single beat (Figure 3 – supplement 2).

      (8) L146-147 Better placed in Discussion.

      We agree, and we omitted this sentence from the results part.

      (9) Line 154-156 - The white arrowheads are present in both WT and KO sperm, thus it appears they denote the basal plate, not necessarily the dislocation/parallel position as the current text seems to suggest. Furthermore, the position of the WT and KO sperm is somewhat different with the tail coiling differently, so it is hard to see whether the two are comparable.

      We agree and we removed the white arrowhead in the WT sperm picture, and it now depicts only the dislocation of the basal plate in the Cylc2-/- sperm. Due to the morphological anomalies of Cylc2-/- sperm cells, it’s difficult to determine the exact angle of the depicted cell. However, we added more TEM pictures of the sperm cells (3 for WT and 6 for Cylc2-/-) in Figure 3 – Supplement 1.

      (10) Line 164 Please describe in detail what mitochondrial damage the readers expect to see from the TEM image.

      We evaluated the observed mitochondrial damage in more detail. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation, and we deleted this section in the manuscript.

      (12) Figure S2A - no WT comparison, difficult to compare without it (mitochondria in Cylc2-/-)

      See (10). We evaluated the observed mitochondrial damage in more detail and in comparison to WT. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation and we deleted this section in the manuscript.

      (13) Line 172-173 - Fig 3C denotes quantification of abnormal acrosome only, however, the text mentions sperm coiled tail being quantified within this graph - which is it? Is it both of them? Or only one of them?

      Figure 3 C (now Figure 2G) showed the percentage of abnormal sperm in general comprising acrosomal as well as flagellar defects. We modified the figure and evaluated acrosomal defects and tail defects separately. The results section was changed accordingly and reads now as follows:

      Line 152-159: Loss of Cylc1 alone caused malformations of the acrosome in around 38% of mature sperm, while their flagellum appeared unaltered and properly connected to the head. Cylc2+/- males showed normal sperm tail morphology with around 30% of acrosome malformations. Cylc2-/- mature sperm cells displayed morphological alterations of head and mid-piece (Fig. 2 F-G). 76% of Cylc2-/- sperm cells showed acrosome malformations, bending of the neck region, and/or coiling of the flagellum, occasionally resulting in its wrapping around the sperm head in 80% of sperm (Fig. 2 F). While 70% of Cylc1-/Y Cylc2+/- sperm showed these morphological alterations, around 92% of Cylc1-/YCylc2-/- sperm presented with coiled tail and abnormal acrosome (Fig. 2 F-G).

      (14) Fig 3D - CCIN in the text, cylicin in the figure - this should be consistent. Furthermore, since only the head is being shown, is CCIN ever detected in the WT sperm tail?

      We apologize for the inconsistency, and we added the abbreviation “CCIN” to the figure. CCIN is solely detectable in the sperm head of wildtype sperm as published previously. Irregular staining patterns showing signals in the tail region are only observed upon Cylicin deficiency.

      (15) Line 199-200 - To say that head of Cylc2-deficient sperm appears less concave seems redundant, likely the observed increased circularity is contributed to by sperm head being less concave in this region; unless there is an extra point that the authors are trying to make and if so, this needs to be elaborated on

      We agree and we deleted the sentence from the manuscript.

      (16) Figure legend of Fig S3 is wrong. Only S3A and S3B are present, and in the figure legend S3C corresponds to figure S3B.

      We agree and corrected the Figure legends accordingly. Due to the re-structuring of the manuscript, Figures and Supplementary figures were re-ordered as well.

      (17) Figure 4B - figure legend and/or text should specify that lectin is green and HOOK1 is in red

      We specified the figure legend as well as the main text accordingly: Line: 279-281: Co-staining of the spermatids with antibodies against PNA lectin (green) and HOOK1 (red) revealed that abnormal manchette elongation and acrosome anomalies simultaneously occurred in elongating spermatids of Cylc2-/- male mice (Fig. 5 C).

      Line: 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (18) Line 261-263 - It is difficult to see what is going on with microtubules in these images, as the resolution is low

      We increased the pictures and improved their quality. Microtubules are also depicted with letter ‘m’

      (19) Line 265-266 - It seems that there is a persistence of manchette, rather than elongation. From these images, I cannot see gaps, and I am not sure where to look for them. This needs to be labelled further and higher-resolution images could be included for clarity.

      We agree, although we observed both excessive elongation and persistence of the manchette. The average length of the manchette is now shown in figure 5B.

      The paragraph now reads as follows:

      Line 235-239: Microtubules appeared longer on one side of the nucleus than on the other, displacing the acrosome to the side and creating a gap in the PT (Fig. 4 C). Whereas elongated spermatids at step 14-15 in wildtype sperm already disassembled their manchette and the PT appeared as a unique structure that compactly surrounds nucleus, in Cylc2-/- spermatids, remaining microtubules failed to disassemble.

      The gaps in the perinuclear theca are better visible in TEM micrographs and the description is now in the paragraph describing TEM.

      (20) Line 269 Please include the information of "White arrowhead".

      We added the information accordingly.

      Line 240-242: In addition, at step 16, the calyx was absent, and an excess of cytoplasm surrounded the nucleus and flagellum (Fig. 4 C, white arrowhead).

      (21) Line 276-280 This should be placed in the Discussion.

      We agree, and we deleted this concluding remark from the results section.

      (22) Is Cylc1 and/or Cylc2 conserved/expressed amongst species other than rodents and primates?

      Yes, Cylc1 and Cylc2 homologs were identified in C. elegans for example. We added a schematic to the introduction showing the protein structure of human, mouse and C. elegans CYLC1 and CYLC2 (Figure 1 – supplement 1).

      The section reads now as follows:

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1- supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysine-glutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices 14. Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1-supplement 1).

      (23) The whole chapter "Cylc2 coding sequence is slightly more conserved among species than Cylc1" references only supplemental figures/tables. I find this unusual.

      We agree, and in order to show the results of the evolutionary analysis more clearly, we moved the panel to main Figure 6.

      Line 286-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6 A). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6 B).

      (24) Line 332 - CYCL2 should be CYLC2

      We corrected the typo accordingly.

      (25) Line 340 The ratio of head defects is different from Table 1 (98% here and 99 % in the table). Please check this information.

      We apologize for the inconsistency. We checked the raw data and corrected the table accordingly.

      (26) Line 344-345 From figure 5C it is difficult to determine whether the sperm are "headless" or whether the heads are attached to the highly coiled tails next to them

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. Furthermore, we added an arrowhead to figure 6C to highlight headless sperm. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      (27) L367-368 I agree with the authors' logic of this sentence. Although, it is better to show the co-localization of proteins using multi-channel immunocytochemistry. As you mentioned on L369 this will make your finding more obvious. If it is available, please include the colocalization images of the proteins.

      We performed the multi-channel staining against Cylicin1 and Calicin, as well as Cylicin2 and Calicin on mouse epipidymal sperm and it’s shown in Figure 2 – supplement 4. Unfortunately, we did not manage to obtain stainings of tissue sections since antibodies against Cylicins and Calicin require different sample processing.

      The sentence was added in the section describing calyx integrity:

      Line 168-169: In epididymal sperm, CCIN co-localizes with both CYLC1 and CYLC2 in the calyx (Figure 2 – supplement 4).

      (28) Line 376 Please keep the abbreviation. "Calicin" "CCIN".

      We included the abbreviation accordingly.

      Line 377-378: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins.

      (29) Line 377-378 "Based on ~". The authors did not prove the interaction between CCIN and Cylicins in this article. The mislocalization of CCIN might be resulted in the loss of Cylicins, without any "interaction". To reach this conclusion, a more direct result should be provided.

      We agree that we overinterpreted this as we and others did not prove the interaction between CCIN and Cylicins so far. We therefore weakened this statement and formulated it as a hypothesis.

      Line 377-381: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins. Zhang et al. found CYLC1 to be among proteins enriched in PT fraction 7. Based on their speculation that CCIN is the main organizer of the PT, we hypothesize that both CCIN and Cylicins might interact, either directly or in a complex with other proteins, in order to provide the ‘molecular glue’ necessary for the acrosome anchoring.

      (30) Line 499 Please specify which is the target of the immunostaining on the Figure legend. (Tubulin à acetylated α-tubulin)

      We specified that α-Tubulin was stained. The figure legend reads now as follow: Line 555-557: Immunofluorescence staining of α-Tubulin to visualize manchette structure in squash testis samples of WT, Cylc1-/y, Cylc2+/-, Cylc2-/-, Cylc1 -/y Cylc2+/- and Cylc1-/y Cylc2-/- mice.

      (31) Line 502 Please specify which color indicates which target protein (not only cellular structure).

      Line 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (32) Line 509 Please include scale bar information in the figure legend like Figure 4 (The magnifications of Figure 5 B, C, and D seem different).

      We included the scale bar information accordingly (now Figure 6).

      Line 575-588: Figure 6: Cylicins are required for human male fertility

      (A) Pedigree of patient M2270. His father (M2270_F) is carrier of the heterozygous CYLC2 variant c.551G>A and his mother (M2270_M) carries the X-linked CYLC1 variant c.1720G>C in a heterozygous state. Asterisks (*) indicate the location of the variants in CYLC1 and CYLC2 within the electropherograms.

      (B) Immunofluorescence staining of CYLC1 in spermatozoa from healthy donor and patient M2270. In donor’s sperm cells CYLC1 localizes in the calyx, while patient’s sperm cells are completely missing the signal. Scale bar: 5 µm.

      (C) Bright field images of the spermatozoa from healthy donor and M2270 show visible head and tail anomalies, coiling of the flagellum as well as headless spermatozoa who carry cytoplasmatic residues without nuclei. Heads were counterstained with DAPI. Scale bar: 5 µm.

      (D-E) Quantification of flagellum integrity (D) and headless sperm (E) in the semen of patient M2270 and a helathy donor.

      (F-G) Immunofluorescence staining of CCIN (F) and PLCz (G) in sperm cells of patient M2270 and a healthy donor. Nuclei were counterstained with DAPI. Scale bar: 3 µm.

      (33) S2A is not clear. Please describe specifically what the left panel and right panel are about to show with a clear indication of what is PM, mitochondria, etc. On the right, in only one cross-section that shows both mitochondria and the 9+2 axoneme, they look both unaltered whereas on the left, there are unpacked, not aligned mitochondria but the tail boundary is not clear to grasp at first sight.

      We apologize for the bad quality of the TEM pictures showing the axonemes and the missing labeling. We recorded and included new images showing an intact 9+2 microtubular structure in Cylc2-/-. Furthermore, we added an image for the wildtype control.

      (34) S2D: colors of the last three plots of each graph are too close to tell apart

      We agree and changed the color scheme for better visualization.

      Reviewer #2 (Recommendations For The Authors):

      However, I find the manuscript a bit messy, and I will propose to reorganize the figures: following figure 1, showing the reproductive phenotype, I would continue with a figure showing the morphology of sperm in optical microscopy and showing the morphological defect of the nucleus (Fig 3B and 3E), followed with one figure focusing on the flagellum, with images obtained with optical and electronic microscopies, allowing to present the abnormalities of the flagellum and finally the impact on sperm motility and flagellum beating (mix of figure 2FG/3A); next, one figure focusing on acrosome. After that, I would present all data concerning spermiogenesis, starting with figure 2C.

      We thank the reviewer for the valuable suggestion, which helps a lot to improve the structure and comprehensibility of the manuscript. We re-organized the figures and the results section accordingly.

      Major remarks

      1) Line 111. The specificity of raised Ab is not clear. Please specify if antibodies are specific: what immune-decorates anti-CYLC1: only CYLC1 or CYLC1 and CYLC2. Same question for anti-CYLC2

      Both antibodies were raised against specific peptides of the CYLC1 or CYLC2 protein, respectively. The antigen peptides used for immunization are depicted in the Material and Methods section (AESRKSKNDERRKTLKIKFRGK and KDAKKEGKKKGKRESRKKR peptides for CYLC1; KSVGTHKSLASEKTKKEVK and ESGGEKAGSKKEAKDDKKDA for CYLC2). The peptides used for immunization are specific as they do not resemble the highly conserved and repetitive KKD/KKE motives present in both, Cylc1 and Cylc2.

      The specificity of raised antibodies was validated by IF staining of wildype and Cylicin-deficient testis sections. The results clearly show, that CYLC1 signal is absent in Cylc1-deficient spermatids and CYLC2 signal being absent in Cylc2 deficient spermatids.

      Specificity of antibodies was additionally proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested (Figure 1 - supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      2) Line 115 and figure 1D. From the images presented in figure 1D, it is not clear where CYLC1 and CYLC2 are localized in the round and in elongated spermatids. Please make double staining using a second Ab to identify the acrosome such as DPY19L2 (best option) or SP56 and the manchette such as acetylated alpha-tubulin.

      We agree, and we added a double staining of CYLC1/CYLC2 and SP56 to the supplement (Figure 1 - supplement 3), showing co-localization of the developing acrosome and Cylicins. Manchette staining was not performed due to antibodies being available in same species as those against Cylicins (anti-rabbit).

      Line 117-120: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E, Figure 1 – supplement 3). The signal was first detectable in the subacrosomal region as a cap like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3).

      3) Line 118 and figure 1. The drawing showing the localization of Cylicin in mature sperm does not fit with the experimental data. Cylicins are located on the whole ventral face of the sperm.

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      4) Figure 1: Change "expression of Cylicin" to "localization of cylicin" (green)

      We changed the legend accordingly.

      5) Line 124 and figure 2C. In the figure provided, the PAS staining seems defective. The acrosomes do not seem stained (in pink as expected for a PAS staining). It may be due to the low quality of the pdf file, nevertheless, it is important to provide in supplementary data, an enlargement of the spermatid region showing the staining of the acrosome.

      We apologize for the bad quality of the PDF file and the low magnification. We restructured the subfigure showing PAS stained spermatids at different steps of spermiogenesis at higher magnification. According to the initial reviewer’s suggestion, the PAS staining was moved to figure 4. The PAS staining in figure 2 was replaced by HE-stained overview testis sections in Figure 3 – supplement 1 showing intact spermatogenesis in all genotypes.

      6) Line 130. Please indicate a reference for the lower limit of 58%. If this lower limit corresponds to human sperm, it should be omitted.

      Indeed, the given reference limit of 58% is only valid for human sperm samples. Therefore, we omitted the reference limit. The paragraph reads now as follows: Line 144-146: Eosin-Nigrosin staining revealed that the viability of epididymal sperm from all genotypes was not severely affected (Fig. 2 D, Figure 2 – supplement 2).

      7) line 152 Sperm morphology. Before showing the ultrastructure of the sperm, it would be important to show sperm morphology observed by optical microscopy. Therefore, I recommend including figure S2 as a principal figure, with a mix of Figures 3B and 3E.

      We thank the reviewer for the suggestion. The results section was re-structured accordingly, first showing results of optical microscopy (Fig. 2), followed by an in-depth ultrastructural investigation of morphological defects and their effects on sperm motility. Brightfield images of epididymal sperm were moved from former Figure S2 to main Figure 2.

      8) Line 164. figure S2A, showing that the 9+2 pattern is normal in KO sperm, is not convincing. Enlargement is required to conclude that the axoneme structure is normal; from the pictures, it rather seems that some doublets are missing.

      We apologize for the bad quality of the TEM pictures showing the axonemes. We recorded and included new images showing an intact 9+2 microtubular structure.

      9) Line 196. I would suggest removing the term "mild globozoospermia". Globozoospermia is rather complete (100% of round sperm heads) or incomplete (<90 % of round sperm heads). The anomalies observed on sperm heads, sperm motility, and the decrease in sperm concentration are rather suggestive of an OAT.

      We agree and we omitted the term “mild globozoospermia”. Instead, we added a concluding remark to the section, summarizing the described defects as OAT syndrome. The section reads now as follows:

      Line 215-217: Taken together, observed anomalies of sperm heads, impaired sperm motility, and the decrease in epididymal sperm concentration show that Cylc deficiency results in a severe OAT phenotype (Oligo-Astheno-Teratozoospermia-syndrome) described in human.

      10) Line 248. It is not clear from the data of figure 4B that "the developing acrosome lost its compact adherence to the nuclear envelope". From this figure, only defective morphologies of the acrosome are observed

      We agree and we omitted the sentence. Furthermore, it does not add additional information to the manuscript, since defects in the attachment of the acrosome to the nuclear envelope have been described in detail in Figure 4C.

      11) line 260-264. Manchette defects appear at stages 9-10. At this stage, the HTCA is already attached to the nucleus of the spermatid. see for instance figure 2 from Shang Y, Zhu F, Wang L, Ouyang YC, Dong MZ, Liu C, Zhao H, Cui X, Ma D, Zhang Z, Yang X, Guo Y, Liu F, Yuan L, Gao F, Guo X, Sun QY, Cao Y, Li W. Essential role for SUN5 in anchoring sperm head to the tail. Elife. 2017 Sep 25;6:e28199. doi: 10.7554/eLife.28199 . Therefore, the hypothesis that "abnormal attachment of the developing flagellum to the basal plate and consequently flipping of the head and coiling of the tail in mature spermatozoa" is unlikely and I suggest modifying this paragraph. In the HOOK paper, the manchette defects occurred earlier.

      We read the suggested literature and we agree to this reviewer’s comment. Manchette defects that we observe appear at later stages and are probably not responsible for the morphological anomalies of the mature sperm. We also re-analyzed all the manchette staining pictures and didn’t find any defects at earlier stages, so we decided to delete the sentence from the manuscript.

      12) Line 344. Please indicate a percentage of headless spermatozoa. Many sperm is too vague.

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      13) Any data concerning the success of ICSI for this patient?

      Yes, the outcome of the ICSI were added to the main text. Line 309-311: The couple underwent one ICSI procedure which resulted in 17 fertilized oocytes out of 18 retrieved. Three cryo-single embryo transfers were performed in spontaneous cycles, but no pregnancy was achieved.

      14) Finally, it would be interesting to study the localization of PLCzeta in this model, since its localization in the perinuclear theca has been clearly shown (Escoffier et al, 2015 doi:10.1093/molehr/gau098 )

      We thank the reviewer for the valuable suggestion and performed PLCzeta staining on human sperm, clearly showing an irregular PT staining pattern in sperm of patient M2270 compared to healthy control sperm. Of note, staining was not possible in the mouse due to the antibody being reactive only for human samples.

      The section reads as follows:

      Line 343-349: Testis specific phospholipase C zeta 1 (PLCζ1) is localized in the postacrosomal region of PT in mammalian sperm (Yoon and Fissore, 2007) and has a role in generating calcium (Ca²⁺) oscillations that are necessary for oocyte activation (Yoon, 2008). Staining of healthy donor’s spermatozoa showed a previously described localization of PLCζ1 in the calyx, while sperm from M2270 patient presents signal irregularly through the PT surrounding sperm heads (Fig. 7 G). These results suggest that Cylicin deficiency can cause severe disruption of PT in human sperm as well, causing male infertility.

      Reviewer #3 (Recommendations For The Authors):

      1) Why the Cylc1-/y Cylc2+/- males were infertile? It would be helpful to show the homologue of the two proteins;

      To elaborate more on the homology of CYLC1 and CYLC2, we added a more detailed section about the protein and domain structure to the introduction.

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysineglutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices (Hess et al., 1993). Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1supplement 1).

      Speculations about the infertility of Cylc1-/y Cylc2+/- males was added to the discussion:

      Line 410-413: Interestingly, Cylc1-/Y Cylc2+/- males displayed an “intermediate” phenotype, showing slightly less damaged sperm than Cylc2-/- and Cylc1-/Y Cylc2-/- animals. This further supports our notion, that loss of the less conserved Cylc1 gene might be at least partially compensated by the remaining Cylc2 allele.

      2) Western blot is important to show the absence of the two proteins in the mouse models;

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      A paragraph was added to the manuscript and reads as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      3) On Page 7, line 227 and line 243, was the acetylated α-tubulin or α-tubulin antibody used?

      For all stainings α-tubulin antibody was used. We corrected this accordingly. Line 257-259: We used immunofluorescence staining of α-tubulin on squash testis samples containing spermatids at different stages of spermiogenesis to investigate whether the altered head shape, calyx structure, and tail-head connection anomalies originate from possible defects of the manchette structure.

      4) Fig. 2S: A cartoon showing the elongation and circularity of nuclei for evaluation is helpful; The TEM images from the control and Cylc1 KO mice are needed;

      Cylc1-/Y TEM picture was added in Figure 3A.

      5) The discussion should be rewritten. The current version is to repeat the experiments/findings. The authors should discuss more about the potential mechanisms.

      We discussed the observed defects of Cylc-deficient animals and discussed this in relation to other published mouse models deficient in Calyx components. Furthermore, we speculated about potential interaction partners of Cylicins and the importance of these protein complexes for male fertility. However, to this point, we think that it is too farfetched to speculate about potential mechanisms without any evidence for Cylc interaction partner or their exact molecular function. This requires further research.

    1. Author Response

      We are grateful to the editors for considering our manuscript and facilitating the peer review process. Importantly, we would like to express our gratitude to reviewers for their constructive comments. Given eLife’s publishing format, we provide an initial author response now, which will be followed by a revised manuscript in the near future. Please find our responses below.

      eLife Assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Reviewer 1

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      • Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      • Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      • Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      • Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their subjective feelings. It might have been better to query participants about perceived stimulus intensity levels. This per- spective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the rele- vance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.1- 2.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Reviewer 2

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential impli- cations for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Thank you very much for these positive comments.

      Reviewer 3

      We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally trans- formed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens. Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines sig- nificance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the x- axis and the recovered parameters on the y-axis would effectively convey this missing information. Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Thank for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regula- tion.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

    1. Author Response

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not.

      The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major concerns.

      -The experimental details on the electron microscopy data and more specifically on the processing is too minimal. Because of the missing pieces of information, the data cannot be trusted in its current state. The authors should explain how they processed the data: number of particles, software used, 3D reconstruction algorithms etc...For instance, they do not mention anything about the final resolution and whether they tried to improve it. What is the dimension of the boxes used for 2D classes and 3D reconstruction? Besides, the resulting 3D volumes should be displayed at different orientations or from, at least, a movie so one can see whether the modelled data actually fits into the 3D volume in various orientations. Have the authors tried cryo-EM to improve the resolution of the data? Have they generated 3D classes? Also they should comment on why the resolution if rather low.

      Thank you for your valuable feedback on our work. We appreciate your suggestions for improvement and agree that we could provide more detailed information on the experimental details of our electron microscopy data. To address your concerns, we have provided additional information on the processing of the data in the revised manuscript.

      Regarding the use of cryo-EM, we attempted to use this technique to determine the structure of autoinhibited kinesin-1. Unfortunately, we encountered challenges in getting the kinesin-1 to behave well on the grids, which prevented us from obtaining meaningful results.

      -The report goes back and forth from focusing on KIF5B then KIF5C and back to KIF5B. It is thus confusing for the reader and the rationale for highlighting a specific isoform is not clear. Hence the authors should perform similar analysis for both isoforms. Specifically the alpha fold deed learning modeling should also be performed using KIF5C in parallel with the analysis performed on KIF5B.

      Thank you for your feedback on our manuscript. We apologize for any confusion caused by the shifting focus between KIF5B and KIF5C. The KIF5B and KIF5C are both kinesin-1 isoforms, should have high structural similarity and should adopt similar structures.

      In our current manuscript, we performed AlphaFold structure prediction on both KIF5B and KIF5C stalks and found that they adopt the same structure. Furthermore, the XL-MS data suggests that KIF5B and KIF5C exhibit similar patterns. We choose to model the KIF5B in this case.

      For the kinesin-1 tetramer, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Author response image 1 and 2) to confirm our analysis in the manuscript. Both data showed that KIF5B-KLC1 and KIF5C-KLC1 have a similar folding pattern. The differences between the two are: (1) The crosslinks within the KIF5B are sparse compared to KIF5C. (2) There are fewer crosslinks between KIF5B and KLC1 compared to KIF5C-KLC1. These differences will need further investigation. Given that there are more crosslinks in KIF5C-KLC1, we choose to model the KIF5C-KLC1 in our manuscript.

      Author response image 1.

      Crosslinked lysine pairs in KIF5B-KLC1 were mapped onto the domain diagram.

      Author response image 2.

      Crosslinked lysine pairs in KIF5C-KLC1 were mapped onto the domain diagram.

      -The proportion of compact versus extended form for KIF5B and KIF5C differs. It seems that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers? Can the authors comment on this and suggest any possible molecular argument which would induce this difference? Can the authors comment on this discrepancy? What would induce any extended form given that the wild type constructs should be compact only? Is there any equilibrium in solution between the two conformations?

      Thank you for your comments on our manuscript. We appreciate your observation that the proportion of compact versus extended form for KIF5B and KIF5C appears to differ. We did observe that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers. We have updated our main text and commented on this difference. We do not have a definitive explanation for this difference, but one possibility is that the differences in the sequence of the two isoforms may contribute to their differential propensities for compact versus extended conformations. It is possible that there is an equilibrium between the two conformations, but we did not explicitly investigate this in our study.

      • In Figure 1.C, lower panel, the "extended" conformation does not appear as extended as stated in the text, looking at the negative stain image. In particular, the one on the bottom right look rather compact, instead. The resulting graph shown in Figure 1.E seems a bit off as compared with the images. How were the measurements performed to generate figure 1.E? Were all the particles selected for measurement or were only some of them picked or were the measurements done using class averages? In the same line, the authors should show class averages of the extended conformation as well.

      Thank you for your feedback on our manuscript. We appreciate your comments on the presentation of our data in Figure 1C. We agree that some kinesin may not appear as extended in the negative stain images as we stated in the text. For EM sample preparation, we took the fraction corresponding to the extended conformation, used BS3 to crosslink them and then examined them under EM. The compact kinesin-1 molecule could come from the aggregated molecule during the crosslinking process.

      Regarding the measurement, we measured the length of individual molecules which clearly looks like the KIF5B from the raw micrographs. Molecules that show any sign of aggregation were not measured. For the class averages of the extended state, given that the extended molecule is about 80 nm in length and very flexible, it would be hard to get meaningful averages. We have updated the methods section to include this measurement method.

      -In figure 2B, the EM envelope does not accommodate the CC1 domain which extends way beyond the contour of the 3D volume and thus suggest that the modeling and/or the 3D EM reconstruction is not correct. Also the authors do not comment at all on this even though this is a striking feature. The CC1 might thereby be less disorganized or more flexible than expected by the model.

      Thank you for your feedback on our manuscript, particularly with regard to Figure 2B. We appreciate your observation that the EM envelope does not accommodate the CC1 domain, which extends beyond the contour of the 3D volume. We agree that this is a striking feature that may suggest that the modeling and/or the 3D EM reconstruction is not entirely correct. We have added comments regarding this feature in the main text. However, given the current data, we could not generate a better model to describe the structure of CC1 besides using results from the AlphaFold prediction.

      -The so called "C-shaped" feature on the class averages (Fig 3D) does not stand out clearly on all of the class averages. It is visible on the right hand panels but not visible on the left hand side. What is the proportion of classes and thus of the dataset which clearly displayed this peculiar C-shaped feature?? Can the authors analyze this?

      Thank you for your feedback on our manuscript, particularly with regard to Figure 3D. We acknowledge your observation that the "C-shaped" feature is not clearly visible on all of the class averages. We believe that it could be due to the different orientations of the class averages. We have revised our main text to comment on this.

      -The different mutants were subjected to motility assays. However, mutations/truncations could strongly affect their structural features and conformation. The authors should thus, at least for some of them, check their global ultrastructure using electron microscopy, for instance, and 2D class averaging. In particular, it would be worthwhile testing how different mutations induce any transition from a compact to an extended state. Besides, it is not specified whether the truncated mutants are homo-dimeric or monomeric.

      Thank you for your valuable feedback on our manuscript, particularly with regard to the motility assays conducted on the different mutants. All the KIF5B mutants should be homodimers as WT KIF5B. We agree that it would be beneficial to check some of the mutants under EM to examine their conformation. However, due to time constraints, we were unable to perform these analyses.

      Minor concerns

      • Does AlphaFold generate several possible models? Can a selection of those be displayed at least in the supplementary material so the reader can understand how any given model is selected? A short introduction on the alpha fold methodology and how the different obtained structures compare with one another and ultimately how the best structure is selected.

      Yes, AlphaFold generates several possible models during the protein structure prediction process. These models are ranked based on their confidence scores, which reflect the degree of certainty with which AlphaFold has predicted each model. In our study, we chose the model with the highest score, while we noticed that the top 5 models from the AlphaFold prediction generally tend to be very similar in the case of the kinesin-1 structure prediction. We have updated the text in the method section to help the reader appreciate our approach.

      -When expressing the hetero-tetramers, do the authors generate homodimers as well? If so, can they estimate the relative proportion of all the possible populations?

      We used the multibac expression system to co-express the kinesin heavy chain and light chain in sf9 cells. We believe that the hetero-tetramers should account for the majority of products, though we can not rule out the possibility of formation of homodimers.

      -The motility assays should be better described.

      We have added more text to describe the assay.

      -The report does not discuss whether any combinations of isoforms (for instance KIF2B-KIF2C) could assemble into a complex and whether it has already been observed in cells?

      We believe that you are asking about whether KIF5B and KIF5C form heterodimer. We did not see any previous literature report on this and have not tested this possibility.

      -The authors should discuss why they do not obtain the same results as Kaan et al (2011). For instance, would the experimental conditions responsible for the discrepancies observed?

      In the study done by Kaan et al (2011), their structures showed that kinesin-1 motor domains crystallized with a tail peptide holding the motors in an immotile conformation, which supports the model of kinesin-1 autoinhibition where the C-terminal tail of kinesin-1 drives autoinhibition to block motility. However, there are several limitations regarding this study as we mentioned in our manuscript. First, the authors used truncated kinesin heavy chains that only include the motor domain and the neck coil instead of the full length protein. Second, the crystal structure was obtained by adding the tail peptide in trans. Thus, how kinesin-1 folds into an autoinhibited state remains poorly understood, severely limiting our understanding of kinesin-1 regulation.

      Our model confirms the critical role of the tail domain as the study done by Kaan et al (2011). We observe that the tail domain lies very close to the motor heads which are consistent with what has been reported in the study done by Kaan et al (2011). However, due to lack of enough lysine residues and the unstructured nature of the tail domain, we could not resolve the exact conformation of the tail domain.

      We have addressed the question in our discussion section regarding the tail domain and IAK motif.

      -A final schematic model would be beneficial to support the model and could be inserted within the discussion section.

      We have added a final model figure as Figure 7 in the discussion section.

      -The authors should discuss why the shortest mutant is the most active in the motility assay and how this compares with the full length protein in vivo? Can full-length kinesin1 reach similar motility?

      The shortest mutant KIF5B(1-420) only contains the motor domain and CC1, without any regulatory elements to lock it into the inhibited state. It should reflect the intrinsic biophysical property of the kinesin-1 motor domain on the microtubules. We have revised our main text to include this point. However, kinesins in cells are all full length proteins and are subjected to multiple layers of regulation. It would be hard to make the comparison between full length kinesins in vivo and the shortest mutant KIF5B(1-420).

      -Have the authors attempted to obtain the structure of a TRAK-1 kinesisn1 complex, for instance by electron microscopy? Will they consider addressing the structure of such full complexes to see whether the protein-protein interactions they infer are indeed reflected within the complexes?

      Yes, we did want to check the TRAK1-KIF5B complex using negative staining EM. However, due to the flexibility of TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM, we could not get meaningful results.

      -Can the authors test kinesin-TRAK1 complexes in motility assays?

      There are already two studies (Canty et al., 2021, Henrichs et al., 2020) that confirmed that TRAK1 can activate the motility of kinesin-1, which we cited in our manuscript. Therefore, we did not test it in our studies.

      Reviewer 2

      -The lack of crosslinks seems to be interpreted as the lack of interactions, but that this is not necessarily the case. Also BS3 crosslinks mainly amino groups that are about 25A apart, which gives a read out of proximity rather than interactions. How many times were the crosslinking experiments done? In figure 6, there are not many crosslinks for TRAK and kinesin-1 so it would be good to know if it has been repeated.

      The number of XL-MS we have done for each sample are: KIF5B (three times), KIF5C (once), KIF5B-KLC1 (twice), KIF5C-KLC1 (twice), KIF5B(1-562) (once), KIF5B-TRAK1 (once) and KIF5B(IAK/AAA) (once). We have added the above information in the method section for the XL-MS.

      For the kinesin-1 heterotetramers, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Figure 1 and Figure 2) to validate our analysis in the manuscript, which shows consistent results as in our manuscript. For the XL-MS experiment on the KIF5B-TRAK1 complex, due to the time limitation, we only performed it once but would like to explore it in the future.

      We summarized identified cross-linked pairs for each kinesin-1 sample as supplementary files.

      -Regarding the interaction between TRAP and Kif5b, the authors propose TRAP activate Kif5b by disrupted the autoinhibited conformation from the lack of crosslinks and the position of the cross-links identified. What does Kif5b+TRAP (after or before crosslinking) look like by negative stain EM? The authors have done this experiments for the other samples Kif5b and Kif5b KLC so it would should be easy for the authors to do this for Ki5f5b-TRAP. Also can alphafold mutimer predict the Ki5fb-TRAP interface?

      Thanks for bringing this up. We tried to get the EM images for the TRAK1-KIF5B complex. We observed that the KIF5B alone and the TRAK1-KIF5B complex tend to fall apart if not being crosslinked before putting onto the grids. For the crosslinked samples, we are unable to see the TRAK1 clearly on the KIF5B due to the flexibility of the TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM. We would like to explore this further.

      As for the AlphaFold prediction on KIF5B-TRAK1 complex, we found that AlphaFold did not perform well in predicting the TRAK1 on kinesin-1 stalk. We tried the combination of various TRAK1 and KIF5B fragments, but could not get any meaningful results.

      -Figure 4. Very long crosslinks are not explained by the model, and suggest the model could be partially incorrect. Can the authors state the distance between the crosslinked residues in their model in figures? Generally the authors should report all crosslink distance in their figures with molecular models.

      Thanks for bringing this up. For the model building, we used the XL-MS data as guidance to model the autoinhibited kinesin-1 with the input from AlphaFold structure prediction and EM map. We assembled the model by piecing together multiple rigid kinesin-1 fragments generated from AlphaFold structure prediction as described in the method section.

      We realize that some crosslinked residues in our model have distances greater than the maximum distance allowed for the BS3 crosslinkers, especially for the crosslinked pairs between the TPR and motor domain. We admit that our current model could be partially incorrect. Since we do not have high resolution structure data on kinesin-1, we are unsure about how to make our model to satisfy all the distance constraints. We have addressed the above limitations in our discussion section.

      -Figure 5: motility assays, the amount of data analyzed seems quite low. There are only 2 repeats done for each condition. The number of microtubules is reported rather than number of measurements done-can the authors report number of events/motors measured. It would be useful to have the concentration of motors used in the figure. Landing rate: are authors not differentiating motile vs non motile tracks also? What do the mutants look like in EM class averages?

      Thanks for bringing this up. We have revised our method section about the single molecule assay to include this information.

      Finally, we agree that it would be beneficial to check the mutants under EM. However, due to time limitations, we were unable to perform this experiment.

      -The figure in 6D needs revising. This does not look like a pulldown experiment, controls are missing and the proteins do not seem to be stoichiometric. In particular, the third lane. There are also no protein markers.

      Thank you for bringing this up. We revised Figure 6 and added the protocol for the pulldown assay in our method section for protein expression and purification.

      Minor points

      -Is the data available in PRIDE, etc...? Could the authors provide a table of xlinks?

      We have included crosslinked pairs detected in our XL-MS as supplementary files for KIF5B, KIF5C, KIF5B-KLC1, KIF5C-KLC1, KIF5B(1-565), KIF5B(IAK/AAA) and KIF5B-TRAK1. We have added a new section called Data Availability in the main manuscript to fully describe this.

      -It would be better to have the mapping of the crosslinks in the same figures as the corresponding crosslink map.

      Due to the layout of the figure, we choose to show the model and the mapped crosslinks in the same figure.

      -No crosslinks were obtained between the IAK motif and the motor domain. This could be due to the lack of neighbouring groups that can crosslink with the K in the motif, rather than the tail not binding/crosslinking to the motor. The text could be edited to explain this

      Thanks for bringing this up. We edited the text to add this point.

      -Figure 5. Typo in mutation

      We revised the figure5

      -No hyphen between c and terminus (as that is a noun)

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Rai1 encodes the transcription factor retinoic acid-induced 1 (RAI1), which regulates expression of factors involved in neuronal development and synaptic transmission. Rai1 haploinsufficiency leads to the monogenic disorder Smith-Magenis syndrome (SMS), which is associated with excessive feeding, obesity and intellectual disability. Consistent with findings in human subjects, Rai1+/- mice and mice with conditional deletion of Rai1 in Sim+ neurons, which are abundant in the paraventricular nucleus (PVN), exhibit hyperphagia, obesity and increased adiposity. Furthermore, RAI1-deficient mice exhibit reduced expression of brain-derived neurotrophic factor (BDNF), a satiety factor essential for the central control of energy balance. Notably, overexpression of BDNF in PVN of RAI1-deficient mice mitigated their obesity, implicating this neurotrophin in the metabolic dysfunction these animals exhibit. In this follow up study, Javed et al. interrogated the necessity of RAI1 in BDNF+ neurons promoting metabolic health.

      Consistent with previous reports, the authors observed reduced BDNF expression in the hypothalamus of Rai1+/- mice. Moreover, proteomics analysis indicated impairment in neurotrophin signaling in the mutants. Selective deletion of Rai1 in BDNF+ neurons in the brain during development resulted in increased body weight, fat mass and reduced locomotor activity and energy expenditure without changes in food intake. There was also a robust effect on glycemic control, with mutants exhibiting glucose intolerance. Selective depletion of RAI1 in BDNF+ neurons in PVN in adult mice also resulted in increased body weight, reduced locomotor activity, and glucose intolerance without affecting food intake. Blunting RAI1 activity also leads to increases and decreases in the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN.

      Strengths:

      Overall, the experiments are well designed and multidisciplinary approaches are employed to demonstrate that RAI1 deficits in BDNF+ neurons diminish hypothalamic BDNF signaling and produce metabolic dysfunction. The most significant advance relative to previous reports is the finding from electrophysiological studies showing that blunting RAI1 activity leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN. Furthermore, that intact RAI1 function is required in BDNF+ neurons for the regulation of glucose homeostasis.

      Weaknesses:

      Some of the data need to be reconciled with previous findings by others. For example, the authors report that more than 50% of BDNF+ neurons in PVN also express pTrkB whereas about 20% of pTrkB+ cells contain BDNF, raising the possibility that autocrine mechanisms might be at play. This is in conflict with a previous study by An et al, (2015) showing that these cell populations are largely non-overlapping in the PVN.

      We fully agree with this assessment. Given the difficulty of using immunostaining to characterize the expression of membrane proteins in vivo, and the specificity of the pTrkB antibody in different tissues remains unknown, it is difficult to interpret the signals we observed. We have excluded the data because the histological analysis of p-TRKB and BDNF autocrine/paracrine signalling is not a focus of the present study. Future studies using a more advanced genetic method (i.e., Ntrk2CreER/+; Ai9 mouse line as used by An et al., 2015) is more suitable and should be used in the future to investigate the function of Rai1 in the TRKB+ neurons.

      Another issue that deserves more in-depth discussion is that diminished BDNF function appears to play a minor part driving deficits in energy balance regulation. Accordingly, both global central depletion of Rai1 in BDNF+ neurons during development and deletion of Rai1 in BDNF+ neurons in the adult PVN elicited modest effects on body weight (less than 18% increase) and did not affect food intake. This contrasts with mice with selective Bdnf deletion in the adult PVN, which are hyperphagic and dramatically obese (90% heavier than controls). Therefore, the results suggest that deficits in RAI1 in PVN or the whole brain only moderately affect BDNF actions influencing energy homeostasis and that other signaling cascades and neuronal populations play a more prominent role driving the phenotypes observed in Rai1+/- mice, which are hyperphagic and 95% heavier than controls. The results from the proteomic analysis of hypothalamic tissue of Rai1 mutant mice and controls could be useful in generating alternative hypotheses. Depleting RAI1 in BDNF+ neurons had a robust effect compromising glycemic control. However, as the approach does not necessarily impact BDNF exclusively, there should be a larger discussion of alternative mechanisms.

      We thank the reviewer for these insightful comments. We want to highlight that global deletion of Rai1 from BDNF neurons did induce food intake increase in male mice (Fig 2figure supplement 4K). We have incorporated the following paragraphs into the discussion section.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent hypothalamic cell types residing in brain regions other than PVH regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1-expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Reviewer #2 (Public Review):

      Understanding disease conditions often yields valuable insights into the physiological regulation of biological functions, as well as potential therapeutic approaches. In previous investigations, the author's research group identified abnormal expression of brain-derived neurotrophic factor (BDNF) in the hypothalamus of a mouse model exhibiting Smith-Magenis syndrome (SMS), which is caused by heterozygous mutations of the Rai1 gene. Human SMS is associated with distinct facial characteristics, sleep disturbances, behavioral issues, and intellectual disabilities, often accompanied by obesity. Conditional knockout (cKO) of the Bdnf gene from the paraventricular hypothalamus (PVH) in mice led to hyperphagic obesity, while overexpression of the Bdnf gene in the PVH of Rai1 heterozygous mice restored the SMS-like obese phenotype. Based on these preceding findings, the authors of the present study discovered that homozygous Rai1 cKO restricted to Bdnf-expressing cells, or Rai1 gene knockdown solely in Bdnf-positive neurons in the PVH, induced obesity along with intricate alterations in adipose tissue composition, energy expenditure, locomotion, feeding patterns, and glucose tolerance, some of which varied between sexes. Additionally, the authors demonstrated that a brain-penetrating drug capable of activating the TrkB pathway, a downstream signaling pathway of BDNF, partially alleviated the SMS-like obesity phenotype in female mice with Rai1 heterozygous mutations. Although the specific (neural) cell type responsible for this TrkB signaling remains an open question, the present study unequivocally highlights the importance of Rai1 gene function in PVH Bdnf neurons for the obesity phenotype, providing valuable insights into potential therapeutic strategies for managing obesity associated with SMS.

      In the proteomic analysis (Fig. 1), the authors elucidated that multiple phospho-protein signaling pathways, including Akt and mTOR pathways, exhibited significant attenuation in the SMS model mice. Of significance, the manifestation of haploinsufficiency of the Rai1 gene exclusively within the BDNF+ cells demonstrated negligible impact on body weight (Fig. 2supple 3D), despite observing a reduction in BDNF levels in the heterozygous Rai1 mutant (Fig. 1A). Conversely, the homozygous Rai1 cKO in the BDNF+ cells prominently displayed an obesity phenotype, suggesting substantial dissimilarities in the gene expression profiles between Rai1 heterozygous and homozygous conditions within the BDNF+ cell population. It would be advantageous to precisely identify the responsible differentially expressed genes, possibly including Bdnf itself, in the homozygous cKO model. The observed reduction in the excitability of PVH BDNF+ cells (Fig. 3) is presumably attributed to aberrant gene expression other than Bdnf itself, which may serve as a prospective target for gene expression analysis. Notably, the Rai1 homozygous cKO mice in BDNF+ cells exhibited some sexual dimorphisms in feeding and energy expenditures, as evidenced by Fig. 2 and related figures. Exploring the potential relevance of these sexual differences to human SMS cases and investigating the underlying cellular/molecular mechanisms in the future would provide valuable insights.

      Although the CRISPR-mediated knockdown of the Rai1 gene (Fig. 4) appears to be highly effective, given the broad transduction of AAV serotype 9, it may be helpful to exclude the possibility of other brain regions adjacent to the PVH, such as the DMH or VMH, being affected by this viral procedure. If the PVH-specificity is established, the majority of Rai1 cKO effects in Bdnf+ cells are primarily attributed to PVH-Bdnf+ cells based on the similarity of phenotypes observed. With regards to the apparent rescue of the body weight phenotype in Rai1 heterozygous mutants using a selective TrkB activator, the specific biological processes, and neurons responsible for this effect remain unclear to this reviewer. Elucidating these aspects would be significant when considering potential applications to human SMS cases.

      We appreciate the reviewer's insightful comments. We agree that the logical next step would be to identify the profile of the differentially expressed genes in our homozygous conditional knockout model. We have included the following paragraphs in the discussion.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent non-PVH hypothalamic cell types regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Lines 409-418: “It is plausible that RAI1 regulates the expression of genes encoding inward rectifier K+ channels, which regulate neuronal activity and potentially energy homeostasis. For instance, KIR6 (a family of ATP-sensitive potassium channels, KATP) is widely expressed in the hypothalamus. Deleting the hypothalamic KIR6.2 subunit impairs KATP channel function and glucose tolerance (Miki et al., 2001; Parton et al., 2007). Moreover, reduced expression of hypothalamic GIRK4 (encoding an inwardly rectifying potassium channel) causes obesity (Perry et al., 2008). GABAergic neurotransmission from arcuate AGRP-expressing neurons to the PVH neurons is important to increase appetite by favouring hyperphagia (Atasoy et al., 2012). Disrupting the composition of these ion channels could contribute to reduced PVHBDNF neuronal firing, which awaits further investigations.”

      Moreover, to facilitate the future exploration of the potential relevance of sexual differences to human SMS cases, we have incorporated the following explanation in the discussion section.

      Lines 419-426: “Female mice with a conditional knockout of Rai1 from BDNF-producing neurons do not display a noteworthy difference in food intake. Conversely, their male counterparts exhibit a significant increase in food intake. Although SMS individuals of both genders tend to overeat, male patients who are obese show significantly higher food consumption than their female counterparts (Gandhi et al., 2022). This observation raises the possibility that Rai1 regulates eating behaviours through multiple cell types in the hypothalamus and that a male-specific involvement of BDNF-producing neurons in regulating food intake, potentially provides a neurobiological basis for the observed pattern in SMS patients (Gandhi et al., 2022).”

      To exclude the possibility of other brain regions adjacent to the PVH (such as VMH and arcuate nucleus) being affected by our AAV-CRISPR-mediated Rai1 knockout, we have analyzed other hypothalamic regions including VMH and arcuate nucleus from the same slides used to confirm PVH viral expression and we confirmed that the AAV was not expressed in these regions. We have incorporated a representative image (Figure 4 suppl 1F) depicting limiting AAV expression in these nuclei.

      Regarding LM22A-4: It is possible that LM22A-4 functions directly through binding to TRKB or indirectly engages TRKB downstream molecules through activating other receptors such as GPCR. LM22A-4 appears to engage neurotrophin downstream PI3KAKT pathway, which was identified by our RPPA analysis to be downregulated in the hypothalamus of Rai1-deficient mice. Reduced AKT activity is associated with insulin resistance and obesity in mice. Restoration of functional activity of AKT by LM22A-4 could be the primary mode of action for this drug in the brain. However, since we observed that this drug only partially rescued the body weight defect, future research exploring more potent TrkB agonists or utilizing a combination therapy that targets both the neurotrophin and mTOR pathways might yield improved responses to the pharmacological interventions. We have included the following paragraph in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      Overall, the present study represents a valuable addition to the authors' series of high-quality molecular genetic investigations into the in vivo functions of the Rai1 gene. This reviewer particularly commends their diligent efforts to enhance our comprehension of SMS and contribute to the future development of more effective therapies for this syndrome.

      We thank the reviewer for finding our study valuable in advancing the understanding of RAI1 function.

      Reviewer #3 (Public Review):

      Summary:

      Smith-Magenis syndrome (SMS) is associated with obesity and is caused by deletion or mutations in one copy of the Rai1 gene which encodes a transcriptional regulator. Previous studies have shown that Bdnf gene expression is reduced in the hypothalamus of Rai1 heterozygous mice. This manuscript by Javed et al. further links SMS-associated obesity with reduced Bdnf gene expression in the PVH.

      Strengths:

      The authors show that deletion of the Rai1 gene in all BDNF-expressing cells or just in the PVH BDNF neurons postnatally caused obesity. Interestingly, mutant mice displayed sexual dimorphism in the cause for the obesity phenotype. Overall, the data are well presented and convincing except the data from LM22A-4.

      Weaknesses:

      1) The most serious concern is about data from LM22A-4 administration experiments (Figure 5 and associated supplemental figures). A rigorous study has demonstrated that LM22A-4 does not activate TrkB (Boltaev et al., Science Signaling, 2017), which is consistent with unpublished results from many labs in the neurotrophin field. It is tricky to interpret body weight data from pharmacological studies because compounds always have some side effects, which can reduce body weight non-specifically.

      We thank this reviewer for their valuable comments. Indeed, the precise mechanism by which LM22A-4 exerts its effect is not entirely clear and there has been mixed evidence regarding its identity as a TRKB agonist in vitro. We have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increasing AKT phosphorylation in vivo. We have modified the title to remove TRKB, and the following changes have been made in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      2) The resolution of all figures are poor, and thus I could not judge the quality of the micrographs.

      We have updated with higher resolution images.

      3) Citation of the literature is not precise. The study by An et al. (2015) shows that deletion of the Bdnf gene in the PVH leads to obesity due to increased food intake and reduced energy expenditure (not just hyperphagic obesity; Line 72). Furthermore, the study by Unger et al. (2017) carried out Bdnf deletion in the VMH and DMH using AAV-Cre and did not discuss SF1 neurons at all (Line 354). The two studies by Yang et al. (Mol Endocrinol, 2016) and Kamitakahara et al. (Mol Metab, 2015) did use SF1-Cre to delete the Bdnf gene and did not observe any obesity phenotype.

      We thank the reviewer for bringing this to our attention. We have revised the text to ensure accurate representation of the cited publications. The following changes have been made: Lines 348-350: “ Although BDNF is required in the VMH and DMH to regulate body weight (Unger et al., 2007), embryonic deletion of Bdnf from the SF1-lineage populations including the VMH did not result in obesity (Kamitakahara et al., 2016; Yang et al., 2016).”

      4) Animal number is not described in many figure legends.

      We thank the reviewer for pointing it out. We have revised the manuscript to incorporate the missing animal numbers.

      Reviewer #1 (Recommendations For The Authors):

      Additional points:

      1) The data provided indicating increased inhibitory tone onto BDNF neurons in PVN of Rai1 mutant mice are not convincing that inhibitory drive is significantly affected.

      We have modified the sentences as follows, we have also deleted these conclusions from the abstract and discussion:

      Lines 215-220: “We observed a slight rightward shift of the probability of miniature inhibitory postsynaptic current (mIPSC) frequency in cKO PVHBDNF neurons, although the average frequency (Fig 3K) was not significantly different between groups. The probability of mIPSC amplitude also showed a right shift without a significant change (Fig 3L, Figure 3—figure supplement 1D). However, we observes a significant increased area under the curve (Fig 3M).”

      2) Fig. 3C - Was outlier analysis performed for these data? One of the data points for the control group looks like an outlier that might be skewing the data.

      We performed an outlier analysis and found that indeed one data point was an outlier, after removing this data point, the data remained statistically significant (*p<0.05) and the new manuscript has been updated.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript would benefit from improved usage and precise descriptions of statistics. The authors often provided only general statements such as "one or two-way ANOVA" without specifying the exact statistical tests used. It is important to differentiate between one-way and two-way ANOVA, particularly when using the latter, by clearly indicating the within-group effects and interaction effects. The representation of p-values associated with ANOVA using asterisks requires clarification, specifying which statistics indicate ANOVA results and which ones correspond to post hoc analysis. It is advisable to assess the normality of the distribution before employing t-tests or consider non-parametric comparisons such as Wilcoxon's rank sum test if normality assumptions are not met. Additionally, it is essential to specify whether the tests are one-sided or two-sided and whether they are paired or unpaired. In some figure panels, such as Fig. 2H and K, the statistical tests used were not indicated at all.

      We have clarified the exact statistical tests in the figure legend for each figure.

      2) Rearranging the figures to facilitate a direct comparison of the sexual phenotypes (Fig. 2 and Fig. 2-supple 4) within the same figures would greatly improve reader comprehension.

      We have decided to keep the figure arrangement because of the focus on female mice in the main figures.

      3) To improve the comprehension of the figures and text, the following points should be addressed:

      • Fig. 1D: The definition of the expression level in the color code is not clear.

      Explanation for the color code has been added in the method section.<br /> Lines 652-656: “The vertical axis of the dendrogram represents the dissimilarity (measured as distance) between protein expressions, and the horizontal axis represents the individual test samples. The colour code (ranging from red to yellow to green) specifies the expression levels of different proteins, where red indicates nifies low expression, yellow indicates intermediate expression, and green indicates high expression.”

      • Fig. 1F: One parenthesis is missing from the figure label.

      Fixed

      • Fig. 2C: It is unclear why there are so many dots for just n = 3 animals. It would be better to specify the conditions or use "animals" as a unit of measurement.

      The dots represent percentage cells quantified per sliced from 3 animals. It has been clarified in the figures.

      • Fig. 2F: There seems to be an unnecessary label "I" in the middle of the panel.

      Fixed

      • It is not completely clear if the data in Fig. 2E-L were all obtained at 26 weeks of age.

      To clarify, following line has been added to the method section:

      Lines 517-518: “After the 25th week, mice were subjected to body composition analysis.”

      • In Fig. 2-Supple 1, the legend should read "G-J." Additionally, please provide a definition for the arrowheads.

      Line 1086: “yellow arrowheads indicate Ai9 marked BDNF cells co-expressing endogenous BDNF.”

      • It is not completely clear if the data in Fig. 3 were all obtained from female mice.

      It is explained in the legend of Fig 3.

      • The description of the number of animals seems to be missing in Fig. 4

      The description for the number of animals has been added in the figure legend. Line 1004: “(Ctrl group: n=5, Exp group: n =5)”

      • On line 280-281, "Fig 4A." should be corrected to "Fig. 5A."

      Corrected.

      • In Fig. 5C-E, it is uncertain if multiple pairwise comparisons for three groups are statistically appropriate. At the very least, multiple comparisons should be corrected.

      We performed two-way ANOVA where mean body weight of age-matched groups were compared with each other (i.e. between control saline-injected and SMS saline-injected, SMS saline-injected and LM22A-4 -saline injected, and Control saline-injected and SMS LM22A-4 injected). We used Šidák’s multiple comparisons test, where statistical significance was indicated with p<0.05, p < 0.01, p<0.001, **p < 0.0001. We have clarified this in the figure 5 legends.

      • The unit of measurement should be standardized across figures, if possible, to facilitate better side-by-side comparisons. For example, most bodyweight figures use "g" (grams), but "mg" (milligrams) is used in Fig. 5.

      All measurements are corrected to be consistent (in grams).

      • It is unclear if nM (not mM) of glucose was actually measured in the glucose tolerance test (Fig. 2L and Fig. 4L).

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      1) The authors can remove the LM22A-4 data without much detrimental effects on the conclusion of the manuscript. Otherwise, the authors have to demonstrate that LM22A-4 activates TrkB, does not have any toxicity, and does not cause aversion.

      We thank this reviewer the valuable comments and we acknowledge the valid concern. Indeed, the precise mechanism by which LM22A-4 exert its effects is not clear and there has been mixed opinions regarding its function as TRKB agonist in in-vitro assays. To clarify, we have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increased AKT phosphorylation, in-vivo.

      We have also modified the title of our article to exclude the word “TRKB Signalling”. The new title is as follows:

      “Smith-Magenis syndrome protein RAI1 regulates body weight homeostasis through hypothalamic BDNF-producing neurons and neurotrophin downstream signalling”

      2) Line 50: "40% > 95th percentile weight, 40% > 85th percentile weight" should be "40% > 95th percentile weight, 80% > 85th percentile weight".

      Corrected.

      3) Abbreviations for brain-derived neurotrophic factor: Bdnf for gene and BDNF for protein.

      Abbreviations have been corrected throughout the manuscript.

      4) Need to specify the animal age when viruses were injected into the PVH to inactivate the Bdnf gene.

      Line 235: Virus was injected at 3 weeks of age. It has been specified in the main text.

      5) Line 832: "3 technical triplicates" can be simplified as "3 technical repeats" because 3 and triplicates are redundant.

      Corrected.

      6) Figure 2B: The "O" in cKO is misplaced.

      Fixed.

      7) Figure 3: The black legends in E and F should include Ctrl.

      Fixed in the Figure 3.

    1. Author Response

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2014).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018), reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level even in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina, I., Outin, E., Picard, C. A., Chettouh, Z., Dymecki, S., Consalez, G. G., Coppola, E. and Brunet, J. F. (2014). Neurodevelopment. Parasympathetic ganglia derive from Schwann cell precursors. Science 345, 87–90.

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Response: Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below.

      Briefly, regarding clearer explanations of the methods, we added additional analyses (e.g., commonality analyses on ridge regression and on multiple regressions with a quadratic term for chronological age) to address some of the concerns and additional details in text and figures to ensure that the reader can fully understand our methodological procedures. Regarding the critical evaluation of the conceptual basis of the different models, we added discussions to help with interpretations and the scope of the generalisability of our findings. For instance, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them in the ability to explain fluid cognition, we now treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition (for this particular issue, please see our response to Reviewer 3 Public Review #4).

      Reviewer 1:

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address which mostly relate to clarity and interpretation.

      Reviewer 1 Public Review #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain-age models more generally. Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, there may be limits to the interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest that the authors consider and comment on these issues.

      Response: Thank you Reviewer 1 for pointing out these important issues. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 (see below).

      Reviewer 1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. Stacked models can be prone to overfitting when combined with cross-validation. This is because the predictions from the first-level models (i.e. the features that are provided to the second level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand what was actually done. Please provide more information to enable the reader to better understand the stacked regression models. If the authors are not using an approach that fully preserves training and test separability, they need to do so.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #2 (see below). Briefly, we now made it clearer that training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Reviewer 1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 1 Public Review #4:

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods, and bias-correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #5-#6. Briefly, we followed your advice and add all of the suggested details.

      Reviewer 2 (Public Review):

      Reviewer 2 Public Review Overall:

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration. The study employs suitable data and methods, albeit with some limitations, to address the research questions. A more detailed discussion of methodological limitations in relation to the study's aims is required. For instance, the current commonality analysis may not sufficiently address potential multicollinearity issues, which could confound the findings. Importantly, given that the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. This is particularly relevant to their novel index, brain-cognition, given that brain-age has been validated extensively elsewhere. In addition, the paper's rationale for using elastic net, which references previous fMRI studies, seemed somewhat unclear. The discussion could be more nuanced and certain conclusions appear speculative.

      Response Thank you for your encouragement. We have now added discussion of methodological limitations (see below). Regarding potential multicollinearity issues, we addressed this comment using Ridge regressions (see our response to Reviewer 2 Recommendations For The Authors #2). Regarding external validation, we now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations (see Reviewer 2 Recommendations For The Authors #1). Regarding Brain Cognition, we also added previous studies showing similarly high prediction for cognition functioning (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We added a discussion about Elastic Net (see Reviewer 1 Recommendations For The Authors #6)

      Discussion

      “There are several potential limitations of this study. First, we conducted an investigation relying only on one dataset, the Human Connectome Project in Aging (HCP-A) (Bookheimer et al., 2019). While HCP-A used state-of-the-art MRI methodologies, covered a wide age range from 36 to 100 years old and used several task-fMRI from different tasks that are harder to find in other bigger databases (e.g., UK Biobank from Sudlow et al., 2015), several characteristics of HCP-A might limit the generalisability of our findings. For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here. Similarly, HCP-A also excluded participants with neurological conditions, possibly making their participants not representative of the general population. Next, while HCP-A’s sample size is not small (n=725 and 504 people, before and after exclusion, respectively), other datasets provide a much larger sample size (Horien et al., 2020). Similarly, HCP-A does not include younger populations. But as mentioned above, a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) also found small effects of the adjusted Brain Age Gap in explaining cognitive functioning. And the disagreement between the predictive performance of age-prediction models and the utility of Brain Age found here is largely in line with the findings across different phenotypes seen in a recent systematic review (Jirsaraie, Gorelik, et al., 2023).”

      Reviewer 2 Public Review #1:

      The authors aimed to evaluate how brain-age and brain-cognition indices capture cognitive decline (as mentioned in their title) but did not employ longitudinal data, essential for calculating 'decline'. As a result, 'cognition-fluid' should not be used interchangeably with 'cognitive decline,' which is inappropriate in this context.

      Response Thank you for raising this issue. We now no longer used the word ‘cognitive decline’.

      Reviewer 2 Public Review #2:

      In their first aim, the authors compared the contributions of brain-age and chronological age in explaining variance in cognition-fluid. Results revealed much smaller effect sizes for brain-age indices compared to the large effects for chronological age. While this comparison is noteworthy, it highlights a well-known fact: chronological age is a strong predictor of disease and mortality. Has the brain-age literature systematically overlooked this effect? If so, please provide relevant examples. They conclude that due to the smaller effect size, brain-age may lack clinical significance, for instance, in associations with neurodegenerative disorders. However, caution is required when speculating on what brain-age may fail to predict in the absence of direct empirical testing. This conclusion also overlooks extant brain-age literature: although effect sizes vary across psychiatric and neurological disorders, brain-age has demonstrated significant effects beyond those driven by chronological age, supporting its utility.

      Response For aim 1, we focused our claims on cognitive functioning and not on any clinical significance for neurodegenerative disorders. We now made it clearer that the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023).

      We believe this issue of the utility of brain age on cognitive functioning vs neurological/psychological disorders requires another consideration, namely the discrepancy in the training and test samples typically used for studies focusing on neurological/psychological disorders. We made this point in the discussion now (see below).

      Discussion

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Public Review #3:

      The second aim's results reveal a discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in cognition-fluid. The authors suggest that if the ultimate goal is to capture cognitive variance, brain-age predictive models should be optimized to predict this target variable rather than age. While this finding is important and noteworthy, additional analyses are needed to eliminate potential confounding factors, such as correlated noise between the data and cognitive outcome, overfitting, or the inclusion of non-healthy participants in the sample. Optimizing brain-age models to predict the target variable instead of age could ultimately shift the focus away from the brain-age paradigm, as it might optimize for a factor differing from age.

      Response We discussed the issue regarding the discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in fluid cognition in our response to Reviewer 3 Public Review #9 (see below). This issue is found to be widespread in a recent systematic review (Jirsaraie, Gorelik, et al., 2023). We now provided several strategies to mitigate this issue to improve the utility of Brain Age in explaining other phenotypes based on our current work and others, using different MRI modalities as well as modelling techniques (Bashyam et al., 2020; Jirsaraie, Kaufmann, et al., 2023; Rokicki et al., 2021).

      Regarding potential confounding factors, we are not sure what the reviewer meant by “correlated noise between the data and cognitive outcome”. The current study, for instance, used ICA-FIX (Glasser et al., 2016) to remove noise in functional MRI. It is unclear how much ‘noise’ is still left and might confound our findings. More importantly, we are not sure how to define ‘noise’ as referred to by Reviewer 2 here. As for overfitting, we used nested cross-validation to ensure that training and test sets were separate from each other (see Reviewer 1 Recommendations For The Authors #2). If overfitting happened as suggested, we should see a ‘lower’ predictive performance of age-prediction and cognitive-prediction models since the models would fit well with the training set but would not generalise well to the test set. This is not what we found. The predictive performance of our age-prediction and cognitive-prediction models was high and consistent with the literature. Regarding the inclusion of non-healthy participants in the sample, we discussed this above in our response to Reviewer 2 Public Review #2).

      Reviewer 2 Public Review #4:

      While a primary goal in biomarker research is to obtain indices that effectively explain variance in the outcome variable of interest, thus favouring models optimized for this purpose, the authors' conclusion overlooks the potential value of 'generic/indirect' models, despite sacrificing some additional explained variance provided by ad-hoc or 'specific/direct' models. In this context, we could consider brain-age as a 'generic' index due to its robust out-of-sample validity and significant associations across various health outcome variables reported in the literature. In contrast, the brain-cognition index proposed in this study is presumed to be 'specific' as, without out-of-sample performance metrics and testing with different outcome variables (e.g., neurodegenerative disease), it remains uncertain whether the reported effect would generalize beyond predicting cognition-fluid, the same variable used to condition the brain-cognition model in this study. A 'generic' index like brain-age enables comparability across different applications based on a common benchmark (rather than numerous specific models) and can support explanatory hypotheses (e.g., "accelerated ageing") since it is grounded in its own biological hypothesis. Generic and specific indices are not mutually exclusive; instead, they may offer complementary information. Their respective utility may depend heavily on the context and research or clinical question.

      Response Thank you Reviewer 2 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 3 (Public Review #4) bought up a similar issue. We agreed with Reviewer 2 that both 'specific/direct' index and Brain Age as a 'generic/indirect' index have merit in their own right. We made a discussion about this issue in our response to Reviewer 3 Public Review #4 (please see this response below).

      Briefly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition. We also made a discussion about using our commonality approach to test for this missing variation in future work:

      Discussion

      “Finally, researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest. As demonstrated here, one straightforward method is to build a prediction model using a phenotype of interest as the target (e.g., fluid cognition) and incorporate the predicted value of this model (e.g., Brain Cognition), along with Brain Age and chronological age, into a multiple regression for commonality analyses. The unique effect of this predicted value will inform the missing variation in the brain MRI from Brain Age. If this unique effect is large, then researchers might need to reconsider whether using Brain Age is appropriate for a particular phenotype of interest.”

      Reviewer 2 Public Review #5:

      The study's third aim was to evaluate the authors' new index, brain-cognition. The results and conclusions drawn appear similar: compared to brain-age, brain-cognition captures more variance in the outcome variable, cognition-fluid. However, greater context and discussion of limitations is required here. Given the nature of the input variables (a large proportion of models in the study were based on fMRI data using cognitive tasks), it is perhaps unsurprising that optimizing these features for cognition-fluid generates an index better at explaining variance in cognition-fluid than the same features used to predict age. In other words, it is expected that brain-cognition would outperform brain-age in explaining variance in cognition-fluid since the former was optimized for the same variable in the same sample, while brain-age was optimized for age. Consequently, it is unclear if potential overfitting issues may inflate the brain-cognition's performance. This may be more evident when the model's input features are the ones closely related to cognition, e.g., fMRI tasks. When features were less directly related to cognitive tasks, e.g., structural MRI, the effect sizes for brain-cognition were notably smaller (see 'Total Brain Volume' and 'Subcortical Volume' models in Figure 6). This observation raises an important feasibility issue that the authors do not consider. Given the low likelihood of having task-based fMRI data available in clinical settings (such as hospitals), estimating a brain-cognition index that yields the large effects discussed in the study may be challenged by data scarcity.

      Response Given the use of nested cross-validation, we do not consider the good predictive performance of Brain Cognition found here as overfitting. In fact, we found a similar level of predictive performance of Brain Cognition on another database with younger participants in the past (Tetereva et al., 2022). However, we agreed with Reviewer 2 that the prediction of fluid cognition might be driven by MRI modalities that are different from those that drive the prediction of chronological age. In our own work with other age groups, including young adults (Tetereva et al., 2022) and children (Pat, Wang, Anney, et al., 2022), cognitive functioning seems to be predicted well from task-based functional MRI. And Reviewer 2 is right that task-based fMRI is not commonly used in clinics, making it harder to translate our results. However, given our results, clinicians should be encouraged to use task-based fMRI if their goal is to predict cognitive functioning. Nevertheless, as suggested, we listed data scarcity as one of the limitations of our approach.

      Discussion “For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here.”

      Reviewer 2 Public Review #6:

      This study is valuable and likely to be useful in two main ways. First, it can spur further research aimed at disentangling the lack of correspondence reported between the accuracy of the brain-age model and the brain-age's capacity to explain variance in fluid cognitive ability. Second, the study may serve, at least in part, as an illustration of the potential pros and cons of using indices that are specific and directly related to the outcome variable versus those that are generic and only indirectly related.

      Response We are thankful for the encouragement. For the discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker for fluid cognition, we made a detailed discussion in our response to Reviewer 3 Public Review #9. More specifically, to ensure that readers can benefit from our findings, we made suggestions on how to ensure the utility of Brain Age indices as a biomarker for other phenotypes by drawing from our own strategy, as well as strategies used by Rokicki and colleagues (2021), Jirsaraie and colleagues (2023) and Bashyam and colleagues (2020).

      As for the pros and cons between generic vs specific biomarkers, we made a detailed discussion in our response to Reviewer 3 Public Review #4. We also made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers (see Reviewer 2 Public Review #4, above).

      Reviewer 2 Public Review #7:

      Overall, the authors effectively present a clear design and well-structured procedure; however, their work could have been enhanced by providing more context for both the brain-age and brain-cognition indices, including a discussion of key concepts in the brain-age paradigm, which acknowledges that chronological age strongly predicts negative health outcomes, but crucially, recognizes that ageing does not affect everyone uniformly. Capturing this deviation from a healthy norm of ageing is the key brain-age index. This lack of context was mirrored in the presentation of the four brain-age indices provided, as it does not refer to how these indices are used in practice. In fact, there is no mention of a more common way in which brain-age is implemented in statistical analyses, which involves the use of brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates. The latter is used to account for the regression-to-the-mean effect. The 'corrected brain-age delta' the authors use does not include a non-linear term, which perhaps is an additional reason (besides the one provided by the authors) as to why there may be small, but non-zero, common effects of both age and brain-age in the 'corrected brain-age delta' index commonality analysis. The context for brain-cognition was even more limited, with no reference to any existing literature that has explored direct brain-cognitive markers, such as brain-cognition.

      Response Regarding Brain Age and negative health outcomes, we addressed this in our response to Reviewer 1 Recommendations For The Authors #1 (see below). Briefly, we now discussed (1) the consistency between our findings on fluid cognition and other recent works on negative health outcomes, (2) the differences between Brain Age studies focusing on negative health outcomes vs. cognitive functioning and (3) suggested solutions to optimise the utility of brain age for both cognitive functioning and negative health outcomes.

      Regarding how Brain Age was used in practice, we addressed this in our response to Reviewer 3 Public Review #2 (see below). Our argument resonates Butler and colleagues’ (2021) suggestion that the common practice for Brain Age analysis should be re-evaluated: “The MBAG and performance on the complex cognition tasks were not associated (r =  .01, p = 0.71). These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016). (p. 4097).”

      Importantly, we also implemented “brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates” in our additional analyses along with other implementations (see Reviewer 2 Recommendations For The Authors #3). Of particular note, we found that adding a non-linear term (i.e., a quadratic term for chronological age) barely changed the results of commonality analyses.

      We now wrote this paragraph to recommend how future research should implement Brain Age:

      Discussion

      “First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to their recommendation (Butler et al., 2021), we suggest future work focus on Corrected Brain Age Gap or, better, unique effects of Brain Age indices after controlling for chronological age in multiple regressions. In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). “

      Regarding brain cognition, we now expanded our explanation about Brain Cognition on how it might be relevant to Brain Age and on Brain Cognition’s predictive performance found previously.

      Introduction

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      Discussion

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022).”

      Reviewer 2 Public Review #8:

      While this paper delivers intriguing and thought-provoking results, it would benefit from recognizing the value that both approaches--brain-age indices and more direct, specific markers like brain-cognition--can contribute to the field.

      Response Thank you so much for recognising the value of our work. As we mentioned above in our response to Reviewer 2 Public Review #4 and #6, we made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers.

      Reviewer 3 (Public Review):

      Reviewer 3 Public Review Overall:

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" While this question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age, the authors are currently missing an opportunity to convey the inevitability of their results, given how brain-age and the brain-age gap are calculated. They also argue that brain-cognition is somehow superior to brain-age, but insufficient evidence is provided in support of this claim.

      Response We addressed the concerns below. The inevitability of our results is not obvious to many researchers who might be interested in Brain Age. We hope our findings might make many issues surrounding Brain Age more obvious, and we now make many suggestions on how to address some of these issues. We no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Specific comments follow:

      Reviewer 3 Public Review #1:

      • "There are many adjustments proposed to correct for this estimation bias" (p3). Regression to the mean is not a sign of bias. Any decent loss function will result in over-predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including "correcting" the brain age gap by regressing out age.

      Response: Thank you so much for raising this issue. We used the word ‘bias’ following many articles in the field. For instance,

      de Lange and Cole (2020) wrote: “brain-age estimation also involves a frequently observed bias: brain age is overestimated in younger subjects and underestimated in older subjects, while brain age for participants with an age closer to the mean age (of the training dataset) are predicted more accurately (Cole, Le, Kuplicki, McKinney, Yeh, Thompson, Paulus, Investigators, et al., 2018, Liang, Zhang, Niu, 2019, Niu, Zhang, Kounios, Liang, 2019, Smith, Vidaurre, Alfaro-Almagro, Nichols, Miller, 2019).”

      Cole (2020) wrote: “As recent research has highlighted a proportional bias in brain-age calculation, whereby the difference between chronological age and brain-predicted age is negatively correlated with chronological age (Le et al., 2018, Liang et al., 2019, Smith et al., 2019), an age-bias correction procedure was used. This entailed calculating the regression line between age (predictor) and brain-predicted age (outcome) in the training set, then using the slope (i.e., coefficient) and intercept of that line to adjust brain-predicted age values in the testing set (by subtracting the intercept and then dividing by the slope). After applying the age-bias correction the brain-predicted age difference (brain-PAD) was calculated; chronological age subtracted from brain-predicted age.”

      Beheshiti and colleagues (2019) used bias in their title: “Bias-adjustment in neuroimaging-based brain age frameworks: a robust scheme”

      More recently, Cumplido-Mayoral and colleagues (2023) wrote: “As recent research has shown that brain-age estimation involves a proportional bias (de Lange et al., 2020a; Le et al., 2018; Liang et al., 2019; Smith et al., 2019), we applied a well-established age-bias correction procedure to our data (de Lange et al., 2020a; Le et al., 2018).”

      Still, we agree with Reviewer 3 that using ‘bias’ might lead to misinterpretation. As Butler and colleagues (Butler et al., 2021) pointed out, ”It is important to note that regression toward the mean is not a failure, but a feature, of regression and related methods.“ We rewrote the paragraph and clarified the “regression towards the mean” issue. We no longer used the word “bias” here:

      Introduction

      “Note researchers often subtract chronological age from Brain Age, creating an index known as Brain Age Gap (Franke & Gaser, 2019). A higher value of Brain Age Gap is thought to reflect accelerated/premature aging. Yet, given that Brain Age Gap is calculated based on both Brain Age and chronological age, Brain Age Gap still depends on chronological age (Butler et al., 2021). If, for instance, Brain Age was based on prediction models with poor performance and made a prediction that everyone was 50 years old, individual differences in Brain Age Gap would then depend solely on chronological age (i.e., 50 minus chronological age). Moreover, Brain Age is known to demonstrate the “regression towards the mean” phenomenon (Stigler, 1997). More specifically, because Brain Age is a predicted value of a regression model that predicts chronological age, Brain Age is usually shrunk towards the mean age of samples used for training the model (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018). Accordingly, Brain Age predicts chronological age more accurately for individuals who are closer to the mean age while overestimating younger individuals’ chronological age and underestimating older individuals’ chronological age. There are many adjustments proposed to correct for the age dependency, but the outcomes tend to be similar to each other (Beheshti et al., 2019; de Lange & Cole, 2020; Liang et al., 2019; Smith et al., 2019). These adjustments can be applied to Brain Age and Brain Age Gap, creating Corrected Brain Age and Corrected Brain Age Gap, respectively. Corrected Brain Age Gap in particular is viewed as being able to control for age dependency (Butler et al., 2021). Here, we tested the utility of different Brain Age calculations in capturing fluid cognition, over and above chronological age.”

      Reviewer 3 Public Review #2:

      • "Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021)" (p3). This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading the Methods, I noticed that the authors use a metric from Le et al. (2018) for the "Corrected Brain Age Gap". If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of the present manuscript, and cross-comparisons between the two.

      Response: We thank Reviewer 3 for pointing out the issues surrounding our choices of wording: "corrected" and "biases". We share the same frustration with Reviewer 3 in that different brain-age articles use different terminologies, and we tried to make sure our readers understand our calculations of Brain Age indices in order to compare our results with previous work.

      We commented on the word “bias” in our response to Reviewer 3 Public Review #1 above and refrained from using this word in the revised manuscript. Here we commented on the use of the word “Corrected Brain Age Gap". And by doing so, we clarified how we calculated it.

      Reviewer 3 is right that we cited the work of Butler and colleagues (2021), but wasn’t accurate to say that we used “a metric from Le et al. (2018) for the "Corrected Brain Age Gap". We, instead, used a method described in de Lange and Cole’s (2020) work. We now added equations to explain this method in our Materials and Method section (see below).

      It is important to note that Butler and colleagues (2021) did not come up with any adjustment methods. Instead, Butler and colleagues (2021) discussed three adjustment methods:

      1) A method proposed by Beheshiti and colleagues (2019). Butler and colleagues (2021) called the result of this method, Modified Brain Age Gap (MBAG). Importantly, Butler and colleagues (2021) discouraged the use of this method due to “researchers misinterpreting the reduced variability of the MBAG as an improvement in prediction accuracy.” Accordingly in our article, we performed methods (2) and (3) below.

      2) A method proposed by de Lange and Cole (2020). We used this method in our article (see below for the equations). Briefly, we first fit a regression line predicting the Brain Age from a chronological age in each training set. We then used the slope and intercept of this regression line to adjust Brain Age in the corresponding test set, resulting in an adjusted index of Brain Age. Butler and colleagues (2021) called this index, “Revised Predicted Age.”, while de Lange and Cole’s (2020) originally called this Corrected Brain Age, “Corrected Predicted Age”. Butler and colleagues (2021) then subtracted the chronological age from this index and called it, “Revised Brain Age Gap (RBAG)”. We would like to follow the original terminology, but we do not want to use the word “Predicted Age” since chronological age can be predicted by other variables beyond the brain. We then settled with the word, "Corrected Brain Age" and “Corrected Brain Age Gap". We listed the terminologies used in the past in our article (see below).

      3) A method proposed by Le and colleagues (2018). Here, Butler and colleagues (2021) referred to one of the approaches done by Le and colleagues: “include age as a regressor when doing follow-up analyses.” Essentially this is what we did for the commonality analysis. Le and colleagues (2018)’ approach is the same as examining the unique effects of Brain Age in a multiple regression analysis with Chronological Age and Brain Age as regressors.

      While indexes from de Lange and Cole’s (2020) and Le and colleagues’ (2018) methods show poor performance in capturing fluid cognition in the current work, we need to stress that many research groups do not believe that these methods are meaningless. In fact, de Lange and Cole’s method (2020) is one of the most commonly implemented methods that can be seen elsewhere (e.g., Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). This index just does not seem to work well in the case of fluid cognition.

      Here is how we described how we calculated Brain Age indexes in the revised manuscript:

      Methods

      “ Brain Age calculations: Brain Age, Brain Age Gap, Corrected Brain Age and Corrected Brain Age Gap In addition to Brain Age, which is the predicted value from the models predicting chronological age in the test sets, we calculated three other indices to reflect the estimation of brain aging. First, Brain Age Gap reflects the difference between the age predicted by brain MRI and the actual, chronological age. Here we simply subtracted the chronological age from Brain Age:

      Brain Age Gapi = Brain Agei - chronological agei , (2)

      where i is the individual. Next, to reduce the dependency on chronological age (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018), we applied a method described in de Lange and Cole’s (2020), which was implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022):

      In each outer-fold training set: Brain Agei = 0 + 1 chronological agei + εi, (3)

      Then in the corresponding outer-fold test set: Corrected Brain Agei = (Brain Agei - 0)/1, (4)

      That is, we first fit a regression line predicting the Brain Age from a chronological age in each outer-fold training set. We then used the slope (1) and intercept (0) of this regression line to adjust Brain Age in the corresponding outer-fold test set, resulting in Corrected Brain Age. Note de Lange and Cole (2020) called this Corrected Brain Age, “Corrected Predicted Age”, while Butler (2021) called it “Revised Predicted Age.”

      Lastly, we computed Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Cole et al., 2020; de Lange & Cole, 2020; Denissen et al., 2022):

      Corrected Brain Age Gap = Corrected Brain Age - chronological age, (5)

      Note Cole and colleagues (2020) called Corrected Brain Age Gap, “brain-predicted age difference (brain-PAD),” while Butler and colleagues (2021) called this index, “Revised Brain Age Gap”.

      Reviewer 3 Public Review #3:

      • "However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age" (p3). I largely agree with this statement. I would be really careful to distinguish between brain-age and the brain-age gap here, as the former is a predicted value, and the latter is the residual times -1 (i.e., predicted age - age). Therefore, together they explain all of the variance in age. Changing the first sentence to refer to the brain-age gap would be more accurate in this context. The brain-age gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response: Thank you so much for pointing this out. We agree to change “Brain Age” to “Brain Age Gap” in the mentioned sentence.

      Reviewer 3 Public Review #4:

      • "Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?". This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. Upon reading the Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as the authors refer to it, brain-cognition) is the same as the measure of fluid cognition that you are trying to assess how well brain-cognition can predict. Assuming the brain parameters can predict fluid cognition at all, it is then inevitable that brain-cognition will predict fluid cognition. Therefore, it is inappropriate to use predicted values of a variable to predict the same variable.

      Response: Thank you Reviewer 3 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 2 (Public Review #4) bought up a similar issue. While Reviewer 3 felt that “it is inappropriate to use predicted values of a variable to predict the same variable,“ Reviewer 2 viewed Brain Cognition as a 'specific/direct' index and Brain Age as a 'generic/indirect' index. And both have merit in their own right.

      Similar to Reviewer 2, we believe that the specific index is as important and has commonly been used elsewhere in the context of biomarkers. For instance, to obtain neuroimaging biomarkers for Alzheimer’s, neuroimaging researchers often build a predictive model to predict Alzheimer's diagnosis (Khojaste-Sarakhsi et al., 2022). In fact, outside of neuroimaging, polygenic risk scores (PRSs) in genomics are often used following “to use predicted values of a variable to predict the same variable” (Choi et al., 2020). For instance, a PRS of ADHD that indicates the genetic liability to develop ADHD is based on genome-wide association studies of ADHD (Demontis et al., 2019).

      Still, we now agreed that it may not be fair to compare the performance of a specific index (Brain Cognition) and a generic index (Brain Age) directly (as pointed out by Reviewer 3 Public Review #6 below). Accordingly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, the strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition. And consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age. According to Reviewer 2, a generic index (Brain Age) “sacrificed some additional explained variance provided” compared to a specific index (Brain Cognition). Here, we used the commonality analyses to quantify how much scarifying was made by Brain Age. See below for the re-conceptualisation of Brain Age vs. Brain Cognition in the revision:

      Abstract

      “Lastly, we tested how much Brain Age missed the variation in the brain MRI that could explain fluid cognition. To capture this variation in the brain MRI that explained fluid cognition, we computed Brain Cognition, or a predicted value based on prediction models built to directly predict fluid cognition (as opposed to chronological age) from brain MRI data. We found that Brain Cognition captured up to an additional 11% of the total variation in fluid cognition that was missing from the model with only Brain Age and chronological age, leading to around a 1/3-time improvement of the total variation explained.”

      Introduction:

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      “Finally, we investigated the extent to which Brain Age indices missed the variation in the brain MRI that could explain fluid cognition. Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition.“

      Discussion

      “Third, how much does Brain Age miss the variation in the brain MRI that could explain fluid cognition? Brain Age and chronological age by themselves captured around 32% of the total variation in fluid cognition. But, around an additional 11% of the variation in fluid cognition could have been captured if we used the prediction models that directly predicted fluid cognition from brain MRI.

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer 3 Public Review #5:

      • "However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, "Stacked: All excluding Task Contrast", generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid" (p7). This is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): y=(y-y ̂ )+y ̂. Let's say that age explains 60% of the variance in fluid cognition, and predicted age (y ̂) explains 40% of the variance in fluid cognition. Then the brain age gap (-(y-y ̂)) should explain 20% of the variance in fluid cognition. If by "Corrected Brain Age" you mean the modified predicted age from Butler et al (2021), the "Corrected Brain Age" result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel (a) should be flat and high (about as high as the predictive value of age for fluid cognition). So it is unclear how "Corrected Brain Age" is calculated. It looks like you might be regressing age out of brain-age, though from your description in the Methods section, it is not totally clear. Again, I highly recommend using the terminology and metrics of Butler et al (2021) throughout to reduce confusion. Please also clarify how you used the slope and intercept. In general, given how brain-age metrics tend to be calculated, the following conclusion is inevitable: "As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models" (p10).

      Response: We agreed that the results are ‘inevitable’ due to the transformations from Brain Age to other Brain Age indices. However, the consequences of these transformations may not be very clear to readers who are not very familiar with Brain Age literature and to the community at large who think about the implications of Brain Age. This is appreciated by Reviewer 1, who mentioned “While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community.”

      Note we made clarifications on how we calculated each of the Brain Age indices above (see<br /> Reviewer 3 Public Review #2), including how we used the slope and intercept. We chose the terminology closer to the one originally used by de Lange and Cole (2020) and now listed many terminologies others have used to refer to this transformation.

      Reviewer 3 Public Review #6:

      "On the contrary, the unique effects of Brain Cognition appeared much larger" (p10). This is not a fair comparison if you do not look at the unique effects above and beyond the cognitive variable you predicted in your brain-cognition model. If your outcome measure had been another metric of cognition other than fluid cognition, you would see that brain-cognition does not explain any additional variance in this outcome when you include fluid cognition in the model, just as brain-age would not when including age in the model (minus small amounts due to penalization and out-of-sample estimates). This highlights the fact that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #7:

      "First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little" (p12). This is a really important point, but the paper requires an in-depth discussion of the inevitability of this result, as discussed above.

      Response We agree that the tight relationship between Brain Age and chronological age is inevitable. We mentioned this from the get-go in the introduction:

      Introduction “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.”

      To make this point obvious, we quantified the overlap between Brain Age and chronological age using the commonality analysis. We hope that our effort to show the inevitability of this overlap can make people more careful when designing studies involving Brain Age.

      Reviewer 3 Public Review #8:

      "Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age" (p12). I suggest controlling for the cognitive measure you predicted in your brain-cognition model. This will show that brain-cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response This point is similar to Reviewer 3 Public Review #6. Again please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison and said whether Brain Cognition is ‘better’ than Brain Age. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #9:

      "Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond" (p13). I whole-heartedly agree with the first two sentences, but strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain-age paradigm). As of now, your results do not suggest that researchers should keep going down the brain-age path. While it is difficult to prove that there is no transformation of brain-age or the brain-age gap that will be useful, I am nearly sure this is true from the research I have done. If you would like to suggest that the field should continue down this path, I suggest presenting a very good case to support this view.

      Response Thank you for your comments on this issue.

      Since the submission of our manuscript, other researchers also made a similar observation regarding the disagreement between the predictive performance of age-prediction models and the utility of Brain Age. For instance, in their systematic review, Jirasarie and colleagues (2023, p7) wrote this statement, “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest. As a point of illustration, seven of the twenty studies in this review only evaluated the utility of their most accurate model, which in all cases was trained using multimodal features. This approach has also led to researchers to exclusively use T1-weighted and diffusion-weighted MRI scans when developing brain age models36 since such modalities have been shown to have the largest contribution to a model’s predictive power.2,67 However, our review suggests that model accuracy does not necessarily provide meaningful insight about clinical utility (e.g., detection of age-related pathology). Taken with prior studies,16,17 it appears that the most accurate models tend to not be the most useful.”

      We now discussed the disagreement between the predictive performance of age-prediction models and the utility of Brain Age, not only in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) but also in the context of neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). Following Reviewer 3’s suggestion, we also added several possible strategies to mitigate this problem of Brain Age, used by us and other groups. Please see below.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 (Recommendations For The Authors):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline using the HCP aging dataset by performing a commonality analysis in a downstream regression. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain-cognition') as an alternative that explains more unique variance in the downstream regression.

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community. With that said, I have some comments that I believe the authors ought to address before publication.

      Reviewer 1 Recommendations For The Authors #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. This is undeniably important, but is only one application area for brain age models. They are also used for example to provide biomarkers for many brain disorders. What would the results presented here have to say about these application areas? Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, my own opinion about the limits of interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest the authors nuance their discussion to provide considerations on these issues.

      Response Thank you Reviewer 1 for pointing out two important issues.

      The first issue was about applications for brain disorders. We now made a detailed discussion about this, which also addressed Reviewer 3 Public Review #9. Briefly, we now bought up

      1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      2) under-fitted age-prediction models from Brain Age studies focusing on neurological/psychological disorders when applied to participants with neurological/psychological disorders because the age-prediction models were built from largely healthy participants,

      and 3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      The second issue was about “the brain-age gap as a dimensionless biomarker.” We are not so clear on what the reviewer meant by “the dimensionless biomarker.” One possible meaning of the “dimensionless biomarker” is the fact that Brain Age from the same algorithm and same modality can be computed, such that Brain Age can be tightly fit or loosely fit with chronological age. This is what Bashyam and colleagues (2020) did in the article Reviewer 1 referred to. We now wrote about this strategy in the above paragraph in the Discussion.

      Alternatively, “the dimensionless biomarker” might be something closer to what Reviewer 2 viewed Brain Age as a “generic/indirect” index (as opposed to a 'specific/direct' index in the case of Brain Cognition) (see Reviewer 2 Public Review #4). We discussed this in our response to Reviewer 3 Public Review #4.

      Reviewer 1 Recommendations For The Authors #2:

      Second, from a methods perspective, I am quite suspicious of the stacked regression models the authors are using to combine regression models and I suspect they may be overfit. In my experience, stacked models are very prone to overfitting when combined with cross-validation. This is because the predictions from the first level models (i,e. the features that are provided to the second-level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not sufficient explanation of the methodological procedures in the current manuscript to fully understand what was done. First, please provide more information to enable the reader to better understand the stacked regression models and if the authors are not using an approach that fully preserves training and test separability, please do so.

      Response: We would like to thank Reviewer 1 for the suggestion. We now made it clearer in texts and new figure (see below) that we used nested cross-validation to ensure no information leakage between training and test sets. Regarding the stacked models more specifically, the hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7 below). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Methods:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or fluid cognition as the target and standardised brain MRI as the features (Denissen et al., 2022). We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds. In each outer-fold CV, one of the outer folds was treated as a test set, and the rest was treated as a training set, which was further divided into five inner folds. In each inner-fold CV, one of the inner folds was treated as a validation set and the rest was treated as a training set. We used the inner-fold CV to tune for hyperparameters of the models and the outer-fold CV to evaluate the predictive performance of the models.

      In addition to using each of the 18 sets of features in separate prediction models, we drew information across these sets via stacking. Specifically, we computed predicted values from each of the 18 sets of features in the training sets. We then treated different combinations of these predicted values as features to predict the targets in separate “stacked” models. The hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets. We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, in total, there were 26 prediction models for Brain Age and Brain Cognition.

      Reviewer 1 Recommendations For The Authors #3:

      Third, the authors standardize the elastic net regression coefficients post-hoc. Why did the authors not perform the more standard approach of standardizing the covariates and responses, prior to model estimation, which would yield standardized regression coefficients (in the classical sense) by construction? Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response For model fitting, we did not “standardize the elastic net regression coefficients post-hoc.” Instead, we did all of the standardisation steps prior to model fitting (see Methods below). For regression strengths across different models and cross-validation splits, we now provided predictive performance at each of the five outer-fold test sets in Figure 1 (below). As you may have seen, the predictive performance was quite stable across the cross-validation splits.

      For visualising feature importance, We originally only standardised the elastic net regression coefficients post-hoc, so that feature importance plots were in the same scale across folds. However, as mentioned by Reviewer 3 (Recommendations for the Authors #7, below), this might make it difficult to interpret the directionality of the coefficients. In the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      Methods

      “We controlled for the potential influences of biological sex on the brain features by first residualising biological sex from brain features in each outer-fold training set. We then applied the regression of this residualisation to the corresponding test set. We also standardised the brain features in each outer-fold training set and then used the mean and standard deviation of this outer-fold training set to standardise the test set. All of the standardisation was done prior to fitting the prediction models.”

      “To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘’ and ‘l_1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘’ leads to similar predictive performance), resulting in different ‘’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.”

      Reviewer 1 Recommendations For The Authors #4:

      I do not really find it surprising that the level of unique explained variance provided by a brain-cognition model is higher than a brain-age model, given that the latter is considerably more accurate (also, in view of the comment above). As such I would recommend to tone down the claims about the utility of this method, also because it is only really applicable to one application area for brain age.

      Response Thank you for bringing this issue to our attention. We have now toned down the claims about the utility of Brain Cognition and importantly treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. Please see Reviewer 3 Public Review #4 above for a detailed discussion about this issue.

      Reviewer 1 Recommendations For The Authors #5:

      Please provide more details about the task designs and MRI processing procedures that were employed on this sample so that the reader is not forced to dig through the publications from the consortia contributing the data samples used. For example, comments such as "Here we focused on the pre-processed task fMRI files with a suffix "_PA_Atlas_MSMAll_hp0_clean.dtseries.nii." are not particularly helpful to readers not already familiar with this dataset.

      Response Thank you so much for pointing out this important point on the clarity of the description of our MRI methodology. We now added additional details about the data processing done by the HCP-A and by us. We, for instance, explained the meaning of the HCP-A suffix “"_PA_Atlas_MSMAll_hp0_clean.dtseries.nii”. Please see below.

      Methods

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.

      Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features.

      HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.

      Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. “

      Reviewer 1 Recommendations For The Authors #6:

      Similarly, please be more specific about the regression methods used. There are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted. The same goes for the methods used for correcting bias, e.g. what is "de Lange and Cole's (2020) 5th equation"?

      Response Thank you. We now made a detailed description of Elastic Net including its equation (see below). We also added more specific details about the methods used for correcting bias in Brain Age indices (see our response to Reviewer 3 Public Review #2 above).

      Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘’: the greater the , the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l_1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l_1 ratio=0) or absolute (known as ‘Lasso’; l_1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as: argmin_ ((|(|y-X|)|_2^2)/(2×n_samples )+α×l_1 _ratio×|(||)|_1+0.5×α×(1-l_1 _ratio)×|(|w|)|_2^2 ), (1) where X is the features, y is the target, and  is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters:  using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.”

      Additional minor points:

      Reviewer 1 Recommendations For The Authors #7:

      • Please provide more descriptive figure legends, especially for Figs 5 and 6. For example, what do the boldface numbers reflect? What do the asterisks reflect?

      Response Thank you for the suggestion. We made changes to the figure legends to make it clearer what the numbers and asterisks reflect.

      Reviewer 1 Recommendations For The Authors #8:

      • Perhaps this is personal thing, but I find the nomenclature cognition_{fluid} to be quite awkward. Why not just define FC as an acronym?

      Response Thank you for the suggestion. We now used the word ‘fluid cognition’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      Reviewer 2 Recommendations For The Authors #1:

      • Since the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. Therefore, it is recommended to conduct out-of-sample testing of the models.

      Response Thank you for the suggestion. We now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations, e.g., large samples of older adults in Uk Biobank (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023), and in a broader context, extending to neurological and psychological disorders (for review, see Jirsaraie, Gorelik, et al., 2023). Please see below.

      Please also noted that all of the analyses done were out-of-sample. We used nested cross-validation to evaluate the predictive performance of age- and cognition-prediction models on the outer-fold test sets, which are out-of-sample from the training sets (please see Reviewer 1 Recommendations For The Authors #2). Similarly, we also conducted all of the commonality analyses on the outer-fold test sets.

      Discussion

      “The small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). Cole (2020) studied the utility of Brain Age on cognitive functioning of large samples (n>17,000) of older adults, aged 45-80 years, from the UK Biobank (Sudlow et al., 2015). He constructed age-prediction models using LASSO, a similar penalised regression to ours and applied the same age-dependency adjustment to ours. Cole (2020) then conducted a multiple regression explaining cognitive functioning from Corrected Brain Age Gap while controlling for chronological age and other potential confounds. He found Corrected Brain Age Gap to be significantly related to performance in four out of six cognitive measures, and among those significant relationships, the effect sizes were small with a maximum of partial eta-squared at .0059. Similarly, Jirsaraie and colleagues (2023) studied the utility of Brain Age on cognitive functioning of youths aged 8-22 years old from the Human Connectome Project in Development (Somerville et al., 2018) and Preschool Depression Study (Luby, 2010). They built age-prediction models using gradient tree boosting (GTB) and deep-learning brain network (DBN) and adjusted the age dependency of Brain Age Gap using Smith and colleagues’ (2019) method. Using multiple regressions, Jirsaraie and colleagues (2023) found weak effects of the adjusted Brain Age Gap on cognitive functioning across five cognitive tasks, five age-prediction models and the two datasets (mean of standardised regression coefficient = -0.09, see their Table S7). Next, Butler and colleagues (2021) studied the utility of Brain Age on cognitive functioning of another group of youths aged 8-22 years old from the Philadelphia Neurodevelopmental Cohort (PNC) (Satterthwaite et al., 2016). Here they used Elastic Net to build age-prediction models and applied another age-dependency adjustment method, proposed by Beheshti and colleagues (2019). Similar to the aforementioned results, Butler and colleagues (2021) found a weak, statistically non-significant correlation between the adjusted Brain Age Gap and cognitive functioning at r=-.01, p=.71. Accordingly, the utility of Brain Age in explaining cognitive functioning beyond chronological age appears to be weak across age groups, different predictive modelling algorithms and age-dependency adjustments.“

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023). “

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained. “

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Recommendations For The Authors #2:

      • Employ Variance Inflation Factor (VIF) to empirically test for multicollinearity.

      Response Given high common effects between many of the regressors in the models (e.g., between Brain Age and chronological age), VIF will be high, but this is not a concern for the commonality analysis. We showed now that applying the commonality analysis to multiple regressions allowed us to have robust results against multicollinearity, as demonstrated elsewhere (Ray-Mukherjee et al., 2014, Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity). Specifically, using the multiple regressions by themselves without the commonality analysis, researchers have to rely on beta estimates, which are strongly affected by multicollinearity (e.g., a phenomenon known as the Suppression Effect). However, by applying the commonality analysis on top of multiple regressions, researchers can then rely on R2 estimates, which are less affected by multicollinearity. This can be seen in our case (Figure 5 and 6) where Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models).

      To directly demonstrate the robustness of the current commonality analysis regarding multicollinearity, we applied the commonality analysis to Ridge regressions (see Supplementary Figures 3 and 5 below). Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). As seen below, the results from commonality analyses applied to Ridge regressions are closely matched with our original results.

      Methods

      “Note to ensure that the commonality analysis results were robust against multicollinearity (Ray-Mukherjee et al., 2014), we also repeated the same commonality analyses done here on Ridge regression, as opposed to multiple regression. Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). See Supplementary Figure 3 for the Ridge regression with chronological age and each Brain Age index as regressors and Supplementary Figure 5 for the Ridge regression with chronological age, each Brain Age and Brain Cognition index as regressors. Briefly, the results from commonality analyses applied to Ridge regressions are closely matched with our results done using multiple regression.”

      Reviewer 2 Recommendations For The Authors #3:

      • Incorporate non-linearities in the correction of brain-age indices, such as separate terms in the regression or statistical analyses.

      Response Thank you for the suggestion. We now added a non-linear term of chronological age in our multiple-regression models explaining fluid cognition (see Supplementary Figure 4 and 6 below). Originally we did not have the quadratic term for chronological age in our model since the relationship between chronological age and fluid cognition was relatively linear (see Figure 1 above). Accordingly, as expected, adding the quadratic term for chronological age as suggested did not change the pattern of the results of the commonality analyses.

      Methods

      “Similarly, to ensure that we were able to capture the non-linear pattern of chronological age in explaining fluid cognition, we added a quadratic term of chronological age to our multiple-regression models in the commonality analyses. See Supplementary Figure 4 for the multiple regression with chronological age, square chronological age and each Brain Age index as regressors and Supplementary Figure 6 for the multiple regression with chronological age, square chronological age, each Brain Age index and Brain Cognition as regressors. Briefly, adding the quadratic term for chronological age did not change the pattern of the results of the commonality analyses.”

      Reviewer 2 Recommendations For The Authors #4:

      • It would be helpful to include the complete set of results in the appendix - for instance, the statistical significance for each component for the final commonality analysis.

      Response Figures 5 and 6 (see above) already have asterisks to reflect the statistical significance of the unique effects. Because of this, we do not believe we need more figures/tables in the appendix to show statistical significance.

      Recommendations for improving the writing and presentation.

      Reviewer 2 Recommendations For The Authors #5:

      • The authors are encouraged to refrain from using terms such as 'fortunately', 'unfortunately', and 'unsettling', as they may appear inappropriate when referring to empirical findings.

      Response We agree with this suggestion and no long used those words.

      Reviewer 2 Recommendations For The Authors #6:

      • It would be helpful to clarify in the methods that you end up with 5 test folds.

      Response We now made a clarification why we chose 5 test folds.

      Methods

      “We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds.”

      Minor corrections to the text and figures.

      Reviewer 2 Recommendations For The Authors #7:

      • Why use months, not years for chronological age? This seems inappropriate given the age range.

      Response We originally used months since they were units used in our prediction modelling. However, to make the figures easier to understand, we now used years.

      Reviewer 2 Recommendations For The Authors #8:

      • The formatting, especially regarding the text embedded within the figures, could benefit from significant improvements.

      Response Thank you for the suggestion. We made changes to the text embedded within the figures. They should be more readable now

      Reviewer 2 Recommendations For The Authors #9:

      • The legend for the neuroimaging feature labels is missing, and the captions are incomplete.

      Response Please see Figure 2 above. We now revised by adding the letter L and R for the laterality of the brain images. We made some changes to the captions to make sure they are complete.

      Reviewer 2 Recommendations For The Authors #10:

      • Figure 5's caption: SD has a missing decimal point).

      Response The numbers are not SD. The numbers to the left of the figure represent the unique effects of chronological age in %, the numbers in the middle of the figure represent the common effects between chronological age and Brain Age index in %, and the numbers to the right of the figure represent the unique effects of Brain Age Index in %. We now used the same one decimal point for these number

      Reviewer #3 (Recommendations For The Authors):

      The main question of this article is as follows: “To what extent does having information on Brain Age improve our ability to capture declines in fluid cognition beyond knowing a person’s chronological age?” While this question is worthwhile, considering most of the field is confused about the nature of brain age, the authors are currently missing an opportunity to convey the inevitability of their results given how Brain Age and the Brain Age Gap are calculated. They also misleadingly convey that Brain Cognition is somehow superior to Brain Age. If the authors work on conveying the inevitability of their results and redo (or remove) their section on Brain Cognition, I can see how their results would be enlightening to the general neuroimaging community that is interested in the concept of brain age. See below for specific critiques.

      Response Please see our response to Reviewer 3 Public Review Overall. Note we no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Recommendations For The Authors #1:

      “There are many adjustments proposed to correct for this estimation bias” (p3) → Regression to the mean is not a sign of bias. Any decent loss function will result in over- predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including “correcting” the brain age gap by regressing out age.

      Response Please see our response to Reviewer 3 Public Review#1

      Reviewer 3 Recommendations For The Authors #2:

      “Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021).” (p3) → This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading your Methods, I noticed that you are using a metric for Le et al. (2018) for your “Corrected Brain Age Gap”. If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of your paper, and cross-comparisons between the two.

      Response Please see our response to Reviewer 3 Public Review #2.

      Reviewer 3 Recommendations For The Authors #3:

      “However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age.” (p3) → I largely agree with this statement. I would be really careful to distinguish between Brain Age and the Brain Age Gap here, as the former is a predicted value, and the latter is the residual times -1 (predicted age - age). Therefore, together they explain all of the variance in age. If you change the first sentence to refer to the Brain Age Gap, this statement makes more sense. The Brain Age Gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response Please see our response to Reviewer 3 Public Review #3.

      Reviewer 3 Recommendations For The Authors #4:

      “Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?” → This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. This seems like an uninteresting question to me. Upon reading your Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as you refer to it, Brain Cognition) is the same as the measure of fluid cognition that you are trying to assess how well Brain Cognition can predict. Assuming the brain parameters can predict fluid cognition at all, of course Brain Cognition will predict fluid cognition. This is inevitable. You should never use predicted values of a variable to predict the same variable.

      Response Please see our response to Reviewer 3 Public Review #4.

      Reviewer 3 Recommendations For The Authors #5:

      “We also examined if these better-performing age-prediction models improved the ability of Brain Age in explaining Cognitionfluid.” → Improved above and beyond what?

      Response We referred to if better-performing age-prediction models improved the ability of Brain Age in explaining fluid cognition over and above lower-performing age-prediction models. We made changes to the Introduction to clarify this change.

      Reviewer 3 Recommendations For The Authors #6:

      Figure 1 b & c → It is a little difficult to read the text by the horizontal bars in your plots. Please make the text smaller so that there is more space between the words vertically, or even better, make the plots slightly bigger. Please also put the predicted values on the y-axis. This is standard practice for displaying regression results. To make more room, you can get rid of your rPearson or your R2 plot, considering the latter is simply the square of the former. If you want to make it clear that the association is positive between all of your variables, I would keep rPearson.

      Response Thank you so much for the suggestions.

      1) We now made sure that the text by the horizontal bars in Figure 1b and c is readable.

      2) Note in prediction model/machine-learning literature, it is more common to plot observed/real values on the y-axis. Here is the logic of our practice: values in the x-axis are the predicted values based on the model, and we would like to see if the changes in the predicted values correspond to the changes in the observed/real value in the y-axis.

      3) Regarding Pearson correlation vs R2, please note that we wrote ”for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020).” As such, R2 is NOT the square of the Pearson correlation. In fact, in Poldrack and colleages’s “Establishment of Best Practices for Evidence for Prediction” paper (2020), they discourage 1) the use of Pearson correlation by itself and 2) the use of the correlation coefficient square as R2 (as opposed to sum of squares definition):

      “It is common in the literature to use the correlation between predicted and actual values as a measure of predictive performance; of the 64 studies in our literature review that performed prediction analyses on continuous outcomes, 30 reported such correlations as a measure of predictive performance. This reporting is problematic for several reasons. First, correlation is not sensitive to scaling of the data; thus, a high correlation can exist even when predicted values are discrepant from actual values. Second, correlation can sometimes be biased, particularly in the case of leave-one-out cross-validation. As demonstrated in Figure 4, the correlation between predicted and actual values can be strongly negative when no predictive information is present in the model. A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      “A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      Accordingly, we decided to keep both R2 and Pearson correlation (along with MAE) in our Figure 1.

      Reviewer 3 Recommendations For The Authors #7:

      Figure 2 “We calculated feature importance by, first, standardizing Elastic Net weights across brain features of each set of features from each test fold.” → What do you mean by “standardize” here? Rescale to be mean 0, variance 1? If so, this seems like a misleading transformation, because it gives the impression that the relationships are negative, when they are not necessarily. Also, why did you choose to use elastic net weights in any form as measures of effect size (or importance)? The raw values are inherently penalized, which means they are under-estimates of the true effect size. It would be more meaningful (and less biased) to plot the raw correlations.

      Response For the first question regarding standardisation, we addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3. Briefly, we agreed with Reviewer 3 that standardisation (with mean = 0, SD = 1) might make it difficult to interpret the directionality of the coefficients. For visualising feature importance in the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      For the second question regarding why using Elastic Net coefficients as feature importance (as opposed to correlations), we need to mention the goal of feature importance: to understand how the model makes a prediction based on different brain features (Molnar, 2019). Correlations between a target and each brain feature do not achieve this. Instead, they will show univariate/marginal relationships between a target and a brain feature. What we want to visualise is how the model made a prediction, which in the case of Elastic Net, the prediction is based on the sum of the features’ coefficients. In other words, the multivariate models (including Elastic Net) focus on marginal relationships that take into account all brain features within each set of features.

      Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Reviewer 3 Recommendations For The Authors #8:

      Figure 3 → Again, what exactly do you mean by “standardised” here?

      Response It means mean subtraction followed by the division by an SD. Though we no longer applies standardisation for feature importance. See our response to Reviewer 1 Recommendations For The Authors #3 and Reviewer 3 Recommendations For The Authors #7.

      Reviewer 3 Recommendations For The Authors #9:

      “However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, “Stacked: All excluding Task Contrast”, generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid.” (p7) → Yes, but you did not need to run any models to show this, considering it is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): 𝑦 = (𝑦 − 𝑦% ) + 𝑦% . Let’s say that age explains 60% of the variance in fluid cognition, and predicted age ( 𝑦% ) explains 40% of the variance in fluid cognition. Then the brain age gap (−(𝑦 − 𝑦% )) should explain 20% of the variance in fluid cognition. If by “Corrected Brain Age” you mean the modified predicted age from the Butler paper, the “Corrected Brain Age” result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel a should be flat and high (about as high as the predictive value of age for fluid cognition). So how are you calculating “Corrected Brain Age”? It looks like you might be regressing age out of Brain Age, though from your description the Methods (How exactly do you use the slope and intercept? You need equation of you are going to stick with this terminology), it is not totally clear. I highly recommend using terminology and metrics from the Butler et al. (2021) paper throughout to reduce confusion.

      Response Please see our response to Reviewer 3 Public Review #5

      Reviewer 3 Recommendations For The Authors #10:

      “On the contrary, an amount of variation in Cognitionfluid explained by Corrected Brain Age Gap was relatively small (maximum R2 = .041) across age-prediction models and did not relate to the predictive performance of the age-prediction models.” (p7) → If by “Corrected Brain Age Gap” you mean MBAG from The Butler paper, yes, this is also inevitable, considering MBAG would be a vector of zeros if it were not for regression on residuals (and out of sample estimates), as I mentioned earlier. Also, it is not clear why you used “on the contrary” as a transition here.

      Response Please see our response to Reviewer 3 Public Review #2 for the ‘MBAG’ term. Briefly, we didn’t use Butler and colleagues' (2021) MBAG, but rather we used the method described in de Lange and Cole’s (2020), which was called RBAG by Butler and colleagues.

      de Lange and Cole’s (2020) method, was commonly implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). Accordingly, researchers who use Brain Age do not usually view this method as capturing a meaningless biomarker. Yet, the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) (see our response to Reviewer 2 Recommendations For The Authors #1).

      “On the contrary” refers to the fact that the other three Brain Age indices (i.e., those that did not account for the relationship between Brain Age and chronological age) showed a much higher amount of variation in fluid cognition explained. As mentioned above (our response to Reviewer 2 Public Review #7), our argument resonates Butler and colleagues’ (2021) suggestion (p. 4097): “As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016)”.

      Reviewer 3 Recommendations For The Authors #11:

      “As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models.” (p10) → Yes, again, this is inevitable considering how they are calculated. You can show these analyses to demonstrate your results in data, if you want, but ignoring the inevitability given how these variables are calculated is misleading.

      Response Accounting for the relationship between Brain Age and chronological age when examining the utility of Brain Age is not misleading. Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we believe that not doing so is misleading. That is, without accounting for the relationship between Brain Age and chronological age, Brain Age will likely explain the same variation of the phenotype of interest as chronological age. Please see our response to Reviewer 3 Recommendations For The Authors #18 below.

      Reviewer 3 Recommendations For The Authors #12:

      “On the contrary, the unique effects of Brain Cognition appeared much larger.” (p10) → This is not a fair comparison if you don’t look at the unique effects above and beyond the cognitive variable you predicted (fluid cognition) in your Brain Cognition model. When you do this, you will see that Brain Cognition is useless when you include fluid cognition in the model, just as Brain Age would be in predicting age when you include age in the model. This highlights the fact that using predicted values of a metric to predict that metric is a pointless path to take, and that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #6.

      Reviewer 3 Recommendations For The Authors #13:

      “First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little.” (p12) → This is a really important point, but your paper requires an in-depth discussion of the inevitability of this result, which I have discussed previously in this review.

      Response Please see our response to Reviewer 3 Public Review #7.

      Reviewer 3 Recommendations For The Authors #14:

      “Second, do better-performing age-prediction models improve the ability of Brain Age to capture Cognitionfluid? Unfortunately, the answer is no.” (p12) → You need to be clear that you are talking about above and beyond age here.

      Response Thank you so much for your suggestion. We now made the change to this sentence accordingly.

      Discussion

      “Second, do better-performing age-prediction models improve the utility of Brain Age to capture fluid cognition above and beyond chronological age? The answer is also no.”

      Reviewer 3 Recommendations For The Authors #15:

      “Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age.” (p12) → Again, try controlling for the cognitive measure you predicted in your Brain Cognition model. This will show that Brain Cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response Please see our response to Reviewer 3 Public Review #8.

      Reviewer 3 Recommendations For The Authors #16:

      “Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond.” (p13) → I whole-heartedly agree with the first two sentences, and strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain age paradigm). They do not, however, suggest that we should keep going down the Brain Age path. In fact, I think it should be abandoned all together. While it is difficult to prove that there is no transformation of Brain Age or the Brain Age Gap that will be useful, I am nearly sure this is true from the research I have done. Therefore, if you would like to suggest that the field should continue down this path, you need to present a very good case to support this view.

      Response Please see our response to Reviewer 3 Public Review #9.

      Reviewer 3 Recommendations For The Authors #17:

      “Perhaps this is because the estimation of the influences of chronological age was done in the training set.” (p13) → I believe this is the case, and it is testable. Try re-running your analyses where parameters are estimated and performance is evaluated on the same data.

      Response Yes, we agreed with this. Based on the equations we used, this is inevitable.

      Reviewer 3 Recommendations For The Authors #18:

      “Similar to a previous recommendation (Butler et al., 2021), we suggest focusing on Corrected Brain Age Gap.” (p13) → To be clear, the authors did not use the term “Corrected” because it is very misleading. The authors also did not suggest that we proceed with any brain age metric; rather they mentioned that the modified brain age gap is independent of age. Note the following passage: “Further, the interpretability of the modified brain age gap (MBAG) itself is limited by the fact that it is a prediction error from a regression to remove the effects of age from a residual obtained through a regression to predict age. By virtue of these limitations, we suggest that the modified version may not provide useful information about precocity or delay in brain development. In light of this, as well as the complexities associated with interpretations of the BAG and its dependence on age, we suggest that further methodological and theoretical work is warranted.” I recognize that that this statement is hedged, as is often required in the publication process, but I am all but certain that MBAG/BAG/modified predicted age are useless constructs. Therefore, if you are going to suggest that people continue to use them, opposed to suggesting that further methodological or theoretical work is warranted, you need to make a strong case, which you did not try to make here. If anything, your results support abandoning the age- prediction endeavor altogether.

      Response Please see our response to Reviewer 3 Public Review #2 for the term. Briefly, we didn’t use Butler and colleagues’ (2021) MBAG, but rather RBAG. This index was originally described in de Lange and Cole’s (2020), and has now been implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022).

      We do not intend to encourage people to abandon the Brain Age endeavour altogether. However, we made main three suggestions for future research on Brain Age to ensure its utility. First, they should account for the relationship between Brain Age and chronological age either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining the unique effects of Brain Age indices after controlling for chronological age through commonality analyses (see below). This is similar to the suggestion made by Le and colleagues (2018) and later rephased by Butler and colleagues (2021). More specifically, Le and colleagues (2018) mentioned (p. 10): “Based on our observations in both real and simulated data, we recommend that the relationship between chronological age and BrainAGE should be accounted for. The two methods proposed in this study are either: (1) regress age on BrainAGE, producing BrainAGER, which is centered on 0 regardless of a participant's actual age or (2) include age as a regressor when doing follow-up analyses.”

      Second, we suggested that researchers should not select age-prediction models based solely on age-prediction performance (see our response to Reviewer 1 Recommendations For The Authors #1).

      Third, we suggested that researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest (see our response to Reviewer 2 Public Review #4).

      Discussion

      “What does it mean then for researchers/clinicians who would like to use Brain Age as a biomarker? First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we suggest future work should account for the relationship between Brain Age and chronological age, either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining unique effects of Brain Age indices after controlling for chronological age through commonality analyses. Note we prefer using unique effects over beta estimates from multiple regressions, given that unique effects do not change as a function of collinearity among regressors (Ray-Mukherjee et al., 2014). In our case, Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models). In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Cole, 2020; Jirsaraie, Kaufmann, et al., 2023).”

      Reviewer 3 Recommendations For The Authors #19:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or Cognitionfluid as the target.” (p16) → You should make it clear in the main text of your paper that the cognition variable in your Brain Cognition models is the same as what you refer to as Cognitionfluid. Some of your analyses would have been much more reasonable if you had two different measures of cognition.

      Response Thank you so much for the suggestion. We believe, given the re-conceptualisation of Brain Cognition as the main text

      Introduction

      “certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data.”

      Reviewer 3 Recommendations For The Authors #20:

      “We controlled for the potential influences of biological sex on the brain features by first residualizing biological sex from brain features in the training set.” (p16) → Why? Your question is about prediction, not causal inference.

      Response While the question is about prediction, we still would like to, as much as possible, be confident about what kind of information we drew from. Here we focused on brain data and controlled for other variables that might not be neuronal. For instance, we controlled for movement and physiological noise using ICA-FIX (Glasser et al., 2016). Following conventional practices in brain-based predictive modelling, we also treated biological sex as another sort of noise (Vieira et al., 2022). The difference between movement/physiological noise and biological sex is that the former varies across TRs, and the latter varies across individuals. Thus we controlled for movement and physiological noise within each participant and controlled for biological sex within a group of participants who belonged to the same training set.

      Reviewer 3 Recommendations For The Authors #20:

      “Lastly, we computer Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Le et al., 2018).” (p17) → The modified brain age gap in that paper is the residuals from regressing BAG on age (see equation 6). I highly recommend using that terminology and notation throughout to provide consistency and interpretability across papers.

      Response Please see our response to Reviewer 3 Public Review #2 for the term.

      Reviewer 3 Recommendations For The Authors #21: Equations (pgs 17-19) → Please use statistical notation instead of pseudo-R code.

      Response We rewrote all of the equations using statistical notations.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Beheshti, I., Nugent, S., Potvin, O., & Duchesne, S. (2019). Bias-adjustment in neuroimaging-based brain age frameworks: A robust scheme. NeuroImage: Clinical, 24, 102063. https://doi.org/10.1016/j.nicl.2019.102063

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533 Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Cole, J. H., Raffel, J., Friede, T., Eshaghi, A., Brownlee, W. J., Chard, D., De Stefano, N., Enzinger, C., Pirpamer, L., Filippi, M., Gasperini, C., Rocca, M. A., Rovira, A., Ruggieri, S., Sastre-Garriga, J., Stromillo, M. L., Uitdehaag, B. M. J., Vrenken, H., Barkhof, F., … Group, M. study. (2020). Longitudinal Assessment of Multiple Sclerosis with the Brain-Age Paradigm. Annals of Neurology, 88(1), 93–105. https://doi.org/10.1002/ana.25746

      Cumplido-Mayoral, I., García-Prat, M., Operto, G., Falcon, C., Shekari, M., Cacciaglia, R., Milà-Alomà, M., Lorenzini, L., Ingala, S., Meije Wink, A., Mutsaerts, H. J., Minguillón, C., Fauria, K., Molinuevo, J. L., Haller, S., Chetelat, G., Waldman, A., Schwarz, A. J., Barkhof, F., … OASIS study. (2023). Biological brain age prediction using machine learning on structural neuroimaging data: Multi-cohort validation against biomarkers of Alzheimer’s disease and neurodegeneration stratified by sex. ELife, 12, e81067. https://doi.org/10.7554/eLife.81067

      de Lange, A.-M. G., & Cole, J. H. (2020). Commentary: Correction procedures in brain-age prediction. NeuroImage: Clinical, 26, 102229. https://doi.org/10.1016/j.nicl.2020.102229

      Demontis, D., Walters, R. K., Martin, J., Mattheisen, M., Als, T. D., Agerbo, E., Baldursson, G., Belliveau, R., Bybjerg-Grauholm, J., Bækvad-Hansen, M., Cerrato, F., Chambert, K., Churchhouse, C., Dumont, A., Eriksson, N., Gandal, M., Goldstein, J. I., Grasby, K. L., Grove, J., … Neale, B. M. (2019). Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nature Genetics, 51(1), Article 1. https://doi.org/10.1038/s41588-018-0269-7

      Denissen, S., Engemann, D. A., De Cock, A., Costers, L., Baijot, J., Laton, J., Penner, I., Grothe, M., Kirsch, M., D’hooghe, M. B., D’Haeseleer, M., Dive, D., De Mey, J., Van Schependom, J., Sima, D. M., & Nagels, G. (2022). Brain age as a surrogate marker for cognitive performance in multiple sclerosis. European Journal of Neurology, 29(10), 3039–3049. https://doi.org/10.1111/ene.15473

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Franke, K., & Gaser, C. (2019). Ten Years of BrainAGE as a Neuroimaging Biomarker of Brain Aging: What Insights Have We Gained? Frontiers in Neurology, 10, 789. https://doi.org/10.3389/fneur.2019.00789

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Horien, C., Noble, S., Greene, A. S., Lee, K., Barron, D. S., Gao, S., O’Connor, D., Salehi, M., Dadashkarimi, J., Shen, X., Lake, E. M. R., Constable, R. T., & Scheinost, D. (2020). A hitchhiker’s guide to working with large, open-source neuroimaging datasets. Nature Human Behaviour, 5(2), 185–193. https://doi.org/10.1038/s41562-020-01005-4

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Khojaste-Sarakhsi, M., Haghighi, S. S., Ghomi, S. M. T. F., & Marchiori, E. (2022). Deep learning for Alzheimer’s disease diagnosis: A survey. Artificial Intelligence in Medicine, 130, 102332. https://doi.org/10.1016/j.artmed.2022.102332

      Le, T. T., Kuplicki, R. T., McKinney, B. A., Yeh, H.-W., Thompson, W. K., Paulus, M. P., Tulsa 1000 Investigators, Aupperle, R. L., Bodurka, J., Cha, Y.-H., Feinstein, J. S., Khalsa, S. S., Savitz, J., Simmons, W. K., & Victor, T. A. (2018). A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE. Frontiers in Aging Neuroscience, 10. https://www.frontiersin.org/articles/10.3389/fnagi.2018.00317

      Liang, H., Zhang, F., & Niu, X. (2019). Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders. Human Brain Mapping, 40(11), 3143–3152. https://doi.org/10.1002/hbm.24588

      Luby, J. L. (2010). Preschool Depression: The Importance of Identification of Depression Early in Development. Current Directions in Psychological Science, 19(2), 91–95. https://doi.org/10.1177/0963721410364493

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Ray-Mukherjee, J., Nimon, K., Mukherjee, S., Morris, D. W., Slotow, R., & Hamer, M. (2014). Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity. Methods in Ecology and Evolution, 5(4), 320–328. https://doi.org/10.1111/2041-210X.12166

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Satterthwaite, T. D., Connolly, J. J., Ruparel, K., Calkins, M. E., Jackson, C., Elliott, M. A., Roalf, D. R., Hopson, R., Prabhakaran, K., Behr, M., Qiu, H., Mentch, F. D., Chiavacci, R., Sleiman, P. M. A., Gur, R. C., Hakonarson, H., & Gur, R. E. (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. NeuroImage, 124, 1115–1119. https://doi.org/10.1016/j.neuroimage.2015.03.056

      Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E., & Miller, K. L. (2019). Estimation of brain age delta from brain imaging. NeuroImage, 200, 528–539. https://doi.org/10.1016/j.neuroimage.2019.06.017

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Stigler, S. M. (1997). Regression towards the mean, historically considered. Statistical Methods in Medical Research, 6(2), 103–114. https://doi.org/10.1177/096228029700600202

      Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., Liu, B., Matthews, P., Ong, G., Pell, J., Silman, A., Young, A., Sprosen, T., Peakman, T., & Collins, R. (2015). UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine, 12(3), e1001779. https://doi.org/10.1371/journal.pmed.1001779

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript titled "Disease modeling and pharmacological rescue of autosomal dominant Retinitis Pigmentosa associated with RHO copy number variation" the authors describe the use of patient iPSC-derived retinal organoids to evaluate the pathobiology of a RHO-CNV in a family with dominant retinitis pigmentosa (RP). They find significantly increased expression of rhodopsin, especially within the photoreceptor cell body, and defects in photoreceptor cell outer segment formation/maturation. In addition, they demonstrate how an inhibitor of NR2E3 (a rod transcription factor required for inducing rhodopsin expression), can be used to rescue the disease phenotype.

      Strengths:

      The manuscript is very well written, the illustrations and data presented are compelling, and the authors' interpretation/discussion of their findings is logical.

      Weaknesses:

      A weakness, which the authors have addressed in the discussion section, is the lack of an isogenic control, which would allow for direct analysis of the RHO-CNV in the absence of the other genetic sequence contained within the duplicated region. As the authors suggest, CRISPR correction of a large CNV in the absence of inducing unwanted on-target editing events in patient iPSCs is often very challenging. Given that they have used a no-disease iPSC line obtained from a family member, controlled for organoid differentiation kinetics/maturation state, and that no other complete disease-causing gene is contained within the duplicated region, it is unlikely that the addition of an isogenic control would yield significantly different results.

      Aims and conclusions:

      This reviewer is of the opinion that the authors have achieved their aims and that their results support their conclusions.

      Discussion:

      The authors have provided adequate discussion on the utility of the methods and data as well as the impact of their work on the field.

      We thank the reviewer for their insightful, and encouraging review of our work that has taken several years to get to current stage.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kandoi et al. describes a new 3D retinal organoid model of a mono-allelic copy number variant of the rhodopsin gene that was previously shown to induce autosomal dominant retinitis pigmentosa via a dominant negative mechanism in patients. With advancements in the low-cost genomics application to detect copy number variations, this is a timely article that highlights a potential disease mechanism that goes beyond the retina field. The evidence is relatively strong that the rod photoreceptor phenotype observed in an adult patient with RP in vivo is similar to that phenotype observed in human stem cell-derived retinal organoids. Increases in RHO expression detected by qPCR, RNA-seq, and IHC support this phenotype. Importantly, the amelioration of photoreceptor rhodopsin mislocalization and related defects using the small molecule drug photoregulin demonstrates an important potential clinical application.

      Overall, the authors succeeded in providing solid evidence that copy number variation via a genomic RHO duplication leads to abnormalities in rod photoreceptors that can be partially blocked by photoregulin. However, there are several points that should be addressed that will enhance this paper.

      Strengths:

      • The use of patient-derived organoids from patients that have visual defects is a major strength of this work and adds relevance to the disease phenotype.

      • The rod phenotype assessed by qPCR, RNA-seq, and IHC supports a phenotype that shares similarities with the patient.

      • The use of a small molecule drug that selectively targets rod photoreceptors, as opposed to cones, is a noteworthy strength.

      We thank the reviewers for highlighting the key strengths of the paper.

      Weaknesses:

      1) The chromosomal segment that was duplicated had 3 copies of RHO in addition to three copies of each of the flanking genes (IFT122, HIF100, PLXND1). Discussion of the involvement of these genes would be helpful. Would duplication of any of these genes alone cause or contribute to adRP? As an example, a missense mutation in IFT122 was previously implicated in photoreceptor loss (PMID: 33606121 PMCID: PMC8519925).

      Thank you for your comment. It is an interesting question on the contribution of the other duplicated genes. Of these, IFT122 is particularly interesting as pointed out. We did a thorough survey through literature and our genetic testing partner’s database, BluePrint Genetics. We did not find any human retinal degeneration cases with variants in IFT122. IFT122 has been shown to cause recessive phenotype in dogs and in complete knockout zebrafish model but dominant or overexpression has not been shown to have a phenotype. Interestingly, recessive biallelic IFT122 mutation can cause Cranioectodermal Dysplasia (Sensenbrenner syndrome, PMID: 24689072) and none of these patient exhibited retinal dystrophy. HIF100 is an epigenetic modifier gene while PLXND1 is expressed in endothelial cells. We will include a discussion on this in the revised manuscript.

      2) Related to #1, have the authors considered inserting extra copies of RHO (and/or the flanking genes) of these at a genomic safe harbor site? Although not required, this would allow one to study cells with isogenic-matched genetic backgrounds and would partially address the technical challenge of repairing a 188kb duplication, which as the authors note would be difficult to do. Demonstrating that excess copy numbers in different genetic backgrounds would be a huge contribution to the field. At a minimum, a discussion of the role of the nearby genes should be included.

      Thank you for your suggestion. We plan to test the relative role of 1-3 extra copies of RHO driven off a NRL promoter in order to drive it only in rods in our future mechanistic analysis studies. We will include a discussion on the potential role of the other genes in the revised manuscript.

      3) In the patient, the central foveal region was spared suggesting that cones were normal. Was there a similar assessment that cones are unaffected in retinal organoids?

      We will include this data in our revised manuscript but overall did not see a cone defect in RHO CNV organoids. Additionally, although it is true that the central foveal region was relatively spared in this patient, the cones are definitely not normal. The macular cones that remain have been damaged by chronic edema, and photoreceptor and RPE atrophy has progressed into the macula, sparing only the foveal cones.

      4) Pathway analysis indicated that glycosylation was perturbed and this was proposed as an explanation as to why rhodopsin was mislocalized. Have the authors verified that there is an actual decrease in glycosylation?

      These studies are ongoing. We are currently looking into the details of cellular pathophysiology focusing on RHO trafficking in RHO-CNV including role of glycosylation and other post-translational modifications defects.

      5) Line 182: by what criteria are the authors able to state that " there were no clear visible anatomical changes in apical-basal retinal cell type distribution during the early differentiation timeframe (data not shown)." Was this based on histological staining with antibodies, nuclear counter-staining, or some other evaluation?

      This was based on both IHC for various cell type markers and nuclear (DAPI) staining.

      6) Figure 2C - the appearance of the inner segments in RC and RM looks very different from one another. Have the authors ruled out the possibility that the RC organoid cell isn't a cone? In addition, the RM structure has what appears to be a well-defined OLM which would suggest well-formed Muller glia. Do these structures also exist in RC organoids? Typically the OLM does form in older organoids. In addition, was this representative in numerous EM preparations?

      For clarification on EM data, we will include additional images in the revision as supplementary data. We have not carefully compared OLM between the patient and control organoids but do observe them in both conditions in the older organoids. The EM preparations were made from multiple organoids from two different batches with consistent results.

      7) What criteria were used to assess cell loss? Has any TUNEL labeling been performed to confirm cell loss? From the existing data, it seems that rod outer segments appear to be affected in organoids. However, it's not clear if the photoreceptors themselves actually die in this model.

      TUNEL was used to assess cell loss and it was not significantly different between the control and patient organoids at the timepoints examined. We did not expect a change as the disease in the patient developed over decades.

      8) Figure 5B. The RHO staining in the vehicle-treated sample is perturbed relative to the PR3 treatments as indicated in the text. In the vehicle-treated sample, the number of DAPI-positive cells that are completely negative proximal to the inner segments suggests that there might be non-rod cells there. Have the authors confirmed whether these are cones? Labels would be helpful in the left vehicle panel as the morphology looks very different than the treated samples.

      Thank you very much for the various suggestions and these will be included in the revised manuscript version. A number of the cells in the negative regions are OTX2+/NRL- and likely to be cones (Figure 4 A and B). Unfortunately, we do not have a very good cone nuclear marker as RXRγ does not consistently stain mature cones.

      9) It is interesting that in addition to increases in RHO, and photo-transduction, there are also increases in PTPRT which is related to synaptic adhesion. Is there evidence of ectopic neurites that result from PTPRT over-expression?

      You are absolutely correct that PTPRT data is very interesting. PTPRT requires similar PTMs like RHO in photoreceptors for its synaptic localization. We did not specifically look at ectopic neurites and test that in the revision. It will interesting to follow-up on its expression pattern to see if it gets processed or localized normally if we can find a working antibody. It is also possible that the gene-expression increase due to feedback upregulation secondary to improper protein processing.

      Reviewer #3 (Public Review):

      This manuscript reports a novel pedigree with four intact copies of RHO on a single chromosome which appears to lead to overexpression of rhodopsin and a corresponding autosomal dominant form of RP. The authors generate retinal organoids from patient- and control-derived cells, characterize the phenotypes of the organoids, and then attempt to 'treat' aberrant rhodopsin expression/mislocalization in the patient organoids using a small molecule called photoregulin 3 (PR3). While this novel genetic mechanism for adRP is interesting, the organoid work is not compelling. There are multiple problems related to the technical approaches, the presentation of the results, and the interpretations of the data. I will present my concerns roughly in the order in which they appear in the manuscript.

      Major concerns:

      (1) Individual human retinal organoids in culture can show a wide range of differentiation phenotypes with respect to the expression of specific markers, percentages of given cell types, etc. For this reason, it can be very difficult to make rigorous, quantitative comparisons between 'wild-type' and 'mutant' organoids. Despite this difficulty, the author of the present manuscript frequently presents results in an impressionistic manner without quantitation. Furthermore, there is no indication that the investigator who performed the phenotypic analyses was blind with respect to the genotype. In my opinion, such blinding is essential for the analysis of phenotypes in retinal organoids. To give an example, in lines 193-194 the authors write "we observed that while the patient organoids developing connecting cilium and the inner segments similar to control organoids, they failed to extend outer segments". Outer segments almost never form normally in human retinal organoids, even when derived from 'wild-type' cells. Thus, I consider it wholly inadequate to simply state that outer segment formation 'failed' without a rigorous, quantitative, and blinded comparison of patient and control organoids.

      We agree it is challenging to generate outer segments in retinal organoids but we are not the first to show this. This has been demonstrated by multiple independent labs (Mayerl et al (PMID: 36206764), Wahlin et al (PMID: 28396597), West at al (PMID: 35334217) including ours (Chirco et al (PMID: 34653402). To clarify, we did not observe any OS like tissue in the patient organoids across multiple EM preps of a number of organoids from two independent 300+ day experiments which matched the phase microscopy data presented in Fig2B.

      (2) The presentation of qPCR results in Figure 3A is very confusing. First, the authors normalize expression to that of CRX, but they don't really explain why. In lines 210-211, they write "CRX, a ubiquitously expressing photoreceptor gene maintained from development to adulthood." Several parts of this sentence are misleading or incomplete. First, CRX is not 'ubiquitously expressed' (which usually means 'in all cell types') nor is it photoreceptor-specific: CRX is expressed in rods, cones, and bipolar cells. Furthermore, CRX expression levels are not constant in photoreceptors throughout development/adulthood. So, for these reasons alone, CRX is a poor choice for the normalization of photoreceptor gene expression.

      As you are aware, all housekeeping genes have shortcomings when used for normalizing PCR data. We went with CRX as within the timepoints chosen, it is not expected to change much and thus represent a good equalizer for relative photoreceptor numbers between the organoids and conditions. While we agree that CRX is weakly expressed in bipolar cells (Yamamoto et al 2020), it is not expected to bias the data too much as we have not seen nor have other reported a huge relative difference in bipolar cell number in organoids. We also confirm this by showing equivalent expression of OTX2, RCVRN and NRL between all conditions.

      Second, the authors' interpretation of the qPCR results (lines 216-218) is very confusing. The authors appear to be saying that there is a statistically significant increase in RHO levels between D120 and D300. However, the same change is observed in both control and patient organoids and is not unexpected, since the organoids are more mature at D300. The key comparison is between control and patient organoids at D300. At this time point, there appears to be no difference between control and patient. The authors don't even point this out in the main text.

      Thank you for the comment and we apologize if this confused you. However, as can been seen in the graph in Figure 3A, we do compare expression of genes including RHO between control and patient organoids at two different time points. There are four conditions: D120-RC, D120-RM, D300-RC and D300-RM with individual data points and error bars for each condition. There is a statistically significant increase at both time points upon comparing the control and patient organoids for RHO. We compared RHO expression between patient organoids at the two time points and it was not statistically different.

      Third, the variability in the number of photoreceptor cells in individual organoids makes a whole-organoid comparison by qPCR fraught with difficulty. It seems to me that what is needed here is a comparison of RHO transcript levels in isolated rod photoreceptors.

      We agree that this makes it challenging. This was the exact reasoning for using CRX for normalization since it is predominantly present in photoreceptors. This was validated by the data showing no difference in expression of photoreceptor markers OTX2, RCVRN or NRL between the organoids.

      (3) I cannot understand what the authors are comparing in the bulk RNA-seq analysis presented in the paragraph starting with line 222 and in the paragraph starting with line 306. They write "we performed bulk-RNA sequencing on 300-days-old retinal organoids (n=3 independent biological replicates). Patient retinal organoids demonstrated upregulated transcriptomic levels of RHO... comparable to the qRT-PCR data." From the wording, it suggests that they are comparing bulk RNA-seq of patients and control organoids at D300. However, this is not stated anywhere in the main text, the figure legend, or the Methods. Yet, the subsequent line "comparable to the qRT-PCR data" makes no sense, because the qPCR comparison was between patient samples at two different time points, D120 and D300, not between patient and control. Thus, the reader is left with no clear idea of what is even being compared by RNA-seq analysis.

      We apologize if the conditions were not obvious and will clarify this in the revised version. The conditions compared are control and patient organoids at D300. Regarding comparison to RT-PCR, as stated above, the comparison shown is between patient and control organoids at two different timepoints.

      Remarkably, the exact same lack of clarity as to what is being compared is found in the second RNA-seq analysis presented in the paragraph starting with line 306. Here the authors write "We further carried out bulk RNA-sequencing analysis to comprehensively characterize three different groups of organoids, 0.25 μM PR3-treated and vehicle-treated patient organoids and control (RC) organoids from three independent differentiation experiments. Consistent with the qRT-PCR gene expression analysis, the results showed a significant downregulation in RHO and other rod phototransduction genes." Here, the authors make it clear that they have performed RNA-seq on three types of samples: PR3-treated patient organoids, vehicle-treated patient organoids, and control organoids (presumably not treated). Yet, in the next sentence, they state "the results showed a significant downregulation in RHO", but they don't state what two of the three conditions are being compared! Although I can assume that the comparison presented in Fig. 6A is between patient vehicle-treated and PR3-treated organoids, this is nowhere explicitly stated in the manuscript.

      Thank you for the comment and we will explicitly state various comparisons in the revised version.

      (4) There are multiple flaws in the analysis and interpretation of the PR3 treatment results. The authors wrote (lines 289-2945) "We treated long-term cultured 300-days-old, RHO-CNV patient retinal organoids with varying concentrations of PR3 (0.1, 0.25 and 0.5 μM) for one week and assessed the effects on RHO mRNA expression and protein localization. Immunofluorescence staining of PR3-treated organoids displayed a partial rescue of RHO localization with optimal trafficking observed in the 0.25 μM PR3-treated organoids (Figure 5B). None of the organoids showed any evidence of toxicity post-treatment."

      There are multiple problems here. First, the results are impressionistic and not quantitative. Second, it's not clear that the investigator was blinded with respect to the treatment condition. Third, in the sections presented, the organoids look much more disorganized in the PR3-treated conditions than in the control. In particular, the ONL looks much more poorly formed. Overall, I'd say the organoids looked considerably worse in the 0.25 and 0.5 microM conditions than in the control, but I don't know whether or not the images are representative. Without rigorously quantitative and blinded analysis, it is impossible to draw solid conclusions here. Lastly, the authors state that "none of the organoids showed any evidence of toxicity post-treatment," but do not explain what criteria were used to determine that there was no toxicity.

      Thank you for your critical insight. The RHO localization data is qualitative as it is very difficult to accurately quantify rhodopsin trafficking within the cell in the organoid. Thus, for quantitative comparison, we have provided expression level changes. Regarding toxicity, we analyzed the organoids by morphology and TUNEL and did not observe significant difference between the conditions. This closely mimics mouse data on PR3 which suppressed rod function in mice following IP injection without any obvious toxicity.

      (5) qPCR-based quantitation of rod gene expression changes in response to PR3 treatment is not well-designed. In lines 294-297 the authors wrote "PR3 drove a significant downregulation of RHO in a dose-dependent manner. Following qRT-PCR analysis, we observed a 2-to-5 log2FC decrease in RHO expression, along with smaller decreases in other rod-specific genes including NR2E3, GNAT1 and PDE6B." I assume these analyses were performed on cDNA derived from whole organoids. There are two problems with this analysis/interpretation. First, a decrease in rod gene expression can be caused by a decrease in the number of rods in the treated organoids (e.g., by cell death) or by a decrease in the expression of rod genes within individual rods. The authors do not distinguish between these two possibilities. Second, as stated above, the percentage of cells that are rods in a given organoid can vary from organoid to organoid. So, to determine whether there is downregulation of rod gene expression, one should ideally perform the qPCR analysis on purified rods.

      The reviewer is correct in pointing the potential reasons for reduction in RHO levels following PR3 treatment. Thus, we have provided NRL expression levels in the graph to show that this key rod-specific gene does not change suggesting equivalent number of rod photoreceptor cells. The suggestion of using purified rods is not practical here, as we do not have any way to sort human rods due to the lack of a rod-specific cell surface marker.

      (6) In Figure 4B 'RM' panels, the authors show RHO staining around the somata of 'rods' but the inset images suggest that several of these cells lack both NRL and OTX2 staining in their nuclei. All rods should be positive for NRL. Conversely, the same image shows a layer of cells scleral to the cells with putative RHO somal staining which do not show somal staining, and yet they do appear to be positive for NRL and OTX2. What is going on here? The authors need to provide interpretations for these findings.

      Since RHO is a cytoplasmic marker and photoreceptor are tightly packed, it is difficult to make a 1:1 comparison to NRL/OTX2 nuclear marker to RHO. Additionally, as the RHO+ cytoplasm moves towards scleral surface, it is expected to pass adjacent to other nuclei. Few of the rods do still have normal Rhodopsin trafficking and it is likely these will not have somal RHO similar to control conditions. We do rarely observe these cells as highlighted by the occasional RHO in IS/OS of RM organoids in the figure. We do agree that the NRL staining in the figure 4B (>D250) is not extremely crisp and we will include an updated figure in the revised version.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths: The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing, although limited. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      Weaknesses: The lack of additional mutational data and/or analyses on the impact of pH on ligand binding reduces the insights from these experiments. This reduces the strength of the conclusions that can be drawn about the mechanism of binding and transport or the novelty of the gating mechanism discussed above.

      We greatly appreciate this summary and thank reviewer #1 for their comments and suggested experiments which we believe will further strengthen this work. We agree with these comments and plan to include more mutagenesis data in a revised manuscript in order to address this point and expand further on the mechanistic details of transport.

      Reviewer #2 (Public Review):

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane, and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine, and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose the involvement of these networks and hydrophobic residues in the coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Thank you for these comments and summary describing this work. We agree that the involvement of polar networks has not been experimentally tested; these are proposed as a possible mechanism, but we have not made mechanistic conclusions on how protons are translocated and coupled to transport. We believe we have made it clear in the manuscript when describing the polar networks that the corresponding discussion is largely descriptive and speculative and will further stress that in a future revision. We would like to point out however, that many of the polar and charged residues which make up these networks have been studied and that there is a wealth of biochemical and functional experiments in the literature which implicate these residues in this process. Yet, we agree that establishing the precise mechanistic details will require additional structures and likely also extensive computational experiments. We have cited these papers that have characterized these polar residues extensively throughout the text (30-32,37,49,55).

      We would like to submit that we have not proposed that the hydrophobic gates are involved in proton translocation. Gating residues, by definition, block access to the binding site (29,30,48); and since our structure is occluded, we directly observe the residues which participate in both gates. We have also performed extensive mutagenesis studies of many of these hydrophobic gating residues and our binding data are consistent with this conclusion. Transport experiments with mutations at these gates might be helpful toward gaining a deeper understanding of transport mechanism but given the current structural data it is conceivable that these residues play a role in gating neurotransmitter.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      The central argument made by this reviewer that is repeated throughout this critique is that more structures of various states are needed to make mechanistic conclusions with respect to how TBZ binds and alternating access. While additional structures would certainly add mechanistic detail, they are not required to make these conclusions. In fact, as we point out throughout the text, these conclusions have already been made in various publications which we have cited and discussed. Decades of mutagenesis, binding, transport, inhibition, and accessibility measurements all support the conclusion that TBZ binds from the luminal side and that VMAT2 uses an alternating mechanism to transport neurotransmitter (30-32,35-37,55). Structures are neither required nor sufficient to make such claims and more structures of various apo states in different conformations would not provide any additional support to this question. If the predominant apo state was luminal open, cytoplasm open or occluded, this would not prove how TBZ enters VMAT2. Structural data alone does not provide these details; biochemical data does and structures are useful for understanding the details of how these mechanisms work. Thus, our structure provides the molecular framework for understanding the binding site, conformation, gating, and polar networks and we have interpreted our own biochemical data as well as the available biochemical data in the literature in the context of our structure.

      The structure indicates why TBZ is a non-competitive inhibitor (35,36) because it is not possible for neurotransmitters to compete for binding to this state. Neurotransmitter initially binds to the cytosolic facing state where the intracellular gates are open, inhibition by binding to this state would result in a competitive mechanism. Since TBZ is non-competitive, it must bind through the luminal-open state where the luminal gate is open. Further conformational change produces the occluded conformation with both the luminal and intracellular gates closed which is what we observe in the structure. This finding is supported by numerous biochemical and functional experiments and by extensive analysis of mutants in the gates using binding assays, transport experiments and cysteine accessibility experiments. We have cited and discussed these key papers (30-32,35-37,55) throughout the text and our results support the conclusions drawn from these works.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it is bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it is bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      TBZ is accepted to be a non-competitive inhibitor, based on decades of research, and not based solely on our structure (30-32,35,36). Our structure provides insight into the molecular mechanism by which non-competitive inhibition occurs. Previous studies have shown that TBZ enters through the luminal side of the transporter, resulting in non-competitive inhibition by binding to a conformation of the transporter which does not bind cytosolic neurotransmitter. We agree our structure does not prove how TBZ ‘got there’, but other studies have addressed this question (30-32, 35, 36) and have been discussed in detail.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein, and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      Thank you for the suggestion. We will prepare a new figure that focuses on the gates to make this clearer. The comparison with Alphafold is valuable since the luminal loops and gates are not well modeled. Many groups are using these structures to do biochemical and computational experiments and perhaps even to design small-molecules. Since Alphafold differs substantially in this area, it might be of interest to those in the community doing this type of work.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      We agree these statements are speculative, which we acknowledged in the text. We will further emphasize this point in a future revision. Please note, however, that many of these residues have been highlighted in other studies (30-32,37,49,55), and we have cited them in the text. Please see previous response.

      Most of these residues are indeed highly conserved. It is a good idea to highlight this in our sequence alignment of related transporters. We will do so in our revised manuscript.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. There is a problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      Please see the response to this argument presented earlier. The occluded structure clearly shows the residues serving as gates. To understand how the gates open is a separate question. This does require additional structures and computations which are beyond the scope of this work. Our structure is interpreted in the context of all available biochemical data.

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      Indeed, the structural details of alternating access in MFS transporters are based on structures of other related proteins and we have cited review articles that describe this (29,30,48). We would like to highlight that these assumptions are not without merit, as previous studies investigating predicted gating residues (the same residues resolved in our structure) were based on studies of other MFS transporters and the demonstrated biochemical results are consistent with an alternating access transporter. These biochemical experiments also clearly demonstrate that a broadly similar mechanism of alternating access is used by VMAT2, see (30-32,48) which we have cited extensively when discussing these mechanisms.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      We plan to provide additional experimental details and data to support the computational experiments in a revision. See response to reviewer #3.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of domains fused to its N- and C-terminal ends. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations. The simulations resulted in repositioning of the ligand, which does not seem to be well founded, and raises questions about the methodological choices made for the simulations.

      We appreciate the comments of reviewer #3 and thank them for these suggestions regarding the MD simulations. We will be supplying additional information to address the questions of reviewer #2 and #3 regarding the MD simulations including 1) movies which show there is not a substantial repositioning of ligand in any of the three runs 2) a table showing protonation states of residues and TBZ 3) data which shows that the number of waters which enter the binding site is relatively few compared with simulations of dopamine bound VMAT2 4) in run 2, more waters have entered the binding site vs. run 1 and 3 which likely explains why there is a small repositioning of TBZ.

      We will also be providing a substantially improved map in a revised manuscript where the peripheral TMHs and loops are better resolved.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their helpful comments which we have addressed, point-by-point, below:

      Reviewer #1:

      1) It might be useful to add more details to the methods (especially lines 191-196) to make them a bit more user-friendly for an audience who still may be unfamiliar with the relatively new and complex Mendelian randomisation technique.

      The following information has been included in this section of the methods, to describe the different MR models in more detail:

      “The IVW MR model will produce biased effect estimates in the presence of horizontal pleiotropy, i.e. where one or more genetic variant(s) included in the instrument affect the outcome by a pathway other than through the exposure. In the weighted median model, each genetic variant is weighted according to its distance from the median effect of all genetic variants. Thus, the weighted median model will provide an unbiased estimate when at least 50% of the information in an instrument comes from genetic variants that are not horizontally pleiotropic. The weighted mode model uses a similar approach but weights genetic instruments according to the mean effect. In this model, over 50% of the weight of the genetic instrument can be contributed to by genetic variants which are horizontally pleiotropic, but the most common amount of pleiotropy must be zero (known as the Zero Modal Pleiotropy Assumption (ZEMPA))[Hartwig et al., 2017].”

      2) I was just wondering why MR egger was not carried out as part of this analysis?

      We did consider also employing the MR Egger model as a further sensitivity analysis. However, given we were already employing the weighted median and weighted mode models, and given that MR-Egger suffers from reduced statistical power in comparison to the other models, we reasoned that adding in a further MR model would not add further clarity to our analyses, particularly given the relatively small sample size.

      3) Although it is included in Figure 1 flowchart, I think it is also important to explain clearly in the written text way only n=6,118 of n=13,988 children in ALSPAC study were included in this study and the reason for this.

      The following information has been included in the paragraph describing the ALSPAC study in the methods:

      “Sufficient information was available on 6,221 of these individuals to be included in our analysis, as metabolomics was not performed for all individuals in the ALSPAC study.”

      4) It is mentioned within the discussion 'the NMR metabolomics platform utilised in the analyses outlined here has limited coverage of fatty acids'. I think it might be useful to also add this detail into the methods section to aid readers when they are making their own interpretation whilst reading the results section.

      The following sentence has been included in the methods section:

      “This metabolomics platform has limited coverage of fatty acids.”

      5) However, I feel that the conclusion should be tempered slightly as although this study alongside other similar MR studies provides evidence of an association between genetic liability to CRC and levels of metabolites at certain ages, I do not think there is enough evidence at this stage to say that genetic liability for CRC actually alters the levels of metabolites.

      The first sentence of the conclusion has been changed to:

      “Our analysis provides evidence that genetic liability to CRC is associated with altered levels of metabolites at certain ages, some of which may have a causal role in CRC development.”

      Reviewer #2:

      1) The background is lacking introduction to the different components of the metabolic features tested. For instance, there is a broader discussion about polyunsaturated fatty acids (PUFA) in the discussion, however, this should have been introduced and defined already before that. What metabolites are included in that term (PUFA)? Are there other studies on PUFA and CRC?

      The following information has been included in the background section:

      “In particular, previous work has highlighted polyunsaturated fatty acids (PUFA) as potentially having a role in colorectal cancer development. The term PUFA includes omega-3 and -6 fatty acids. Recent MR work has highlighted a possible link between PUFAs, in particular omega 6 PUFAs, and colorectal cancer risk.”

      2) There seem to be indications for horizontal pleiotropy given the changed estimates when genetic variants in the FADS loci are removed. Could multivariable MR methods have been used to account for pleiotropy and differentiate individual fatty acid effects?

      Multivariable MR can be employed to investigate the effects of horizontal pleiotropy. However, the multiple exposures must have sufficiently distinct underlying genetic architecture in order to instrument each one whilst adjusting for the other, as determined by conditional F-statistics. Given the correlations across metabolite levels, this is unlikely to be the case.

      3) The ALSPAC sample sizes are decreasing across the different age groups, which is not strange given the longitudinal collection. However, does the altered sample composition affect the results? Have sensitivity analyses been done on the complete set of individuals from age 8-25?

      The altered sample composition could be affecting results. The limitations section of the discussion has been amended to reflect this:

      “Secondly, mostly due to the longitudinal nature of the ASLAPC study, our sample at each time point is composed of slightly different individuals. This could be influencing our results, and should be taken into account when comparing across time points.”

      We have not completed any sensitivity analyses to investigate this.

      4) Although beyond the scope of this paper, sex-stratified GWAS analyses on metabolites can easily be done in UK Biobank.

      We thank the reviewer for this suggestion, and agree that this would be an interesting future analysis. We have amended the discussion to mention this:

      “Fourthly, our analysis would benefit from being repeated with sex-stratified data. Although such GWAS results for metabolites are not currently available, the data to perform such GWAS are available in UK Biobank for future analyses.”

      5) Very minor, there is a difference in reporting a number of decimals in ALSPAC results. There is also a difference in reporting the units for the results comparing text and figures (per SD higher CRC liability or per doubling). Please include sample sizes and data sources in the figure legends as they should be stand-alone items.

      We have amended the ALSPAC results to all have two decimal places, reporting units have been altered and figure legends to include sample sizes and data sources.

    1. Author Response

      We thank the reviewers for their suggestions. We are confident in the model that predicts odor vs odor (OCT-MCH) preference using calcium activity, but we acknowledge the relative weakness of the model that predicts odor (OCT) vs air preference. We are preparing an updated manuscript that will prioritize our interpretation of the OCT-MCH results and more fully document uncertainties around our estimates of prediction capacity.

      Reviewer #1 (Public Review):

      Summary: The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths: Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses: The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      We are working on a revision that overhauls the interpretations of the results. We recognize that the current version inadequately distinguishes the results that we have high confidence in (specifically, PC2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as the PC1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with r2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that the more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried to in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. We are working on a revision that is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferonni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In the revision we are working on, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision will include confidence limits.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the forthcoming revision will address the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn: i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We are working to guarantee that all such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and are revising the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      -No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure. We are reviewing these results to determine if they warrant including as a negative finding in the revision.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when were only able to image a small portion of the glomeruli. In analyses we did not report, we explored this possibility using the AL computational model. We are likely to include this interpretation in the revised discussion.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements…

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements. I.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states. We are considering those suggestions and anticipate responding to them in the revision.

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we are working to make sure this is appropriately reflected in all word choice across the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for your thoughtful review and constructive feedback on our manuscript. We have implemented numerous revisions throughout the manuscript to address your comments and suggestions. Below, our point-by-point responses to the reviewers' remarks. We hope that our revisions adequately address all raised concerns.

      Reviewer #1

      One major drawback of the manuscript is the fact that the data were collected from male subjects only. One might expect similar behavioral outcomes from male and female rats receiving 2shock and 10-shock training. However, increasing attention to sex as a biological variable has revealed an interesting truth, namely that males and females can engage distinct neural pathways to arrive at the same behavioral destination. It should not be taken for granted that retrieval of aversive contextual associations would reproduce the same networks in females, and, as such, the manuscript does not give a complete accounting of the phenomenon under study.

      We thank the reviewer for highlighting the importance of sex differences in fear memory and for encouraging us to discuss this issue. We agree that males and females can engage different behavioral and circuit mechanisms and that our findings may not be generalizable to female rats. We expanded the discussion section to state this limitation and to suggest future directions for research on sex differences in fear memory:

      “In addition, a growing body of evidence underscores the differences between males and females concerning fear memories (Fleischer and Frick, 2023). Given that our study was conducted only with male rats, future studies exploring sex differences will be instrumental in providing a more complete account of the network-level mechanisms underlying fear memory strength.”

      The aversive associative memories described by the authors are characterized as mild or strong. More analysis of the meaning of memory strength, and its relationship to conditioning parameters, is needed.

      In particular, the authors should discuss issues such as amount of training, US magnitude, and rate of shock delivery. If amount of training is important, would 2 vs 10 presentations of a milder shock produce the same networks at retrieval? Would a larger shock require fewer presentations to isolate amygdalar regions from the rest of the network? If the shocks were presented at the same rate during training, would you get the same result in both groups? More data examining these questions would be ideal, but, in the absence of that, a discussion of these issues is needed and missing from the manuscript in its current form.

      We appreciate the reviewer's feedback on the characterization of the fear memories in our study and agree that the labels "mild" and "strong" could oversimplify the complex nature of fear memories. Our study's main objective was not to delineate how varying conditioning protocols result in 'mild' or 'strong' fear memories, but to employ protocols of different intensities known to produce distinct behaviors, and then discern their brain differences. Our categorization was rooted in the resulting behavioral expressions, classifying 'mild' memories as those triggering sub-maximal fear responses with low generalization and a potential for extinction learning and reconsolidation. Conversely, 'strong' memories were defined by peak or near-peak fear responses, high generalization, and impeded extinction and reconsolidation processes. To isolate the number of foot shocks as the sole variable, we kept both shock intensity and session duration constant. While this decision allowed for a clear comparative analysis, we acknowledge its limitations in exploring other influential factors.

      A more ideal approach would be to reverse this process—first experimenting with several different conditioning parameters and then observing the resulting behaviors and brain mechanisms—but given the additional workload that would entail, particularly when combined with the c-fos and network analyses, we opted for our current approach. Nevertheless, we hope our study will stimulate research that goes deeper into the nuances of fear conditioning protocols, fostering a better understanding of adaptive and maladaptive fear memories. This is now discussed in the discussion session:

      “To generate mild and strong fear memories, we based our conditioning parameters on methods that have shown distinct behavioral outcomes in prior studies (Haubrich et al., 2020, 2015; Holehonnur et al., 2016; Poulos et al., 2016; Wang et al., 2009). To ensure a focused comparative analysis, our conditioning protocols differed only in the number of foot shocks, and maintained consistent shock intensities and session durations. Yet, the number of shocks is not the only factors that can affect the strength of fear memories (Gazarini et al., 2023). Other conditioning parameters, such as shock intensity, its predictability, and inter-shock intervals, can also play crucial roles. Moreover, different fear measures like freezing behavior, fear-potentiated startle, and inhibitory avoidance might manifest differently following varying conditioning protocols, adding another layer of complexity. A comprehensive understanding of fear memory strength will benefit from further studies scrutinizing these parameters and memory attributes.”

      Reviewer #2

      One alternative account to the weak vs. strong memory distinction made in the paper is the opportunity for extinction in the 2S compared to the 10S group. In the 2S group, the last shock occurs in the 3rd minute, leaving 9 minutes of context exposure without reinforcement to follow. This is not the case for the 10S group. If context fear extinction is engaged during this time, then this would mean that two memories (acquisition and extinction) are taking place in the 2S group, weakening the fear memory in this group, setting up the ground for stronger effects of extinction, less generalization and of course potential greater connectivity required for representing and linking these memories. Indeed, the IL, a brain area linked to extinction, is more predominant in the connectivity map of the 2S compared to the 10S group. While testing this alternative is beyond the scope of this paper, it will need to be discussed.

      We thank the reviewer for raising this interesting point. We agree that the structure of the 2S protocol might inadvertently provide an opportunity for within-session extinction. However, we would like to clarify that we made a mistake in the description of the 2S training protocol. The timing of the shock deliveries was not at the second and third minutes as stated (a usual protocol in the literature), but at the sixth and seventh minutes. We apologize for this mistake and are thankful for your help in identifying this discrepancy which had unfortunately persisted despite multiple proofreading rounds. We have amended this detail in the methods section of our manuscript.

      Nevertheless, we recognize that the subsequent minutes post-shock in the 2S group may still provide a window for potential extinction. To address this possibility, we scored the freezing expression during the training session minute by minute. In the 2S group, two videos were corrupted, and it was only possible to score freezing in six out of eight animals (this is acknowledged in the methods section). As presented in Figure 1.A (middle plot), freezing behavior increased post-shocks and showed no decline towards the session's end. These findings suggest that within-session extinction did not occur during our conditioning session. This analysis is now integrated into the relevant results subsection.

      Methodological detail is lacking re the timeline of study, and connectivity analyses.

      Thank you for your feedback. The formula for the discrimination index is now explained in the methods section. The new plot showing freezing behavior during training shows the exact time bin when shocks were delivered. We have expanded the description of the connectivity analysis.

      Reviewer #3

      Major concerns)

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this insightful observation. We believe that the absence of the expected increase in hippocampal c-fos activation is due to the unique experimental design employed for our control group. In our study, control rats were subjected to an equivalent duration of context exposure without receiving shocks. As a result, these animals formed and retrieved a neutral, rather than fearful, contextual memory. This likely elevated cfos levels in the hippocampus in comparison to the more traditional home-cage condition frequently used in earlier studies. We used the NS (no shock) protocol for our control group to specifically elucidate the impact of the number of shock presentations on memory formation, therefore the context exposure was kept the same across groups. Importantly, this aspect did not affect our connectivity analysis, since it is influenced by the relative variance across structures than on the absolute magnitude of c-fos expression. We now emphasize the nature of our control group in the discussion:

      “Importantly, our control animals were exposed to the conditioning chamber for an equivalent duration without being subjected to shocks, thus encoding and recalling a non-fearful contextual memory.”

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DREADD and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      We appreciate the reviewer's perspective and acknowledge the limitations of our current findings. While our data based on c-fos expression suggests functional connections reflective of neural activity during fear memory recall, we agree that it is not possible to deduce causality from this alone. Instead, our study aimed to uncover the network-level distinctions between mild and strong memories, laying the groundwork for subsequent, in-depth investigations of the causal relationships within these identified pathways. We agree that corroborating our findings with interventional experiments, such as using DREADDs, is an important next step. We also agree that such experiments would enhance our study and hope future research will address these points. These points were included in the discussion session:

      “To further elucidate the underlying mechanisms of fear memory strength in vivo, understanding the specific roles of individual network elements in fear regulation becomes essential. Future research will be important to probe the causal interplay among distinct nodes and edges, both individually and in combination, in shaping diverse aspects of fear expression.”

      Reviewer #2 (Recommendations For The Authors):

      Methodological detail is lacking:

      How is the discrimination index calculated?

      We have included this information in the methods section: “The generalization index was calculated as Freezing in Test B / (Freezing in Test A + Freezing in Test B).”

      A distinction between complete spontaneous recovery (10S group) vs. partial spontaneous recovery (2S group) vs. extinction retention needs to be considered in discussing the extinction data.

      Thank you for this suggestion. To address this point, we now include Tukey’s post hoc comparisons between the first and last bins of extinction and the test session. The results show that in the 2S group, freezing during test remained consistent with the levels observed in the final extinction bin and was lower than the levels in the initial extinction bin. Conversely, in the 10S group, freezing levels increased from the final extinction bin to the test, reaching levels comparable to those observed in the initial extinction bin.

      Detail regarding the connectivity analyses is missing from the methods. For example the calculation of the r value distractions should be detailed in the methods not just the results, more detail regarding calculations is needed for the degree of centrality, betweenness centrality, nodal efficiency, small world analyses etc.

      We appreciate the reviewer’s feedback. We have expanded the description of the connectivity analysis.

      Justification for 'excluding edges with r values lower than the average plus one standard deviation of all 292 networks (Figure 4.B; r < 0.61)' is needed.

      Thank you for your encouraging us to elaborate on the rationale behind our thresholding method. We acknowledge that there is no consensus in the literature on the optimal thresholding method for functional networks. Our primary objective with thresholding was to retain the most robust connections while minimizing potential noise from weakly correlated regions. Instead of opting for an arbitrary threshold, we determined our cut-off based on the average plus one standard deviation across all networks. Theoretically, this retains approximately the top 16% of connections. Given our 12 regions of interest, this translates to roughly 10 connections per network. This count is sufficient for a nuanced analysis of the network structures and between group comparisons.Importantly, our method inherently accounts for variations in interregional correlations across groups. Groups with a distribution skewed towards higher r values will naturally have more edges, highlighting the enhanced synchronized activity between certain regions. On the other hand, networks with tendencies towards lower r-values will exhibit fewer connections. Thus, our thresholding method is rooted in the data’s distribution and result in networks that reflect the differences across groups.

      We added the following sentence to the methods session summarizing this rationale:

      “This thresholding approach was used to provide a cut-off based on the data’s inherent distribution, therefore retaining the top edges according to the data variance. “

      Line 81 - 'brain areas' is missing after '12'.

      Thank you, this is now fixed.

      Tile for 2. is somewhat odd. Thought the following may be better, but obviously leaving this up to the author's discretion: 'Commonalities and differences in brain activation induced by recall of mild and strong fear memories'

      Thank you for this suggestion. We agree with the title suggested by the reviewer, and it was replaced in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this suggestion. As explained above, we believe that this is due to the nature of our control group, which is now highlighted in the discussion section.

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DRRED and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      Thank you for your valuable feedback. As explained above, these points are now included in the discussion section.

      Minor comments)

      1) cfos should be c-fos or c-Fos.

      Thank you for your correction. All instances of ‘cfos’ were replaced by ‘c-fos’.

      2) Line 275; "Compared to the to re-exposure to" should be "Compared to the to re-exposure to".

      Thank you for your correction. This is now fixed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study has uncovered some important initial findings about how certain extracellular vehicles (EVs) from the mother might impact the energy usage of an embryo. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The study's title might be a bit too assertive as the evidence linking maternal mtDNA transmission to changes in embryo energy use is still correlative.

      We would like to express our sincere gratitude to the editors and reviewers for their invaluable comments on this work. Their feedback has been instrumental in enhancing the quality of our manuscript; we have incorporated their suggestions to the best of our abilities.

      Reviewer #1 (Public Review):

      Q1. Bolumar et al. isolated and characterized EV subpopulations, apoptotic bodies (AB), Microvesicles (MV), and Exosomes (EXO), from endometrial fluid through the female menstrual cycle. By performing DNA sequencing, they found the MVs contain more specific DNA sequences than other EVs, and specifically, more mtDNA were encapsulated in MVs. They also found a reduction of mtDNA content in the human endometrium at the receptive and post-receptive period that is associated with an increase in mitophagy activity in the cells, and a higher mtDNA content in the secreted MVs was found at the same time. Last, they demonstrated that the endometrial Ishikawa cell-derived EVs could be taken by the mouse embryos and resulted in altered embryo metabolism.

      This is a very interesting study and is the first one demonstrating the direct transmission of maternal mtDNA to embryos through EVs.

      A1. Thank you for your kind comments.

      Reviewer #2 (Public Review):

      Q2. In Bolumar, Moncayo-Arlandi et al. the authors explore whether endometrium-derived extracellular vesicles contribute mtDNA to embryos and therefore influence embryo metabolism and respiration. The manuscript combines techniques for isolating different populations of extracellular vesicles, DNA sequencing, embryo culture, and respiration assays performed on human endometrial samples and mouse embryos.

      Vesicle isolation is technically difficult and therefore collection from human samples is commendable. Also, the influence of maternally derived mtDNA on the bioenergetics of embryos is unknown and therefore novel. However, several experiments presented in the manuscript fail to reach statistical significance, likely due to the small sample sizes. Additionally, the experiments do not demonstrate a direct effect of mtDNA transfer on embryo bioenergetics. This has the unfortunate consequence of making several of the authors' conclusions speculative.

      In my opinion the manuscript supports the following of the authors' claims:

      1) Different amounts of mtDNA are shed in human endometrial extracellular vesicles during different phases of the menstrual cycle

      2) Endometrial microvesicles are more enriched for mitochondrial DNA sequences compared to other types of microvesicles present in the human samples

      3) Fluorescently labelled DNA from extracellular vesicles derived from an endometrial adenocarcinoma cell line can be incorporated into hatched mouse embryos.

      4) Culture of mouse embryos with endometrial extracellular vesicles can influence embryo respiration and the effect is greater when cultured with isolated exosomes compared to other isolated microvesicles

      A2. Thank you for your detailed feedback. We have made every effort to enhance the manuscript in this revised version, ensuring that our conclusions are grounded in solid evidence and that they avoid any speculation.

      My main concerns with the manuscript:

      Q3. The authors demonstrate that microvesicles contain the most mtDNA, however, they also demonstrate that only isolated exosomes influence embryo respiration. These are two separate populations of extracellular vesicles.

      A3. This manuscript focuses on the DNA content secreted by the endometrium and captured by the embryo. We identified both mitochondrial DNA and genomic DNA. We have found that mitochondrial DNA is predominantly secreted and encapsulated within microvesicles, while all three types of vesicles encapsulate genomic DNA. Specifically, based on the results we presented in Response A8 to the reviewers and included in the latest version of the manuscript, we observed that exosomes contain the highest amount of genomic DNA. Furthermore, exosomes have the greatest impact on embryo bioenergetics, suggesting that this DNA content may primarily exert this effect. We have thoroughly revised the manuscript, focusing our message on DNA content.

      Q4. mtDNA is not specifically identified as being taken up by embryos only DNA.

      A4. We agree with the reviewer; as we mention in answer A9, EdU does not specifically label mitochondrial DNA. To solve this issue, we incubated a synthetic molecule of labeled mtDNA with embryos and analyzed mtDNA incorporation using confocal microscopy. We co-cultured hatched mouse embryos (3.5 days) with an ATP8 sequence conjugated with Biotin overnight at 37ºC and 5% CO2. We then permeabilized embryos, incubated them with Streptavidine-Cy3 for 45 min, and visualized the results using an SP8 confocal microscope (Leica). We observed mtDNA internalization by cells of the hatched embryos; please see new supplementary Figure 7 and lines 234-237 on page 9 and lines 583-592 M&M on page 21.

      Q5. The authors do not rule out that other components packaged in extracellular vesicles could be the factors influencing embryo metabolism.

      A5. The vesicular subtypes contain molecules beyond DNA, such as microRNAs, proteins, or lipids. Our laboratory has studied the transmission of vesicles and their relationship with their contents (particularly microRNAs) and their connection to maternal-fetal communication. In this study, we focused on genomic/mitochondrial DNA. We cannot exclude the possibility that other molecules may influence metabolism; this statement is already noted in the discussion section on lines 328-331 on page 12.

      Q6. Taken together, these concerns seem to contradict the implication of the title of the manuscript – the authors do not demonstrate that inheritance of maternal mtDNA has a direct causative effect on embryo metabolism.

      A6. We have modified the title to better align with the manuscript’s results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Reviewer #1 (Recommendations for The Authors):

      Q7. Would it be possible to validate the mtDNA content and mitophagy activity in different periods using the Ishikawa cells?

      A7. Unfortunately, this validation cannot be achieved with in vitro cultures of cell lines, especially with a cell line such as the endometrial adenocarcinoma-derived Ishikawa cell line. While mimicking the menstrual cycle (as observed in Figure 3 of the manuscript) is entirely artificial, we believe that the statistically significant results obtained in human samples faithfully represent the biological processes involved. Using a cell line, in our opinion, would not provide us with novel information.

      Q8. Characterization of the EVs subpopulations from Ishikawa cells and direct evidence to show the EdU labeled DNA is contained in the EVs are necessary.

      A8. To address this concern, we designed a novel experiment. We cultured Ishikawa cells in the presence of Edu, isolated the three types of vesicles, and evaluated labeled DNA content by flow cytometry (as illustrated in Supplementary Figure 5). All three types of vesicles exhibited positive EdU-DNA labeling; notably, the exosomal fraction demonstrated substantially higher DNA content than the other vesicle populations. Please see new supplementary Figure 5 and lines 217-218 on page 9, and lines 576-582 of the M&M on pages 20-21.

      Q9. Would EdU incorporate into the genomic DNA or mitochondrial DNA?

      A9. EdU (5-ethynyl-2′-deoxyuridine) is a nucleoside analog of thymidine and becomes incorporated into DNA during active DNA synthesis. EdU labels all newly synthesized DNA, both genomic and mitochondrial; however, we cannot differentiate between them with this technique.

      Q10. It is difficult to assess whether the EV-derived DNA was taken by the TE or ICM without immunostaining of cell lineage markers in mouse embryos.

      A10. We did not aim to label the inner cell mass, as the vesicles primarily enter through trophectodermal cells. The images presented in Figure 4 and Supplementary Figure 5 depict trophectoderm cells.

      Q11. It is also valuable to perform co-staining of Mitotracker to show the co-localization of EdU labelled DNA and the mitochondrial.

      A11. Per the reviewer's suggestion, we conducted an experiment as described in the following text. We isolated MVs from the culture media of EdU-treated Ishikawa cells and co-incubated them with embryos overnight. The resulting images (See Author response image 1) show an embryo subjected to staining with EdU-tagged DNA labeled with Alexa Fluor 488 (green), Mitotracker Deep Red (red), and nuclei (blue). Detailed views of the embryo are presented in panels A and B. Notably, we observed co-localization of mitochondria and EdU-tagged DNA, as indicated by the white arrows. Despite this intriguing finding, we chose not to include these results in the initial version of the manuscript; however, if the editor deems it appropriate, we would be delighted to incorporate them into the final version. The experimental procedure for co-localization of EdU DNA-tagged with mitochondria involved the following steps: Mitotracker Deep Red FM (Thermo Fisher Scientific, M22426) was added to the embryo media at a final concentration of 200 nM, and the embryos were subsequently incubated for 45-60 minutes prior to fixation.

      Author response image 1.

      Co-localization of mitochondria and EdU-tagged DNA in mouse embryos. Representative micrograph of an embryo co-incubated with MVs isolated from the culture media of Ishikawa cells treated with EdU. EdU-tagged DNA was labeled with Alexa Fluro 488 (green). Mitotracker Deep Red (mitochondria; red) and nuclei (blue). A and B) magnified images of the embryo show detailed co-localization of mitochondria and EdU-tagged DNA (white arrows). Negative control) Embryos incubated with MVs isolated from control Ishikawa cells (without EdU incubation) and stained with the click-it reaction cocktail. A and B showed magnified images of the embryo. Notice the absence of EdU-Alexa Fluro 488 signals (green).

      Reviewer #2 (Recommendations for The Authors):

      Q12. It would be helpful if the authors could provide citations and rationale for why they chose specific molecular markers to validate the different population of extracellular vesicles.

      A12. Different extracellular populations are defined by molecular marker signatures that reflect their origin. VDAC1 forms ionic channels in the mitochondrial membrane, has a role in triggering apoptosis, and has been described as characteristic of ABs.[1]

      The ER protein Calreticulin has also been used as an AB marker [2]; however, other studies have noted the presence of Calreticulin in MVs. [1] This apparent non-specificity may derive from apoptotic processes, during which the ER membrane fragments and forms vesicles smaller than ABs, which would contain Calreticulin and sediment at higher centrifugal forces.[3,4] In fact, proteomic studies have linked the presence of Calreticulin with vesicular fractions of a size range relevant for MVs [5] and ABs [6].

      ARF6, a GTP-binding protein implicated in cargo sorting and promoting MV formation, has been proposed as an MV marker. [7,8]

      Classic markers of EXOs include molecules involved in biogenesis, such as tetraspanins (CD63, CD9, CD81), Alix, TSG101, and flotillin-1.[9,10] Nonetheless, studies have recently reported the widespread nature of such markers among various EV populations, although with different relative abundances (such as is the case for CD9, CD63, HSC70, and flotillin-1[11]). Notably, certain molecular markers (such as TSG101[1,11]) have been ratified as specific to EXOs.

      References

      1. D. K. Jeppesen, M. L. Hvam, B. Primdahl-Bengtson, A. T. Boysen, B. Whitehead, L. Dyrskjøt, T. F. Orntoft, K. A. Howard, M. S. Ostenfeld, J. Extracell. Vesicle. 2014, 3, 25011, doi: 10.3402/jev.v3.25011.

      2. J. van Deun, P. Mestdagh, R. Sormunen, V. Cocquyt, K. Vermaelen, J. Vandesompele, M. Bracke, O. De Wever, A. Hendrix, J. Extracell. Vesicles. 2014, 3:24858, doi: 10.3402/jev.v3.24858.

      3. L. Abas, C. Luschnig, Anal. Biochem. 2010, 401, 217-227, doi: 10.1016/j.ab.2010.02.030.

      4. C. Lavoie, J. Lanoix, F. W. Kan, J. Paiement, J. Cell Sci. 1996, 109(6), 1415-1425.

      5. M. Tong, T. Kleffmann, S. Pradhan, C. L. Johansson, J. DeSousa, P. R. Stone, J. L. James, Q. Chen, L. W. Chamley, Hum. Reprod. 2016, 31(4), 687-699, doi: 10.1093/humrep/dew004.

      6. P. Pantham, C. A. Viall, Q. Chen, T. Kleffmann, C. G. Print, L. W. Chamley, Placenta. 2015, 36, 1463e1473, doi: 10.1016/j.placenta.2015.10.006.

      7. V. Muralidharan-Chari, J. Clancy, C. Plou, M. Romao, P. Chavrier, G. Raposo, C. D'Souza-Schorey, Curr. Biol. 2009, 19, 1875-1885.

      8. C. Tricarico, J. Clancy, C. D'Souza-Schorey, Small GTPases. 2016, 0(0), 1-13.

      9. M. Colombo, G. Raposo, C. Théry, Annu. Rev. Cell. Dev. Biol. 2014, 30, 255-289, doi: 10.1146/annurev-cellbio-101512-122326.

      10. S. Mathivanan, H. Ji, R. J. Simpson, J. Proteomics. 2010, 73(10), 1907-1920.

      11. J. Kowal, G. Arras, M. Colombo, M. Jouve, J. P. Morath, B. Primdal-Bengtson, F. Dingli, D. Loew, M. Tkach, C. Théry, Proc. Natl. Acad. Sci. U. S. A. 2016, 113(8), E968-77.

      Q13. The PCA analysis in supplementary figure 4 A&B needs more explanation for why they think separation of the two conditions based on principal component 1 is sufficient. The small number of replicates makes me concerned because principal component 2 does not show similarity of replicates for the DNase treated samples. Also, 4C has no description in the figure legend.

      A13. The PCA results show a clear separation between the two conditions; we believe this separation is primarily driven by the differences observed in principal component 1 (PC1). We would like to address the concerns raised by the reviewer with the following points:

      1. Interpretation of PCs: In PCA, the principal components represent orthogonal axes capturing the highest variance in the data. PC1 accounts for 56% and 57% of the variance in the two conditions, respectively. The significant variance explained by PC1 suggests that it effectively captures the major sources of variation between the samples.

      2. Sample Replicates and Variability: The concern regarding the small number of replicates is acknowledged, and we understand its impact on the analysis. Despite the limited number of replicates, the consistent pattern of separation in PC1 between the two conditions provides confidence in the observed separation. We also agree that PC2 does not show an apparent similarity among the DNase-treated samples; however, this does not diminish the significance of PC1, which robustly separates the two conditions.

      We include the Figure legend for 4C: “C) Principal component analysis shows EV sample grouping due to specificity in coding-gene sequences.

      Q14. I am confused by the phrasing in the last two sentences of the top paragraph on page 7. Why would apoptotic bodies all have similar content if they encapsulate a greater amount of material making their contents less specific? Please clarify.

      A14. This sentence intended to convey the fact that apoptotic bodies (ABs) are formed from apoptotic cells, they are larger in size, and their content is more non-specific - this non-specific nature arises as they do not encapsulate molecules specifically, unlike the other two types of vesicles. For more detailed information on ABs in human reproduction, we published an extensive review in 2018 (see below).

      Simon C, Greening DW, Bolumar D, Balaguer N, Salamonsen LA, Vilella F. Extracellular Vesicles in Human Reproduction in Health and Disease. Endocr. Rev. 2018 Jun 1;39(3):292-332. doi: 10.1210/er.2017-00229. PMID: 29390102.

      Q15. The first and last sentences of the last paragraph of page 8 seem to contradict each other. Please clarify.

      A15. We observe an enrichment in the amount of mitochondrial DNA in samples during the receptive and post-receptive phases. While the data may not show statistical significance, we observed a trend towards greater enrichment in receptivity compared to pre-receptivity. The lack of significant differences could be attributed to inherent variability among patients. We have also altered the text on page 8 to avoid confusion.

      Q16. Quantification of the rates of DNA incorporation into embryos would strengthen Figure 4 and Supplementary Figure 5.

      A16. We acknowledge the reviewer's feedback, and in response, we conducted an assay to quantify the total DNA incorporated into the embryos. We isolated EVs from the control Ishikawa cell culture media and EdU-treated Ishikawa cell culture media to achieve this. Subsequently, we co-incubated both types of EVs with ten embryos overnight in G2 plus media at 37ºC and 5% CO2.

      After co-incubation, we collected embryos and the culture media containing co-incubated EVs. We then isolated total DNA using the QIAamp® DNA Mini kit (Qiagen; 51304). To label the EdU-DNA particles, we performed a click-it reaction using the Click-iT™ EdU Alexa Fluor™ 488 flow cytometry assay Kit (Thermo Fisher Scientific, ref: C10420) per the manufacturer's instructions. Subsequently, we cleaned and purified DNA using AMPure beads XP (Beckman Coulter, A63882) and eluted DNA in 150 L of 0.1 M Tris-EDTA. Finally, we measured the fluorescence of each sample using a Victor3 plate reader (PerkinElmer). To ensure accuracy, we subtracted the background signal from non-labeled DNA-derived EVs and embryos incubated without EVs for each sample. Despite conducting the experiment twice, we encountered challenges in obtaining clear results, possibly due to the limitation of the technique's resolution.

      Q17. If mtDNA is most enriched in MVs but only embryos cultured with Exos demonstrated differences in respiration the authors need to comment on this discrepancy.

      A17. We ask the reviewer to refer to Answer A3; we have thoroughly revised the manuscript, focusing our message on DNA content.

      Q18. The authors should change the definitive language in the title of the manuscript because all evidence presented is correlative.

      A18.We have modified the title to better align with the manuscript's results. The proposed new title for the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles modulates embryo bioenergetics during the periconceptional period.”

      Q19. I realize this is beyond what the authors intend for the scope of this paper, however, on page 6 the authors describe membranous structures within the ABs but say they couldn't study their presence with organelle-specific markers. Why? Presence of organelles in these vesicles is very interesting!

      A19. As the reviewer rightly points out, we did not study ABs in this manuscript. Analysis of the electron microscopy images suggests the presence of fragments of organelles, most likely originating from apoptotic processes; however, we did not use any specific markers to confirm our assertion. We have modified the text to avoid any confusion. Please see Page 6, Lines 120-121, for further details.

    1. Author Response

      We thank the reviewers and editorial team for the positive reaction to our paper and for the constructive recommendations and comments on our work. Here we provide a brief provisional response to key points that were identified. We will give a detailed point-by-point response with highlighted changes in our manuscript when we upload the revised version of our paper.

      Reviewer 1:

      Statistical evaluation of the null

      In Experiment 2, we inferred the existence of a null effect of image category on suppression depth based on frequentist statistics. At the reviewer’s suggestion we performed a statistical evaluation of the evidence in favour of the null effect using a Bayesian repeated measures ANOVA implemented in JASP. That analysis provides strong evidence for the null (BF01= 20.38) and will be included in the final version of the paper.

      Likelihood of exceptional cases

      We acknowledge that our selection of categories is only a sampling of possible categories to which our novel tCFS method can be applied for deriving suppression depth. Other possibilities that come to mind include objects that emerge from specific configurations of simple 'tokens' such as dots (such as actions defined by biological motion (Watson et al., 2004)) or different shaped tokens configured to generate pareidolia faces (Zhou et al., 2021). We will expand on the possibility of these exceptional cases impacting bCFS and reCFS thresholds in the discussion of our revised manuscript.

      Reviewer 2:

      In response to the claim “the paper overreaches by claiming breakthrough thresholds are insufficient for drawing certain conclusions about subconscious processing.”

      We agree that breakthrough thresholds can provide useful information to draw conclusions about unconscious processing – as our procedure is predicated on breakthrough thresholds. Our key point is that breakthrough provides only half of the needed information and will amend our manuscript accordingly. In so doing, we will also shift our focus toward the influence of semantics and low-level factors, including discussion of the possibility that suppression depth and bCFS thresholds could be driven by statistically orthogonal factors.

      Reviewer 3:

      On the appropriateness of log-transformed contrast

      Our motivation to quantify suppression depth after log-transform to decibel scale was two-fold. First, we recognised that the traditional use of a linear contrast ramp in bCFS is at odds with the well-characterised profile of contrast discrimination thresholds which obey a power law (Legge, 1981) and the observations that neural contrast response functions show the same compressive non-linearity in many different cortical processing areas (e.g.: V1, V2, V3, V4, MT, MST, FST, TEO. See Ekstrom et al., 2009). Increasing contrast in linear steps could thus lead to a rapid saturation of the response function, which may account for the overshoot that has been reported in many canonical bCFS studies. For example, in Jiang et al. (2007), target contrast reached 100% after 1 second, yet average suppression times for faces and inverted faces were 1.36 and 1.76 seconds respectively. As contrast response functions in visual neurons saturate at high contrast, the upper levels of a linear contrast ramp have less and less effect on the target's strength. This approach to response asymptote may have exaggerated small differences between stimulus conditions and may have inflated some previously reported differences. In sum, the use of a log-transformed contrast ramp allows finer increments in contrast to be explored before saturation, a simple manipulation which we hope will be adopted by our field.

      Second, by quantifying suppression depth as a decibel change, we enable the comparison of suppression depth between experiments and laboratories, which inevitably differ in presentation environments. As a comparison, a reaction-time for bCFS of 1.36 s cannot easily be compared without access to near-identical stimulation and testing environments. In addition, once ramp contrast is log-transformed it effectively linearises the neural contrast response function. This means that different studies that use different contrast levels for masker or target can be directly compared because a given suppression depth (for example, 15 dB) is the same proportionate difference between bCFS and reCFS regardless of the contrasts used in the particular study.

      We also acknowledge that different stimulus categories may engage neural and visual processing associated with different contrast gain values (e.g., magno- vs parvo-mediated processing). But the breaks and returns to suppression of a given stimulus category would be dependent on the same contrast gain function appropriate for that stimulus which thus permits their direct comparison. Indeed, this is why our novel approach offers a promising technique for comparing suppression depth associated with various stimulus categories (a point mentioned above). Viewed in this way, differences in actual durations of break times (such as we report in our paper) may tell us more about differences in gain control within neural mechanisms responsible for processing of those categories.

      Consider that preferential processing could shift both bCFS and reCFS thresholds together

      This is related to the point raised in the previous comment. A stimulus that is preferentially processed (such as a face) could have lower bCFS and reCFS thresholds than other stimuli such that it emerges into awareness at a lower contrast but also remains visible at lower contrasts. We plan to address this interpretation of our data in our revised discussion and highlight that this type of preferential processing could well occur, and yet could still produce the same uniform suppression depth.

      Can the effect of contrast ramp be explained by slower RTs?

      A 500 ms reaction time estimate would not account for the magnitude of the changes observed in Experiment 3. Suppression depths in our slow, medium, and fast contrast ramps were 9.64 dB, 14.64 dB and 18.97 dB, respectively (produced by step sizes of .035, .07 and .105 dB per video frame at 60 fps). At each rate, assuming a 500 ms reaction time for both thresholds (1 second total) would capture a change of 2.1 dB, 4.2 dB, 6.3 dB. This difference cannot account for the size of the effects observed between our different ramp speeds.

      Non-zero switch rate probability affecting ramping

      We agree that for a given ramp speed there is a variable probability of a switch in perceptual state for both bCFS and reCFS portions of the trial. To put it in other words, for a given ramp speed and a given observer the distribution of durations at which transitions occur will exhibit variance. We see that variance in our data (just as it’s present in conventional binocular rivalry duration histograms), as a non-zero probability of switches at very short durations (for example). One might surmise that slower ramp speeds would afford more opportunity for stochastic transitions to occur and that the measured suppression depths for slow ramps are underestimates of the suppression depth produced by contrast adaptation. Yet by the same token, the same underestimation would occur during fast ramp speeds, indicating that that difference may be even larger than we reported. In our revision we will spell this out in more detail, and indicate that a non-zero probability of switches at any time may lead to an underestimation of all recorded suppression depths.

      In our data, we believe the contribution of these stochastic switches are minimal. Our current Supplementary Figure 1(d) indicates that there is a non-zero probability of responses early in each ramp (e.g. durations < 2 seconds), yet these are a small proportion of all percept durations. This small proportion is clear in the empirical cumulative density function of percept durations, which we include in Author response image 1, and will address in our detailed response. Notably, during slow-ramp conditions, average percept durations actually increased, implying a resistance to any effect of early stochastic switching. We plan to expand on our analysis of these reaction-time differences in our revised manuscript.

      Author response image 1.

      The specificity of the DHO fit

      In our revised manuscript we will increase the justification for this model, and plan to include a comparison of model fits over time (as opposed to response number in the current manuscript).

      References

      Ekstrom, L. B., Roelfsema, P. R., Arsenault, J. T., Kolster, H., & Vanduffel, W. (2009). Modulation of the contrast response function by electrical microstimulation of the macaque frontal eye field. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 29(34), 10683–10694.

      Jiang, Y., Costello, P., & He, S. (2007). Processing of invisible stimuli: advantage of upright faces and recognizable words in overcoming interocular suppression. Psychological Science, 18(4), 349–355.

      Legge, G. E. (1981). A power law for contrast discrimination. Vision Research, 21(4), 457–467.

      Watson, T. L., Pearson, J., & Clifford, C. W. G. (2004). Perceptual grouping of biological motion promotes binocular rivalry. Current Biology: CB, 14(18), 1670–1674.

      Zhou, L.-F., Wang, K., He, L., & Meng, M. (2021). Twofold advantages of face processing with or without visual awareness. Journal of Experimental Psychology. Human Perception and Performance, 47(6), 784–794.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Comment. “The manuscript demonstrates that FGF4, FGF8, and FGF9 exhibit distinct binding modes towards FGFRs”

      No, this paper is not about ligand binding, and there are NO binding data in the manuscript. This paper is about ligand-dependent functional bias. Previously, differential effects of ligands on the signaling of one FGFR have been attributed to differences in ligand binding, but that paradigm is incomplete, if not incorrect. This manuscript is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). The bias we report here cannot be the result of differences in ligand binding. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Our article thus changes the current paradigm about how FGF ligands activate FGFR signaling.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics.

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). We calculate bias coefficients, and we analyze the results using statistical tools.

      Comment. …“Unproven and speculative structural differences in the FGF-FGFR1 dimers”.

      This statement is not correct, as it is directly contradicted by the differences reported in Figure 6. This Figure presents the results of a quantitative FRET assay performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured differences in FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when FGF8 is bound to the extracellular domain of FGFR1, as compared to FGF4 and FGF9. The difference can be observed in the raw FRET data in Figure 6A. While these data do not reveal the exact molecular origin of the structural differences, they unequivocally prove that there are structural differences when different ligands are bound.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.


      The following is the authors’ response to the previous reviews.

      eLife assessment. This manuscript describes useful data on the mechanisms underlying the activation of the receptor tyrosine kinase FGFR1 and stimulation of intracellular signaling pathways in response to FGF4, FGF8, or FGF9 binding to the extracellular domain of FGFR1. Solid quantitative binding experiments are presented to demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      No, this paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. This is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. Thus far, differential effects in the signaling of one FGFR have been attributed to differences in ligand binding, but this current paradigm is incomplete/incorrect. Our article changes the current paradigm in how FGF activate downstream FGFR signaling.

      We have clarified this point by adding the following text in the Discussion.

      "Thus far, differential effects in the signaling of one FGFR in response to different FGF ligands have been attributed to differences in ligand binding. It can be reasoned, however, that differences in ligand binding strengths, alone, cannot explain differential signaling. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Here we discovered, using tools that are novel for the RTK field, that there are qualitative differences in the actions of the ligands. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and collagen loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). These effects occur in addition to previously measured differences in ligand binding coefficients (87).”

      We have also re-written the abstract.

      “Abstract

      “The mechanism of differential signaling of multiple FGF ligands through a single FGF receptor is poorly understood. Here, we use biophysical tools to quantify multiple aspects of FGFR1 signaling in response to FGF4, FGF8 and FGF9: potency, efficacy, bias, ligand-induced oligomerization and downregulation, and conformation of the active FGFR1 dimers. We find that the three ligands exhibit distinctly different potencies and efficacies for inducing responses in cells. We further discover qualitative differences in the actions of the three FGFs through FGFR1, as FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and cell growth arrest). Thus, FGF8 is a biased ligand, when compared to FGF4 and FGF9. Förster resonance energy transfer experiments reveal a correlation between biased signaling and the conformation of the FGFR1 transmembrane domain dimer. Our findings expand the mechanistic understanding of FGF signaling during development and bring the poorly understood concept of receptor tyrosine kinase ligand bias into the spotlight.”

      Reviewer #1 (Public Review):

      Comment. Quantitative binding experiments presented in the manuscript demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      This paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. Please see our response to the Elife assessment.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics that is manifested via binding and activation FGFR1 mediated by "structural differences in the FGF- FGFR1 dimers, which impact the interactions of the FGFR1 transmembrane helices, leading to differential recruitment and activation of the downstream signaling adapter FRS2".

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). Specifically, we construct bias plots, we calculate bias coefficients, and we analyze the results using statistical tools.

      Also, please note that ligand bias has no direct connection to binding strength, so the statement that biased ligand characteristics “is manifested via binding” is not correct.

      Comment. In the absence of any structural experimental data of different forms of FGFR dimers stimulated by FGF ligands the model presents in the manuscript is speculative and misleading.

      Figure 6 presents the “structural experimental data”. A quantitative FRET assay is performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when the ligand FGF8 is bound to the extracellular domain of FGFR1, as compared to the cases of FGF4 and FGF8.

      Because the Rosetta modeling of the kinase domains in the previous version of the paper is not based on experimental data, we have removed the modeling from the Results, and we have removed all references to it in the Discussion. Thus, all that is shown and discussed in the revised paper is based on experimental data.

      We have substituted two paragraphs in the discussion with the following two sentences:

      “The experimental data in Figure 6 hint at the possibility that ligand bias arises due to differences in FGFR1 dimer conformations. If this is so, then conformational differences in the signaling complex in the plasma membrane underlie biased signaling for both RTKs and GPCRs, the two largest receptor families in the human genome”.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.

    1. Author Response:

      Reviewer #1:

      Summary:

      This research study utilizes a realistic motoneuron model to explore the potential to trace back the appropriate levels of excitation, inhibition, and neuromodulation in the firing patterns of motoneurons observed in in-vitro and in-vivo experiments in mammals. The research employs high-performance computing power to achieve its objectives. The work introduces a new framework that enhances understanding of the neural inputs to motoneuron pools, thereby opening up new avenues for hypothesis testing research.

      Strengths: The significance of the study holds relevance for all neuroscientists. Motoneurons are a unique class of neurons with known distribution of outputs for a wide range of voluntary and involuntary motor commands, and their physiological function is precisely understood. More importantly, they can be recorded in-vivo using minimally invasive methods, and they are directly impacted by many neurodegenerative diseases at the spinal cord level. The computational framework developed in this research offers the potential to reverse engineer the synaptic input distribution when assessing motor unit activity in humans, which holds particular importance. Overall, the strength of the findings focuses on providing a novel framework for studying and understanding the inputs that govern motoneuron behavior, with broad applications in neuroscience and potential implications for understanding neurodegenerative diseases. It highlights the significance of the study for various research domains, making it valuable to the scientific community.

      Weaknesses: The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.

      We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      Reviewer #2:

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

      Nevertheless, I would suggest that the authors consider the following recommendations to strengthen the message further. First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.

      We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1: Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Ratio. The summary plots are for the models showing highest 𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Ratio).

      Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (push-pull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree left unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?

      We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task. We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank all the reviewers for their comments and constructive feedback regarding our manuscript. We have made many changes to strengthen the manuscript, including addition of two new experiments (presented in Fig. S1) that help to clarify the nature and scope of activation of late response genes in striatal neurons. Our specific responses to individual reviewer comments are provided below.

      Reviewer #1

      Public review

      Weaknesses: The timing and the location of the accessibility changes are meaningfully different from other similar studies, which should be discussed. The authors provide good data for the function of a single enhancer near Pdyn, but could contextualize this with respect to other regulatory elements nearby.

      In the revised manuscript, we have expanded our discussion of the differences between chromatin accessibility changes observed in this study and those found in prior reports in different systems. These differences are also addressed in extended detail below. Unfortunately, limitations on resources and time prevented a deeper exploration of additional candidate enhancers near the Pdyn locus. However, we believe our efforts to characterize an activity-dependent enhancer in the Pdyn locus provides a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      Recommendations For The Authors

      1) At 1hr after stimulation in previous papers (Su 2017 which is reference #8 of FernandezAlbert Nat Neurosci. 2019 October ; 22(10): 1718-1730.) there are large increases in accessibility directly over the IEGs, consistent with the concerted transcription of these genes following stimulation. It is surprising that the authors do not see this here, either at 1hr or at 4hr. This difference in results needs to be addressed.

      We thank the reviewer for bringing this discrepancy to our attention. Indeed, Su et al. 2017 and Fernandez-Albert et al. 2019 both describe increases in chromatin accessibility at IEG promoters. There are several experimental differences that could be contributing to differences between our study and previously published studies. Two major reasons include the developmental timepoint of the tissue/cells and the cell type/brain region that is being assayed. Su et al. assayed chromatin accessibility in ex vivo slices containing the dentate gyrus from adult mice, while Fernandez-Albert et al. assayed chromatin accessibility in forebrain principal neurons of adult mice following kainic acid injection. Bulk ATAC-Seq experiments described in the present manuscript were generated from cultured embryonic rat striatal neurons. Additionally, baseline chromatin accessibility seems to be significantly different between forebrain principal neurons studied in Fernandez-Albert et al. 2019 and the current study. For example, in Figure 3a of Fernandez-Albert et al. 2019, the Npas4 gene body is not accessible in a saline treated animal. In vehicle treated, cultured embryonic rat striatal neurons, the Fos gene body and associated enhancers are accessible at baseline (Fig. S3), and do not increase with KCl depolarization.

      We have expanded our discussion of this discrepancy in the discussion section of the revised manuscript, and included additional citations addressing this difference.

      2) It is also somewhat surprising that the authors see almost no regions that show changes in accessibility at 1hr and then a very large number of differentially accessible regions at 4hr. This is quite different from the more rapid changes shown for example in Figure 7f in the human GABA neurons even though these are also studies in culture with rapid calcium channel opening. Can the authors speculate on the reason for the difference?

      Many previously published studies that use cultured neurons include a pre-treatment in which spontaneous neuronal activity is inhibited with the sodium channel blocker tetrodotoxin (SanchezPriego et al. Cell Reports, 2022; Kim et al. Nature, 2010; Malik et al. Nature Neuroscience, 2014). The Sanchez-Priego et al. Cell Reports manuscript also blocked NMDA receptor activity with the competitive NMDAR antagonist D-AP5 for 12 hours prior to depolarization. Rapid changes in chromatin accessibility observed in other studies at <1 hour timepoints could be due to prior silencing of the cells and subsequent reduction in the accessibility and transcriptional activity of IEGs. Decreased baseline accessibility and transcriptional activity of IEGs can be observed in Figure 1a of Malik et al. 2014, which displays ChIP-Seq tracks for both RNA pol II and H3K27ac. At baseline, H3K27ac and RNA pol II enrichment is low throughout the Fos locus. Subsequent depolarization of silenced neurons drives accessibility and transcription of the Fos gene and associated enhancers. In contrast, we found accessible chromatin at Fos enhancer elements at baseline (without stimulation; Fig. S3).

      The experiments described in the current study do not include any pre-treatment with tetrodotoxin or D-AP5, and thus the neurons are expected to be spontaneously active. This baseline electrophysiological activity may result in increased accessibility and transcription at IEG loci, which ultimately makes it difficult to identify activity-dependent increases in IEG accessibility at timepoints <1 hour. Furthermore, a previously published manuscript from our lab (Carullo et al. Nucleic Acids Research, 2020) conducted ATAC-seq on cultured embryonic rat cortical, hippocampal, and striatal neurons and found that transcribed enhancers for IEG loci (including Fos) had decreased chromatin accessibility following depolarization when compared to vehicle treatment. These differences in experimental design (including cell type, model organism, developmental timepoint, and treatment paradigm) may all contribute to differences in the temporal dynamics of chromatin remodeling between the current manuscript and previously published studies.

      3) Experimentally it can be challenging to repress a single enhancer and show a significant effect on gene regulation which makes the repression in Fig 6c somewhat unexpected. There are several regions near Pdyn that show activity-dependent changes in accessibility in the human cells (Fig. 7e) and presumably in the rat neurons too (Fig. 5a shows a few but most of the intervening region is cut out). Did the authors target any of these other regions?

      We chose the identified regulatory element upstream of the Pdyn TSS because it met several criteria that we determined are important for characterizing LRG enhancers. These criteria are outlined in the Results: “1) located in non-coding regions of the genome, 2) inaccessible at baseline and accessible following depolarization, and 3) inaccessible when depolarization was paired with protein synthesis inhibition.” Indeed, ATAC-seq experiments presented in the current study demonstrate that thousands of genomic regions undergo reprogramming, and many of these regions meet these criteria (including additional loci near Pdyn). However, we lacked the time and resources to systematically investigate all other enhancers, and did not target any other regions within the Pdyn locus. While many enhancers may regulate a single gene, the identified enhancer seems to be particularly important for activity-dependent Pdyn gene expression. Importantly, CRISPRi-based repression of this enhancer (Fig. 6c) did not reduce basal Pdyn expression as compared to a non-targeting control, but completely blocked stimulus-dependent induction of Pdyn transcription. We believe this is a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      4) The authors should clarify in the methods or figure legends the number of independent replicate libraries for each experiment and were the RNA and ATAC libraries made from the same or different experiments.

      We thank the reviewer for bringing this to our attention. We have clarified the number of replicates in the methods as outlined below. Additionally, RNA and ATAC libraries were generated from different experiments, and this information is also now included in the methods.

      Within the ATAC-Seq library preparation and analysis methods section: “ATAC-seq libraries were generated from experiments independent of the RNA-seq experiments. For the ATAC-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons pre-treated with DMSO or Anisomycin, there were 4 replicates within each treatment group (4 DMSO + Veh, 4 DMSO + KCl, 4 Anisomycin + KCl).”

      Within the RNA-seq library preparation and analysis methods section: “RNA-seq libraries were generated from experiments independent of the ATAC-seq experiments. For the RNA-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within the KCl group and 4 replicates within the vehicle group. For the RNA-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 4 replicates within each group (4 Veh, 4 KCl).”

      Reviewer #2

      Public review

      First of all, at a conceptual level, most of the findings related to the induction of particular transcriptional programs upon neuronal activation the changes in chromatin state, and the need for protein translation for proper induction of LRGs have been broadly characterized previously in the literature (Tyssowski et al., Neuron, 2018; Ibarra et al., Mol. Syst. Biol., 2022; and also reviewed by Yap and Greenberg, Neuron, 2018). In addition, it is not so obvious why to focus on Pdyn gene regulatory regions among the thousands of genes upregulated and with modified chromatin landscape after neuronal activation. The authors highlight three particular traits of this gene as the reason to choose it, but those traits are probably shared by most of the genes that are part of the LRGs set.

      We thank the reviewer for these comments, and have included these important publications as citations in our manuscript. With over 5,000 differentially accessible chromatin regions following KCl stimulation, it was not possible to follow up on all regulatory regions or linked genes in a rigorous way. Therefore, we selected a target candidate enhancer near the Pdyn locus for mechanistic studies. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of this gene, and makes our results applicable to a broader pre-existing literature.

      At the methodological level, some attention should be put into the timings chosen for generating the data. The authors claim that these time points (1h and 4hrs) identify the first (i.e IEGs) and second (i.e LRGs) waves of transcription. However, at 4hrs the highest over-expressed genes are still IEGs, as shown in the volcano plots of Figure 1B and 1C, showing a high overlap with up-regulated genes found at 1h (Figure 1D). This might suggest that the 4hrs time point is somewhere in between the first and second wave of transcription, probably missing some of the still-to-be-induced LRGs of the latest one.

      Given that the depolarization applied in RNA-seq and ATAC-seq experiments is continuous, it was not unexpected to find IEGs present at both 1 h and 4 h timepoints. The revised manuscript contains a new experiment (Fig. S1d-f) demonstrating that a shorter depolarization period (1 h KCl followed by a 3 h wash off period) also induces Fos mRNA, but to a much lower extent than 4 h continuous stimulation. In contrast, both short (1 h) and long (4 h) depolarization periods induce Pdyn to equivalent levels when measured at 4 h after the onset of the stimulus. These experiments support our conclusion that LRGs require a temporal delay, and not simply extended stimulation. Nevertheless, the reviewer is correct that a 4 h timepoint may potentially miss some LRGs that are induced even later. We plan to explore the full timecourse of LRG induction in future studies.

      Finally, while only prosed as a suggestion, the assumption that from the data generated in this article, we can envision a mechanism by which AP-1 family of transcription factors interacts with the SWI/SNF chromatin remodeling complex is going too far, as no evidence is provided implicated SWI/SNF in the data presented in the manuscript.

      While speculative in the current context, we felt that it was important to highlight this prior literature to identify potential mechanisms that may link IEGs (specifically, AP-1 members) to chromatin remodeling machinery. We have altered this section of the discussion to emphasize that this link is speculative in the context of neuronal chromatin remodeling.

      Recommendations For The Authors

      1) I couldn't find the number of replicates generated for each dataset, neither for RNA nor for ATAC-seq. It could be worth adding these data to the figure legends or in the material and methods.

      We thank the reviewer for bringing this to our attention. The number of replicates generated for each dataset are now included in the methods section (see response to Reviewer #1, comment #4 above).

      2) In Figure 1D, Gene Ontology terms appear significant only for each of the individual datasets. While this might be expected for the 1h time-point, the 4hrs time-point comprises a big extent of the genes up-regulated at 1h as well, and it is surprising no term related to chromatin or transcription regulation appears as significant. Is this due to the fact that the analysis has been conducted with two separated lists of genes and only the top terms are shown without crossing the data? This could be misleading for the reader and maybe a comparative GO term analysis might be better such as using CluterProfiler or similar tools, that might allow for real comparison of terms enriched in each dataset.

      We thank the reviewer for pointing this out. For Figure 1d, GO term analysis was conducted with two separated gene lists, each consisting of timepoint-specific upregulated DEGs. Thus, 772 genes were included for the analysis of 4 h GO terms and 39 genes were included for the analysis of 1 h GO terms. Previously, comparisons of cellular component GO terms included in the current study only included the top 10 GO terms. The revised manuscript contains an updated analysis that compares all enriched GO terms and identifies that three of the top 10 cellular component GO terms for the 1 h gene set are also identified as significantly enriched in the 4 h gene set. We have revised the graph in Fig. 1f to reflect this updated analysis. Overall, our conclusions (that 1 h and 4 h DEG sets fall into distinct functional categories) remains supported by this analysis.

      3) In Figure 3D, the graphs show the density of motifs within the DARs in units of "Motifs/Kb/peak" while the x-axis represents the peaks coordinates from -500bp to +500bp. It is not clear to me how this graph is generated and how within 1000bp the profiles can reach values of 18-20 Motifs/Kb/peak. Could this be clarified?

      The motif enrichment score was calculated by identifying the number of total motifs within defined 50bp genomic bins surrounding the center of the DAR regions. HOMER builds enrichment histograms that normalize motif presence to set size (e.g., number of peaks or DARs), and also to genomic space (base pairs). While HOMER’s default histogram represents motifs/bp/peak, we converted this to motifs/kb/peak for ease of interpretation. However, to avoid confusion we have returned the y axis labels to the default HOMER settings (motifs/bp/peak). The normalization and units for this graph have been clarified in the methods section.

      4) In Figure 4C the newly generated ATAC-seq data is just "targeted" analyzed, showing global tendencies are maintained between the initial generated data and this one. It could be interesting, however, to see the number of DARs obtained in these conditions, especially to see if some DARs are observed in the Anisomycin condition that might be translation-independent.

      The experiment described in Figure 4 was designed to both validate the 5,312 DARs and understand the role of protein translation in activity-dependent chromatin remodeling. One way to begin identifying translation-independent DARs is to compare the DMSO + Vehicle group to the Anisomycin + KCl group. With this comparison, any 4 h DAR that has increased accessibility in the Anisomycin + KCl group may be translation-independent as pretreatment with anisomycin did not prevent chromatin remodeling. After conducting this analysis, we identified a very small percentage (3.44%) of 5,312 4 h DARs that still exhibited significantly increased accessibility when pre-treated with Anisomycin. This small number is consistent with the robust effects of anisomycin on KCl-dependent remodeling shown in Fig. 4c-d. However, to confirm that these were in fact translation-independent activity-regulated DARs, we would need to perform direct comparison of chromatin accessibility between neurons pre-treated with Anisomycin and then treated with either vehicle or KCl. Since we did not include an anisomycin only group in experiments in Fig. 4, we cannot confidently claim whether this 3.4% of DARs are translationindependent. Nevertheless, we agree with the reviewer that this is an interesting avenue of future exploration.

      Reviewer #3

      Public review

      1) Throughout the paper, the authors emphasize a "temporal decoupling" of transcriptional and chromatin response to depolarization, based on a lack of significant chromatin changes at 1h, despite IEG transcription. However, previous publications show significant chromatin remodeling at 1h (e.g. Su et al., NN 2017 in adult dentate gyrus) or 2h (Kim et al., Nature 2010; Malik et al., NN 2014 in cultured embryonic cortical neurons). The discussion briefly mentions this contrast, but it remains difficult to conclude decisively whether there is temporal decoupling when such decoupling is not found consistently. If one is to make broad conclusions about basic neural chromatin response to depolarization, it would be ideal to know under which conditions there is temporal decoupling, or if this is a region-specific phenomenon.

      Indeed, prior studies referred to in our manuscript have identified chromatin remodeling at earlier timepoints than we identified here. As addressed above (Reviewer #1, comments 1 & 2), it is possible that this discrepancy arises due to the difference in experimental model system, differences in the type of stimulation applied, pretreatment protocols used to silence neurons prior to activation, or even differences in developmental stage. Differences in each of these parameters make it difficult to make straightforward comparisons between datasets and results in this manuscript. It is possible that other cell types induce IEGs more quickly (or more robustly) in response to stimulation, which could lead to earlier chromatin remodeling. However, the common patterns of chromatin reorganization (e.g., the fact that changes are enriched at AP-1 motifs and are found in intergenic regions at putative enhancers) lend support for the idea that the transcriptional waves identified here can also be found in other cell types and in other contexts.

      2) The UMAP analysis is a novel way to probe transcription factor enrichment, but it's unclear what this is actually showing. The authors sought to ask whether "DARs could be separated based on transcription factor motifs in these regions." However, the motifs present in any genomic stretch are fixed based on genomic sequence, so it seems like this analysis might be asking whether certain motifs are more likely to be physically clustered together in the genome, in activity-regulated regions (rather than certain transcription factors acting in concert, as is implied in discussion). While still potentially interesting, this analysis does not seem to give much additional insight into activity-dependent chromatin remodeling beyond the motif enrichment analysis already performed. Nevertheless, to draw stronger conclusions, it would be necessary to compare clustering to a random set of genomic regions of the same length/size to interpret the clustering here. It would also be useful to know whether the ISL1 motif is also enriched in ubiquitously accessible genomic regions in the striatum (and not just DARs).

      We agree that additional analysis is needed to explore enrichment of various transcription factor motifs and activity at differently accessible regions of the genome. The motif enrichment analysis in Figure 3 demonstrated the types of motifs that were enriched in DARs (Fig. 3a-c), the overall degree of enrichment (Fig. 3c), and the distribution of those motifs across DAR sites (Fig. 3d). This analysis allowed us to understand whether motifs for cell-defining transcription factors like ISL1 are enriched uniquely in DARs, or are also found in other regions that are accessible at baseline (see direct comparisons between vehicle/baseline peaks and DARs in Fig. 3d). However, these approaches represent enrichment across all DARs as group, and do not show TF presence/absence at any specific DAR. The UMAP analysis presented in Figure 3e allowed identification of DAR clusters based on the presence or absence of specific transcription factor motifs, and allowed us to represent specific DARs in a reduced two-dimensional space. Because this analysis identifies the existence of distinct motifs within single DARs, it allowed us to speculate as to the possibility of transcription factor cooperation within DARs, or the meaning of DAR clusters that appear to be defined by specific motifs (e.g., KLF10 in Fig. 3f). Given the information that this adds to the initial analyses, we argue that its inclusion in the manuscript is useful and potentially informative for generating follow-up hypotheses.

      3) The authors identify late-response gene enhancers by 3 criteria. However, only Pdyn was highlighted thereafter. How many putative DARs met these three criteria in striatum? Only Pdyn?

      As illustrated in Figures 2 and 4, nearly all of the DARs in our dataset met these criteria, which included presence in non-coding genomic regions, increase in accessibility following stimulation, and prevention of chromatin accessibility changes by protein synthesis inhibition. We did not mean to indicate that the Pdyn locus was unique in this way. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of the regulator mechanisms that control expression of this gene, and makes our results applicable to a broader pre-existing literature. The revised manuscript includes additional experiments that examine Pdyn expression changes in response to different stimuli, which help to justify the focus on this gene from the beginning of the manuscript.

      Recommendations For The Authors

      1) Figure 1 volcano plots show a scatter primarily in the up-regulated portion at both the 1-h and 4-h time points. However, the Venn diagrams show largely similar numbers of up- and downregulated genes at the 4-h time point. Is the clustering of down-regulated genes tighter/more overlapping? If so, semi-translucent volcano dots or some acknowledgment of the visual discrepancy would be useful.

      We thank the reviewer for bringing this to our attention. Down-regulated genes are clustering tighter on the volcano plot due to smaller fold changes. This visual discrepancy is acknowledged by the numeric indicators of up- and down-regulated genes in the upper left-hand corner of the volcano plot.

      2) Methods for RNA and ATAC seq analysis align to human genome Hg38, rather than rat?

      RNA- and ATAC-Seq analyses from rat neurons were aligned to the mRatBn7.2/Rn7 rat genome. RNA- and ATAC-Seq analyses from human neurons were aligned to the Hg38 human genome. We have updated the methods to make this clear.

      3) The introduction states that different classes of neurons induce distinct LRGs. Please add a citation. Citations are also needed for the last statement WRT consequences of chromatin remodeling near LRGs not being concretely linked to LRG transcription.

      We thank the reviewer for pointing this out. The revised manuscript now includes additional citations supporting each of these statements.

      4) Specify somewhere in Methods that DEGs were compared to vehicle for both 1-h and 4-h (and not 4 vs 1 h).

      We thank the reviewer for bringing this to our attention. We have updated the methods to include: “DEGs were calculated by comparing the KCl and Vehicle treatment groups at each respective timepoint.”

      5) In Figure 2E, why are the enrichments exactly opposite, especially given these are two different types of input (all baseline peaks vs DARs)?

      Odds ratios were calculated by comparing baseline peaks (i.e., ATAC-seq peaks identified in vehicle treated cells) to KCl-induced DARs. This allowed us to identify the enrichment of DARs in specific genomic annotations in comparison to the genomic features that are accessible at baseline, rather than making comparisons to random probe sets or genomic space dedicated to these distinct annotations. This analysis identified that relative to baseline peaks, DARs are significantly depleted in coding regions of the genome and enriched in non-coding regions of the genome. However, given this analysis we agree that it does not make sense to graph both the vehicle (baseline) and DARs on this graph, given that enrichment of each set is determined relative to the other (creating the reciprocal enrichment in this panel). We have updated Fig. 2e to only include points for 4 h DARs.

      6) Some references are off. One that I noted was "...chromatin remodeling in the mouse dentate gyrus following 1 h of electricoconvulsive stimulation" should be Su et al 2017 not Malik 2014. For the statement that IEGs are critical regulators of non-neuronal IEGs, the authors may want to add Hrvatin 2017 ref.

      We thank the reviewer for bringing this to our attention. We have revised the manuscript to include the correct citation for this claim, and also to incude the Hrvatin, et al reference.

      7) It would be helpful for the authors to write out the whole gene name for Pdyn somewhere.

      We have updated the text to include the gene name for Pdyn, both in the abstract and also in the introduction of the manuscript.

      8) Figure 5f: For ease, please include what is high vs low in the figure caption in addition to the main text.

      We thank the reviewer for bringing this to our attention. We have updated the figure caption and main text to include what is high vs low in Pseudotime estimates in Fig. 5f.

      9) How are the tracks ordered in Fig8c?

      Tracks within Fig. 8c demonstrate snATAC-seq signal at the Pdyn gene locus for transcriptionally distinct cell types within the NAc. The tracks are ordered by cluster size (nuclei number) in the snATAC-seq dataset.

    1. Author Respose

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors prepared several Acinetobacter baumannii strains from which an essential protein of known or unknown function can be depleted. They chose to study one of the proteins (AdvA) in more detail. AdvA is a known essential cell division protein that accumulates at cell division sites together with other such proteins. No clear homologs are present in model bacteria such as E.coli, and the precise role(s) of AdvA is still unclear. The authors rename AdvA here as Aeg1. The authors searched for suppressors of lethality caused by AdvA-depletion and recovered an allele of ftsA (E202K) that is capable of doing so. Based on similar superfission alleles previously recovered in other division genes in E.coli, they test several mutant genes and find that certain alleles in ftsB, L and W can also suppress lethality of AdvA-minus cells.

      In addition, the authors perform bacterial two-hybrid assays and protein sublocalization studies of AdvA and of other division proteins, but the results of these studies are either not new (confirming previous work) or not convincing.

      We appreciate the vigor of this reviewer.

      We agreed that the essentiality of AdvA/Aeg1 described in our submission is not new, we believed our work has firmly established its role as a cell division protein. The earlier work by the labs of Geisinger and Isberg labs (1) showed its essentiality and the cell morphology changes upon its depletion (Fig. 3 of ref. 1 in the end of this rebuttal letter). This protein was one of the many proteins addressed in their study and their results only suggests its role in cell division due to the close phenotypical relationships between AdvA/Aeg1 and genes associated with chromosome replication/segregation and cell division.

      Reviewer #2 (Public Review):

      In this study the authors confirm that one of the genes classified as essential in a Tn-mutagenesis study in A. baumannii is in fact an essential gene. It is also present in other closely related Gram-negative bacteria and the authors designated it Aeg1. Depletion of Aeg1 leads to cell filamentation and it appears that the requirement for Aeg1 can be suppressed by what appear to be activation mutations in various genes. Overall, it appears that Aeg1 is involved in cell division but many of the images suffer from poor quality - it may be due to conversion to PDF. One of the main issues is that depletion of Aeg1 is carried out for such long times (18 hr) (Fig. 2, 4 and 5). Depleting a cell division protein for such long times may have pleiotropic effects on cell physiology. A. baumannii grows quite fast and even with a small inoculum, cells will probably be in stationary phase. If Aeg1 is that essential cells should be quite filamentous 2-3 hours after Ara removal when they are still in exponential phase. Also, it would be better to see the recovery to small cells if cells are not grown such a long time before Ara is added back. Overall, Aeg1 is potentially interesting, but studies are needed to define its place in the assembly pathway for this to be published. What proteins are at the division site when Aeg1 is depleted and what proteins are required for Aeg1 to localize to the division site. These experiments should be done when cell are depleted of proteins for only 1 -2 hours.

      We appreciate these insightful suggestions and have followed them to make necessary modifications in the revised manuscript, including:

      1st, We have redone the experiment for Fig. 1C to obtain images of higher resolution.

      2nd, We have more carefully examined the kinetics of the depletion of Aeg1-mCherry upon removal of the inducer arabinose from medium. We first evaluated the protein of Aeg1-mCherry at 2, 4, and 6 h after withdrawing arabinose and found that at the 2 h and 4 h time points mCherry-Aeg1was still readily detectable (Fig. S4). Importantly, we found that removal of arabinose for 6 h rendered Aeg1-mCherry undetectable in approximately 90% of the cells. We thus used the 6 h inducer depletion to examine the effects of Aeg1 depletion.

      In experiments aiming to analyze the co-localization of Aeg1 with other core divisome proteins, cultures of strains derived from Δaeg1(PBAD::mCherry-Aeg1) harboring the GFP fusions were induced by ara for 16 h. The saturated bacterial cultures were then diluted into fresh LB broth without ara for 6 h to induce the elongation morphology. IPTG (0.25 mM) and ara (0.25%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Our results indicate that Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, and FtsW (Fig. 4C), which is consistent with results from the protein interaction experiments using the bacterial two-hybrid assay.

      To determine the impact of Aeg1 depletion on cellular localization of the several core cell divisome proteins. In cells in which Aeg1 had been depleted (by removing the inducer arabinose), all of the examined core division proteins displayed midcell mistargeting, including ZipA, FtsK, FtsB, FtsL, and FtsN (Fig. 5A).

      Reviewer #1 (Recommendations For The Authors):

      Specific remarks 1) The manuscript title is misleading in that the 'novel cell division protein' studied in this paper has already been identified as such, and studied in some detail, by the Geisinger and Isberg labs (refs 37 and 20).

      We agreed with this point. Because of the data presented by Geisinger and Isberg labs (1) that demonstrated its essentiality and morphological changes upon its depletion (Fig. 3 in ref 1), we have changed the title to “A unique cell division protein critical for the assembly of the bacterial divisome”.

      2) The Isberg/Geisinger labs named this division protein AdvA in 2020 (ref 37). The authors of the present manuscript should follow this terminology, as there is no compelling reason to rename the protein Aeg1 here. It will only confuse the field.

      We named this protein Aeg1 because we identified and named it before the work by Geisinger and Isberg labs (1) was published and this name has been used in all of our records. In addition, this is a part of our research exploring hypothetical essential genes in A. baumannii and we thus would like to keep the name in this manuscript.

      3) Membrane topology of AdvA? Line 103-104: The authors predict a single transmembrane domain in AdvA (Aeg1). However, reference 37 predicted two, and some prediction programs (e.g. CCTOP) predict three with the N-terminus periplasmic. A good understanding of the membrane topology of AdvA is important, if not only for the design of credible BACTH two-hybrid assays. Figure 6 indicates that the authors assume that the N-terminus of AdvA is periplasmic with the bulk of the protein cytoplasmic. But then they choose to use pKT25::AdvA for two-hybrid assays, which would place the CyaA T25 domain periplasmic as well. This should not yield faithful interaction data as both the T25 and T18 domains need to be cytoplasmic to restore CyaA activity.

      The Bacterial Adenylate Cyclase-Based Two-Hybrid (BACTH) technique is a powerful tool for studying protein-protein interactions, especially those involving integral membrane or membrane-associated proteins. It overcomes the limitations of traditional two-hybrid systems by allowing the detection of interactions that occur within the membrane or in other difficult-to-study protein environments (2). This method has been successfully used to analyze the relationships among bacterial cell division proteins (e.g., ref 3 and 4). Furthermore,our results from bacterial two-hybrid and immunofluorescence techniques are consistent. As a result, the results presented here should be valid.

      4) Strains and plasmids, Table S4 Far more detail is needed. a) Please provide complete genotypes of strains and, especially, of the plasmids used, including replication origin, antibiotic resistance markers, promoters, promoter repressors, inducible genes/fusions to be expressed, and the placement of genetic tags (T25, T18, XFP, Flag, etcetera).

      We have added the information to Table S4.

      b) In addition, provide details on how each strain/plasmid was constructed in the Methods section or as supplement. Currently, you only provide some details on one or two of the strains or plasmids.

      We have added the necessary details about how the constructs and plasmids used in this study were made.

      5) Lines 114-129, Fig 2. AdvA is needed for cell division. a) Similar results were already described by refs 37 and 20, so this is merely confirmatory.

      We revised the description accordingly.

      b) Refs 37 and 20 should be referenced here, as well as in the section above where you find AdvA to be essential for viability on rich medium.

      We have added the appropriate reference as suggested.

      c) The micrographs in panel C are of poor quality. Consider higher magnification and resolution.

      We have redone the experiments and images of higher resolution have been used in the revised manuscript.

      6) Lines 130-143, selection for suppressors of AdvA-depletion. I would expect quite a few mutations in araC repressor on the plasmid in this screen, rendering the promoter more constitutive (i.e. arabinose-independent). Did these not appear?

      This is an interesting point. Unfortunately, we did not recover suppression mutants which mutations on araC or other elements of the BAD promoter. Given the complexity of AraC-mediated regulation (5), such mutants likely are rare or we did not screen enough candidates.

      7) Lines 173-178, Fig3E. Sublocalization of AdvA-mCherry. a) The micrographs in Fig. 3E are very poor and I can not see any specific localization, or barely any signal whatsoever, of the AdvA-mCherry fusion. Thus, this result is not convincing

      We have replaced this image with a new one of higher-resolution.

      b) In contrast, accumulation of an AdvA-GFP fusion at constriction sites was already clearly and convincingly shown in ref 37.

      We have revised the text to reflect this fact.

      c) So, this section needs convincing images, as well as a reference to ref 37.

      We have added an image of higher resolution and revised the text accordingly. Thank you

      8) Lines 179-188, Fig4a-b. BACTH assays

      a) As noted above (see point 3), the T25-AdvA fusion would likely place the T25 domain in the periplasm, casting doubt on the validity of these results.

      b) Similarly, the T18-ZipA fusion would place the T18 domain in the periplasm, casting further doubt.

      The Bacterial Adenylate Cyclase-Based Two-Hybrid (BACTH) technique is a powerful tool for studying protein-protein interactions, especially those involving integral membrane or membrane-associated proteins. It overcomes the limitations of traditional two-hybrid systems by allowing the detection of interactions that occur within the membrane or in other difficult-to-study protein environments (2). This method has been successfully used to analyze the relationships among bacterial cell division proteins (e.g., ref 3 and 4). Furthermore,our results from bacterial two-hybrid and immunofluorescence techniques are consistent. As a result, the results presented here should be valid.

      9) Lines 189-201, Fig4c, co-localization of proteins in AdvA-depleted filaments. These co-localization results are not convincing for several reasons:

      a) None of the proteins accumulate in specific ring-like structures, as might be expected for ZipA, at least. One possible reason is that division rings are not made at all due to the partial depletion of AdvA in these cells. But another possible reason is that some or all the fusions are simply non-functional. Do any of these proteins (co-)localize to the septal ring in wt cells?

      b) At least for the GFP-ZipA fusion, there is good reason to predict it is not functional, as correct membrane insertion of the fusion would place GFP in the periplasm. In E. coli this prevents GFP from becoming fluorescent in the first place. So the fluorescence seen here may reflect failure of the fusion to insert properly.

      c) Another possible reason for rings being absent is that the fusions are massively overexpressed. The plasmids are multicopy, the BAD and TAC promoters are strong, and the used levels of inducers (Ara and IPTG) are high. How do fusion levels compare to that of native proteins? Perhaps some of the bright spots we see are inclusion bodies or other types of non-specific protein aggregates.

      We appreciate these excellent suggestions and have carried out experiments to investigate the (co-)localization of these proteins at the septal ring in Δaeg1 cells under conditions of low-level inducers (Ara and IPTG) and reduced induction time.

      Cultures of strains derived from Δaeg1(PBAD::mCherry-Aeg1) harboring the GFP fusions were induced by ara for 16 h, saturated bacterial cultures were then diluted into fresh LB broth without ara for 6 h to induce the elongation morphology. IPTG (0.2 mM) and ara (0.2%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Consistent with results from the protein interaction experiments using the bacterial two-hybrid assay, Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, and FtsW (Fig. 4C). Thus, Aeg1 interacts with multiple core cell divisome proteins of A. baumannii.

      In cells of the wild-type A. baumannii strain, we have observed cell elongation upon overexpression of FtsL, FtsB, FtsW, or FtsN. This raises concerns regarding the physiological relevance of the results obtained in wild-type cells. Of note, the phenotype of cell elongation following overexpression of division proteins has been observed in Escherichia coli by several groups (6-11).

      10) Lines 202-214, Fig5a, localization of division proteins in AdvA-depleted filaments. These localization results are not convincing for the same reasons outlined above (see point 9).

      a) Do any of the fusions localize correctly under similar expression conditions, but in normally dividing cells?

      In wild-type A. baumannii cells, cell elongation occurs upon overexpression of FtsL, FtsB, FtsW or FtsN, which raises the concern that the results from the suggested experiments may not physiologically relevant.

      b) Even the regular structures seen with GFP-FtsZ do not resemble rings, but appear more like blobs. Perhaps fixation with glutaraldehyde would preserve structures better?

      We have followed the suggestion to use glutaraldehyde fixation for cell fixation. The new images have been used in the revised manuscript.

      11) Other points:

      a) Line 97, Fig1. Is AdvA essential on minimal medium (~ slow growth) as well?

      We have performed this experiment. Yes, AdvA/Aeg1 is essential for A. baumannii growth in the Vogel-Bonner minimal medium with succinate (VBS) as the sole carbon source (12) (Fig S1).

      b) Fig1. What residues are actually missing (or replaced?) in the delta-TM version of AdvA?

      We have added the information, residues 1-23 have been removed.

      c) Fig1D. Also, the delta-TM version of HA-AdvA runs slower than HA-AdvA itself. Why?

      We have also been puzzled by this phenomenon that full-length AdvA/Aeg1 migrated faster than the delta-TM mutant. Interestingly, this discrepancy did not occur when the proteins were expressed in E. coli (see Author response image 1). We do not have a good explanation for this phenomenon.

      Author response image 1.

      The expression of the Aeg1 and Aeg1∆TM in A. baumannii and E. coli. Total proteins resolved by SDS-PAGE was probed by immunoblotting with the HA-specific antibody. The metabolic enzyme isocitrate dehydrogenase (ICDH) was probed as a loading control. Similar results were obtained in three independent experiments.

      d) Lines 159, 165 and elsewhere. The mutation in E. coli is actually FtsA(R286W), not Q286W.

      We have corrected this error. Thank you!

      e) Line 161. These alleles of ftsA should be referenced properly: ref 33 for I143L and ref 29 for E124A.

      We have made the correction. Thank you!

      f) Line 692, you incorrectly switched the two CyaA domains here.

      We have corrected this error.

      g) Fig4b. Is 'none' a vector control (pUT18C-Flag)?

      We have specified the control, it is the vector pUT18C-Flag.

      h) Lines 727-729. I don't understand this sentence. Please explain.

      We have revised this sentence.

      Reviewer #2 (Recommendations For The Authors):

      Line 159 and Fig. 2 Panel D. I am not sure that this panel should be in the paper for two reasons: 1) FtsA from E. coli and A. baumannii are only 50% identical and its not clear that one can make corresponding mutations and expect similar behavior. FtsA* from E. coli is R286W not Q286W. R286 does not appear to be conserved in A. baumannii. Also, what you label as Q286 appears to be Q285. Please check. 2) the alleles that are tested in this panel do not rescue the deletion of Aeg1. This may be due to the instability of the mutant proteins. It would be better to characterize the mutant that you have isolated - is it a superfission mutation; that is does it produce small cells in a strain that contains WT Aeg1?

      Thank you! We have more carefully examined the relevant sites in these proteins. We did not observe the small cell phenotype when FtsAE202K was overexpressed in WT strains (please see Author response image 2).

      Author response image 2

      The overexpression of FtsAE202K did not cause a small cell phenotype in A. baumannii. Bacterial strains derived from WT (Ptac::FtsAE202K) grown in LB broth overnight were diluted into fresh medium with the inducer and the cultures were induced with IPTG for 4 h prior to being processed for imaging (A). Total proteins were resolved by SDS-PAGE and proteins transferred onto nitrocellulose membranes were detected by immunoblotting with the HA-specific antibody. ICDH was probed as a loading control (B, right panels). Images were representatives of three parallel cultures. Bar, 10 µm.

      The images in Fig. 3, Panel C are quite poor (perhaps the original images [not PDF] are better). It is difficult to see the localization.

      We have redone the experiments and replaced the images with ones of higher resolution.

      Fig. 4. Panel C. This is an effort to show that Aeg1 colocalizes with known cell division proteins. Since in Fig. 3, panel C it is claimed that Aeg1 localizes to the division site, them it must colocalize with known division proteins. Doing the long term depletion of Aeg1 is likely causing artefacts. The localization of proteins seems very erratic. A better experiment would be to express the GFP fusions to the known proteins and then deplete Aeg1 and see what happens. Does depletion of Aeg1 prevent the localization of FtsZ, FtsK or FtsN? Another important question is if one of the known cell division proteins is depleted does Aeg1 localize to division sites. Since it is speculated that Aeg1 interacts with ZipA and FtsN, these proteins could be depleted and see if Aeg1 localizes.

      We greatly appreciate your insightful suggestions. We have carefully redone these experiments as follows: Each of the testing strains was grown in LB broth with ara overnight prior to being diluted into fresh medium without ara for 6 h to induce the elongation morphology. IPTG (0.25 mM) and ara (0.25%) were added to induce the expression of fusion proteins for 4 h before samples were processed for microscopic analysis. Consistent with results from the protein interaction experiments using the bacterial two-hybrid assay, we observed that Aeg1 colocalized with ZipA, FtsK, FtsL, FtsB, or FtsW (Fig. 4C).

      In cells not expressing Aeg1, all of the examined core division proteins including FtsZ, FtsK, and FtsN displayed midcell mistargeting, (Fig. 5A).

      As for the localization of Aeg1 upon depleting ZipA or FtsN, this is an ongoing project in our lab. Such information is beyond the scope of this manuscript.

      Fig. 5. Panel A. again the images are not of good quality. Also, why deplete for 18 hrs. This is too long.

      We have redone these experiments and images of higher resolution are now used in the revised manuscript. After extensive test, we have chosen to use a 6-h depletion, which gave us the window to observe the phenotype (Fig. 5A).

      Line 25. Change 'so' to 'as'

      Corrected as suggested. Thank you!

      Line 28. "Induces' to 'induce'

      We have made the suggested correction. Thank you!

      Line 43. Change 'of' to 'with'

      Corrected as suggested. Thank you!

      Line 74. Change 'determine' to 'test'

      Corrected as suggested. Thank you!

      Line 89. Delete 'of the'

      We have made the suggested correction. Thank you!

      Line 102. Some strains of E. coli? Does that mean there are strains that do not contain Aeg1? What are they?

      Yes, this is indeed the case, the common strains of E. coli derived from strain K12 does not have a discernable homolog of aeg1. This gene is present in some clinic E. coli isolates (e.g. HAY5567682, HBI862710, HAY5567682, MDD9849866, EFE8345364, and KAE9874289).

      Line 112. Note this TM domain has a rare topology as it is similar to ZipA. Please mention that this is a Type 1b.

      We have made the suggested revision. Thank you!

      Reference:

      1. Geisinger E, Mortman NJ, Dai Y, Cokol M, Syal S, Farinha A, et al. Antibiotic susceptibility signatures identify potential antimicrobial targets in the Acinetobacter baumannii cell envelope. Nature communications. 2020;11:4522.doi: 10.1038/s41467-020-18301-2

      2. Karimova G, Gauliard E, Davi M, Ouellette SP, Ladant D. Protein-Protein Interaction: Bacterial Two-Hybrid. Methods in molecular biology (Clifton, NJ). 2017;1615:159-76.doi: 10.1007/978-1-4939-7033-9_13

      3. Karimova G, Dautin N, Ladant D. Interaction network among Escherichia coli membrane proteins involved in cell division as revealed by bacterial two-hybrid analysis. Journal of bacteriology. 2005;187:2233-43.doi: 10.1128/jb.187.7.2233-2243.2005

      4. Boldridge WC, Ljubetič A, Kim H, Lubock N, Szilágyi D, Lee J, et al. A multiplexed bacterial two-hybrid for rapid characterization of protein-protein interactions and iterative protein design. Nature communications. 2023;14:4636.doi: 10.1038/s41467-023-38697-x

      5. Schleif R. AraC protein, regulation of the l-arabinose operon in Escherichia coli, and the light switch mechanism of AraC action. FEMS microbiology reviews. 2010;34:779-96.doi: 10.1111/j.1574-6976.2010.00226.x

      6. Addinall SG, Cao C, Lutkenhaus J. FtsN, a late recruit to the septum in Escherichia coli. Molecular microbiology. 1997;25:303-9.doi: 10.1046/j.1365-2958.1997.4641833.x

      7. Pichoff S, Lutkenhaus J. Identification of a region of FtsA required for interaction with FtsZ. Molecular microbiology. 2007;64:1129-38.doi: 10.1111/j.1365-2958.2007.05735.x

      8. Du S, Henke W, Pichoff S, Lutkenhaus J. How FtsEX localizes to the Z ring and interacts with FtsA to regulate cell division. Molecular microbiology. 2019;112:881-95.doi: 10.1111/mmi.14324

      9. Park KT, Du S, Lutkenhaus J. Essential Role for FtsL in Activation of Septal Peptidoglycan Synthesis. mBio. 2020;11.doi: 10.1128/mBio.03012-20

      10. Barre FX, Aroyo M, Colloms SD, Helfrich A, Cornet F, Sherratt DJ. FtsK functions in the processing of a Holliday junction intermediate during bacterial chromosome segregation. Genes & development. 2000;14:2976-88.doi: 10.1101/gad.188700

      11. Cameron TA, Vega DE, Yu C, Xiao H, Margolin W. ZipA Uses a Two-Pronged FtsZ-Binding Mechanism Necessary for Cell Division. mBio. 2021;12:e0252921.doi: 10.1128/mbio.02529-21

      12. Vogel HJ, Bonner DM. Acetylornithinase of Escherichia coli: partial purification and some properties. The Journal of biological chemistry. 1956;218:97-106.doi:

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you and the two reviewers for their constructive feed-back on our manuscript entitled: "Substrate evaporation drives collective construction in termites".

      Here, we submit a revised version in which -we believe- we fill the missing details identified by the reviewers and we clarify the presentation of our results.

      From the eLife assessment we can identify a few main points that the reviewers found unclear or not well developed in our previous manuscript:

      • Insufficient details about computer simulation models. Is the match between simulations and experiments qualitative or quantitative?

      • Request for clarifications related to the wall stimulus: is evaporation stronger at the high-curvature wall corners or similar along all the wall edge? Why is there less consistency in the experimental results with the wall stimulus, with a minority of wall experiments in which something different happens?

      • Quantitative estimation of the humidity gradients in our experimental setup.

      • "Confirmation" that termites can sense humidity gradients of magnitude and scale comparable with those encountered in our experiments.

      • Request for additional background information about the considered termite species and their construction habits.

      The reviewers also made a number of interesting suggestions and other comments:

      • Suggestion of possible explanations and interpretations for a purported discrepancy with a previous work by Calovi and collaborators.

      • Suggestion of alternative experimental approaches (array of probes, alternative experimental setups).

      We address all these points below.

      Details about computer simulation models

      There are two different types of computer simulations in our experiments: 1. simulations of evaporation on the initial structure, and 2. simulations of structure growth based on curvature.

      1) Simulations of evaporation We recall that these simulations rely on the hypothesis that humidity transport happens in a diffusive way, that is evaporation rate is proportional to the humidity gradient. New details on the implementation of these diffusive simulations are now added in section S.VI. We also adapted figures 4A and 4B which are now expressed in units more comparable to the expected humidity field in experiments. Essentially, we show that the model under-estimates the absolute magnitude of the humidity gradient |∇ℎ| in our setup while it correctly predicts the relative importance of the same field across the topography.

      First, it is instructive to report the value of |∇ℎ| predicted by diffusive simulations with the bottom boundary at 100% humidity (like the clay disk), and the top boundary of the simulation box at 70% like our experimental room. Note that, at a given temperature, relative humidity and absolute humidity are proportional, so we will assume here that temperature is constant and always refer to relative humidity. Thus, humidity gradient will be measured in 𝑚𝑚−1 exactly like curvature. One than has:

      • flat disk, |∇ℎ| ∼0.01mm−1

      • wall tips, |∇ℎ| ∼0.13mm−1

      • wall top edge |∇ℎ| ∼0.1mm−1

      • pillar tips |∇ℎ| ∼0.19mm−1,

      First we remark, that the value of |∇ℎ| on the flat portion of the disk is 10 times smaller of the estimation |∇ℎ|0 ∼0.5mm−1 of the same quantity in our experiments, which is now given in the manuscript and discussed in a specific paragraph below. This discrepancy is due to the fact that our simulations overestimate the size of the diffusive region (i.e. the simulation box) to 18mm while we expect the diffusive layer to be much thinner (i.e. 𝛿 ∼2mm). Note also that, as in all diffusive problems, the humidity gradient on any point of the bottom boundary (i.e. on the clay surface) depends on the distance of that point from the top boundary, for example the closer are the boundaries the stronger is the gradient. This is a very general feature of diffusive problems: the gradient of the diffusing field depends on the distance from the boundaries, where the value of the field is given. Note also that, in principle, the size of the simulation box does not only affect the overall magnitude of the humidity gradient but also its shape. However, one observes that in our simulations the topographic cues are only 30% closer to the top boundary compared to the flat, bottom, surface, but the local gradient is 10 to 20 times larger. This evidence suggests that the ’curvature’ effect is much stronger than the ’distance’ effect, and supports the fact that our approximation does not affect in a significant way the estimation of the relative importance of the humidity gradient at the bottom surface. We then conclude that our diffusive simulations do not provide a correct estimation of the order of magnitude of |∇ℎ|, but well capture its relative variations across the topography.

      2) Structure growth based on curvature. As observed by the reviewer, the dynamical simulations included here refer to a model that was developed in a previous study, thus we chose to not include all the details of the simulations in the present one. At this stage, that model is still phenomenological: for example we cannot provide a physical estimation of the dimensionless parameter 𝑑 which controls the typical size of the structure produced by the simulations of the model. Thus in principle, the comparisons with real experiments cannot be other than "qualitative". Indeed, to push such a comparison further is not necessarily of interest, given the minimal and mean field character of our model, and the extreme complexity of the natural system which is studied here. However, our experimental setup was specifically designed to overcome this limit, which is designing topographies where the curvature cues where modulated in a way which is almost discrete, with flat regions, and regions where curvature is strong ’for termites’, i.e. the curvature radius is of the order of termite body size. Our experimental results greatly validate our choice because deposition patterns also show an almost ’discrete’ shape, with specific regions attracting most of the depositing actions. Thus, we claim that the significance of the agreement is strong, and we suggest that when stimuli and response both behave in a quasi-discrete manner, the difference between qualitative and quantitative is not well defined. Finally, we recall that in all the discussion above curvature and humidity gradient can be exchanged, as we already pointed out in the manuscript. Consistently, the humidity gradient show a strong variation between the curved regions and the flat ones.

      Results with the wall stimulus One important point coming out from the reviews is that we did not clearly present the results with the wall stimulus. These concerns are best summarized by a comment from reviewer 2, who states: “evaporation rates seem inconclusive in the wall geometry, yet the termites still deposit material at the high-curvature wall corners”.

      We acknowledge that the interpretation of results of experiments with the wall stimulus must address three key points: 1- Salt deposition experiment are inconclusive in showing variation of the evaporation rate, across the top of the wall; 2- A portion (4/11) of termite experiments do not show a clear pellet deposition pattern by termites; 3- Conversely, in the remaining portion (7/11), most experiments still show a clear pellet deposition on the corners of the wall, in spite of small differences in evaporation between the corners and the top edge (like in our Fig. 3B). These points are now addressed in the manuscript and discussed below.

      The variation of the humidity gradient between the corners of the wall, and the wall’s top edge is relatively small while both are regions of relatively high curvature and higher evaporation as compared to the the flat surface of the clay disk. We now report precise values of the humidity gradient from numerical simulations, as discussed above. These indicate that humidity gradient at the wall corners and upper edge is respectively 10 and 7 times larger than on the flat bottom, but evaporation at the wall tips is only 0.3 times larger than on the wall upper edge.

      Experiments with the saline solution qualitatively confirm the same result of an evaporation pattern more evenly distributed on the wall stimulus (point 1) than on the pillars.

      Taken together, these results might explain why not all wall experiments end up with depositions at the tips (point 2): simply, in the wall experiments the relative importance of the deposition cue between tips and wall upper edge is not high enough to always guide termite behavior in a deterministic way.

      But we should also point to the fact that the evaporation simulations presented in figure 4 and the experiments with the saline solution both reflect the humidity field on the clay templates before termite construction has started. As soon as termites start adding pellets to the wall, effectively starting to build a pillar, the humidity gradient will be reinforced at the locations of pellet deposition, and a self-reinforcing process is initiated, similar to our dynamical simulations based on local curvature. This explains why eventually termite activity can result in clear and localized depositions (point 3) also with the wall stimulus.

      Incidentally, we would like to include here another consideration: the nest of Coptotermes termites comprise a “scaffold” with multiple interconnected pillars. In other termite genera, the prevalent nest structure is one made by surfaces, rather than pillars, such as in Nasutitermes nests, Apicotermes, Psammotermes, or again some fungus growing structures in Macrotermes and Synacanthotermes). The fact that the wall stimulus presents some potential to stimulate construction everywhere on its edge is intriguing as it might provide some cues on the construction of different nest architectures.

      Quantitative estimation of the humidity gradient in our setup The moisture gradients in our experiments and simulations was only presented in a non-quantitative manner, because we were mainly interested in identifying locations of high and low evaporation. But, combining scaling arguments already discussed in S.IX and the the results of our evaporation simulations, one can produce a lower boundary for the magnitude of the humidity gradient |∇ℎ|, predict its higher value at key positions on our setup, and compare it with humidity variations experienced by termites in their natural environment. These considerations are now included in the manuscript and discussed below.

      First, we define a reference value |∇ℎ|0 for the humidity gradient on the (flat) clay disk, which can be estimated using the boundary layer thickness 𝛿 ∼2mm (see section IX.A of the SI) and the variation of relative humidity Δℎ between the clay disk surface and the exterior which was Δℎ =30% (the difference between the fully wetted substrate, and room air humidity at 70% saturation). Note that |∇ℎ|0 constitutes a lower boundary for the expected values of the humidity gradient in our setup, as confirmed by our experiments with saline solution. We can then write:

      Next, the results of diffusive simulations shown in figure 4A and 4B indicate that the humidity gradient at highly curved regions of the topographic cues is at least 10 times larger than |∇ℎ|0 which allows to estimate an upper boundary for |∇ℎ| in our experimental setup, say |∇ℎ|𝑚𝑎𝑥 ∼1mm−1. Humidity sensing capabilities of termites Our hypothesis that humidity gradients could guide termite building behavior implicitly assumes that termites can sense humidity gradients comparable with those existing in our experiments.

      Humidity is important to all termites because of their small size and unsclerotized body. Coptotermes termites in particular are wetwood termites that can only survive in high-humidity environments such as moist wood or soil. It is well documented that coptotermes termites (like other termites and cockroaches) have humidity receptors in their antennae, and behavioral studies indicate that they can discriminate between chambers with different humidity content.

      For example, a study by Gautam and Henderson (2011, Environmental entomology, 40:1232) provided chambers with different relative humidity and, after 12 hours, almost all termites were in the highest humidity chamber (98% RH), leaving the other chambers with 75% or less RH empty. These results (which are similar also to other results testing termite response to chambers with different soil moisture) indicate that -given a sufficient amount of time- termites can detect a difference of humidity from 75% to 98% over a spatial scale of centimeters.

      The quantitative estimation of the humidity gradient described above indicates that in our experimental setup termites can experience humidity variations of 15% over a distance of only 1mm and even shorter, while the length of a single termite antenna is about 1.5 mm.

      In other words, the humidity gradients that we estimate for our experiments are well above those that termites were able to discriminate in previous experiments. Future experiments should aim to test the exact limits of resolution of the humidity-sensing ability of termites (e.g. in an environment where humidity is close to 100% everywhere), and the mechanisms how they sense the gradient (e.g. comparing information from the two antennae, or by integrating humidity information over time).

      By definition, |∇ℎ|0 corresponds to a variation of humidity between a fully saturated atmosphere (i.e. 100%), comparable to the nest interior, and a "humid" atmosphere (i.e. 70%) comparable to the natural environment where termites live (say the nest exterior), occurring over a distance (2mm) which is comparable with their body size.

      We can then conclude that even the lower boundary |∇ℎ|0 of the humidity gradient corresponds to an atmosphere variation to which termites must be used, i.e. nest interior vs nest exterior, happening across one body length. If we add that the upper boundary |∇ℎ|𝑚𝑎𝑥 is one order of magnitude higher, it appears extremely unlikely that they could not detect these gradients.

      Additional background information about our considered termite species and their construction habits

      We have now added some details about the life history and nesting habits of termites in the Coptotermes genus in a new paragraph in section SI. Essentially, these are wetwood termites that nest in moist wood or soil, and their nests present a typical structure comprising a scaffold of interconnected pillars (we now show a picture of a typical structure from one of our lab-reared colonies).

      After the initial submission of our manuscript we have also obtained a more precise taxonomic identification of the termites we used, which indicated that our termites are better identified as Coptotermes gestroi than Coptotermes formosanus. The two species are extremely close and can also interbreed in the areas where they co-occur, but in this case C. gestroi is a better match. Hence, we have amended the name in the manuscript and in the supplementary material.

      Differences with previous results by Calovi and collaborators

      We believe that there is no real discrepancy between our results and those described by Calovi et al. (2019, Phil. Trans. Roy. Soc. B 374:20180374). What they measure-termite aggregation and activity- is similar to what we also observe in our experiments: termites aggregate in concave regions, such as at the base of the wall in our experiments, and they collect pellets at the locations that they visit more often. And, above all, we observe that concavities promote digging activity, which in turns promote aggregation as already observed in previous studies like Green et al. (2017, Proc. Roy. Soc. B 284:20162730). The main difference is that in our analyses we treat separately the three measurements of termite occupancy, pellet collection and pellet deposition, and in this way we identify a role of convexity for pellet deposition.

      It is possible that, apart from the differences in language and interpretations between our study and the study by Calovi, there were also real differences in termite building behavior between the two studies that we couldn’t fully appreciate from our own reading of the article by Calovi, but which the reviewer has spotted. The reviewer makes a very interesting suggestion that some of these differences might be due to the different humidity level used in our experiment, compared to the experiment by Calovi and collaborators. Room humidity was high, at around 70% in our experiments. The humidity in Calovi’s experiments was possibly even higher as they performed their experiments in a closed box, but we could not find precise reported information on the humidity level in their publication.

      Given that it is not clear that the building behavior in our experiments was qualitatively different from the building behavior in Calovi and collaborators’ experiments, and given that we don’t know the precise humidity value used in Calovi’s experiments (plus, we worked on different termite species that could have different sensitivity to humidity) we decided that -based on the information that we have- we could not meaningfully expand our discussion of similarities and differences with Calovi’s study in our manuscript.

      It is clear, though, and we completely agree with the referee on this point, that in light of Calovi’s and our own new results, it would now be extremely interesting if future experiments could characterize termite construction activity across a range of finely controlled air humidity values. Anecdotally, in preliminary experiments we did include some trials in which termites were hosted in a completely closed box, and we observed much reduced construction activity in those conditions. However, the fact that we could not easily track termite activity and pellet collections / depositions in those conditions (because of the box), together with the fact that the building activity itself was reduced, made us to converge towards the open arena experiments that we describe here.

      Suggestion of alternative experimental approaches One reviewer made interesting suggestions for alternative experiments, including using an array of humidity probes for measuring humidity, or a different experimental setup -analogous to those used in previous experiments by Bardunias and collaborators-. It is often the case that only at the end of a series of experiments we identify an alternative, and possibly better, way of doing the same experiment. In future, if we have the opportunity to run other similar experiments again, we will likely experiment with these suggestions. When we first designed our own experiments, one of our priorities was to be able to film all termites in the arena at all time, so that potentially we could also study individual termite behavior and task specialization. This partly constrained the type of experimental setups that we could use.

      One aspect that clearly emerged from our work and from the revision process is that any future experiments related to this topic should achieve a very precise control of air humidity, and test a wider range of stimuli of more varied and controlled size, humidity and curvature. Since our own experiments were conducted, three of us have moved to different institutions, which imposes practical constraints for us on working on the same termites in a similar way, but the suggestions from the reviewers will be helpful as we are planning our future research.

      We hope that the explanations above and the details that we have changed in the manuscript itself have contributed to clarify unclear aspects of our study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes a structural analysis of the tripartite HipBST toxin-antitoxin (TA) system, which is related to the canonical two-component HipBA system composed of the HipA serine-threonine kinase toxin and the HipB antitoxin. The crystal structure of the kinase-inactive HipBST complex of the Enteropathogenic E. coli O127:H6 was solved and revealed that HipBST forms a hetero-hexameric complex composed of a dimer of HipBST heterotrimers that interact via the HipB subunit. The HipS antitoxin shows a structural resemblance to HipA N-terminal region and the HipT toxin represents to the core kinase domain of HipA, indicating that in HipBST the hipA toxin gene was likely split in two genes, namely hipS and hipT.

      -The structure also reveals a conserved and essential Trp residue within the HipS antitoxin, which likely prevents the conserved "Gly-rich loop" of HipT from adopting an inward conformation needed for ATP binding. This work also shows that the regulating Gly-rich loop of the HipT toxin contains conserved phosphoserine residues essential for HipT toxicity that are key players within the HipT active site interacting network and which likely control antitoxin binding and/or activity.

      Strengths:

      The manuscript is well written and the experimental work well executed. It shows that major features of the classical two-component HipAB TA system have somehow been rerouted in the case of the tripartite HipBST. This includes the N-terminal domain of the HipA toxin, which now functions as bona fide antitoxin, and the partly relegated HipB antitoxin, which could only function as a transcription regulator. In addition, this work shows a new mode of inhibition of a kinase toxin and highlights the impact of the phosphorylation state of key toxin residues in controlling the activity of the antitoxin.

      Weaknesses:

      A major weakness of this work is the lack of data concerning the role of HipB, which likely does not act as an antitoxin. Does it act as a transcriptional regulator of the hipBST operon and to what extent both HipS and HipT contribute to such regulation? These are still open questions.

      We thank the reviewer for their feedback and have included a supplementary figure (Figure 1 supplement 2) and accompanying text that shows the transcriptional role of HipB, and how HipS and HipT influence this regulatory effect.

      In addition, there is no in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of HipBST from Legionella. This is also a major weakness of this work.

      A structural comparison to the recent structures from Legionella has now been included in the discussion, including Figure 6 supplement 1.

      Reviewer #2 (Public Review):

      The work by Bærentsen et al., entitled "Structural basis for regulation of a tripartite toxin-antitoxin system by dual phosphorylation" deals with the structural aspects of the control of the hipBST TA operon, the role of auto-phosphorylation in the activation and neutralisation of the enzyme and the direct effects of HipS and HipB in neutralisation. This is a follow-up to the Vang Nielsen et al., and Gerdes et al., papers from the same authors on this very unique TA module, that brings forth a thorough and well written dissection of an unusually complex regulatory system.

      This is a much improved manuscript, the paper is more focused and the message is now clear.

      Reviewer #1 (Recommendations For The Authors):

      My main recommendation would be to include an in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of similar HipBST from Legionella.

      We thank the reviewer and have included a new supplementary figure (Figure 6 supplement 1) and expanded the comparison in the discussion to accommodate this.

      Reviewer #2 (Recommendations For The Authors):

      So I only have some minor comments.

      1) The authors should accompany Fig.1 (a supplementary panel is sufficient) with a surface electrostatic representation of the complex to better illustrate the potential role of the complex in transcription auto-regulation.

      We have included a new panel in Figure 1 supplement 3 to show the electrostatic surface of the DNA-binding domains of HipB of HipBST and HipBASo.

      2) When the Gly-rich loop is first introduced, please provide from which residue to which residue the loop expands.

      Corrected for both the first mention of the Gly-rich loop of HipA and HipT.

      3) In Fig 2. The authors try to show how the interaction of the main helix of HipS with HipT is different in HipBST compared to HipAB. I think it would be helpful if these two panel show the surface of HipT and HipA coloured by electrostatics so that not only the differences in HipS become apparent, but also the local differences between both toxins.

      We thank the reviewer for this excellent idea, and the electrostatics did in fact reveal that the region of the toxins are different. We have updated figure 2b to show this difference.

      4) Fig. 4 Shows the experimental SAXS curves for the HipT D210Q variants SIS (blue), SID (red), and DIS (orange). In each case a black curve is fitted to the data (presumably the fitting of the model-derived scattering curve to the data). Could the authors clarify this in the figure?

      We agree that this information is missing in the legend. The black curves are the fits for the models based on the crystal structure after rigid-body refinements and inclusion of a structure factor to account for oligomerization of the complexes. This is now included in the figure caption.

      5) Also regarding the SAXS analysis, in the manuscript the authors state that all three models "gave good fits to the data" as assessed by the fitting χ2. These χ2 values should be explicit in the figure or the figure legend.

      We thank the reviewer for this suggestion. The chi squared values for the best fits are now given in the text.

      In addition, is the SAXS data (the parameters derived from the experimental scattering, including the MW) consistent with the lack of HipS from the complex? (it should be...).

      This is a good point, however, the partial oligomerization (dimerization) of the complexes (heterohexamers) and the variation of the dimerization degree between samples prevent extraction of useful mass values from the I(0) determinations. Therefore, we decided not to give the values explicitly in the text but only state “…consistent with analysis of the forward scattering that revealed partial oligomerisation of the samples with an average mass corresponding to roughly a dimer of the HipBST heterohexamer.”

      6) Please improve this sentence: "Moreover, since it has previously been shown that only the HipT Gly-rich loop never is observed in doubly phosphorylated form with both Ser57 and Ser59 modified simultaneously, it is unlikely that the effects are due to autophosphorylation of the remaining serine residue in either case (Vang Nielsen et al., 2019)."

      Done

    1. Author Response

      We are happy that the novelty and strengths of the study have been appreciated by the editor/s and reviewer/s. We thank the editor/s and reviewer/s for a considerably detailed and constructive review of the manuscript. Here are the responses and proposed revisions from the authors.

      • The weakness, as pointed out in the editorial comment regarding the absence of data on role of Piezo1 in migrating T cells in varying physico-chemical conditions were, in the opinion of the authors, beyond the scope of the present manuscript. Moreover, introducing external forces using invasive techniques followed by assessment of Piezo1 function was intentionally avoided. That was the reason for using the non-invasive microscopy technique like IRM to assess membrane tension generation in migrating T cells.

      • With regard to the explanation sought for the statement 'these high tension edges are usually further emphasized at later time points', the edges are visible right from 1 min (Supp fig 2B) and seen to be emphasized at 30 min. In Fig 2D, we find the 3 min time point at which increased tension at edges is visible together with a clear difference in median tension too. Fig. 2c and Supp fig 2C are averaged over all cells - hence it is possible that at a time point when a particular cell still shows higher tension at edges the median tension of Fig 2C is not significantly different. Also, if only a thin section of cell-edge enhances tension - it may contribute to a second peak without affecting the median much.

      • With regard to the query regarding experimental replicates, all data shown is derived from at least 3 experimental replicates for Jurkat cells or independent blood donors for primary CD4+ T lymphocytes as specified in the respective figure legends.

      • With regard to the comments on nonavailability of representative images/videos for Figures 1 A and B, in the revised manuscript we will add representative video of GFP (-) and GFP (+) tracks. The transwell experiments were assessed by collecting cells from the bottom chamber followed by flow cytometry. We did not take microscopic images of the bottom chambers before collecting the cells.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editor and all the reviewers for their time and thoughtful consideration of our manuscript. We appreciate the valuable comments. Our provisional response to the “public review” has been published and now we have corrected factual errors and enhanced the clarity of writings based on the “recommendations for the authors.” We believe these corrections will improve the quality and accuracy of our manuscript.

      Specific responses to the reviewers' recommendations for the authors are as follows:

      Reviewer #1 (Recommendations For The Authors):

      1) Is the Slack current amplitude dependent on the Nav subtype? Differences in Slack current amplitude might explain the sensitization of Slack to quinidine.

      We appreciate the reviewer for raising this point. We examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      2) Is the open probability changed by the presence of Nav1.6 and/or by the other Nav subtypes? Changes in open probability might explain the Nav1.6 induced sensitization of Slack to quinidine block.

      We appreciate the reviewer for raising this point. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in future studies.

      3) Could the authors elaborate more on the coupling between INaT mediated sensitization of Slack to block by quinidine and the Nav1.6 N-and C-tail induced sensitization?

      We appreciate the reviewer for raising this point. We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade. To address the questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      4) Line 85: The authors use an outdated nomenclature of AMPAR subtypes. I would suggest changing to GluA1, GluA2, GluA3 and GluA4.

      We appreciate the reviewer’s suggestion. We have changed the term “GluR” to “GluA” in the revised manuscript.

      The authors do not explain the rationale by using the different homomeric AMPAR subtypes. Most often the AMPARs express as heteromeric receptors decorated by auxiliary subunits. Also, is the GluA2 the edited version?

      We thank the reviewer for raising this point. While AMPARs are often expressed as heteromeric receptors with auxiliary subunits, we focused on the homomeric AMPAR subtypes for initial screening. Through our investigation, we found no significant effects on sensitizing Slack to quinidine blockade. Additionally, the GluA2 used in our study is unedited.

      5) Line 144: I expect a reduction in current amplitude caused by blocking INaT and INaP is tested at +100mV?

      We thank the reviewer for raising this point. The reduction in current amplitude was indeed tested at +100 mV and we have included this information in the revised manuscript.

      6) Line 157 and line 162: Reference to Supplementary table S3 should be Table S2.

      We thank the reviewer for pointing this out. The reference to "Table S3" has been corrected to "Table S2" in the revised manuscript.

      7) How many times did the authors repeat the co-immunoprecipitation? Some of the bands are very weak, and repeats are necessary for all blots.

      We thank the reviewer for raising this concern. We performed the co-immunoprecipitation experiments three times independently.

      8) Line 288: The authors are showing the chimeric construct in Figures 7A and B but are referring to the full length Nav1.6 in the main text line 288.

      We apologize for the confusion. We have clarified in the revised manuscript that we used NaV1.5/6NC in our study.

      9) Figure 1 line 23: 1 uM quinidine must be 30 uM quinidine?

      We thank the reviewer for catching this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      10) Figure 2 line 53: I expect IC50 is measured at +100mV? Same question for line 60 in same figure text.

      We thank the reviewer for pointing this out. We have now included this information in the revised manuscript.

      11) Figure 4B color coding is confusing.

      We apologize for the confusion. We would like to clarify that Fig. 4B illustrates the domain architecture of the human NaV channel pore-forming α subunit, and we have changed the color from dark blue to black in the revised figure.

      12) Figure S6: Text for figure S6E and S6F has been swapped (line 96 to 106).

      We thank the reviewer for raising this point. We have rectified the swapped captions for Fig. S6E and Fig. S6F in the revised manuscript.

      13) Methods section line 652: Kainite acid should be changed to kainic acid

      We thank the reviewer for catching this typo. The term “kainite acid” has been corrected to “kainic acid” in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Discuss limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We thank the reviewer for raising this point. We have discussed the limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system (line 344 to line 348).

      2) Riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We have discussed the limitations of riluzole in the revised manuscript (line 360 to line 364).

      3) Remove the term in vivo.

      We thank the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the coimmunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      4) Figure 1

      ①C Why does Nav1.2 have a small inward current before the large inward current in the inset? The slope of the rising phase of the larger sodium current seems greater than Nav1.6 or Nav1.5. Was this examined?

      We apologize for the confusion. We would like to clarify that the small inward current can be attributed to the current of membrane capacitance (slow capacitance or C-slow). The larger inward current is mediated by NaV1.2. Additionally, we did not compare the slope of the rising phase of NaV subtypes sodium currents but primarily focused on the current amplitudes.

      ②D-E

      For Nav1.5 the sodium current is very large compared to Nav1.6. Is it possible the greater effect of quinidine for Nav1.6 is due to the lesser sodium current of Nav1.6?

      We thank the reviewer for raising this point. We would like to clarify that our results indicate that transient sodium currents contribute to the sensitization of Slack to quinidine blockade (Fig. 2C,E). Therefore, it is unlikely that the greater effect observed for NaV1.6 in sensitizing Slack is due to its lower sodium currents.

      ③The differences between WT and KO in G -H are hard to appreciate. Could quantification be shown? The text uses words like "block" but this is not clear from the figure. It seems that the replacement of Na+ with Li+ did not block the outward current or effect of quinidine.

      We apologize for the confusion. We would like to clarify the methods used in this experiment. The lithium ion (Li+) is a much weaker activator of sodium-activated potassium channel Slack than sodium ion (Na+)1,2.

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262

      Therefore, we replaced Na+ with Li+ in the bath solution to measure the current amplitudes of sodium-activated potassium currents (IKNa)3.

      1. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      The following equation was used for quantification:

      Furthermore, the remaining IKNa after application of 3 μM quinidine in the bath solution was measured as the following:

      The quantification results were presented in Fig. 1K. The term "block" used in the text referred to the inhibitory effect of quinidine on IKNa.

      ④In K, for the WT, why is the effect of quinidine only striking for the largest currents?

      We thank the reviewer for raising this point. After conducting an analysis, we found no correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (p = 0.6294) (Author response image 2). Therefore, the effect of quinidine is not solely limited to targeting the larger currents.

      Author response image 2.

      The correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (data from manuscript Fig. 1K). r = 0.1555, p=0.6294, Pearson correlation analysis.

      5) Figure 2

      ①A. The argument could be better made if the same concentration of quinidine were used for Slack and Slack + Nav1.6. It is recognized a greater sensitivity to quinidine is to be shown but as presented the figure is a bit confusing.

      We apologize for the confusion. We would like to clarify that the presented concentrations of quinidine were chosen to be near the IC50 values for Slack and Slack+NaV1.6.

      ②C. Can the authors add the effect of quinidine to the condition where the prepulse potential was - 90?

      We apologize for the confusion. We would like to clarify that the condition of prepulse potential at -90 mV is the same as the condition in Fig. 1. We only changed one experiment condition where the prepulse potential was changed to -40 mV from -90 mV.

      6) Figure 3.

      ①line 80 should be coronal not coronary

      We thank the reviewer for catching this error. We have corrected the term “coronary” to “coronal” in the caption of Figure 3.

      ②A. Clarify these 6 panels.

      We thank the reviewer for raising this point. We have clarified the captions of Fig. 3A in the revised manuscript.

      ③Please enlarge fonts in D.

      We thank the reviewer’s suggestion. We’ve enlarged the fonts in Fig. 3D in the revised manuscript.

      ④F. The variances should be checked with a test to determine if they are significantly different because they look different - if so, data can be transformed and if transformed data have variances that are equivalent a t-test can be used on the transformed data. Otherwise, Mann-Whitney should be used.

      We thank the reviewer for pointing this out. We have reanalyzed the data in Fig. 3F using Mann Whitney test after identifying the different variances in the two groups.

      7) Figure 7. The images need more clarity. They are very hard to see. Text is also hard to see.

      We apologize for the lack of clarity in the images and text. we would like to provide a concise summary of the key findings shown in this figure.

      Figure 7 illustrates an innovative intervention for treating SlackG269S-induced seizures in mice by disrupting the Slack-NaV1.6 interaction. Our results showed that blocking NaV1.6-mediated sodium influx significantly reduced Slack current amplitudes (Fig. 2D,G), suggesting that the Slack-NaV1.6 interaction contributes to the current amplitudes of epilepsy-related Slack mutant variants, aggravating the gain-of-function phenotype. Additionally, Slack’s C-terminus is involved in the Slack-NaV1.6 interaction (Fig. 5D). We assumed that overexpressing Slack’s C-terminus can disrupt the Slack-NaV1.6 interaction (compete with Slack) and thereby encounter the current amplitudes of epilepsy-related Slack mutant variants.

      In HEK293 cells, overexpression of Slack’s C-terminus indeed significantly reduced the current amplitudes of epilepsy-related SlackG288S and SlackR398Q upon co-expression with NaV1.5/6NC (Fig. 7A,B). Subsequently, we evaluated this intervention in an in vivo epilepsy model by introducing the Slack G269S variant into C57BL/6N mice using AAV injection, mimicking the human Slack mutation G288S that we previously identified (Fig. 7C-G).

      ②It is not clear how data were obtained because injection of kainic acid does not lead to a convulsive seizure every 10 min for several hours, which is what appears to be shown. Individual seizures are just at the beginning and then they merge at the start of status epilepticus. After the onset of status epilepticus the animals twitch, have varied movements, sometime rear and fall, but there is not a return to normal behavior. Therefore one can not call them individual seizures. In some strains of mice, however, individual convulsive seizures do occur (even if the EEG shows status epilepticus is occurring) but there are rarely more than 5 over several hours and the graph has many more. Please explain.

      We apologize for the confusion. Regarding the data acquisition in relation to kainic acid injection, we initiated the timing following intraperitoneal injection of kainic acid and recorded the seizure scores of per mouse at ten-minute intervals, following the methodology described in previous studies4.

      1. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      The seizure scores were determined using a modified Racine, Pinal, and Rovner scale5,6: (1) Facial movements; (2) head nodding; (3) forelimb clonus; (4) dorsal extension (rearing); (5) Loss of balance and falling; (6) Repeated rearing and failing; (7) Violent jumping and running; (8) Stage 7 with periods of tonus; (9) Dead.

      1. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      2. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/0013- 4694(72)90177-0

      8) The graphical abstract is quite complicated and somewhat hard to follow. Please simplify and clarify. One aspect of the abstract to clarify is the direction of what is first and second and third (etc.) because arrows point to many directions.

      We thank the review for raising this point. In the revised manuscript, we have included numbering of three components within the graphical abstract:

      1. Pathological phenotype: Increased Slack currents.

      2. Two types of interventions:

      2a. Disruption of the Slack-NaV1.6 interaction.

      2b. NaV1.6-mediated sensitization of Slack to quinidine blockade.

      1. Therapeutic effects: Reduced Slack currents.

      Reviewer #3 (Recommendations For The Authors):

      1) A reference to homozygous knockout is made in the abstract; however, only heterozygous mice are mentioned in the methods section. The genotype of the mice needs to be made clear in the manuscript. Furthermore, at what age were these mice used in the study. Since homozygous knockout of NaV1.6 is lethal at a very young age (<4 wks), it would be important to clarify that point as well.

      We thank the reviewer for pointing this out. In the revised manuscript, we have included information about the source of the primary cortical neurons used in our study. These neurons were obtained from postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls.

      2) Coimmunoprecipitation studies in Fig. 3C are not convincing. There appears to be a signal in the control lane. Furthermore, it appears that brightness levels were adjusted of that image, thereby removing completely the background.

      We thank the reviewer for pointing this out. We have replaced Fig. 3C with an unadjusted version in the revised manuscript.

      3) In Fig. 1B, the authors indicate that 30 microM of quinidine was used, while the corresponding figure legend suggest that 1 microM. Please clarify.

      We apologize for this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      4) How long were the cells exposed to quinidine before the functional measurement were performed?

      We thank the reviewer for pointing this out. The cells were exposed to the bath solution with quinidine for about one minute before applying step pulses.

      5) In Fig. 6B-D, it is not clear to what extent co-expression of Slack mutants and NaV1.6 increases sodium-activated potassium current.

      We thank the reviewer for pointing this out. We notice that the current amplitudes of Slack mutants exhibit a considerable degree of variation, ranging from less than 1 nA to over 20 nA (n = 5-8). To accurately measure the effects of NaV1.6 on increasing current amplitudes of Slack mutants, we plan to apply tetrodotoxin in the bath solution to block NaV1.6 sodium currents upon coexpression of Slack mutants with NaV1.6.

      6) In Fig.7A and B, it appears that some recordings had no sodium-activated potassium currents. Why were these included in analysis? How was transfection efficacy assessed?

      We apologize for the confusion. We would like to clarify that all recordings included in analysis indeed exhibited outward sodium-activated potassium currents. The current density data in Fig. 7A-B are listed in Author response table 1 (in pA/pF):

      Author response table 1.

      Regarding the assessment of transfection efficacy, we estimated it approximately by using fluorescence proteins as reporters, which were co-expressed with the relevant proteins via the selfcleaving 2A peptide.

      7) Greater detail needs to be provided for the generation of NaV1.5 and NaV1.6 chimeras. Specifically, what AA residues were changed between sodium channel isoforms?

      We thank reviewer for pointing this out. In the revised manuscript, we have included the specific amino acid residues that were changed between NaV1.5 and NaV1.6 to generate the chimeric constructs.

      8) In line 481, the authors refer to Fig. S2d instead of Fig. S6D. This should be corrected. Furthermore, the unusual shift in sodium current kinetics that the authors observe might be due in part to junction potential. Did the authors take that into consideration?

      We apologize for this error. The reference to "Fig. S2d" has been corrected to "Fig. S6D" in the revised manuscript.

      Regarding the unusual shift observed in the sodium current kinetics, we agree with the reviewer's suggestion that the junction potential may contribute to this phenomenon. During patch-clamp recordings, we ensure that the junction potential was properly compensated by the amplifier. Additionally, the replacement of CsF in pipette solution may have contributed to the observed unusual shift, as CsF in pipette solution has been reported to shift the voltage dependence of activation and fast/slow inactivation of NaV channels towards more negative potentials7.

      1. Korngreen A. Advanced patch-clamp analysis for neuroscientists. Neuromethods. Humana Press; 2016:xii, 350 pages.

      9) Legends for Fig.S6E and S6F are flipped. Please correct.

      We apologize for this error. We have rectified the flipped captions for figure S6E and S6F in the revised manuscript.

      10) Variance should be provided for the IC50 values and kinetic parameters of the sodium channels in the supplemental tables.

      We thank the reviewer for raising this point. We have included the 95% confidence interval (95%CI) for the IC50 values and kinetic parameters in the revised supplementary tables.

      Additionally, we have corrected some equations in the methods section:

      1. Line 500 and line 503: We have corrected equation (1) by adding the parameter hill coefficient.

      2. Line 514: We have revised equation (4) from to

    1. Author Response

      We thank the two reviewers and the reviewing editor for their positive evaluation of our manuscript. Especially, we appreciate the useful comments and suggestions on how the manuscript can be improved and which directions would be promising for future work on this topic. We would like to point out that we did consider the possibility that the plant enzymes produce ethylene in the same manner as EFE, but so far we did not obtain any evidence for such an activity (Supplementary Figure 3). We also performed some preliminary experiments with plants subjected to biotic stress, but the results suggested that neither defence responses nor pipecolate and proline biosynthesis depend to a significant extent on the 2-ODD-C23 enzymes. We plan to address these questions in more detail in further experiments. Depending on the outcome, we will either incorporate the results into a revised version of the present manuscript, or present them as follow-up studies. Concerning the possibility of testing all types of pathogens that affect expression of the 2-ODD-C23 genes, it is beyond our capacity and beyond the scope of the present manuscript. We hope, however, that such experiments can be the subject of a future research project in collaboration with experts in plant-pathogen interactions.

    1. Author Response

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      We thank the reviewer for this comment. We will show the data in the revised manuscript.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We will describe the limitation and advantage of our strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      We apologize for not mentioning it clearly. As we have confirmed the unresponsiveness using synthetic HCoV peptides, we will include these data in the revised manuscript.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      We thank the reviewer for this helpful comment. We will add the discussion to the revised manuscript.

      Reviewer #3 (Public Review):

      Summary: The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      We will carefully describe the interpretation of the data with statistical analysis in the revised manuscript.

      2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We apologize for the insufficient explanation and will describe how we performed cell annotation in the revised manuscript.

      3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important point. We will describe the limitation of the strategy. In addition, we will include some data in accordance with the reviewer’s recommendation.

      4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We will also show the proportion of clonotypes in the revised manuscript.

      5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have exactly the same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

    1. Author Response

      Reviewer #1 (Public Review):

      Drawing on insights from preceding studies, the researchers pinpointed mutations within the spag7 gene that correlate with metabolic aberrations in mice. The precise function of spag7 has not been fully described yet, thereby the primary objective of this investigation is to unravel its pivotal role in the development of obesity and metabolic disease in mice. First, they generated a mice model lacking spag7 and observed that KO mice exhibited diminished birth size, which subsequently progressed to manifest obesity and impaired glucose tolerance upon reaching adulthood. This behaviour was primarily attributed to a reduction in energy expenditure. In fact, KO animals demonstrated compromised exercise endurance and muscle functionality, stemming from a deterioration in mitochondrial activity. Intriguingly, none of these effects was observed when using a tamoxifen-induced KO mouse model, implying that Spag7's influence is predominantly confined to the embryonic developmental phase. Explorations within placental tissue unveiled that mice afflicted by Spag7 deficiency experienced placental insufficiency, likely due to aberrant development of the placental junctional zone, a phenomenon that could impede optimal nutrient conveyance to the developing fetus. Overall, the authors assert that Spag7 emerges as a crucial determinant orchestrating accurate embryogenesis and subsequent energy balance in the later stages of life.

      The study boasts several noteworthy strengths. Notably, it employs a combination of animal models and a thorough analysis of metabolic and exercise parameters, underscoring a meticulous approach. Furthermore, the investigation encompasses a comprehensive evaluation of fetal loss across distinct pregnancy stages, alongside a transcriptomic analysis of skeletal muscle, thereby imparting substantial value. However, a pivotal weakness of the study centres on its translational applicability. While the authors claim that "SPAG7 is well-conserved with 97% of the amino acid sequence being identical in humans and mice", the precise role of spag7 in the human context remains enigmatic. This limitation hampers a direct extrapolation of findings to human scenarios. Additionally, the study's elucidation of the molecular underpinnings behind the spag7-mediated anomalous development of the placental junction zone remains incomplete. Finally, the hypothesis positing a reduction in nutrient availability to the fetus, though intriguing, requires further substantiation, leaving an aspect of the mechanism unexplored.

      Hence, in order to fortify the solidity of their conclusions, these concerns necessitate meticulous attention and resolution in the forthcoming version of the manuscript. Upon the comprehensive addressing of these aspects, the study is poised to exert a substantial influence on the field, its significance reverberating significantly. The methodologies and data presented undoubtedly hold the potential to facilitate the community's deeper understanding of the ramifications stemming from disruptions during pregnancy, shedding light on their enduring impact on the metabolic well-being of subsequent generations.

      Thanks to this reviewer for their thoughtful analysis and commentary. Human mutations in SPAG7 are exceedingly rare (SPAG7 | pLoF (genebass.org)), potentially because of the deleterious effects of SPAG7-deficiency on prenatal development. This makes investigation into the causative effects of SPAG7 in humans challenging. There exist mutations in the SPAG7 region of the genome that are associated with BMI, but no direct coding variants within the spag7 gene itself have been studied.

      We agree with the reviewer that the precise role of spag7 in the placenta remains unknown. However, given its robust expression and high protein levels in the placenta, including in key cells, such as the syncytiotrophoblast (https://www.proteinatlas.org/ENSG00000091640-SPAG7/tissue/Placenta), it is highly likely that spag7 is critical for normal placenta development and function. Multiple studies (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716072/) have recently shown that sperm associated RNAs play a critical role in embryonic and early placenta development. Our findings will provide the basis for future studies that can elucidate the role of spag7 in human placenta.

      Reviewer #2 (Public Review):

      Summary: The authors of this manuscript are interested in discovering and functionally characterizing genes that might cause obesity. To find such genes, they conducted a forward genetic screen in mice, selecting strains which displayed increased body weight and adiposity. They found a strain, with germ-line deficiency in the gene Spag7, which displayed significantly increased body weight, fat mass, and adipose depot sizes manifesting after the onset of adulthood (20 weeks). The mice also display decreased organ sizes, leading to decreased lean body mass. The increased adiposity was traced to decreased energy expenditure at both room temperature and thermoneutrality, correlating with decreased locomotor activity and muscle atrophy. Major metabolic abnormalities such as impaired glucose tolerance and insulin sensitivity also accompanied the phenotype. Unexpectedly, when the authors generated an inducible, whole body knockout mouse using a globally expressed Cre-ERT2 along with a globally floxed Spag7, and induced Spag7 knockout before the onset of obesity, none of the phenotypes seen in the original strain were recapitulated. The authors trace this discrepancy to the major effect of Spag7 being on placental development.

      Strengths: Strengths of the manuscript are its inherently unbiased approach, using a forward genetic screen to discover previously unknown genes linked to obesity phenotypes. Another strong aspect of the work was the generation of an independent, complementary, strain consisting of an inducible knockout model, in which the deficiency of the gene could be assessed in a more granular form. This approach enabled the discovery of Spag7 as a gene involved in the establishment of the mature placenta, which determines the metabolic fate of the offspring. Additional strengths include the extensive array of physiological parameters measured, which provided a deep understanding of the whole-body metabolic phenotype and pinpointed its likely origin to muscle energetic dysfunction.

      Weaknesses: Weaknesses that can be raised are the lack of molecular mechanistic understanding of the numerous phenotypic observations. For example, the specific role of Spag7 to promote placental development remains unclear. Also, the reason why placental developmental abnormalities lead to muscle dysfunction, and whether indeed the entire metabolic phenotype of the offspring can be attributed solely to decreased muscle energetics is not fully explored.

      Overall, the authors achieved a remarkable success in identifying genes associated with development of obesity and metabolic disease, discovering the role of Spag7 in placental development, and highlighting the fundamental role of in-utero development in setting future metabolic state of the offspring.

      We thank this reviewer for their thoughtful analysis and commentary. Significant effort has been made to understand the causes of the metabolic phenotypes observed in SPAG7-deficient mouse models. It is clear that hyperphagia is not the cause and the muscle energetics deficit is likely not the sole cause. We expect that decreased access to nutrition in utero will lead to widespread and varied metabolic adaptation.

      We agree with the reviewer that further work can be done to understand the molecular mechanism driving the metabolic phenotypes of SPAG7-deficient animals. We believe that full investigation of the processes behind the developmental abnormalities is beyond the scope of this paper and best to be done under a separate paper.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Flaherty III S.E. et al identified SPAG7 gene in their forward mutagenetic screening and created the germline knockout and inducible knockout mice. The authors reported that the SPAG7 germline knockout mice had lower birth weight likely due to intrauterine growth restriction and placental insufficiency. The SPAG7 KO mice later developed obesity phenotype as a result of reduced energy expenditure. However, the inducible SPAG7 knockout mice had normal body weight and composition.

      Strengths:

      In this reviewer's opinion, this study has high significance in the field of metabolic research for the following reasons.

      (1) The authors' findings are significant in the field of obesity research, especially from the perspective of maternal-fetal medicine. The authors created and analyzed the SPAG7 KO mice and found that the KO mice had a "thrifty phenotype" and developed obesity.

      (2) SPAG7 gene function hasn't been thoroughly studied. The reported phenotype will fill the gap of knowledge.

      Overall, the authors have presented their results in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings.

      Weaknesses:

      The manuscript can be further strengthened with more clarification on the following points.

      1) The germline whole-body KO mice were female mice (Line293), however the inducible knockout mice were male mice (Line549). Sexual dimorphism is often observed in metabolic studies, therefore the metabolic phenotype of both female and male mice needs to be reported for the germline and inducible knockouts in order to make the justified conclusion.

      We thank the reviewer for their thoughtful analysis and commentary. All inducible KO animals described in the paper are female (the typo in Line 549 has been corrected). We did perform studies in both male and female animals for both of these lines. Males display similar metabolic phenotypes, though not as robustly as the females. A table summarizing key data from male and female germline KO animals and inducible KO animals has been included in Author response table 1.

      Author response table 1.

      2) SPAG7 has an NLS. Does this protein function in gene expression? Whether the overall metabolic phenotype is the direct cause of SPAG7 ablation is unclear. For example, the Hsd17b10 gene was downregulated in all tissues in the KO mice. Could this have been coincidentally selected for and thus be the cause of the developmental issues and adulthood obesity? Do the iSpag7 mice demonstrate reduced expression of Hsd17b10?

      SPAG7 contains an R3H domain, which is predicted to bind polynucleotides, and other proteins that contain R3H domains are known to bind RNA or ssDNA. The iSPAG7 mice do display decreased hsd17b10 expression (to a lesser degree than the germline KOs) in the tissues examined. When we knock-down SPAG7 in specific tissues, we also see hsd17b10 expression decrease specifically in those tissues. These data all suggest that hsd17b10 expression is, at least, linked to spag7 expression. They also raise the question of why these animals have no metabolic phenotype. Some possible explanations are that hsd17b10 expression is essential only during early development, or that the lower magnitude of downregulation of hsd17b10 in the iSPAG7 is insufficient to produce the metabolic phenotypes seen in the germline Kos with higher magnitude of downregulation.

      3) Figure 2c should display the energy expenditure normalized to body weight (or lean body mass).

      How best to normalize total energy expenditure data is a subject of debate within the energy expenditure field. As the animals have increased body weight and decreased lean mass, normalizing to either will skew the results in different directions. We have included the data normalized to body weight and to lean mass in Author response image 1. The decrease in total energy expenditure remains significant in either scenario.

      Author response image 1.

      4) Please provide more information for the figure legend, including the statistical test that was conducted for each data set, animal numbers for each genotype and sexes.

      This information has been added to all figures.

      5) The authors should report how long after treatment the data was collected for figures 4F-M.

      Weeks after treatment have been added to the figure legends for Figures 4F-M.

      6) The authors should justify ending the data collection after 8 weeks for the iSPAG7 mice in Figures 4C-E. In the WT vs germline KO mice, there was no clear difference in body weight or lean mass at 15 weeks of age.

      Highly significant changes in fat mass, glucose tolerance and insulin sensitivity are already present in the germline SPAG7 KO mice at age of 15 week or earlier. Tamoxifen injection effectively induced SPA7 gene KO in less than a week in the iSPAG7 KO mice. Given the absence of significant changes or any trends towards significance in glucose and insulin tolerance test as well as other metabolic testes in the iSPAG7 KO mice at age of 15 week (same age as the germline KO when these changes observed) and 8 week after SPAG7 gene KO, we did not anticipate to see the changes beyond this point and decided to stop the study at 9 weeks after treatment.

    1. Author Response

      Reviewer #1 (Public Review):

      Gambelli et al. provide a structural study of the SlaA/SlaB S-layer of the archaeon Sulfolobus acidocaldarius. S-layers form an essential component of most archaeal cell envelopes, where their self-assembling properties and activity as cell envelope support structures have raised substantial interest, both from researchers seeking to understand the fundamental biology of archaea, as well as researchers seeking to exploit the biomaterial properties of S-layers in biotechnological applications. Both interests are hampered by the paucity of structural information on archaeal S-layer assembly, structure, and function to date, in large part due to technical difficulties in their study.

      In this study, Gambelli and coworkers overcome these difficulties and report the high-resolution 3D cryoEM structures of the purified SlaA monomers at three different pH, as well as the medium resolution 3D cryoET structures of the SlaA/SlaB lattices determined from S-layer fragments isolated from the Sulfolobus cells.

      The structural work is generally well executed, although lacks in detail in places to allow a proper review, particularly in the cryoET. A further drawback of the current manuscript is that the structural work remains rather descriptive and speculative, with little validation of the proposed models.

      The authors run a plethora of representation, analyses, prediction, and simulation software on their structures resulting in an abundance of Figures that risk overloading the reader and in several cases bring little new insight beyond unsubstantiated speculation.

      We understand the reviewer’s concern about the number of figures presented in the manuscript. To avoid overloading the reader, we have further simplified the supplementary figures and provided additional context and explanations in the narrative of the manuscript to ensure that the reader can follow the data presented. We have also improved unclarities in legends, making sure that they provide clearer explanations of the data. Additionally, we have taken extra care to connect each figure to the main findings, emphasising how each piece of data contributes to the overall understanding of the structures.

      We find it difficult to agree with the assertion of unsubstantiated speculation. We carefully justified our interpretation of our data, referring to well-established principles and relevant literature. Nevertheless, we have attempted to provide further context and clarification in the revised manuscript. Where appropriate, we have acknowledged the limitations of our analyses and have made sure to note where further research is needed to confirm their findings.

      The structural description of the S. acidocaldarius S-layer will be of high general interest and the authors have made a substantial leap forward, but the current manuscript would benefit from a better validation and basic atomic description of the SlaA/SlaB S-layer.

      Specific points.

      • It is not possible to review the quality of the SlaA and SlaA/SlaB models in the cryoET reconstruction. No detailed fits of the map and model are shown, and no correlation statistics are given (the latter is also true for the higher resolution 3D reconstructions at pH4, 7, and 10). To be of use to the community, the S-layer model and cryoET maps should also be deposited in PDB and EMDB, and an autodep report and ideally the cryoET maps should be available.

      Maps and models for the SlaA single particle at pH4, 7 and 10 have now been released on the PDB database under the accession codes PDB-7ZCX, PDB-8AN3 and PDB-8AN2 and all validation statistics can be accessed there. We have also provided a standard cryoEM statistics table with the manuscript.

      We have also changed the main figures 4 and 5 to include more detail about the STA maps and models. We have deposited the sub-tomogram averaging map in the EMDB (EMD-18127) and models of the hexameric and trimeric pores in the Protein Databank under accession codes PDB-8QP0 and PDB-8QOX, respectively (with status release upon publication). We have also attached the map and models as supporting files to this rebuttal.

      • The authors spend a great deal on the MD simulation of the SlaA glycans and the description of the 'glycan shield' and its possible role in subunit electrostatics and intersubunit contacts. This does not result in testable hypotheses, however, and does not bring much more than vague speculation on the role of the glycans or the subunits contacts in S-layer assembly and stability.

      We propose that our glycan analysis does lead to a testable hypothesis, which could for example be tested by a future study involving the genetic or enzymatic ablation of glycosylation sites and the subsequent investigation of the structure and stability of the S-layer. We have included this statement in our manuscript to inspire future research in this direction.

      • For the primary description of the SlaA/B S-layer, more important would be a detailed atomic description and validation of the intermolecular contacts in the proposed lattice model. Given the low resolution of the cryoET, this would require MD simulation of the contacts. Lattice stability during MD simulation and/or the confirmation of lattice contacts by cross-linking mass spectrometry would go a great way in validating the proposed lattice model.

      We have improved our map and model by reprocessing our sub-tomogram averages (STA) using a different pipeline (Warp and M). We are now able to visualise more of SlaB, and the new map agrees with our Alphafold predictions of the SlaB trimer. The new map also clearly shows the interaction sites between SlaA and SlaB, as well as how SlaB integrates into the lipid bilayer. We have made new figures that now correlate the STA with the atomic model more clearly.

      Taking the reviewer’s suggestions on board, we have used Namdinator – a molecular dynamics-based flexible fitting software, to refine our model. Due to RAM limitations, we had to split our model into two pdb files. The first contains 6 SlaA monomers delineating a hexameric pore and the second, 3 SlaB monomers and 5 SlaA in the region of a trimeric pore. While the new models largely agree with the original, Namdinator did improve them. The IgG domains of SlaB now fill previously unoccupied areas of the map and any clashes have been removed. Notably, the way that SlaA is modelled is the only way in which the subunits can be reconciled with the map. This is especially true for the surface glycans, which in our model are excluded from any of the intermolecular interfaces and thus remain free to move around in the solvent. In any other SlaA configuration, there would be severe clashes between neighbouring polypeptide backbones or proteins and surface glycans and thus be sterically or entropically unfavourable.

      Unfortunately, full MD simulations of the entire S-layer array would necessitate the simulation of at least 36 SlaA monomers, including glycans, in addition to 9 SlaB monomers integrated into a membrane and solvent environment, implying >8 Million atoms. Such largescale models would only enable the simulation of very short simulation times (on the order of no more than 100 nanoseconds). Such time scales would preclude the observation of major changes, even if the model was sub-optimally configured.

      • The discussion of the subunit electrostatics and the role they could play in subunit assembly/disassembly remains superficial and speculative. No real model or hypothesis is put forward, let alone validated.

      We have rephrased the discussion to clearly state our hypothesis regarding S-layer disassembly. Hopefully, it should now be clearer that from our data, we deduce that S-layer disassembly at high pH is likely not driven by protein unfolding or pH-induced conformational change. We hypothesise that instead the pH-induced disassembly is likely caused by a weakening or abolishment of hydrogen bonds, as the proton concentration is reduced.

      • The authors solve the cryoEM structure of SlaA released and purified form S. acidocaldarius S-layers by an alkaline pH shift. When shifted back to acidic pH, does this native material self-assemble in vitro? If not, do the authors have an explanation for this? Are components missing or could the solved structures represent SlaA conformations that are no longer assembly competent?

      We have previously shown that S. acidocaldarius S-layers disassembled by a pH shift from acidic to alkaline reassemble when the pH is shifted back to acidic. We also demonstrated that this disassembly / reassembly works with both SlaB present and absent, showing that SlaA alone can assemble into an S-layer (Gambelli et al, PNAS, 2019). This means that the SlaA protein that we imaged in this manuscript is indeed reassembly competent. We have included a sentence clarifying this in the first paragraph of the Results section and have discussed our hypothesis for the mechanism underlying assembly and disassembly in detail.

      Reviewer #2 (Public Review):

      Gambelli et al. investigated the surface layer (S-layer) of Sulfolobus acidocaldarius by using combined single particle cryo-electron microscopy (cryoEM), cryo-electron tomography (cryoET), and Alphafold2 predictions to generate an atomic model of this outermost cell envelope structure. As known from previous studies, the two-dimensional lattice comprises two distinct S-layer glycoproteins (SLPs) termed SlaA, the outer component interacting with the harsh living environment of this archaeon, and SlaB, comprising a dominant hydrophobic domain, which anchors this SLP in the cytoplasmic membrane, respectively. The interwoven S-layer lattice of S. acidocaldarius shows a hexagonal lattice symmetry with a p3 topography. It is built very complex as the unit cell constitutes of one SlaB trimer and three SlaA dimers (SlaB3/3SlaA2). Despite the complexity of this distinct proteinaceous S-layer lattice, the authors not only investigated the SLP structures but also considered the glycans in their structure predictions.

      The strengths of this study are that it was possible, and the first approach taken, to divide the Y-shaped SlaA SLP, starting from the N-terminus into six domains, D1 to D6. As previous studies revealed that SlaA assembly and disassembly are pH-sensitive processes, the structure of SlaA was investigated at different pH conditions. This approach led to the striking result that the cryoEM maps of SlaA D1 to D4 are virtually identical at the three pH conditions, demonstrating remarkable pH stability of these protein domains. For SlaA at low pH, however, the domains D5 and D6 were too flexible to be resolved in the cryoEM maps. Nevertheless, the authors were able to hypothesize that jackknife-like conformational changes of a link between domains D4 and D5, as well as pH-induced alterations in the surface charge of SlaA play important roles in S-layer assembly. This study showed in addition, that the surface charges of SlaA shift significantly from positive at acidic pH to negative at basic pH. A comparison of the surface charge between glycosylated and non-glycosylated SlaA showed that the glycans contribute considerably to the negative charge of the protein at higher pH values. This change in electrostatic surface potential may therefore be a key factor in disrupting protein-protein interactions within the S-layer, causing its disassembly as it is highly desired for new practical applications in biomolecular nanotechnology and synthetic biology. An excellent approach was to use exosomes to determine the structure of the entire S-layer structure comprising of SlaA and SlaB. By this approach, effectively two zones in the SlaA assembly could be distinguished: an outer zone constituted by D1 to D4, and one inner zone formed by D5 and D6. Moreover, for the first time, deeper insights into how SlaA forms the hexagonal and triangular pores within the S-layer lattice of S. acidocaldarius are provided. Very interesting are the found SlaA dimers, which are suggested to be formed by two SlaA monomers through the D6 domains, with each SlaA dimer spanning two adjacent hexagonal pores.

      The weaknesses in this work are in the introduction, where the citation is incomplete. In the comparisons drawn between archaeal and bacterial S-layers, basic citations are missing for the latter. One gets the impression that there is a deliberate avoidance of citing individual prominent S-layer research groups here. The same is true for citations of glycosylation of archaeal S-layer proteins and Sulfolobus mutants lacking SlaB.

      We thank the reviewer for suggesting the inclusion of additional references. We would like to reassure the reviewer that we did not intend any deliberate omissions. Instead, we aimed to focus on archaeal S-layers and thus did not provide a detailed overview of bacterial S-layers. We have now incorporated more references on bacterial S-layers, hoping that this will be provide a more balanced overview.

      The authors show many pictures and schematic drawings of high quality. In the main text, these illustrations should be briefly commented on if there is any ambiguity. For example, it is somewhat difficult to understand that in one schematic drawing the angle between the SlaA longitudinal axis and the membrane plane is 28 degrees and at the same time in another schema, the angle of the longitudinal axes in SlaA dimers is given as 160 degrees.

      We thank the reviewer for their appreciation for our figures. To clarify, the angles mentioned are two different ones. The 28 degrees angle is located between the cytoplasmic membrane and the longitudinal axis of an SlaA monomer in the assembled S-layer. The 160 degrees angle is located between two SlaA monomers forming a dimer.

      The authors argue that by a pH shift to 10, SlaA disassembles and exists exclusively as a single molecule. The presence of exclusively single SlaA proteins and the purity of the fractions were assessed by SDS/PAGE analysis and cryoEM micrographs. However, one can doubt that, due to the strong denaturing effect of SDS and the subsequent dissociation of protein complexes, SlaA dimers or oligomers could have been determined with SDS/PAGE.

      To clarify, we did not assess the assembly state of the S-layer by SDS PAGE, as we are aware that assembled S-layers would not travel into the gel. Instead, we assessed the assembly state by negative stain electron microscopy. Class averages of purified SlaA did not reveal any dimers or higher oligomers.

      Moreover, the shown representative micrographs (supplementary figure 2, a-c) show a heterogeneous structure and thus, do not support the exclusive presence of disassembled SlaA monomers.

      We are not sure what exactly the reviewer is referring to, there are only single SlaA particles visible in supplementary figure 2, a-c. (new ) Larger, amorphous “blobs” in the panels are likely ethane contaminations on the cryoEM grid.

      An interesting finding is SlaA dimerization. SlaA dimers can obviously be found in co-existence with SlaA-only S-layer as shown in supplementary figure 15. A short discussion on whether dimers are an intermediate structure in the process of S-layer lattice formation from monomeric SlaA or if this structure was just a coincident observation could help the reader to better understand the meaning of these dimeric structures and at which stage they are formed.

      We thank the reviewer for their suggestion and added a brief statement to the discussion to clarify this point: “Their co-existence with assembled S-layer may indicate that SlaA dimers are an intermediate of S-layer assembly or disassembly.” The figure numbering was updated, so supplementary figure 15 has now become Figure 4-figure supplement 4.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Royall et al. builds on previous work in the mouse that indicates that neural progenitor cells (NPCs) undergo asymmetric inheritance of centrosomes and provides evidence that a similar process occurs in human NPCs, which was previously unknown.

      The authors use hESC-derived forebrain organoids and develop a novel recombination tag-induced genetic tool to birthdate and track the segregation of centrosomes in NPCs over multiple divisions. The thoughtful experiments yield data that are concise and well-controlled, and the data support the asymmetric segregation of centrosomes in NPCs. These data indicate that at least apical NPCs in humans undergo asymmetric centrosome inheritance. The authors attempt to disrupt the process and present some data that there may be differences in cell fate, but this conclusion would be better supported by a better assessment of the fate of these different NPCs (e.g. NPCs versus new neurons) and would support the conclusion that younger centriole is inherited by new neurons.

      We thank the reviewer for their supportive comments (“…thoughtful experiments yield data that are concise and well-controlled…”).

      Reviewer #2 (Public Review):

      Royall et al. examine the asymmetric inheritance of centrosomes during human brain development. In agreement with previous studies in mice, their data suggest that the older centrosome is inherited by the self-renewing daughter cell, whereas the younger centrosome is inherited by the differentiating daughter cell. The key importance of this study is to show that this phenomenon takes place during human brain development, which the authors achieved by utilizing forebrain organoids as a model system and applying the recombination-induced tag exchange (RITE) technology to birthdate and track the centrosomes.

      Overall, the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology. The Discussion is excellent, it brings this study into the context of previous work and proposes very appealing suggestions on the evolutionary relevance and underlying mechanisms of the asymmetric inheritance of centrosomes. The main weakness of the study is that it tackles asymmetric inheritance only using fixed organoid samples. Although the authors developed a reasonable mode to assign the clonal relationships in their images, this study would be much stronger if the authors could apply time-lapse microscopy to show the asymmetric inheritance of centrosomes.

      We thank the reviewer for their constructive and supportive comments (“…the study is well executed and brings new insights of general interest for cell and developmental biology with particular relevance to developmental neurobiology….”). We understand the request for clonal data or dynamic analyses in organoids (e.g., using time-lapse microscopy). We also agree that such data would certainly strengthen our findings. However, as outlined above (please refer to point #1 of the editorial summary), this is unfortunately currently not feasible. However, we have explicitly discussed this shortcoming in our revised manuscript and why future experiments (with advanced methodology) will have to do these experiments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors report that human cortical radial glia asymmetrically segregates newly produced or old centrosomes after mitosis, depending on the fate of the daughter cell, similar to what was previously demonstrated for mouse neocortical radial glia (Wang et al. 2009). To do this, the authors develop a novel centrosome labelling strategy in human ESCs that allows recombination-dependent switching of tagged fluorescent reporters from old to newly produced centrosome protein, centriolin. The authors then generate human cortical organoids from these hESCs to show that radial glia in the ventricular zone retains older centrosomes whereas differentiated cells, i.e. neurons, inherit the newly produced centrosome after mitosis. The authors then knock down a critical regulator of asymmetric centrosome inheritance called Ninein, which leads to a randomization of this process, similar to what was observed in mouse cortical radial glia.

      A major strength of the study is the combined use of the centrosome labelling strategy with human cortical organoids to address an important biological question in human tissue. This study is similarly presented as the one performed in mice (Wang et al. 2009) and the existence of the asymmetric inheritance mechanism of centrosomes in another species grants strength to the main claim proposed by the authors. It is a well-written, concise article, and the experiments are well-designed. The authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon. However, there are some key controls that would elevate the main conclusions considerably.

      We thank the reviewer for their overall support of our findings (“..authors achieve the aims they set out in the beginning, and this is one of the perfect examples of the right use of human cortical organoids to study an important phenomenon…”). We also understand the reviewer’s request for additional experiments/controls that “…would elevate the main conclusions considerably.”

      1) The lack of clonal resolution or timelapse imaging makes it hard to assess whether the inheritance of centrosomes occurs as the authors claim. The authors show that there is an increase in newly made non-ventricular centrosomes at a population level but without labelling clones and demonstrating that a new or old centrosome is inherited asymmetrically in a dividing radial glia would grant additional credence to the central conclusion of the paper. These experiments will put away any doubt about the existence of this mechanism in human radial glia, especially if it is demonstrated using timelapse imaging. Additionally, knowing the proportions of symmetric vs asymmetrically dividing cells generating old/new centrosomes will provide important insights pertinent to the conclusions of the paper. Alternatively, the authors could soften their conclusions, especially for Fig 2.

      We understand the reviewer’s request. As outlined above (please refer to point #1 of the editorial summary), we had tried previously to add data using single cell timelapse imaging. However, due to the size and therefore weakness of the fluorescent signal we had failed despite extensive efforts. According to the reviewer’s suggestion we have now explicitly discussed this shortcoming and softened our conclusions.

      2) Some critical controls are missing. In Fig. 1B, there is a green dot that does not colocalize with Pericentrin. This is worrying and providing rigorous quantifications of the number of green and tdTom dots with Pericentrin would be very helpful to validate the labelling strategy. Quantifications would put these doubts to rest. Additionally, an example pericentrin staining with the GFP/TdTom signal in figure 4 would also give confidence to the reader. For figure 4, having a control for the retroviral infection is important. Although the authors show a convincing phenotype, the effect might be underestimated due to the incomplete infection of all the analyzed cells.

      We have included more rigorous quantifications in our revised manuscript.

      For Figure 1: There are indeed some green speckles that might be misinterpreted as a green centrosome. However, the speckles are usually smaller and by applying a strict size requirement we exclude speckles. To check whether the classifier might interpret any speckles as centrosomes, we manually checked 60 green “dots” that were annotated as centrosome. From these images all green spots detected as centrosome co-localized with Pericentrin signal (Images shown in Author response image 1).

      For Figure 4: as we are comparing cells that were either infected with a retrovirus expressing scrambled or Ninein-targeting shRNA we compare cells that experienced a similar treatment. Besides that, only cells infected with the virus express Cre-ERT2 whereby only the centrosomes of targeted cells were analyzed. Accordingly, we only compare cells expressing scrambled or Ninein-targeting shRNA, all surrounding “wt” cells are not considered.

      Author response image 1.

      Pictures used to test the classifier. Each of the green “dots” recognized by the classifier as a Centriolin-NeonGreen-containing centrosome (green) co-localized with Pericentrin signal (white).

      3) It would be helpful if the authors expand on the presence of old centrosomes in apical radial glia vs outer radial glia. Currently, in figure 3, the authors only focus on Sox2+ cells but this could be complemented with the inclusion of markers for outer radial glia and whether older centrosomes are also inherited by oRGCs. This would have important implications on whether symmetric/asymmetric division influences the segregation of new/old centrosomes.

      That is an interesting question and we do agree that additional analyses, stratified by ventricular vs. oRGCs would be interesting. However, at the time points analysed there are only very few oRGCs present (if any) in human ESC-derived organoids (Qian et al., Cell, 2016). However, we have now added this point for future experiments to our discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      “In analyzing neural activity accompanying the behavioral persistence of the dominant sequence after a block change, the authors find that the ACC ensemble firing pattern is closer to the original dominant sequence pattern during reinforcement and less like this pattern during exploration… As time, and trials, progress the rat is approaching the point at which it explores another strategy. The authors find strengthened "prevalence" encoding with increasing sequence repetition, but if this parameter is related to behavioral change/flexibility, this was not clear to me. Might there be something unique about the last trials in a tail "predicting" an upcoming switch? Can the authors please expand? Relatedly, if the prediction of upcoming behavioral change is not observed in the neural activity from sequence steps 2-6, it is notable that these are the steps 'within' the sequence, that leaves out the initiation (first center poke) and termination (reward/reward omission). Thus one could imagine this information is "missed" in the current analysis given that both the reward period and the initiation of a trial at the center are not analyzed. This does lead me to suggest a softening of some claims made of identifying "unifying principles" of ACC function, as the authors state, based on the analyses included in the current report, since the neural activity related to the full unit of behavior is not considered. (I appreciate the motivation behind this focus on within-sequence behavior - the wish to compare time periods with similar movement parameters .)

      We apologize for the confusion; while the sequence prevalence itself tends to be high for ‘dominant tails’, we do not claim that the fit of the prevalence model is better at those sequence instances. We do share the interest in linking prevalence encoding to behavioral adaptation as well as the Reviewer’s intuition that block transitions should be among the epochs where strategy prevalence is tracked particularly well. And indeed, we had spent a considerable amount of time thinking about whether we can identify and interpret periods during the session where our prevalence model fits better or worse. Two arguments convinced us to abandon that direction: a technical one and a conceptual one. The technical argument is that when the explanatory power of a variable is limited, regression residuals are proportional to the variable itself. Thus, any meaningful comparison of the model’s fit would have had to be done for periods where strategy prevalence is within a similar range. The conceptual argument is even more disarming: imagine we do identify a putative session epoch where the model fits worse. While it is possible that it truly means that the animal tracks the details of how much he has pursued this strategy in recent past less, it is equally possible that we were simply off in selecting the specific window over which the prevalence signal is estimated, the exact behavioral statistic tracked, or the exact form of the dependence between that statistic and neural activity. We certainly do see changes leading up to behavioral switches at block transitions – something we plan to elaborate on in a subsequent paper – but whether those are related to prevalence tracking is something we believe is hard to crack.

    1. Author Response

      Reviewer 1 (Public Review):

      Weakness: Although the cross-links stimulate ATP hydrolysis, further controls are needed to convince me that the TM1 conformations observed in the structures are physiologically relevant, since they have been trapped by "large" substrates covalently-tethered by crosslinks.

      Reviewer 1 raised concerns about the relatively large size of our covalently attached AAC substrate that would potentially distort TM1 in Pgp. We would like to clarify that AAC has a molecular weight of 462 Da, which, in comparison to many known Pgp substrates ranging from 250 to over 1,000 Da, is not a large compound. For instance, the few other Pgp substrates mentioned in our manuscript all have a comparable or larger size: verapamil, 455 Da; doxorubicin, 544 Da; FK506, 804 Da; valinomycin, 1,111 Da; cyclosporin A, 1,203 Da.

      Furthermore, AAC was strategically attached to a site distant from TM1 in the inwardfacing Pgp conformation. After it was exported to the outward-facing state, several TM helices accommodate the compound. The observation that only TM1 exhibited significant conformational changes suggests its potential role in the transport mechanism. This hypothesis is supported by our findings, where a conservative substitution (G72A) in TM1 resulted in a dramatic loss of transport function for various drug substrates and impaired verapamil-stimulated ATPase activity.

      Reviewer 1 (Recommendations for the Authors):

      I understand the need for an unconventional approach to understanding the translocation pathway. What would help to support this model is to cross-link a much smaller substrate, as the one used is quite large and could potentially distort TM1 in the outward-state when cross-linked.

      We thank the reviewer for this recommendation, and we have outlined plans for future experiments involving other substrates, including smaller ones, to further investigate our proposed model. However, it is important to acknowledge that conducting these studies will require a significant amount of effort and resources, which we believe extend beyond the scope of our current manuscript.

      In unbiased MD simulations starting from the IF state are there any simulations where the substrate follows the same path as proposed here?

      All our MD simulations were performed in the outward-facing state to focus on potential substrate release pathways. Starting MD simulations from the inwardfacing state would introduce complexities in capturing the necessary domain motions and nucleotide binding and hydrolysis required for substrate translocations. Therefore, we opted not to perform MD studies starting from the inward-facing state.

      Reviewer 2 (Public Review):

      Weakness: There is much to like about the experimental work here but I am less sanguine on the interpretation. The main idea is to covalently link via disulfide bonds a model tripeptide substrate under different conditions that mimic transport and then image the resulting conformations. The choice of the Pgp cysteine mutants here is critical but also poses questions regarding the interpretation. What seems to be missing, or not reported, is a series of control experiments for further cysteine mutations.

      Reviewer 2 raised concerns about the interpretation of our results and suggested the need for additional mutant designs to validate our proposed TM1 mechanism. Firstly, we believe that the observed TM1 conformational changes are valid in our cryoEM structures, despite the use of different conditions and several mutants to capture Pgp in the outward-facing state.

      Regarding the G72A mutant, we consider it conclusive that this single point mutation in the TM1 has a profound effect. Importantly, the G72A mutant was readily expressed and purifiable as a stable protein. We were able to resolve a high-resolution structure of the G72A mutant (without the substrate), confirming that the protein is not generally destabilized but properly folded.

      Above all, we appreciate the Reviewer’s suggestion to explore additional mutations and intend to do so in future studies.

      Reviewer 2 (Recommendations for the Authors):

      I am sold on the results regarding TM1 conformational changes as they are evident in the cryoEM structures. However, the set of states compared between mutants are not biochemically equivalent: for 335 and 978 they used an ATP-impaired Pgp whereas for 971 they used what appears to be WT, and the conformation was imaged presumably subsequent to ATP hydrolysis and Vanadate trapping. This is significant if the authors were unable to trap the OF in the impaired mutant background and should be highlighted. I have to believe that they tried that condition but I could be wrong.

      We acknowledge the point made by the Reviewer about the biochemical equivalence of mutant states and the potential significance of using an ATP-impaired mutant for trapping the outward-facing conformation of 971. We have not yet attempted to use the ATPase-deficient 971C mutant for crosslinking and intend to address this question in future studies.

      In our current approach, we used the ATPase-active 971C for two specific reasons:

      1) Our biochemistry data, as shown in Fig 1C, indicates that 971C only crosslinks in the presence of ATP hydrolysis conditions. Vanadate trapping was employed to stabilize the outward-facing conformation.

      2) Based on our experience, we have observed that the conformations of ATP-bound (mutant) and vanadate-trapped states of an ABC transporter are structurally equivalent at this resolution level of our study (see ref. 21: Hoffmann et al. NATURE 2019).

      The authors propose a new model for substrate translocation. It is based on three mutants and a number of structures. If the authors were not challenging the current dogma I would not have written the next comment. Considering the impact of the findings, I would have designed a couple more cysteine mutants based on their model. For instance, this pathway has a number of stabilizing interactions, can't they make a mutant that preserves conformational switching but eliminates substrate translocation? I like the G97A mutant result but I am worried that the effect could just be a general destabilization or misfolding as part of the cryoEM particles seem to suggest. The authors advance one interpretation of the disorder observed in this mutant but it could easily be my interpretation.

      We thank the reviewer for the suggestion to design additional mutants to further validate our proposed model for substrate translocation. We agree that this would be highly valuable, considering the potential impact of our findings. However, given the time-intensive nature of our approach, we believe that presenting these additional designs in a future study is a reasonable course of action.

      Regarding the G72A mutation, we believe that our current data fully supports our model and the role of TM1 in regulating the Pgp activity. Importantly, we would like to emphasize that the G72A mutant was readily expressed and purifiable as a stable protein. Additionally, our cryoEM structural determination of the G72A mutant at high resolution confirmed that the protein is not generally destabilized but properly folded.

      There are a couple of troubling methodological questions that I want the authors to address or clarify:

      1- In the methods they report that the final sample for cryoEM was prepared on a SEC devoid of detergent. It is obvious that the sample was folded but I was wondering why the detergent was removed? Was that critical for observing these structures with multiple ligands? Did they observe any lipids in their cryoEM?

      We avoid detergent in the buffer on final SEC purification. This step is to remove free detergent from the background which helps during cryoEM imaging. Of course, this cannot be done with every detergent but due to the very low CMC of LMNG it is possible. By now, we have verified this method for several other transporters with the same success. While this procedure helps us to obtain better images it is not necessary to obtain specific conformations or ligand bound states, nor does it affect these states or conformations.

      In our cryoEM structures , we did observe multiple cholesterol hemisuccinate (CHS) molecules on the outer transmembrane surface of Pgp.

      2- Can the authors comment on why labeling was carried out in the presence of ATP? Does it matter if the substrate was added prior to ATP and incubated for a few minutes?

      For every dataset, we first added the substrate to be cross-linked and afterwards added the ATP. In the cases of 335C and 978C, labeling was successful before ATP was added, as evidenced by the inward-facing structures with cross-linked substrate.

      However, for 971C, cross-linking only occurred after the addition of ATP. We interpret this data to suggest that the 971 site is inaccessible to the substrate in the inward-facing state, and cross-linking can only occur after the transporter transitions to outward-facing state. This is in line with our inward-facing structure which does not show a cross-linked substrate, and our biochemical data shown in Fig 1C, where 971C only crosslinked in the presence of ATP.

      3- I am not an expert on MD simulations and I understand that carrying out simulations at higher temperatures used to be a trick to accelerate the process. Is this still necessary? Why didn't the author use approaches such as WESTPA?

      Most so-called enhanced sampling methods, including WESTPA, explicitly define a reaction coordinate for the process of interest, usually based on intuition or prior studies. If this coordinate is chosen poorly, enhanced sampling usually fails, either because the sampling becomes inefficient or because the sampling biases the transition pathway (or both). Lacking reliable intuition or prior knowledge on which motions would result in substrate release, we chose temperature to speed up the process. High temperature largely avoids the introduction of an any bias through the definition of a progress coordinate. By contrast, the weighted ensemble method underlying WESTPA is a great method to simulate unbiased dynamics of a process with a known progress coordinate, but unfortunately requires to choose a progress coordinate prior to the simulation and will then mostly sample the process along this progress coordinate, because this is the only direction in which sampling is improved. High temperature MD on the other hand accelerates all processes in the system under study. Indeed, we have now confirmed that the pathway found at high temperature is also feasible at near-ambient conditions.

      In new simulations, we have now observed a similar release pathway at T=330 K. As the only difference, the substrate has not fully dissociated from the protein after 2.5 us, with weak interactions persisting at the top part of TM1 from the extracellular side. Importantly, this is a configuration observed also in higher temperature simulations but with much shorter lifetime.

      In response, we will include these new findings in the revised manuscript.

      4- One way to show that the two substrates binding mode is biochemically relevant is to measure Vmax at different substrate concentrations. One would expect a cooperative transition if that interaction is mechanistically important.

      We have measured Vmax as a function of QZ-Ala concentration in a previous report (ref. 24), supporting positive cooperativity for binding to two sites.

      Reviewer 3 (Public Review and Recommendations for the Authors):

      We thank Reviewer 3 for recommending the acceptance of our manuscript as is. We will address all minor comments from Reviewer 3 in the revised manuscript.

    1. Author Response

      We appreciate the insightful feedback provided by the editors and reviewers who have recognized the novelty of our study. We have mapped the spatial distribution of six endogenous somatic histone H1 variants within the nuclei of several human cell lines using specific antibodies, which strongly suggest functional differences between variants. We will submit a reviewed version of the manuscript to accommodate the reviewers comments.

      To answer the reviewers comments at this stage:

      1. We do have investigated co-localization of H1 variants with HP1 proteins and we are eager to add some of this data in a revised version of this manuscript.

      2. Respect to the functional significance of the results presented here, we want to stress that as a consequence of the differential distribution and abundance of H1 variants among cell types, depletion of different variants has different consequences. For example, H1.2 depletion but not others has a great impact on chromatin compaction. Besides, cell lines lacking H1.3/H1.5 expression present a basal up-regulation of some Interferon stimulated genes (ISGs) and particular repetive elements, as it was previously described upon induced depletion of H1.2/H1.4 in a breast cancer cell line or in pancreatic adenocarcinomas with lower levels of replication-dependent H1 variants (Izquierdo et al. 2017 NAR 45:11622). So, our results reinforce the existing link between H1 content and immune signature. We are eager to add this data in a revised version of this manuscript. Moreover, we also analyzed the chromatin structural changes upon combined depletion of H1.2 and H1.4. Combined H1.2/H1.4 depletion triggers a global chromatin decompaction, which supports previous observations from ATAC-Seq and Hi-C experiments in these cells (Izquierdo et al. 2017 NAR 45:11622; Serna-Pujol et al. 2022 NAR 50:3892). Although H1 content is more compromised in these cells (30% total H1 reduction) compared to single H1 KDs, the phenotype observed could not be recapitulated when other H1 KD combinations, in which total H1 content was reduced similarly, were investigated (Izquierdo et al. 2017 NAR 45:11622), supporting that the deleterious defects were due to the non-redundant role of H1.2 and H1.4 proteins. Indeed, this manuscript supports this notion, as H1.2 and H1.4 show a different genome-wide and nuclear distribution.

      3. We totally agree with the reviewers that the use of commercially available antibodies does not guarantee their quality and specificity. As this issue was crucial for our studies, we extensively assayed performance and specificity of the antibodies, using different approaches. The validations were shown in our previous publications where these antibodies where successfully used for ChIP-seq (Serna et al. 2022 NAR 50:3892; Salinas-Pena et al, under revision). In summary, performance of H1.0 (05-629l, Millipore), H1.2 (ab4086, abcam), H1.4 (702876; Invitrogen), H1.5 (711912, Invitrogen) and H1X (ab31972; abcam) antibodies was tested by Western-Blot, ChIP and proteomic analyses (all the results are included in Supplementary Figure 1 in Serna et al. 2022 NAR 50:3892). Concretely, we tested specificity using inducible KDs for the depletion of each of the somatic H1 variants in T47D. We also checked that the antibodies did not recognize additional H1 variants using recombinant proteins or cell lines naturally lacking some of the variants. All the experiments confirmed that antibodies were variant-specific. In addition, when the corresponding epitope was absent, the antibodies did not gain new cross-reactivity with other variants. More recently, validation of the specificicity of the H1.3 antibody (ab203948) was performed following the same experimental approaches described for the rest of antibodies (Salinas-Pena et al, under revision).

      4. Our immunofluorescence data, together with ChIP-seq data, do not discard binding of H1 variants to a great variety of chromatin, but show enrichment or preferential binding to certain regions or chromatin types. Our data on the interphase nuclei does not suggest at all any type of quenching or saturation. Obviously, detection with antibodies depends on epitope accessibility, just like all immunofluorescence data ever published, and we have acknowledged that post-translational modifications of H1 may occlude antibody accessibility as some phospho-H1 antibodies give distribution patterns different than total/unmodified H1 antibodies. Thus, we cannot exclude that specific modified-H1s exhibit particular distribution patterns that are not being recapitulated in our data. This represents another layer of complexity in H1 diversity and we agree that exploration of the repertoire of H1 PTMs and their functional roles are an interesting matter of study that needs to be addressed. Still, our data is highly relevant as it demonstrates for the first time the unique distribution patterns of H1 variants among multiple cell lines and it does not use overexpression of tagged H1 variants that in our experience produces mislocalization of H1s.

      5. We will further explain how the relative quantification of H1 variants in different cell lines was performed if not clear enough. We agree that more sophisticated mass spectrometry-based quantification is desirable and we are collaborating to do this using internal H1 peptide controls, but this is out of the scope of this manuscript as the observed patterns of distribution of H1 variants do not depend on mild differences in variants abundance. Only the absence of H1.3 and H1.5 in some cell lines alters the distribution of other variants.

      6. We have also studied the spatial distribution of H1 variants in non-tumorogenic cell lines and we are eager to add this in a revised version of the manuscript.

    1. Author Response

      We thank the Editors and Reviewers for the thorough assessment of our work. We are pleased that you agree with us that our proof-of-concept study of the ATUM Tomo technology advances volume electron microscopy and has the potential to solve research questions in diverse biological areas. Based on your comments, we are planning to revise the manuscript to optimize readability, clarify the fields of applicability of our approach more, and add some data related to questions you raised. We plan the following revisions:

      Reviewer #1 The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.

      As part of the supplemental figures describe essential experimental details, we will move them into the main part of the manuscript.

      Reviewer #1 In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.

      Reviewer #2 Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.

      Thank you for the valuable comments on the missing experimental details, which could affect the ease of establisihing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).

      Reviewer #1 Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.

      Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?

      In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.

      Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We will add more suggestions for possible applications to the discussion to accommodate the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.

      Reviewer #2 Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?

      Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we will try to assess the importance of timing in retrospect.

      Reviewer #2 To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?

      We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We aim to test the general workflow with tissue embedded in other commonly used resin types.

      Reviewer #2 Minor corrections to the text and figures.

      Thank you for the detailed corrections. We will apply them accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Sun and co-authors have determined the crystal structures of EHEP with/without phlorotannin analog, TNA, and akuBGL. Using the akuBGL apo structure, they also constructed model structures of akuBGL with phlorotannins (inhibitor) and laminarins (substrate) by docking calculation. They clearly showed the effects of TNA on akuBGL activity with/without EHEP and resolubilization of the EHEP-phlorotannin (eckol) precipitate under alkaline conditions (pH >8). Based on this knowledge, they propose the molecular mechanism of the akuBGL- phlorotannin/laminarin-EHEP system at the atomic level. Their proposed mechanism is useful for further understanding of the defensive-offensive association between algae and herbivores. However, there are several concerns, especially about structural information, that authors should address.

      Thank you for reviewing our manuscript. We addressed all comments below.

      1) TNA binding to EHEP

      The electron densities could not show the exact conformations of the five gallic acids of TNA, as the authors mentioned in the manuscript. On the other hand, the authors describe and discuss the detailed interaction between EHEP and TNA based on structural information. The above seems contradictory. In addition, the orientation of TNA, especially the core part, in Fig. 4 and PDB (8IN6) coordinates seem inconsistent. The authors should redraw Fig. 4 and revise the description accordingly to be slightly more qualitative.

      We apologize for the mistake with the PDB file. We forgot to re-upload the final coordinate file of 8IN6, which had been modified according to the requirement of the PDB instructions. We have now re-uploaded the correct PDB file. We carefully checked Fig. 4 (Fig.3 in the revised version), which used the final coordinate file of 8IN6.

      2) Two domains of akuBGL

      The authors concluded that only the GH1D2 domain affects its catalytic activity from a detailed structural comparison and the activity of recombinant GH1D1. That conclusion is probably reasonable. However, the recombinant GH1D2 (or GH1D1+GH1D2) and inactive mutants are essential to reliably substantiate conclusions. The authors failed to overexpress recombinant GH1D2 using the E. coli expression system. Have the authors tried GH1D1+GH1D2 expression and/or other expression systems?

      By referencing other BGLs (six samples were expressed by using E. coli, and one was expressed by using Pichia), we only tried the overexpression of akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2 in E. coli expression system using several different vectors. As the reviewer mentioned that inactive mutants are essential to substantiate our conclusion reliably, it will be tried further to use yeast or cell expression systems to confirm our conclusion. We added these limitations as “Future assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion (Line 343-345)

      3) Inhibitor binding of akuBGL

      The authors constructed the docking structure of GH1D2 with TNA, phloroglucinol, and eckol because they could not determine complex structures by crystallography. The molecular weight of akuBGL would also allow structure determination by cryo-EM, but have the authors tried it? In addition, the authors describe and discuss the detailed interaction between GH1D2 and TNA/phloroglucinol/eckol based on docking structures. The authors should describe the accuracy of the docking structures in more detail, or in more qualitative terms if difficult.

      Yes, it is possible to try cryo-EM for obtaining the structure of akuBGL complexed with the ligand. However, we didn’t try because 110 kDa akuBGL consists of two 55 kDa GH1Ds linked by along loop, and we worried that ligand may not be visualized using cryo-EM.

      Following the comment, we added the description of the accuracy of the docking structures as “Those docking scores corroborated well with the inhibition activity toward akuBGL, that TNA had a more robust inhibition activity than phloroglucinol, indicating that the docking results are reasonable.” (Line 322-324)

      Reviewer #2 (Public Review):

      In this study the authors try to understand the interaction of a 110 kDa ß-glucosidase from the mollusk Aplysia kurodai, named akuBGL, with its substrate, laminarin, the main storage polysaccharide in brown algae. On the other hand, brown algae produce phlorotannin, a secondary metabolite that inhibits akuBGL. The authors study the interaction of phlorotannin with the protein EHEP, which protects akuBGL from phlorotannin by sequestering it in an insoluble complex.

      The strongest aspect of this study is the outstanding crystallographic structures they obtained, including akuBGL (TNA soaked crystal) structure at 2.7 Å resolution, EHEP structure at 1.15 Å resolution, EHEP-TNA complex at 1.9 Å resolution, and phloroglucinol soaked EHEP structure at 1.4 Å resolution. EHEP structure is a new protein fold, constituting the major contribution of the study.

      We thank you for reviewing our manuscript.

      The drawback on EHEP structure is that protein purification, crystallization, phasing and initial model building were published somewhere else by the authors, so this structure is incremental research and not new.

      We have published the results of protein purification, crystallization, phasing, and initial model building for determining structure but have yet to give the structure since further structural refinement is indispensable. Such published data in [Acta F] is a service for obtaining the structure.

      We believe that the structure of the EHEP holds great importance, and it is the first time to publish.

      Most of the conclusions are derived from the analysis of the crystallographic structures. Some of them are supported by other experimental data, but remain incomplete. The impossibility to obtain recombinant samples, implying that no mutants can be tested, makes it difficult to confirm some of the claims, especially about the substrate binding and the function of the two GH1Ds from akuBGL.

      As mentioned by the reviewer, mutant analysis would be the best way to substantiate our conclusions. However, it is challenging to obtain recombinant samples, although we tried to overexpress them (akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2). So, we did the structural comparison, and docking simulation to propose the molecular mechanism. We added these limitations as “Further assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion part (Line 343-345).

      The authors hypothesize from their structure that the interaction of EHEP with phlorotannins might be pH dependent. Then they succeed to confirm their hypothesis, showing they can recover EHEP from precipitates at alkaline pH, and that the recovered EHEP can be reutilized.

      A weakness in the model is raised by the fact that the stoichiometry of the complex EHEP:TNA is proposed to be 1:1, but in Figure 1 they show that 4 µM of EHEP protects akuBGL from 40 µM TNA, meaning EHEP sequesters more TNA than expected, this should be addressed in the manuscript.

      The assay experiment in figure1 does not directly provide the stoichiometric ratio of EHEP: TNA because the activity assay system consists of substrate of akuBGL, akuBGL, TNA, and EHEP, which involves multiple equilibration processes: akuBGL⇋ substrate, akuBGL⇋TNA, and EHEP ⇋TNA. To avoid misunderstanding, we added the descriptions of ″As this activity assay system involves multiple equilibration processes: akuBGL⇋substrate, akuBGL⇋TNA, and EHEP ⇋TNA.″(Line 120-121).

      The authors study the interaction of akuBGL with different ligands using docking. This technique is good for understanding the possible interaction between the two molecules but should not be used as evidence of binding affinity. This implies that the claims about the different binding affinities between laminarin and the inhibitors should be taken out of the preprint.

      Following the suggestion, we deleted the descriptions about the difference in binding affinity with docking scores at the last paragraph of [Inhibitor binding of akuBGL].

      In the discussion section there is a mistake in the text that contradicts the results. It is written "EHEP-TNA could not dissolve in the buffer of pH > 8.0" but the result obtained is the opposite, the precipitate dissolved at alkaline pH.

      We apologize for this mistake and corrected it to " EHEP–TNA could dissolve in the buffer of pH > 8.0." (Line 394).

      Solving a new protein fold, as the authors report for EHEP, is relevant to the community because it contributes to the understanding of protein folding. The study is also relevant dew to the potential biotechnological application of the system in biofuel production. The understanding on how an enzyme as akuBGL can discriminate between substrates is important for the manipulation of such enzyme in terms of improving its activity or changing its specificity. The authors also provide with preliminary data that can be used by others to produce the proteins described or to design a strategy to recover EHEP from precipitates with phlorotannin at industrial scales.

      In general methods are not carefully described, the section should be extended to improve the manuscript.

      Following the comment, we added the method descriptions

      1. Recombinant GH1D1 domain expression and purification in [EHEP and akuBGL preparation].

      2. Sections of [recomGH1D1 activity assay], and [N-terminal sequencing of akuBGL]

      3. More details of resolubiliztion of EHEP and activity in [Resolubilization of the EHEP–eckol precipitate].

      Reviewer #3 (Public Review):

      The manuscript by Sun et al. reveals several crystal structures that help underpin the offensivedefensive relationship between the sea slug Aplysia kurodai and algae. These centre on TNA (a algal glycosyl hydrolase inhibitor), EHEP (a slug protein that protects against TNA and like compounds) and BGL (a glycosyl hydrolase that helps digest algae). The hypotheses generated from the crystal structures herein are supported by biochemical assays.

      The crystal structures of apo and TNA-bound EHEP reveals the binding (and thus protection) mechanism. The authors then demonstrate that the precipitated EHEP-TNA complex can be resolubilised at an alkaline pH, potentially highlighting a mechanism for EHEP recycling in the A. kurodai midgut. The authors also present the crystal structures of akuBGL, a beta-glucosidase utilised by Aplysia kurodai to digest laminarin in algae into glucose. The structure revealed that akuBGL is composed of two GH1 domains, with only one GH1 domain having the necessary residue arrangement for catalytic activity, which was confirmed via hydrolytic activity assays. Docking was used to assess binding of the substrate laminaritetraose and the inhibitors TNA, eckol and phloroglucinol to akuBGL. The docking studies revealed that the inhibitors bound akuBGL at the glycone-binding suggesting a competitive inhibition mechanism. Overall, most of the claims made in this work are supported by the data presented.

      We thank you very much for reviewing our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • Fig. 3 should be moved to the Supplements because acetylation modification at the N-terminus is not essential for the function of EHEP.

      Following the recommendation, we moved Fig.3 to Supplements (Fig. S2).

      • EHEP2 is processed at 1.4 Å resolution, however, the statistics at highest resolution shell indicate you can process at higher resolution. Why 1.4 Å resolution?

      We tried to process this dataset at the higher resolution at 1.35 Å, and the completeness and I/sigma of the highest resolution shell reduced to 88.9% and 2.16, respectively. The parameter of I/sigma is OK, but the completeness reduced seriously. So, we set a cutoff of 1.4 Å.

      • Fig. S1A should be revised to include the gallic acid numbers (1, 2, 3, 4, 6) and the 3.0 σ map. >

      As presented in Fig. S1A, the omitted map (fo–fc map) of the ligand TNA, countered at 2.0 σ, showed that gallic acid 2 has poor density, and gallic acid 4 has weak density. Moreover, the TNA is relatively big to EHEP (7.5 %), and the omitted map countered 3.0 σ could not clearly show gallic acids. So, we keep the map at 2.0 σ in Fig. S3A.

      • The authors should provide more information on "co-cage-1 nucleant".

      Our lab is currently publishing a paper that provides detailed information on the co-cage-1 nucleant, including components, synthesis, nucleation mechanism, and application. Once the paper is published, we will cite it in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      • Is the word "offence" the appropriate word for referring to the activity of EHEP? Is this word used in the literature for this system? I find it confusing but might be because I am not in the specific topic.

      In the field of prey–predator, the defense–offensive is commonly used.<br /> According to Charles D. Amsler's book ″Algal Chemical ecology″, Herbivore offensive is the traits that allow herbivores to increase feeding rates on algae. Therefore, in our opinion, the offensive is appropriate.

      Taking into consideration that I am not an English language expert I find the writing of the manuscript could be improved in general. Here are some lines as examples of where the grammar could be better:

      Line 193: "decrement of the loop part"

      Following the comment, we corrected it to "decrease of the loop part" (Line 197).

      Line 199: there is a typographical error.

      We apologize for our mistake and corrected it to “EHEP” (Line 202).

      Line 205-206: "only hydrophobically interacted with"

      Following the comment, we modified it to "only interacted hydrophobically with EHEP" (Line 209)

      Line 224: "phlorotannin–precipitate activity"

      Following the comment, we modified it to “phlorotannin-precipitate activity” (Line 227).

      Line 232: "without the N-terminal 25 residues"

      Following the comment, we modified it to "lacked the N-terminal 25 residues" (Line 236).

      Line 353: "bound" should be "bind"

      We apologize for our mistake and modified it (Line 356).

      Line 359: "predator mammals"

      We apologize for our mistake and modified it to "predatory mammals" (Line 363).

      Line 363: "at an alkaline pH of insect midgut"

      Following the comment, we modified it to "at the alkaline pH of the insect midgut" (Line 367).

      Line 370: "nonstructural proteins" means "unstructured proteins"?

      Yes, unfolding proteins, we modified to "unfolding proteins with randomly coils" (Line 374).

      Line 374: "similar strategy with mammals"

      Following the comment, we modified it to "similar strategy to mammals" (Line 379).

      Line 403: "to forming"

      We apologize for our mistake and modified it to "to form" (Line 404).

      Line 404: "considered no binding"

      We apologize for our mistake and modified it to "considered not binding" (Line 405).

      Line 406: "activity pocket" means the active site?

      Yes, we modified it to "active site" (Line 407).

      Line 424: "step purification"

      Following the comment, we corrected it to "one step for purification" (Line 425).

      Line 431

      Following the comment, we corrected it to “To verify whether the chemical modifications which was indicated by previous study affects” (Line 432-433).

      Line 812: there is typographical error

      We apologize for our mistakes, and corrected it to Tris-HCl” for all “Tris–HCl (Line 878~).

      Line 223: eckol is not mentioned in the text and appears for the first time in the figure caption.

      Following the comment, we added “eckol” in the first section of the [Result] (Line 117).

      The paragraph between lines 271 and 280 is disconnected from the previous one and it is not about results, it should be at the discussion section.

      Following the comment, we moved them to the discussion part (Line 335-343).

      Line 324: "the three inhibitors inhibited": this claim should be corrected to "the three inhibitors interacted", since the word inhibited would imply the authors measured activity experimentally.

      We modified it as the comment. (Line 325).

      Line 392: "could not dissolve" is contradicting the result.

      We apologize for our mistake and corrected it to "could dissolve" (Line 394).

      They describe acetylation but they try overexpressing in E. coli, could it be that they needed to express the construct in a system where they would get the acetylation? At least this should be discussed in the text.

      Because our sample of EHEP with acetylation was purified from the natural source of the digestive fluid of A.kurodai, we only need to express EHEP without acetylation. Following the comment, we modified the descriptions to clarify it in the section (Lines 170-173 and 177-179).

      “Consistent with the molecular weight results obtained using MALDI–TOF MS, the apo structure2 (1.4 Å resolution) clearly showed that the cleaved N-terminus of Ala21 underwent acetylation, demonstrating that EHEP is acetylated in A. kurodai digestive fluid.”

      "To explore whether acetylation affects the protective effects of EHEP on akuBGL, we used the E. coli expression system to obtain the unmodified recomEHEP (A21–K229)."

      From the text it is not clear in which biological context the brown algae meet the attack by the hydrolase, the information is spread all over the manuscript, it should be clearly described at the introduction.

      When the brown algae are consumed as food by sea hare A. kurodai, they meet the attack by the hydrolase akuBGL. Following the comment, we clear the descriptions in the introduction part as below (Line 42-45).

      ″In brown algae Eisenia bicyclis, laminarin is a major storage carbohydrate, constituting 20%–30% of algae dry weight. The sea hare Aplysia kurodai, a marine gastropod, preferentially feeds on the E. bicyclis with its 110 and 210 kDa β-glucosidases (akuBGLs), hydrolyzing the laminarin and releasing large amounts of glucose.″

      Affinity ranking based on docking is not reliable, the differences in free energy are in the same order of magnitude. I would recommend erasing this claim since it is not fundamental to the study. Another option would be to determine affinities experimentally.

      We agree with the comment and removed the text about affinity ranking with docking scores.

      Figure 1: relative activity is not defined. HPLC data should be shown as supplementary material.

      Following the comment, we added the definition of relative activity and the HPLC data as Fig. S1 in the revised version.

      Figure 4: Sephacryl resin is mentioned here but not described in the methods.

      Following the comment, we added the description in the methods (Line 515).

      Protein N-terminal sequencing analysis should be described in the methods.

      Following the comment, we added the sequencing analysis in the methods (Line 476-483).

      Figure S1 C: it should be specified how the surface electrostatic potential at different pH was calculated.

      Following the comment, we added the descriptions of how the surface electrostatic potential at different pH was calculated in the figure legend of Fig. S2 of the revised version (Line 876-877).

      Since the authors are capable of producing good amounts of akuBGL and have already conducted glycosidase activity assays using ONPG, it would not be difficult for them to run some kinetics experiments for the enzyme in the presence of the different inhibitors to confirm their hypothesis derived from the docking calculations.

      As mentioned by the reviewer, kinetics experiments are the best way to confirm our hypothesis derived from docking calculations. However, the yield of akuBGL purification from the digestive fluid of sea hare A.kurodai is quite difficult. We could not obtain a sufficient sample of akuBGL to conduct the kinetic experiments. So, we stopped at docking simulation in this study. We added such limitations of ″Future kinetic experiments are required to validate quantitatively the competitive inhibition of phlorotannin against akuBGL″ (Line 359-360).

      Some citations are missing in the discussion section, for example in lines 362, 364 and 396.

      Following the comment, we added the citations.

      Reviewer #3 (Recommendations For The Authors):

      Please see comments/suggestions below for revisions.

      Line 176-178 - Text explains that recombEHEP precipitated after incubation with TNA to a comparable level to natural EHEP. However, figure 3B shows no comparison between recombinant and natural EHEP.

      As the reviewer suggested, we repeated the binding assay of recomEHEP to confirm the precipitation with TNA and added a precipitation result of natural EHEP (Fig. S2B right) for comparing.

      Line 223 - The work presented in Figure S1E goes partway towards demonstrating the activity of resolubilised EHEP. This claim would be strengthened if resolubilised EHEP was used in the akuBGL Galactoside hydrolytic activity assay and is then seen to rescue akuBGL activity in the presence of TNA.

      Yes, our claim would be strengthened by adding resolubilized EHEP to akuBGL assay in the presence of TNA. Since we have obtained and presented the relationship between the precipitating of EHEP with TNA and the rescuing akuBGL activity from TNA, we only used the precipitation to demonstrate the activity of resolubilized EHEP.

      Line 380-384 - Here it is discussed how TNA simultaneously binds to three EHEP molecules thus crosslinking them. It is then proposed that this could be the mechanism of precipitation. However, it is noted that TNA is soaked into crystals, therefore it is likely that this lattice exists whether TNA is present or not (this absolutely needs to be mentioned in the text). It would be possible to test this mechanism through mutagenesis. If the sites where TNA packs in between chains of EHEP were mutated to prevent crosslinking, it could then be determined whether crosslink-null EHEP can still precipitate TNA.

      As the review mentioned, we do not have enough experiments to propose that the TNA-crosslink may cause the EHEP-TNA precipitation. So, we deleted the discussion of the TNA crosslink and the corresponding figure.

      All docked models need to be deposited (perhaps modelarchive.org) and this resource referred to in the text.

      The structures in modelarchive.org site are either homology models or de novo. We think the docked model is out of this site. So, we did not deposit them.

      The x-ray data table contains data previously published in the referenced Acta cryst publication. What is eLife policy on this "double use" of data?

      We apologize for our mistake, and deleted the SAD data in Table 1.

      Minor points

      Line 26 - use "apo akuBGL" so as not to infer a tannic-acid bound form of this also >

      Following the comment, we modified it to “apo akuBGL” (Line 26).

      Line 48 - The sentence currently reads as A. kurodai is being digested.

      Following the comment, we modified it to “by A. kurodai” (Line 48).

      Line 49-50 & Line 65-66 - Both these lines make the same point about the impact of phlorotannin inhibition on the use of brown algae as feedstocks for biofuel, please remove one.

      Following the comment, we deleted the line 49-50.

      Line 115 - This needs attention as its an unusual opening sentence

      Following the comment, we modified it o “Phlorotannin, a type of tannin, is a chemical defense metabolite of brown algae.” (Line 114).

      Line 130 - Should the EHEP concentration be 3.96 µM not 3.36?

      We apologize for our mistake 3.36 is correct, and we corrected the X-axis label in Fig.1B.

      Line 133 - consider using "non-recombinant" rather than "natural"

      To distinguish between non-recombinant and recombinant samples, we used “EHEP” and “akuBGL” as purified from the native source and recomEHEP and recomakuBGL as the samples overexpressed from E. coli in this manuscript. So, we added the definition in [Introduction] (Line 100-101).

      Line 134 - "The residues A21-V227 of A21-K229..." This sentence could be written more clearly.

      Following the comment, we re-wrote it to “The residues A21–V227 in purified EHEP (1–20 aa were cleaved during maturation) were built” (Line 135-136).

      Line 136 - switch "appropriately visualized" for "tracable"?

      Following the comment, we modified it to “built” (Line 136).

      Line 158 - use "70% of backbone in a loop conformation" >

      We modified as the comment (Line 159-160).

      Line 184 - reword "map showed an electron density blob". (Map showed positive electron density)

      Following the comment, we modified it to “map showed the electron density” (Line 188).

      Line 193-194 - Is EHEP really more stable when bound to TNA? It is not shown experimentally? It is difficult to see which loop changes. Is the difference a result of crystal packing? Please switch "decrement" for another term

      The regions with conformation change between EHEP and EHEP–TNA are close to TNA but not at the intermolecular interface. As the reviewer mentioned, we could not clarify the EHEP stability depended on TNA-binding, and deleted the descriptions in the second paragraph of [TNA binding to EHEP].

      Following the comment, we redraw Fig. S1B (Fig. S3B in the revised version) to show the conformation changes clearly. We also modified "decrement" to "decrease" (Line 197).

      Fig S1B - Can an extra figure be added to show the secondary differences more clearly? >

      We redraw this figure (Fig. S3B) using closeup view to show the differences.

      Line 212-213 - There is a slight discrepancy between the text and Figure 4B. Gallic acid 4 interacts with P201 and gallic acid 6 interacts with P77.

      We apologize for our mistake in the text. and corrected it to “gallic acid4 and 6 showed alkyl–π interaction with P201 and P77, respectively” (Line 216).

      Figure 4D - Change x axis from tube number to elution volume. Both chromatograms could also be superimposed for interpretability.

      Since we used raw data from the experiment, we kept the x-axis in tube number with additional “2.7 ml/tube” information (Fig.3D).

      Line 229 - Please change "there was no blob of TNA in the electron density" to there was no electron density for TNA or something similar.

      Following comment, we modified it to “there was no electron density of TNA or something similar in the 2Fo–Fc and Fo–Fc map” (Line 232).

      Line 231 - asymmetric unit is a more standard term (also in Fig S2 legend)

      We modified as the comment (Line 235 and 885).

      Line 234-235 - Reword "the residues L26-P978 of L26-N994" to make it more concise. >

      Following the comment, we deleted “of L26-N994” (Line 239).

      Lines 296-299 could be written more carefully - pi stacking with what? >

      We apologize for our mistake and corrected it to CH–𝜋 (Line 293).

      Line 349 - which putatively enables it to......

      We modified it as the commend (Line 353 in the revised manuscript).

      Line 370 - "nonstructural" is the wrong term because they remain structured - use something akin to non-classical secondary structure

      Following the comment, we modified it to“are unfolding proteins with randomly coils in solution " (Line 374)

      Throughout - use phenix autobuild, not autobuil

      We apologize for our mistakes and corrected them throughout the manuscript.

      Figure 1 - the graphs would be more interpretable with all data points shown overlaid

      The two graphs in Figure 1 showed two experiments with different reaction conditions. Figure 1A presents various TNA concentrations, while Figure 1B maintains a constant concentration of 40 μM for TNA with varying EHEP concentrations. So, overlaying the graphs is not feasible. Therefore, we would like to keep them separated and added the reaction condition in figure legend.

      Figure 4 - in part D add an extra statement outlining what the S-100 analysis demonstrated

      S-100 analysis is using a gel filtration column with Sephacryl S-100 media. We added an extra statement in the method and the legend (Fig. 3, Lines 515 and 879).

      Figure 5 (and elsewhere) - the structures referred to need a PDB code and reference given in legend

      Following the comment, we checked the manuscript carefully and added PDB code to the referred structures.

      Fig S1 - please add an additional panel showing part D but in proper structure form, not schematic shapes

      Since we do not have enough experiments to validate the TNA-crosslink, we deleted the discussion of the TNA crosslink and Fig. S1D.

      Figure sig 4 - Text contains in depth information of side chain hydrogen bonding and π-π interactions between akuBGL and laminarittrose. However, the figure only shows a surface model. Consider adding a figure showing these interactions.

      Following the suggestion, we added a closeup view to show these detailed interactions (Fig. S6B).

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a novel surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer, and feel this is an accurate summary of our work.

      Reviewer #3 (Public Review):

      Summary:

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing across seven different ground-truth subfield definitions. This is an impressive effort that provides important groundwork for future in vivo multi-atlas methods.

      Strengths:

      DeKraker and colleagues have provided novel evidence for the tremendously complicated curvature/gyrification of the hippocampus. This work underscores the challenge that this complicated anatomy presents in our ability to co-register other types of hippocampal data (e.g. MRI data) to appropriately align and study a structure in which the curvature varies considerably across individuals.

      This paper is also important in that it highlights the utility of using post-mortem histological datasets, where ground truth histology is available, to inform our rigorous study of the in vivo brain.

      This work may encourage readers to consider the limitations of the current methods that they currently use to co-register and normalize their MRI data and to question whether these methods are adequate for the examination of subfield activity, microstructure, or perfusion in the hippocampal head, for example. Thus the implications of this work could have a broad impact on the study of hippocampal subfield function in humans.

      Weaknesses:

      As the authors are well aware, hippocampal subfield definitions vary considerably across laboratories. For example, some neuroanatomists (Ding, Palomero-Gallagher, Augustinack) recognize that the prosubiculum is a distinct region from subiculum and CA1 but others (e.g. Insausti, Duvernoy) do not include this as a distinct subregion. Readers should be aware that there is no universal consensus about the definition of certain subfields and that there is still disagreement about some of the boundaries even among the agreed upon regions.

      We thank the Reviewer, and feel this is an accurate summary of our work that also provides useful scientific context.

      Reviewer #2 (Recommendations For The Authors):

      The authors have done a great job with the revisions and have addressed all my concerns. They have clarified aspects of the method and procedure and have included a helpful walk-through explanation of an example subject. The authors have also expanded the discussion and addressed the motivation and justification for certain steps of the procedure.

      We thank the Reviewer.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed my previous comments and I believe the impact and take home message of the paper is more clear.

      We thank the Reviewer.

      In Figure 1, is the proximal-distal label reversed for panel B? I think P (proximal) should be closer to CA4/DG and D (distal) should be closer to subiculum. Am I misreading the graph?

      We thank the Reviewer for this consideration, but the label is as intended. The terms proximal/distal in the hippocampal literature are sometimes relative to the dentate gyrus and sometimes relative to the rest of the cortex. In our case, we use the terms relative to the neocortex, following Ding and Van Hoesen (2015). We have now added the following to clarify this point at the first use of these terms (p.5):

      “The current work, however, defined this tessellation as a regular mesh grid in unfolded space consisting of 256×128 points across the anterior-posterior (A-P) and proximal-distal (P-D) (relative to the neocortex) axes of the unfolded hippocampus, respectively.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      After thoroughly reviewing the comments and suggestions provided by the reviewers, we have revised our manuscript. We sincerely appreciate the reviewers' constructive approach and valuable feedback. We believe that the edited version of the manuscript is now more comprehensible and reader-friendly. Please find our responses to the comments below.

      Reviewer #1 (Public Review):

      This EEG study probes the prediction of a mechanistic account of P300 generation through the presence of underlying (alpha) oscillations with a non-zero mean. In this model, the P300 can be explained by a baseline shift mechanism. That is, the non-zero mean alpha oscillations induce asymmetries in the trial-averaged amplitudes of the EEG signal, and the associated baseline shifts can lead to apparent positive (or negative) deflections as alpha becomes desynchronized at around P300 latency. The present paper examines the predictions of this model in a substantial data set (using the typical P300-generating oddball paradigm and careful analyses). The results show that all predictions are fulfilled: the two electrophysiological events (P300, alpha desynchronization) share a common time course, anatomical sources (from inverse solutions), and covariations with behaviour; plus relate (negatively) in amplitude, while the direction of this relationship is determined by the non-zero-mean deviation of alpha oscillations pre-stimulus (baseline shift index, BSI). This is indicative of a tight link of the P300 with underlying alpha oscillations through a baseline shift account, at least in older adults, and hence that the P300 can be explained in large parts by non-zero mean brain oscillations as they undergo post-stimulus changes.

      Specific comments

      1) The baseline shift model predicts an inverse temporal similarity between alpha envelope changes and P300, confirmed over posterior regions (negative maxima over Pz, Fig 2B). It is therefore intriguing to see in this Figure a very high (positive) correlation in left frontal electrodes. I acknowledge that this is covered in the discussion, but given that this is somewhat unexpected at this point, I suggest providing the readers with a pointer in the Figure legend to this observation and the discussion. Also, I would recommend being more careful with the discussion of this left frontal positive correlation, where a "negative P300" over these areas is mentioned. Given the use of average-referenced sensor data (as opposed to source localized data) and the clear posterior localization of the P300 (Fig 4A), it is likely that what is picked up as "negative ERP potential" over left frontal sites is the posterior P300 forward-projected and inverted through the calculation of the average reference. Accordingly, the interpretation in terms of polarity (positive) of the correlation is likely misleading but what this observation seems to suggest is that other oscillatory processes (than posterior alpha) (e.g. of motor preparation during evidence accumulation) do substantially correlate with the posterior P300 build-up.

      We agree that the name P300 should be used rather for positive potential over posterior sites. We edited the text, substituting mentions of “negative P300” for “negative ER”. Also, the following text has been added to the legend of Figure 2:

      “Note the positive correlation between the low-frequency signal and the alpha amplitude envelope over central sites. Due to the negative polarity of ER over the fronto-central sites, such correlation may still indicate a temporal relationship between the P300 process and oscillatory amplitude envelope dynamics (due to the use of a common average reference). However, it cannot be entirely excluded that additional lateralized response-related activity contributes to this positive correlation (Salisbury et al., 2001).”

      2) Parts of the conclusions are based on a relationship between alpha-amplitude modulation and size of P300-amplitude (amplitude-amplitude) using data binning (illustrated in Fig 3) and the bins seem to include different participants, rather than trials. As this is an analysis of EEG data, I wonder how much of this relationship can be explained by a confound of skull thickness (or other individual differences in anatomy picked up with the scalp measures such as gyral folding patterns and current source orientations etc). E.g. those with thicker/thinner skulls are expected to show less/more of a modulation in all signals. This could be ruled out by relating the bins in alpha modulation not to the P300 but to another event that does not coincide in time with the alpha changes (e.g. P100), where no changes across bins would be expected.

      We are grateful for the suggestions on confound estimation. We repeated the analysis of binning of alpha rhythm amplitude normalised change in relation to early ER, which in our auditory paradigm was N100. The largest change in the alpha amplitude occurs later in the poststimulus window, but that does not necessarily mean that the activity in the window right after the stimulus onset is unaffected. As can be seen in Figure 3 (t-statistics between alpha bins), there is already a significant difference around 100 ms over the central regions of the scalp. For this plot, the broadband data was filtered from 0.1 to 3 Hz, thus assessing only changes in low-frequency signals. We repeated the same analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz, these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). Importantly, this range (4–45 Hz) includes the frequency of N100, which is typically in the alpha range. It means that the differences in N100 are riding on top of the baseline shift created by an unfolding alpha amplitude decrease. When this low-frequency baseline shift was removed, significant differences were no longer visible. This is an indication that differences in P300 amplitude between alpha bins are restricted to the low-frequency range and are not propagated to other ERs with higher frequency content.

      We added Figure S5 to the Supplementary material and introduced it in the main text, the Results section, as follows:

      “The cluster within the earlier window (100–200 ms) over central regions (Figure 3C) possibly reflects the previously shown effect of prestimulus alpha amplitude on earlier ERs (Brandt et al., 1991, Babiloni et al., 2008) but may also be a manifestation of BSM. We tested this assumption for early ER, which in our auditory task was N100. We repeated the binning analysis for broadband data (0.1–45 Hz) and also observed a significant difference between two extreme bins around 100 ms over the central region (Figure S5A). However, if we filter the signal from 4 to 45 Hz (the range that includes the frequency of N100 but not low-frequency baseline shifts), these significant differences almost completely disappear (only electrode TP9 was significant; Figure S5B). It means that the difference in N100 amplitudes over frontal sites is driven by the baseline shift created by an unfolding alpha amplitude decrease. The significant difference at the TP9 electrode possibly reflects a genuine physiological effect of alpha rhythm amplitude on the excitability of a neuronal network and, as a consequence, on the amplitude of ER (as opposed to the baseline-shift mechanism, where the alpha rhythm doesn’t affect the amplitude of ER but creates an additional component of ER; Iemi et al. 2019).”

      3) Related to the above: I assume it can be ruled out that the relationship between baseline-shift index and P300 amplitude (also determined through binning, Fig 6) could be influenced by the above-mentioned confounds, given the inverse relationship?

      As in previous studies alpha rhythm power was found to depend on the size of the head (Candelaria-Cook et al., Cerebral Cortex, 2022), we agree that the contribution of this confounding factor should be estimated (and we did estimate it). However, we would like to point out that we looked into dependencies based on ratios, which eliminates absolute units potentially being affected by head size, skull thickness, etc. For instance, the baseline-shift index is estimated as the Pearson correlation coefficient between the alpha rhythm envelope and low-frequency signal during the resting state. Therefore, multiplying the alpha amplitude envelope by an arbitrary scale would not cause the correlation to change. Nonetheless, for a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. For each electrode, we computed the Pearson correlation between the variable of interest and total intracranial volume. Variables of interest were the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised amplitude (computed as ), and the magnitude of the baseline shift index (BSI). The p-value was set at Bonferroni corrected 0.05. For P300, only one electrode, namely C4, demonstrated a significant correlation of –0.10. However,the C4 electrode is outside of the typical electrode range for P300. For alpha envelope amplitude, significant correlations were observed all over the head (19 out of 31 electrodes, maximum at Cz), and a larger total intracranial volume was related to a higher amplitude of alpha rhythm.

      Candelaria-Cook et al. (Cerebral Cortex, 2022) showed a similar association in longitudinal data from children and adolescents, but the increase in alpha rhythm power in that study might have been due to additional factors beyond a growing head. Conversely, normalised alpha amplitude showed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, only alpha amplitude shows a prominent correlation to total brain volume, thus reducing the concern that head size may be a confound.

      4) This study is based on a sample of older participants. One wonders to what extent this is needed to reveal the alpha-P300 relationships (e.g. more variability in this population than in younger controls), and/or whether other mechanisms may be at play across the lifespan.

      Our study is indeed based on a sample of older participants. However, in our previous study (Studenova et al., PLOS Comp Bio, 2022), we compared young and elderly participants using resting-state data. There, we measured the baseline-shift index (BSI) at rest, and BSI serves as a proxy for baseline shifts present in the task-based data (under the assumptions of the baseline-shift mechanism, ER is in essence a baseline shift). We found that BSIs for elderly participants were smaller in comparison to those for young participants. Yet, the distribution of BSI values across the scalp (as in Figure 6A) was similar between the two age groups.

      Additionally, we observed that larger alpha rhythm power was positively correlated with the magnitude of BSI, but only for younger participants, which points out possible difficulties arising from the fact that elderly people have reduced alpha power. Therefore, we believe that for a sample of young participants, the results should not be different.

      5) Legend to Figure 6: sentence under A: "A positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude, a case that corresponds to negative mean oscillations." I find this sentence at this place in the legend confusing, as Fig 6A seems to illustrate the BSI only (not yet any relationship?).

      We expanded the text in the legend with this paragraph:

      “BSI serves as a proxy for the relation between ER polarity and the direction of alpha amplitude change (Nikulin et al., 2010). Here, we observe predominantly negative BSIs (and thus negative mean oscillations) at posterior sites, which indicates the inverted relation between P300 and alpha amplitude change. Indeed, in the task data, a positive deflection of P300 at posterior sites coincides with a decrease in alpha amplitude.”

      6) Page 4: repetition of "has been" "has been" one after each other in the text We are thankful for this catch. We removed the repetition.

      Reviewer #2 (Public Review):

      The authors attempt to show that event-related changes in the alpha band, namely a decrease in alpha power over parieto/occipital areas, explain the P300 during an auditory target detection task. The proposed mechanism by which this happens is a baseline-shift, where ongoing oscillations which have a non-zero mean undergo an event-related modulation in amplitude which then mimics a low frequency event-related potential. In this specific case, it is a negative-mean alpha-band oscillation that decreases in power post-stimulus and thus mimics a positivity over parieto-occipital areas, i.e. the P300. The authors lay out 4 criteria that should hold if indeed alpha modulation generates the P300, which they then go about providing evidence for.

      Strengths:

      • The authors do go about showing evidence for each prediction rigorously, which is very clearly laid out. In particular, I found the 3rd section connecting resting-state alpha BSI to the P300 quite compelling.

      • The study is obviously very well-powered.

      • Very well-written and clearly laid out. Also, the EEG analysis is thorough overall, with sensible analysis choices made.

      • I also enjoyed the discussion of the literature, albeit with certain strands of P300 research missing.

      Weaknesses:

      In general, if one were to be trying to show the potential overlap and confound of alpha-related baseline shift and the P300, as something for future researchers to consider in their experimental design and analysis choices, the four predictions hold well enough. However, if one were to assert that the P300 is "generated" via alpha baseline shift, even partially, then the predictions either do not hold, or if they do, they are not sufficient to support that hypothesis. This general issue is to be found throughout the review. I will briefly go through each of the predictions in turn:

      1) The matching temporal course of alpha and P300 is not as clear as it could be. Really, for such a strong statement as the P300 being generated by alpha modulation, one would need to show a very tight link between the signals temporally. There are many neural and ocular signals which occur over the course of target detection paradigms: P300, alpha decrease, motor-related beta decrease, the LRP, the CNV, microsaccade rate suppression etc. To specifically go above and beyond this general set of signals and show a tighter link between alpha and P300 requires a deeper comparison. To start, it would be a good idea to show the signals overlapping on the same plot to really get an idea of temporal similarity. Also, with the P300-alpha correlation, how much of this correlation is down to EEG-related issues such as skull thickness, cortical folding, or cognitive issues such as task engagement? One could perhaps find another slow wave ERP, e.g. the Lateralised Readiness Potential, and see if there is a similar strength correlation. If there is not, that would make the P300 relationship stand out.

      Thank you for this comment. In our study, we outline the prerequisites for the baseline-shift mechanism (BSM) and show how they hold for the obtained data. Overall, for all the prerequisites, the evidence could be found in favour of BSM. However, as it is the case for all EEG/MEG data, the non-invasive nature of the data puts constraints on the interpretation of the results. In order to specifically address the points raised by the reviewer about the results, we provide additional information about the overlap (Figure 2) and non-specific anatomical parameters.

      The baseline-shift mechanism makes a general prediction about the generation of some ERs (those that coincide with a change in oscillatory amplitudes). The fact that neuronal oscillations (especially alpha oscillations) are modulated in almost any task indicates that other ERs can also contain a contribution from the baseline-shift mechanism. In our study, it is plausible that several sources of alpha oscillations orchestrated several ER components that appeared on the scalp after the presentation of a target stimulus. Due to the substantial spatial mixing and temporal overlap, it is difficult to disentangle the processes indexing perceptual, memory, or motor functions. However, currently, we are working on showing that the readiness potential (movement related potential) in the classical Libet’s paradigm also complies with the baseline-shift mechanism.

      Concerns about confounds such as skull thickness are valid; therefore, we performed additional analysis. For a subset of participants (1034 participants, mean age 69.8 years, 496 female), we had MRI data, from which we extracted total intracranial volume. We tested the correlation between total intracranial volume and several variables of interest: the peak amplitude of P300, the attenuation-peak amplitude of alpha rhythm, alpha rhythm normalised change, and the magnitude of the baseline shift index (BSI). For P300 amplitude, only the C4 electrode showed a significant correlation of –0.10. For alpha envelope amplitude, there were significant correlations all over the head (19 out of 31 electrodes, maximum at Cz). The correlations showed that a larger total intracranial volume was related to a higher amplitude of alpha rhythm. For a normalised change in alpha amplitude, we observed no significant correlations. Similarly, the absolute value of BSI did not correlate significantly with total intracranial volume at any electrode. Overall, alpha amplitude indeed shows a prominent correlation to total brain volume, but none of the relational variables (normalised amplitude change, BSI) show any correlation.

      In Figure 3, it is clear that alpha binning does not account for even 50% of the variance of P300 amplitude. Again, if there is such a tight link between the two signals, one would expect the majority of P300 variance to be accounted for by alpha binning. As an aside, the alpha binning clearly creates the discrepancy in the baseline period, with all alpha hitting an amplitude baseline at approx. 500ms. I wonder if could you NOT, in fact, baseline your slow wave ERP signal, instead using an appropriate high pass filter (see "EEG is better left alone", Arnaud Delorme, 2023) and show that the alpha binning creates the difference in ERP at the baseline which then is reinterpreted as a P300 peak difference after baselining.

      The difference in the baseline window for alpha rhythm amplitude is indeed prominent (Figure R1A,B), so we proceed with the suggested analysis. Before anything else, we would like to reiterate that the baseline correction per se does not generate ER; it just moves the whole curve (in the pre- and poststimulus intervals) up and down. Firstly, we repeated the analysis without baseline correction (filter 0.1–3 Hz) and still observed the difference in P300 amplitude across bins (Figure R1D). Moreover, based on cluster-based permutation testing, ERs in the two most extreme bins were not significantly different in the prestimulus window. However, when we opt for no baseline correction, there will still be a baseline, namely, the average of the signal will be zero within a filtering window (e.g., 10 sec for a high-pass filter at 0.1 Hz). Thus, secondly, we computed an ER but with the baseline in the poststimulus window (400–600 ms; Figure R1E). In this case, the difference between bin 1 and bin 5 (for the prestimulus interval) in the window before 0 ms was significant in the posterior regions. The differences in the baseline are perceived as being smaller than the differences in alpha amplitude. This can be attributed to the fact that there are other low-frequency processes in the EEG signal that are different from alpha baseline shifts. Additionally, P300 in bin 1 in comparison with P300 in bin 5 is significantly different in shape (Figure R1C). This can be an indication of overlapping components; namely, for bin 5 (where alpha amplitude change is the highest), associated baseline shift dominates, and for bin 1 (where alpha amplitude change is the smallest), associated baseline shift is hidden behind other components. We believe that this proposed analysis demonstrates the intuition behind the baseline-shift mechanism: the baseline shift is generated due to a change in the oscillatory amplitude; and the change is simply the difference between two time points.

      Author response image 1.

      The difference in the strength of alpha amplitude modulation correlates with the difference in P300 amplitude. A. The alpha rhythm amplitude was binned according to the percentage of change. The bins were the following: (66, –25), (–25, –37), (–37, –47), (–47, –58), (–58,–89) % change. A is identical to Figure 3A, main text. B. The alpha rhythm amplitude is multiplied by –1 and evened within the prestimulus window. This may be an approximation for baseline shifts in the low-frequency signal. C. P300 responses are sorted into the corresponding bins. The C is identical to Figure 3B, main text. D. P300 are obtained without applying a baseline correction and are sorted into the corresponding bins. The difference in peak amplitude of P300 remains visible and significant. E. P300 is baselined at 400–600 ms. As a consequence, there are significant differences in the prestimulus window.

      2) The topographies are somewhat similar in Figure 4, but not overwhelmingly so. There is a parieto-occipital focus in both, but to support the main thesis, I feel one would want to show an exact focus on the same electrode. Showing a general overlap in spatial distribution is not enough for the main thesis of the paper, referring to the point I make in the first paragraph re Weaknesses. Obviously, the low density montage here is a limitation. Nevertheless, one could use a CSD transform to get more focused topographies (see https://psychophysiology.cpmc.columbia.edu/software/csdtoolbox/), which apparently does still work for lower-density electrode setups (see Kayser and Tenke, 2006).

      As we mentioned in our provisional response, we believe that we would not benefit from using CSD. First, the CSD transform is a spatial high-pass filter, and, hence, it is commonly used for spatially localised activities. In our case, we have two activities—P300 and alpha amplitude decrease—that are widespread with low spatial frequency, and we believe that applying CSD is not helpful. Second, CSD is more sensitive to surface sources that emanate from the crowns of gyri. For activity in the P300 window, there is a possibility that sources are localised within the longitudinal fissure. Third, as we completely agree that low density montage is a limitation, we used source reconstruction with eLoreta (Figure 5) to clarify the spatial localisation of the potential source of P300 and alpha amplitude change, which indeed shows a considerable spatial overlap.

      3) Very nice analysis in Figure 6, probably the most convincing result comparing BSI in steady state to P300, thus at least eliminating task-related confounds.

      4) Also a good analysis here, wherein there seem to be similar correlation profiles across P300 and alpha modulation. One analysis that would really nail this down would be a mediation analysis (Baron and Kenny, 1986; https://davidakenny.net/cm/mediate.htm), where one could investigate if e.g. the relationship between P300 amplitude and CERAD score is either entirely or partially mediated by alpha amplitude. One could do this for each of the relationships. To show complete mediation of P300 relationship with a cog task via alpha would be quite strong.

      We agree that mediation analysis better suits the purpose of our claim. We added this analysis to the edited version of the manuscript. Additionally, we became concerned that the total alpha power effect may be driving the correlation. Therefore, we used alpha amplitude change in percentage instead of the absolute values of the amplitude. Significant mediation was present only for attention and executive scores.

      In the updated version of the manuscript, the Methods section reads as follows:

      “The correlation between cognitive scores (see Methods/Cognitive tests) and the amplitude and latency of P300 and alpha oscillations was calculated with linear regression using age as a covariate (R lme4, Bates et al., 2015). To estimate what proportion of the correlation between P300 and cognitive score is mediated by alpha oscillations, we used mediation analysis (Baron et al., 1986; R mediation, Tingley et al, 2014). First, we estimated the effect of P300 on the cognitive variable of interest (total effect, cogscore ~ P300+age). Second, we computed the association between P300 and alpha oscillations (the effect on the mediator, alpha ~ P300). Third, we run the full model (the effect of the mediator on the variable of interest, cogscore ~ P300+alpha+age). Lastly, we estimated the proportion mediated.”

      The Results section reads as follows:

      “Stimulus-based changes in brain signals are thought to reflect cognitive processes that are involved in the task. A simultaneous and congruent correlation of P300 and alpha rhythm to a particular cognitive score would be another evidence in favour of the relation between P300 and alpha oscillations. Moreover, if thus found, the correlation directions should correspond to the predictions according to BSM. Along with the EEG data, in the LIFE data set, a variety of cognitive tests were collected, including the Trail-making Test (TMT) A&B, Stroop test, and CERADplus neuropsychological test battery (Loeffler et al., 2015). From the cognitive tests, we extracted composite scores for attention, memory, and executive functions (Liem et al., 2017, see Methods/Cognitive tests) and tested the correlation between composite cognitive scores vs. P300 and vs. alpha amplitude modulation. The scores were available for a subset of 1549 participants (out of 2230), age range 60.03–80.01 years old. Cognitive scores correlated significantly with age (age and attention: −0.25, age and memory: −0.20, age and executive function: −0.23). Therefore, correlations between cognitive scores and electrophysiological variables were evaluated, regressing out the effect of age. To rule out the possibility of a absolute alpha power association with cognitive scores, for this analysis, we used alpha amplitude normalised change computed as , where 𝐴 𝑝𝑜𝑠𝑡 is at the latency of strongest amplitude decsease. Computed this way, negative alpha amplitude change would correspond to a more pronounced decrease, i.e., stronger oscillatory response.

      To increase the signal-to-noise ratio of both P300 and alpha rhythm, we performed spatial filtering (see Methods/Spatial filtering, Figures 7B,C). Following this procedure, both P300 and alpha latency, but not amplitude, significantly correlated with attention scores (Figure 7A, left column). Larger latencies were related to lower attentional scores, which corresponded to a longer time-to-complete of TMT and Stroop tests and hence poorer performance. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.12. Memory scores were positively related to P300 amplitude and negatively to P300 latency (Figure 7A, middle column). The direction of correlation is such that higher memory scores, which reflected more recalled items, corresponded to a higher P300 amplitude and an earlier P300 peak. The association between alpha rhythm parameters and memory scores is not significant, but it goes in the same direction as the association for P300. Executive function (Figure 7A, right column) were related significantly to both P300 and alpha amplitude latencies. The proportion of correlation between P300 latency and attention, mediated by alpha attenuation peak latency, is 0.14. Overall, the direction of correlation is similar for P300 and alpha oscillations, as expected for BSM. Moreover, the direction of correlation is consistent across cognitive functions.

      And an additional paragraph in the Discussion:

      “The mediation analysis showed that the modulation of alpha oscillations only partially explained the correlation between P300 and cognitive variables. This, in general, corresponds to the idea that not the whole P300 but only its fraction can be explained by the changes in the alpha amplitudes. Figure 5 shows that alpha oscillations change not only in the cortical areas where P300 is generated; therefore, we cannot expect a complete correspondence between the two processes. Moreover, since cognitive tests and EEG recordings were performed at different time points, the associations between the cognitive variables and EEG markers are expected to be rather weak and to reflect only some neuronal processes common to P300, alpha rhythm, and tasks. For these reasons, a complete mediation of one EEG variable through another EEG variable in the context of a separate cognitive assessment cannot be expected.”

      One last point, from the methods it appears that the task was done with eyes closed? That is an extremely important point when considering the potential impact of alpha amplitude modulation on any other EEG component due to the well-known substantial increase in alpha amplitude with eyes closed versus open. I wonder, would we see any of these effects with eyes opened?

      The task was auditory and was indeed conducted in an eyes-closed state. In an eyes-closed state, alpha rhythm amplitude in the occipital regions shows a prominent increase. However, we believe that in our case, it was neither an advantage nor a disadvantage. First, occipital sources of alpha rhythm that demonstrate an increase in amplitude are not likely to be those sources that attenuate as a reaction to a target tone. The source reconstruction of alpha rhythm amplitude change (although with a limited number of channels) displayed widespread regions with a prominent decrease on the posterior midline, including the precuneus and posterior cingulate cortex (which contain polymodal association areas; Leech et al., Brain, 2014; Al-Ramadhani et al., Epileptic Disord, 2021). Second, in our previous study, we tested resting-state data with both eyes-closed and eyes-open conditions. There, we computed the baseline-shift index (BSI), which serves as an approximation for estimating if oscillations have a non-zero mean. We found no significant difference between the eyes-open and eyes-closed states in terms of the absolute value of the BSI. Moreover, the average distribution of BSIs on the scalp was the same for both conditions.

      Overall, there is a mix here of strengths of claims throughout the paper. For example, the first paragraph of the discussion starts out with "In the current study, we provided comprehensive evidence for the hypothesis that the baseline-shift mechanism (BSM) is accountable for the generation of P300 via the modulation of alpha oscillations." and ends with "Therefore, P300, at least to a certain extent, is generated as a consequence of stimulus-triggered modulation of alpha oscillations with a non-zero mean." In the limitations section, it says the current study speaks for a partial rather than exhausting explanation of the P300's origin. I would agree with the first part of that statement, that it is only partial. I do not agree, however, that it speaks to the ORIGIN of the P300, unless by origin one simply means the set of signals that go to make up the ERP component at the scalp-level (as opposed to neural origin).

      We have edited parts of the manuscript that have overly exuberant claims. However, we would argue further that alpha rhythm amplitude change does partially explain P300 origin. When a stimulus is being processed by the neuronal network, some part of this network presumably breaks from synchronous oscillation mode. Hence, on the scalp, we observe a decrease in oscillatory amplitude. According to the baseline-shift mechanism (BSM), this stimulus-related decrease in the amplitude generates the baseline shift in the frequency range of modulation (under 3 Hz for alpha rhythm). The P300 component that is explained by alpha rhythm amplitude modulation is, in essence, a baseline shift. Therefore, the origin of a part of P300 is the oscillating network that was pushed out of its synchronous oscillating regime.

      Again, I can only make these hopefully helpful criticisms and suggestions because the paper is very clearly written and well analysed. Also, the fact that alpha amplitude modulation potentially confounds with P300 amplitude via baseline shift is a valuable finding.

      Specific comments:

      Perhaps give a brief overview of the task involved at the start. I know it is not particularly relevant, but I think necessary for those unfamiliar with cog tasks.

      We added a short description of a task in the Introduction section.

      “In this data set, the experimental task was an auditory oddball paradigm. Participants would hear tones, one type of which—the target tone—would occur in only 12% of trials. Target tones elicit both P300 and the modulation of the alpha amplitude. ”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides new insights into history-dependent biases in human perceptual decisionmaking. It provides compelling behavioral and MEG evidence that humans adapt their historydependent to the correlation structure of uncertain sensory environments. Further neural data analyses would strengthen some of the findings, and the studied bias would be more accurately framed as a stimulus- or outcome-history bias than a choice-history bias because tested subjects are biased not by their previous choice, but by the previous feedback (indicating the category of the previous stimulus).

      Thank you for your constructive evaluation of our manuscript. We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors. We have also added several of your suggested neural data analyses so as to strengthen the support for our conclusions, and we have elaborated on the Introduction so as to clarify the gaps in the literature that our study aims to fill. Our revisions are detailed in our replies below. We also took the liberty to reply to some points in the Public Review, which we felt called for clarification of the main aims (and main contribution) of our study.

      Reviewer #1 (Public Review):

      This paper aims to study the effects of choice history on action-selective beta band signals in human MEG data during a sensory evidence accumulation task. It does so by placing participants in three different stochastic environments, where the outcome of each trial is either random, likely to repeat, or likely to alternate across trials. The authors provide good behavioural evidence that subjects have learnt these statistics (even though they are not explicitly told about them) and that they influence their decision-making, especially on the most difficult trials (low motion coherence). They then show that the primary effect of choice history on lateralised beta-band activity, which is well-established to be linked to evidence accumulation processes in decision-making, is on the slope of evidence accumulation rather than on the baseline level of lateralised beta.

      The strengths of the paper are that it is: (i) very well analysed, with compelling evidence in support of its primary conclusions; (ii) a well-designed study, allowing the authors to investigate the effects of choice history in different stochastic environments.

      Thank you for pointing out these strengths of our study.

      There are no major weaknesses to the study. On the other hand, investigating the effects of choice/outcome history on evidence integration is a fairly well-established problem in the field. As such, I think that this provides a valuable contribution to the field, rather than being a landmark study that will transform our understanding of the problem.

      Your evaluation of the significance of our work made us realize that we may have failed to bring across the main gaps in the literature that our current study aimed to fill. We have now unpacked this in our revised Introduction.

      Indeed, many previous studies have quantified history-dependent biases in perceptual choice. However, the vast majority of those studies used tasks without any correlation structure; only a handful of studies have quantified history biases in tasks entailing structured environments, as we have done here (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020). The focus on correlated environments matters from an ecological perspective, because (i) natural environments are commonly structured rather than random (a likely reason for history biases being so prevalent in the first place), and (ii) history biases that change flexibly with the environmental structure are a hallmark of adaptive behavior. Critically, the few previous studies that have used correlated environments and revealed flexible/adaptive history biases were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases.

      Furthermore, although several previous studies have identified neural correlates of history biases in standard perceptual choice tasks in unstructured environments (see (Talluri et al., 2021) for a brief overview), most have focused on static representations of the bias in ongoing activity preceding the new decision; only a single monkey physiology study has tested for both a static bias in the pre-stimulus activity and a dynamic bias building up during evidence accumulation (Mochol et al., 2021). Ours is the first demonstration of a dynamic bias during evidence accumulation in the human brain.

      The authors have achieved their primary aims and I think that the results support their main conclusions. One outstanding question in the analysis is the extent to which the source-reconstructed patches in Figure 2 are truly independent of one another (as often there is 'leakage' from one source location into another, and many of the different ROIs have quite similar overall patterns of synchronisation/desynchronisation.).

      We do not assume (and nowhere state) that the different ROIs are “truly independent” of one another. In fact, patterns of task-related power modulations of neural activity would be expected to be correlated between many visual and action-related cortical areas even without leakage (due to neural signal correlations). So, one should not assume independence even for intracortically recorded local field potential data, fMRI data, or other data with minimal spatial leakage effects. That said, we agree that filter leakage will add a (trivial) component to the similarity of power modulations across ROIs, which can and should be quantified with the analysis you propose.

      A possible way to investigate this further would be to explore the correlation structure of the LCMV beamformer weights for these different patches, to ask how similar/dissimilar the spatial filters are for the different reconstructed patches.

      Thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.

      That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).

      Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.

      We have now clarified these points in the paper.

      Reviewer #2 (Public Review):

      In this work, the authors use computational modeling and human neurophysiology (MEG) to uncover behavioral and neural signatures of choice history biases during sequential perceptual decision-making. In line with previous work, they see neural signatures reflecting choice planning during perceptual evidence accumulation in motor-related regions, and further show that the rate of accumulation responds to structured, predictable environments suggesting that statistical learning of environment structure in decision-making can adaptively bias the rate of perceptual evidence accumulation via neural signatures of action planning. The data and evidence show subtle but clear effects, and are consistent with a large body of work on decision-making and action planning.

      Overall, the authors achieved what they set out to do in this nice study, and the results, while somewhat subtle in places, support the main conclusions. This work will have impact within the fields of decisionmaking and motor planning, linking statistical learning of structured sequential effects in sense data to evidence accumulation and action planning.

      Strengths:

      • The study is elegantly designed, and the methods are clear and generally state-of-the-art

      • The background leading up to the study is well described, and the study itself conjoins two bodies of work - the dynamics of action-planning processes during perceptual evidence accumulation, and the statistical learning of sequential structure in incoming sense data

      • Careful analyses effectively deal with potential confounds (e.g., baseline beta biases)

      Thank you for pointing out these strengths of our study.

      Weaknesses:

      • Much of the study is primarily a verification of what was expected based on previous behavioral work, with the main difference (if I'm not mistaken) being that subjects learn actual latent structure rather than expressing sequential biases in uniform random environments.

      As we have stated in our reply to the overall assessment above, we realize that we may have failed to clearly communicate the novelty of our current results, and we have revised our Introduction accordingly. It is true that most previous studies of history biases in perceptual choice have used standard tasks without across-trial correlation structure. Only a handful of studies have quantified history biases in tasks entailing structured environments that varied from one condition to the next (Abrahamyan et al., 2016; Kim et al., 2017; Braun et al., 2018; Hermoso-Mendizabal et al., 2020), and showed that history biases change flexibly with the environmental structure. Our current work adds to this emerging picture, using a specific task setting analogous to one of these previous studies done in rats (Hermoso-Mendizabal et al., 2020).

      Critically, all the previous studies that have revealed flexible/adaptive history biases in correlated environments were purely behavioral. Ours is the first to characterize the neural correlates of adaptive history biases. And it is also the very first demonstration of a dynamic history-dependent bias (i.e., one that gradually builds up during evidence accumulation) in the human brain.

      Whether this difference - between learning true structure or superstitiously applying it when it's not there - is significant at the behavioral or neural level is unclear. Did the authors have a hypothesis about this distinction? If the distinction is not relevant, is the main contribution here the neural effect?

      We are not quite sure what exactly you mean with “is significant”, so we will reply to two possible interpretations of this statement.

      The first is that you may be asking for evidence for any difference between the estimated history biases in the structured (i.e., Repetitive, Alternating) vs. the unstructured (i.e., Neutral) environments used in our experiment. We do, in fact, provide quantitative comparisons between the history biases in the structured and Neutral environments at the behavioral level. Figure 1D and Figure 1 – figure supplement 2A and accompanying text show a robust and statistically significant difference in history biases. Specifically, the previous stimulus weights differ between each of the biased environments and the Neutral environment and the weights shifted in expected and opposite directions for both structured environments, indicating a tendency to repeat the previous stimulus category in Repetitive and vice versa in Alternating (Figure1D). Going further, we also demonstrate that the adjustment of the history is behaviorally relevant in that it improves performance in the two structured environments, but not in the unstructured environment (Figure 1F and Figure 1 – figure supplement 2A and figure supplement 3).

      The second is that you refer to the question of whether the history biases are generated via different computations in structured vs. random environments. Indeed, this is a very interesting and important question. We cannot answer this question based on the available results, because we here used a statistical (i.e., descriptive) model. Addressing this question would require developing and fitting a generative model of the history bias and comparing the inferred latent learning processes between environments. This is something we are doing in ongoing work.

      • The key effects (Figure 4) are among the more statistically on-the-cusp effects in the paper, and the Alternating group in 4C did not reliably go in the expected direction. This is not a huge problem per se, but does make the key result seem less reliable given the clear reliability of the behavioral results

      The model-free analyses in Figure 3C and 4B, C from the original version of our manuscript were never intended to demonstrate the “key effects”, but only as supplementary to the results from the modelbased analyses in Figures 3C and 4D, E in our current version of the manuscript. The latter show the “key effects” because they are a direct demonstration of the shaping of build-up of action-selective activity by history bias.

      To clarify this, we now decided to focus Figures 3 and 4 on the model-based analyses only. This decision was further supported by noticing a confound in our model-independent analyses in new control analyses prompted by Reviewer #3.

      Please note that the alternating bias in the Alternating environment is also less strong at the behavioral level compared to the bias in the Repetitive condition (see Figure 1D). A possible explanation is that a sequence of repetitive stimuli produces stronger prior expectations (for repetition) than an equally long sequence of alternating stimuli (Meyniel et al., 2016). This might also induce the bias to repeat the previous stimulus category in the Neutral condition (Figure 1D). Moreover, this intrinsic repetition bias might counteract the bias to alternate the previous stimulus category in Alternating.

      • The treatment of "awareness" of task structure in the study (via informal interviews in only a subsample of subjects) is wanting

      Agreed. We have now removed this statement from Discussion.

      Reviewer #3 (Public Review):

      This study examines how the correlation structure of a perceptual decision making task influences history biases in responding. By manipulating whether stimuli were more likely to be repetitive or alternating, they found evidence from both behavior and a neural signal of decision formation that history biases are flexibly adapted to the environment. On the whole, these findings are supported across an impressive range of detailed behavioral and neural analyses. The methods and data from this study will likely be of interest to cognitive neuroscience and psychology researchers. The results provide new insights into the mechanisms of perceptual decision making.

      The behavioral analyses are thorough and convincing, supported by a large number of experimental trials (~600 in each of 3 environmental contexts) in 38 participants. The psychometric curves provide clear evidence of adaptive history biases. The paper then goes on to model the effect of history biases at the single trial level, using an elegant cross-validation approach to perform model selection and fitting. The results support the idea that, with trial-by-trial accuracy feedback, the participants adjusted their history biases due to the previous stimulus category, depending on the task structure in a way that contributed to performance.

      Thank you for these nice words on our work.

      The paper then examines MEG signatures of decision formation, to try to identify neural signatures of these adaptive biases. Looking specifically at motor beta lateralization, they found no evidence that starting-level bias due to the previous trial differed depending on the task context. This suggests that the adaptive bias unfolds in the dynamic part of the decision process, rather than reflecting a starting level bias. The paper goes on to look at lateralization relative to the chosen hand as a proxy for a decision variable (DV), whose slope is shown to be influenced by these adaptive biases.

      This analysis of the buildup of action-selective motor cortical activity would be easier to interpret if its connection with the DV was more explicitly stated. The motor beta is lateralized relative to the chosen hand, as opposed to the correct response which might often be the case. It is therefore not obvious how the DV behaves in correct and error trials, which are combined together here for many of the analyses.

      We have now unpacked the connection of the action-selective motor cortical activity and decision variable in the manuscript, as follows:

      “This signal, referred to as ‘motor beta lateralization’ in the following, has been shown to exhibit hallmark signatures of the DV, specifically: (i) selectivity for choice and (ii) ramping slope that depends on evidence strength (Siegel et al., 2011; Murphy et al., 2021; O’Connell and Kelly, 2021).”

      Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right). This pattern matches what would be expected for a neural signature of the DV, because errors are more frequently made on weak-evidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).

      --

      As you will see, all three reviewers found your work to provide valuable insights into history-dependent biases during perceptual decision-making. During consultation between reviewers, there was agreement that what is referred as a choice-history bias in the current version of the manuscript should rather be framed as a stimulus- or outcome-history bias (despite the dominant use of the term 'choicehistory' bias in the existing literature), and the reviewers pointed toward further analyses of the neural data which they thought would strengthen some of the claims made in the preprint. We hope that these comments will be useful if you wish to revise your preprint.

      We are pleased to hear that the reviewers think our work provides valuable insights into historydependent biases in perceptual decision-making. We thank you for your thoughtful and constructive evaluation of our manuscript.

      We have followed your suggestion to frame the studied bias as ‘stimulus history bias’. We now use this term whenever referring to our current results. Please note that we instead use the generic term ‘history bias’ when referring to the history biases studied in the previous literature on this topic in general. This is because these biases were dependent on previous choice(s), previous stimuli, or previous outcomes, or combinations of some (or all) of these factors.

      We have also performed several of your suggested neural data analyses so as to strengthen the support for our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      One suggestion is to explore the correlation structure of the LCMV beam former weights for the regions of interest in the study, for the reasons outlined in my public review.

      Again, thank you for suggesting this analysis, which provides a very useful context for interpreting the pattern of results shown in our Figure 2. We have now computed (Pearson) correlation coefficients of the LCMV beamformer weights across the regions of interest. The results are shown in the new Figure 2 – figure supplement 1. This analysis provided evidence for minor leakage between the source estimates for neighboring cortical regions (filter correlations <= than 0.22 on average across subjects) and negligible leakage for more distant regions. We now clearly state this when referring to Figure 2.

      That said, we would also like to clarify our reasoning behind Figure 2. Our common approach to these source-reconstructed MEG data is to focus on the differences, rather than the similarities between ROIs, because the differences cannot be accounted for by leakage. Our analyses show clearly distinct, and physiologically plausible functional profiles across ROIs (motion coherence encoding in visual regions, action choice coding in motor regions), in line with other work using our general approach (Wilming et al., 2020; Murphy et al., 2021; Urai and Donner, 2022).

      Most importantly, our current analyses focus on the impact of history bias on the build-up of actionselective activity in downstream, action-related areas; and we chose to focus on M1 only in order to avoid hard-to-interpret comparisons between neighboring action-related regions. Figure 2 is intended as a demonstration of the data quality (showing sensible signatures for all ROIs) and as a context for the interpretation of our main neural results from M1 shown in the subsequent figures. So, all our main conclusions are unaffected by leakage between ROIs.

      We have now clarified also these points in the paper.

      I also wondered if the authors had considered:

      (i) the extent to which the bias changes across time, as the transition probabilities are being learnt across the experiment? given that these are not being explicitly instructed to participants, is any modelling possible of how the transition structure is itself being learnt over time, and whether this makes predictions of either behaviour or neural signals?

      We refer to this point in the discussion. The learning of the transition probabilities which can and should be addressed. This requires generative models that capture the learning of the transition structure over time (Yu and Cohen, 2009; Meyniel et al., 2016; Glaze et al., 2018; Hermoso-Mendizabal et al., 2020).

      The fact that our current statistical modeling approach successfully captures the bias adjustment between environments implies that the learning must be sufficiently fast. Tracking this process explicitly would be an exciting and important endeavor for the future. We think it is beyond the scope of the present study focusing on the trial-by-trial effect of history bias (however generated) on the build-up of action-selective activity.

      (ii) neural responses at the time of choice outcome - given that so much of the paper is about the update of information in different statistical environments, it seems a shame that no analyses are included of feedback processing, how this differs across the different environments, and how might be linked to behavioural changes at the next trial.

      We agree that the neural responses to feedback are a very interesting topic. We currently analyze these in another ongoing project on (outcome) history bias in a foraging task. We will consider re-analyzing the feedback component in the current data set, in this new study as well.

      However, this is distinct from the main question that is in the focus of our current paper – which, as elaborated above, is important to answer: whether and how adaptive history biases shape the dynamics of action-selective cortical activity in the human brain. While interesting and important, neural responses to feedback were not part of this question. So, we prefer to keep the focus of our paper on our original question.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      -pg. 7: "inconstant"

      -some citations (e.g., Barbosa 2020) are missing from the bibliography

      Thank you for pointing this out. We have fixed these.

      -figure S2 is very useful! could probably go in main text.

      We agree that this figure is important. But we decided to show it in the Supplement (now Figure 1 – figure supplement 2) after careful consideration for two reasons. First, we wanted to put the reader’s focus on the stimulus weights, because it is those weights, which are flexibly adjusted to the statistics of the environment rather than the choice weights, which seem less adaptive (i.e., stereotypical across environments) and idiosyncratic. Second, plotting the previous stimulus weights only enabled to add the individual weights in the Neutral condition, which would have been to cluttered to add to figure S2.

      For these reasons, we feel that this Figure is more suitable for expert readers with a special interest in the details of the behavioral analyses and would be better placed in the Supplement. These readers will certainly be able to find and interpret that information in the Supplement.

      Reviewer #3 (Recommendations For The Authors):

      I would suggest that a more in depth description of the previous literature that explains exactly how the features of the lateralized beta--as it is formulated here-- reflect the decision variable would assist with the readers' understanding. A demonstration of how the lateralized beta behaves under different coherence conditions, or for corrects vs errors, for example, might be helpful for readers.

      We now provide a more detailed description of how/why the motor beta lateralization is a valid proxy of DV in the revised paper.

      We have demonstrated the dependence of the ramping of the motor beta lateralization on the motion coherence using a regression model with current signed motion coherence as well as single trial bias as regressors. The beta weights describing the impact of the signed motion coherence on the amplitude as well as on the slope of the motor beta lateralization are shown in Figure 4G (now 4E). As expected, stronger motion coherence induces a steeper downward slope of the motor beta lateralization.

      Furthermore, we have added a figure of the time course of the motor beta lateralization separately for correct and error trials, locked to both stimulus onset and to motor response (Figure 2 – figure supplement 2). This signal reached statistical significance earlier for correct than error trials, and during the stimulus interval it ramped to a larger (i.e., more negative) amplitude for correct trials (Figure 2 – figure supplement 2, left). But the signal was indistinguishable in amplitude between correct and error trials around the time of the motor response (Figure 2 – figure supplement 2, right).This pattern matches what would be expected for a neural signature DV, because errors are more frequently made on weakevidence trials than correct choices and because even for matched evidence strength, the DV builds up more slowly before error trials in accumulator models (Ratcliff and McKoon, 2008).

      Finally, please note that our previous studies have demonstrated that the time course of the beta lateralization during the trial closely tracks the time course of a normative model-derived DV (Murphy et al., 2021) and that the motor beta ramping slope is parametrically modulated by motion coherence (de Lange et al., 2013), which is perfectly in line with the current results.

      Along similar lines, around figures 3c and 4B, some control analyses may be helpful to clarify whether there are differences between the groups of responses consistent and inconsistent with the previous trial (e.g. correctness, coherence) that differ between environments, and also could influence the lateralized beta.

      Thank you for pointing us to this important control analysis. We have done this, and indeed, it identified accuracy and motion strength as possible confounds (Author response image 1). Specifically, proportion correct as well as motion coherence were larger for consistent vs. inconsistent conditions in Repetitive and vice versa in Alternating. Those differences in accuracy and coherence might indeed influence the slope of the motor beta lateralization that our model-free analysis had identified, rendering the resulting difference between consistent and inconsistent difficult to interpret unambiguously in terms of bias. Thus, we have decided to drop the consistency (i.e., model-independent) analysis and focus completely on the modelbased analyses.

      Author response image 1.

      Proportion correct and motion coherence split by environment and consistency of current choice and previous stimulus. In the Repetitive environment (Rep.), accuracy and motion coherence are larger for current choice consistent vs. inconsistent with previous stimulus category and vice versa in the Alternating environment (Alt.).

      Importantly, this decision has no implications for the conclusions of our paper: The model-independent analyses in the original versions of Figure 3 and 4 were only intended as a supplement to the most conclusive and readily interpretable results from the model-based analyses (now in Figs. 3C and 4D, E. The latter are the most direct demonstration of a shaping of build-up of action-selective activity by history bias, and they are unaffected by these confounds.

      In addition, I wondered whether the bin subsampling procedure to match trial numbers for choice might result in unbalanced coherences between the up and down choices.

      The subsampling itself did not cause any unbalanced coherences between the up and down choices, which we now show in Figure 4 – figure supplement 1. There was only a slight imbalance in coherences between up and down choices before the subsampling which then translated into the subsampled trials but the coherences were equally distributed before as compared to after the subsampling.

      Also, please note that the purpose of this analysis was to make the neural bias directly “visible” in the beta lateralization data, rather than just regression weights. The issue does not pertain to the critical single-trial regression analysis, which yielded consistent results.

      References

      Abrahamyan A, Silva LL, Dakin SC, Carandini M, Gardner JL (2016) Adaptable history biases in human perceptual decisions. Proceedings of the National Academy of Sciences 113:E3548–E3557.

      Braun A, Urai AE, Donner TH (2018) Adaptive History Biases Result from Confidence-weighted Accumulation of Past Choices. The Journal of Neuroscience:2189–17. de Lange FP, Rahnev DA, Donner TH, Lau H (2013) Prestimulus Oscillatory Activity over Motor Cortex Reflects Perceptual Expectations. Journal of Neuroscience 33:1400–1410.

      Glaze CM, Filipowicz ALS, Kable JW, Balasubramanian V, Gold JI (2018) A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat Hum Behav 2:213–224.

      Hermoso-Mendizabal A, Hyafil A, Rueda-Orozco PE, Jaramillo S, Robbe D, de la Rocha J (2020) Response outcomes gate the impact of expectations on perceptual decisions. Nat Commun 11:1057.

      Kim TD, Kabir M, Gold JI (2017) Coupled Decision Processes Update and Maintain Saccadic Priors in a Dynamic Environment. The Journal of Neuroscience 37:3632–3645.

      Meyniel F, Maheu M, Dehaene S (2016) Human Inferences about Sequences: A Minimal Transition Probability Model Gershman SJ, ed. PLOS Computational Biology 12:e1005260.

      Mochol G, Kiani R, Moreno-Bote R (2021) Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology 31:1234-1244.e6.

      Murphy PR, Wilming N, Hernandez-Bocanegra DC, Prat-Ortega G, Donner TH (2021) Adaptive circuit dynamics across human cortex during evidence accumulation in changing environments. Nat Neurosci 24:987–997.

      O’Connell RG, Kelly SP (2021) Neurophysiology of Human Perceptual Decision-Making. Annu Rev Neurosci 44:495–516.

      Ratcliff R, McKoon G (2008) The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks. Neural Computation 20:873–922.

      Siegel M, Engel AK, Donner TH (2011) Cortical Network Dynamics of Perceptual Decision-Making in the Human Brain. Frontiers in Human Neuroscience 5 Available at: http://journal.frontiersin.org/article/10.3389/fnhum.2011.00021/abstract [Accessed April 8, 2017].

      Talluri BC, Braun A, Donner TH (2021) Decision making: How the past guides the future in frontal cortex. Current Biology 31:R303–R306.

      Urai AE, Donner TH (2022) Persistent activity in human parietal cortex mediates perceptual choice repetition bias. Nat Commun 13:6015.

      Wilming N, Murphy PR, Meyniel F, Donner TH (2020) Large-scale dynamics of perceptual decision information across human cortex. Nat Commun 11:5109.

      Yu A, Cohen JD (2009) Sequential effects: Superstition or rational behavior. Advances in neural information processing systems 21:1873–1880.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this ms, Tejeda-Muñoz and colleagues examine the roles of macropinocytosis in WNT signalling activation in development (Xenopus) and cancer (CRC sections, cell lines and xenograft experiments). Furthermore, they investigate the effect of the inflammation inducer Phorbol-12-myristate-13-acetate (PMA) in WNT signalling activation through macropinocytosis. They propose that macropinocytosis is a key driver of WNT signalling, including upon oncogenic activation, with relevance in cancer progression.

      I found the analyses and conclusions of the relevance of macropinocytosis in WNT signalling compelling, notably upon constitutive activation both during development and in CRC.

      Thank you.

      However, I think this manuscript only partially characterises the effects of PMA in WNT signalling, largely due to a lack of an epistatic characterisation of PMA roles in Wnt activation. For example: 1- The authors show that PMA cooperate with 1) GSK3 inhibition in Xenopus to promote WNT activation, and 2) (possibly) with APCmut in SW480 to induce b-cat and FAK accumulation. To sustain a specific functional interaction between WNT and PMA, the effects should be tested through additional epistatic experiments. For example, does PMA cooperate with Wnt8 in axis duplication analyses? Does PMA cooperate with any other WNT alteration in CRC or other cell lines? Importantly, does APC re-introduction in SW480 rescue the effect of PMA? Such analyses could be critical to determine specificity of the functional interactions between WNT and PMA. This question could be addressed by performing classical epistatic analyses in cell lines (CRC or HEK) focusing on WNT activity, and by including rescue experiments targeting the WNT pathway downstream of the effects e.g., dnTCF, APC re- introduction, etc.

      We agree that there was need for additional direct evidence of functional interactions of between macropinocytosis, Wnt signaling, and PMA beyond the previously provided target gene assays in Xenopus (now shown in Figure 1I) and luciferase assays in cultured cells (Figure 1J) which used LiCl and inhibition by Bafilomycin. We therefore carried out a new experiment using 3T3 cells, now shown in Figure 1K-P. Wnt3a protein increased the uptake of TMR-dextran 70 kDa, and PMA enhanced this response. The macropinocytosis inhibitor EIPA blocked induction of macropinocytosis by Wnt3a and PMA. These results were quantitated in Figure 1Q. We think this new experiment strengthens the main conclusion that the tumor promoter PMA increases macropinocytosis. Thank you.

      2) While the epistatic analyses of WNT and macropinocytosis are clear in frog, the causal link in CRC cells is contained to b-catenin accumulation. While is clear that macropinocytosis reduces spheroid growth in SW480, the lack of rescue experiments with e.g., constitutive active b-catenin or any other WNT perturbation or/and APC re-introduction, limit the conclusions of this experiment.

      We now provide new experiments in 3T3 cells treated with LiCl, overexpression of constitutively-active β-catenin and constitutively-active Lrp6 (Figure 4, panels I through L’’); the new results indicate that Wnt signaling activation increases protein levels of the macropinocytosis activator Rac1.

      Minor comments:

      3- Different compounds targeting membrane trafficking are used to rescue modes of WNT activation (Wnt8 vs LiCl) in Xenopus.

      The main goal of our experiments was to test the requirement of membrane trafficking for tumor promoter activity through the Wnt pathway. We therefore used PMA, and a variety of inhibitors such as EIPA (Na+/H+ exchanger, Figure 1I and Figure 3D), Bafilomycin A (Figure 1H), DN-Rab7 (Figure 3G) and EHT1864 (a Rac1 inhibitor, Figure 4G). One could argue that using a wide variety of membrane trafficking inhibitors is a plus.

      4- The abstract does not state the results in CRC/xenografts

      We have added a sentence to the abstract.

      5- Labels of Figure 2E might be swap

      Thank you for detecting this error, we now label the last two columns in Figure 2E correctly.

      6- Figure 4i,j, 6 and s4 rely on qualitative analyses instead of quantifications, which underscores their evaluation. On the other hand, the detailed quantifications in Figure S3A-D strongly support the images of Figure 5

      The quantifications of the previous Figure 4I-J supported the data in the initial reviewed preprint, shown in Author response image 1:

      Author response image 1.

      However, these data have now been deleted from this version to make space for new experiments showing the stabilization of Rac1 by stabilized β-catenin and CA-LRP6. Quantifications in Figure 6C-F’’ are not shown because they represent changes in subcellular localization, but a western blot is provided in Figure 6B. Quantifications for Figure 6H-I’’ are shown in panel 6G. Supplemental Figure S4 already has 24 panels so introducing quantifications would be unwieldy.

      Thank you for the thoughtful comments.

      Reviewer #2 (Public Review):

      Tejeda Muñoz et al. investigate the intersection of Wnt signaling, macropinocytosis, lysosomes, focal adhesions and membrane trafficking in embryogenesis and cancer. Following up on their previous papers, the authors present evidence that PMA enhances Wnt signaling and embryonic patterning through macropinocytosis. Proteins that are associated with the endo-lysosomal pathway and Wnt signaling are co-increased in colorectal cancer samples, consistent with their pro-tumorigenic action. The function of macropinocytosis is not well understood in most physiological contexts, and its role in Wnt signaling is intriguing. The authors use a wide range of models - Xenopus embryos, cancer cells in culture and in xenografts and patient samples to investigate several endolysosomal processes that appear to act upstream or downstream of Wnt. A downside of this broad approach is a lack of mechanistic depth. In particular, few experiments monitor macropinocytosis directly, and macropinocytosis manipulations have pleiotropic effects that are open alternative interpretations. Several experiments are confirmatory of previous findings; the manuscript could be improved by focusing on the novel relationship between PMA-induced macropinocytosis and better support these conclusions with additional experiments.

      New additional experiments focusing on the role of PMA are now provided.

      The authors use a range of inhibitors that suppress macropinosome formation (EIPA, Bafilomycin A1, Rac1 inhibition). However, these are not specific macropinocytosis inhibitors (EIPA blocks an Na+/H+ exchanger, which is highly toxic and perturbs cellular pH balance; Bafilomycin blocks the V-ATPase, which has essential functions in the Golgi, endosomes and lysosomes; Rac1 signals through multiple downstream pathways). A specific macropinocytosis inhibitor does not exist, and it is thus important to support key conclusions with dextran uptake experiments.

      We used a wide range of inhibitors because the main idea is to show that membrane trafficking is important in Wnt and PMA activity. We would like to point out that the current experimental definition in the field of macropinocytosis, despite any caveats, is the ability to block dextran uptake with EIPA. Because inhibitors may not be entirely specific, we think using a broad approach to target membrane trafficking might be a plus. We now provide in Figure 1K-Q a new experiment showing that Wnt3a protein treatment increases dextran uptake and PMA stimulates this macropinocytosis in 3T3 cells. EIPA inhibited dextran macropinocytosis in the presence of Wnt and PMA (Figure 1N and 1Q). We also provide a time-lapse video of the rapid macropinocytic vesicles induction by PMA in SW480 CRC cells in which the plasma membrane is tagged (Supplemental Movie S1).

      The title states that PMA increases Wnt signaling through macropinocytosis. However, the mechanistic relationship between PMA-induced macropinocytosis and Wnt signaling is not well supported. The authors refer to a classical paper that demonstrates macropinocytosis induction by PMA in macrophages (PMID: 2613767). Unlike most cell types, macrophages display growth factor-induced and constitutive macropinocytic pathways (PMID: 30967001). It would thus be important to demonstrate macropinocytosis induction by PMA experimentally in Xenopus embryos / cancer cells. Does treatment with EIPA / Bafilomycin / Rac1i decrease the dextran signal in embryos? In macrophages, the PKC inhibitor Calphostin C blocks macropinocytosis induction by PMA (PMID: 25688212). Does Calphostin C block macropinocytosis in embryos / cancer cells? Do the various combinations of Wnts / Wnt agonists and PMA have additive or synergistic effects on dextran uptake? If the authors want to conclude that PMA activates Wnt signaling, it would also be important to demonstrate the effect of PMA on Wnt target gene expression.

      We now provide a new experiment showing macropinocytosis induction of PMA experimentally in cancer cells. CRC SW480 cells, despite having a mutant APC, are able to respond to PMA by further increasing TMR-dextran 70 kDa uptake over background within 1 hour (now shown in Figure S1):

      Investigating PKC and Calphostin C is outside of goals of this paper. With respect to final the point on the effect of PMA on Wnt target gene expression, this was shown in the context of the Xenopus embryo in Figure 1I (Siamois and Xnr3 are direct targets of Wnt).

      Author response image 2.

      The experiments concerning macropinosome formation in Xenopus embryos are not very convincing. Macropinosomes are circular vesicles whose size in mammalian cells ranges from 0.2 - 10 µM (PMID: 18612320). The TMR-dextran signal in Fig. 1A does not obviously label structures that look like macropinosomes; rather the signal is diffusely localized throughout the dorsal compartment, which could be extracellular (or perhaps cytosolic). I have similar concerns for the cell culture experiments, where dextran uptake is only shown for SW480 spheroids in Fig. S2. It would be helpful to quantify size of the circular structures (is this consistent with macropinosomes?).

      In response, we have deleted the TMR experiments in Xenopus embryos; they will be reinvestigated at a later time. With respect to macropinosome sizes in cultured cells, they are indeed large at the plasma membrane level (see new Supplemental Movie S1), but rapidly decrease in size once dextran is concentrated inside the cell. This can be visualized in the new experiments showing dextran vesicles in Supplemental Figure S1J-K and Figure 1K-P.

      In Fig. 4I - J, the dramatic decrease in b-catenin and especially in Rac1 after overnight EIPA treatment is rather surprising. How do the authors explain these findings? Is there any evidence that macropinocytosis stabilizes Rac1? Could this be another effect of EIPA or general toxicity?

      We now provide new evidence that Wnt signaling stabilizes Rac1. The old data relying on overnight EIPA treatment has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’).

      On a similar note, Fig. 6 K - L the FAK staining in control cells appears to localize to focal adhesions, but in PMA-treated cells is strongly localized throughout the cell. Do the authors have any thoughts on how PMA stabilizes FAK and where the kinase localizes under these conditions? Does PMA treatment increase FAK signaling activity?

      The previous Figure 6K-L’’ are now found in Supplementary Figure S4, panels C-D’’. The result is that FAK is greatly stabilized by overnight incubation with PMA. How this achieved is unknown, perhaps the result of increased macropinocytosis, but we do not wish to speculate in the main manuscript. We have not measured FAK activity, but the FAK inhibitor PF-00562271 strongly decreased β-catenin signaling by GSK3 inhibition (Figure 6J) and has strong effects in neural development that mimic inhibition of the early Wnt signal (new experiments shown in Figure 6K-L’’’). The results suggest that FAK activity affects Wnt signaling and dorsal development; the molecular mechanism of this interaction is unknown but worthy of future studies.

      The tumor stainings in Figure 5 are interesting but correlative. Pak1 functions in multiple cellular processes and Pak1 levels are not a direct marker for macropinocytosis. In the discussion, the authors discuss evidence that the V-ATPase translocates to the plasma membrane in cancer to drive extracellular acidification. To which extent does the Voa3 staining reflect lysosomal V-ATPase? Do the authors have controls for antibody specificity?

      It is true that Pak1 has multiple functions, yet it is essential for the actin machinery that drives macropinocytosis. We have now rephrased the discussion to say “Rac1 is an upstream regulator of the Pak1 kinase required for the actin machinery that drive macropinocytosis (Redelman-Sidi et al., 2018)”. We also explain that: “V-ATPase has been associated with acidification of the extracellular milieu in tumors (Capecci and Forgac, 2013; Hinton et al., 2009; Perona and Serrano, 1988). Extracellular acidification is probably due to increased numbers of lysosomes which are exocytosed, since V0a3 was located within the cytoplasm in advanced cancer or xenografts in mice (Figures 5I and S3I)”. The antibody we used for V0a3 is highly specific and has been used widely (Ramirez et al., 2019).

      Reviewer #3 (Public Review):

      The manuscript by Tejeda-Munoz examines signaling by Wnt and macropinocytosis in Xenopus embryos and colon cancer cells. A major problem with the study is the extensive use of pleiotropic inhibitors as "specific" inhibitors of macropinocytosis in embryos. It is true that BafA and EIPA block macropinocytosis, but they do many other things as well. A major target of EIPA is the NheI Na+/proton transporter, which also regulates invasive structures (podosomes, invadopodia) which could have major roles in development. Similarly, Baf1 will disrupt lysosomes and the endocytic system, which secondary effects on mTOR signaling and growth factor receptor trafficking. The authors cannot assume that processes inhibited by these drugs demonstrate a role of macropinocytosis. While correlations in tumor samples between increased expression of PAK1 and V0a3 and decreased expression of GSK3 are consistent with a link between macropinocytosis and Wnt-driven malignancy, the cell and embryo-based experiments do not convincingly make this connection. Finally, the data on FAK and TES are not well integrated with the rest of the manuscript.

      The criticism that drugs are not entirely specific is a valid one. Our approach of using a variety of drugs such as EIPA, BafA, EHT1864 or FAK inhibitor PF-00562271 all point to the main conclusion that the membrane trafficking is important in signaling by Wnt and the action of the tumor promoter PMA. The data on FAK, TES and focal adhesions have been better integrated in the manuscript and new experiments on the effect of FAK inhibitor in embryonic dorsal development are now provided (Figure 6K-L’’’).

      1) The data in Fig. 1A do not convincingly demonstrate macropinocytosis - it is impossible to tell what is being labeled by the dextran.

      In response, we have deleted the TMR-dextran experiments in Xenopus embryos; they will be reported at a later time.

      2) The data in Fig. 2 do not make sense. LiCL2 bypasses the WNT activation pathway by inhibiting GSK3. If subsequent treatment with BafA blocks the effects of GSK3 inhibition, then BafrA is doing something unrelated to Wnt activation, whose target is the inhibition/sequestration of GSK3. While BafA might block GSK3 sequestration by inhibiting MVB function, it should have no effect on the inhibition of GSK3 by LiCl2.

      We now explain in the main text describing Figure 2 in the results, the initial effect of GSK3 inhibition by LiCl is to trigger macropinocytosis (Albrecht et al., 2020). If the downstream acidification of lysosomes is inhibited, then the brief treatment with LiCl (7 min at 32-cell stage) has no effect (LiCl 1st+BafA 2nd, Figure 2H). BafA inhibits lysosomal acidification at 32-cell stage resulting in ventralization, but the effect of brief BafA treatment can be reversed by inducing membrane trafficking by LiCl (BafA 1st+LiCl 2nd, Figure 2C). The labelling of the figure panels C and H has been modified to indicate this is an order-of-addition experiment. These order-of-addition experiments strongly support the proposal that endogenous lysosomal activity is required to generate the initial endogenous Wnt signal that takes place at the 32-cell stage of development (Tejeda-Muñoz and De Robertis, 2022a).

      3) The effect of EHT on MP in SW480 cells is not clearly related to what is happening in the embryos. The nearly total loss of staining for Rac and -catenin after overnight EIPA does not implicate MP in protein stability - critical controls for cell viability and overall protein turnover are absent. Inhibition of WNT signaling might be expected to enhance -catenin turnover, but the effect on Rac1 is surprising. A more quantitative analysis by western blotting is required.

      The results from SW480 cells inhibition by EIPA have been replaced in Figure 4. We now provide new evidence in 3T3 cells that Wnt signaling stabilizes Rac1. The old data relying on EIPA treatment in SW480 cells has been replaced by new experiments in 3T3 cells showing (i) that LiCl treatment increases levels of Rac1 protein and β-catenin levels (Figure 4I-J’’), (ii) that cells transfected with constitutively active β-catenin-GFP have higher levels of Rac1 than control untransfected cells (Figure 4K-K’’) and (iii) that Rac1 is stabilized in cells transfected with CA-Lrp6-GFP when compared to untransfected cells (Figure4L-L’’). In the original EIPA experiment in SW480 cells, now deleted from this version of the manuscript, we tested the cell viability using a Vi-Cell Beckman-Coulter Viability Analyzer and found that cells were 96-98% viable but proliferation was strongly decreased after 12 h of EIPA treatment. The effect of brief Rac1 inhibition (7 min) in decreasing dorsal development in embryos at the critical 32-cell stage is robust (Figure 4A-C). In addition, coinjection of EHT is able to entirely block the effects of microinjected xWnt8 mRNA (compare Figure 4E to 4G, see also Figure 4H), suggesting that Rac1 is required for Wnt signaling. Quantitative target gene expression analysis is provided for the embryo experiments (Figure 4C and 4H); for the stabilization of Rac1 by Wnt we are not providing quantitative measurements, but found similar results with 3 independent approaches (LiCl, CA-β-catenin and CA-Lrp6).

      4) The data on FAK inhibition and TES trafficking are poorly integrated with the rest of the paper.

      We attempted to better relate the TES trafficking to our previous paper showing that canonical Wnt signaling induces focal adhesion and Integrin-β1 endocytosis. We now write in the results: “We have previously reported a crosstalk between the Wnt and focal adhesion (FA) signaling pathways. Wnt3a treatment rapidly led to the endocytosis of Integrin β1 and of multiple focal adhesion proteins into MVBs (Tejeda-Muñoz et al., 2022). FAs link the actin cytoskeleton with the extracellular matrix (Figure 6A), and we now investigated whether FA activity is affected by Wnt signaling, PMA treatment and CRC progression”.

      Reviewer #3 (Recommendations For The Authors):

      The reliance on pleiotropic inhibitors is a weakness and should be supplemented by genetic approaches to inhibit macropinocytosis.

      We agree, but that would be outside of the scope of this study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful assessment of our work and their valuable critiques which we will address in the “Recommendations for the authors” section below. In particular, we appreciate Reviewer #3 noting the value of the C. elegans model system and our efforts to bridge models with our study. We agree with the reviewer that there is a need to clarify the rationale, presentation and interpretation of our results. We have substantially revised the text in our manuscript and Figure legend to address this issue, and provided extensive new commentary and citations to lay out the logic behind our experiments. Indeed, it was our oversight not being more thorough about this initially. We have further adjusted our conclusions to be less unequivocal. Finally, we added an RPM-1 signaling diagram (Fig. 8A) to more clearly annotate the players in the RPM-1/MYCBP2 signaling network that were evaluated genetically in Fig. 8. Importantly, we provide clearer commentary on how genetic enhancer effects with known RPM-1 binding proteins and the absence of genetic suppression in vab-1/Eph receptor double mutants with components of the RPM-1/FSN-1 ubiquitin ligase complex are consistent with the biochemical finding that MYCBP2 stabilizes but does not degrade EphB2. Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      Following extensive discussions between the three reviewers, all three agree that the C. elegans data, as presented, does not add to, and in fact might harm, your bottom line. Our combined suggestion is to take this data out unless you plan to improve it substantially. All reviewers are perplexed by Figure 2F and the presumed interactions of cytosolic proteins with the extracellular domain of EPHB2. At the very least, please provide some suggestions/model/interpretation.

      We have adjusted our manuscript substantially to address this. Please see detailed comments in the individual Reviewer sections below.

      We would like to thank the reviewers for their thorough examination of our manuscript, constructive criticisms, and helpful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      The work is extensive in my view, and mostly of high quality. See minor comments on some of the figures below.

      Thank you very much.

      Two more major comments :

      • I don't think the C. elegans work adds to - in fact I think it hurts - the statement that this regulatory mechanism is specific to EphB2. I would advise the authors to take it out.

      We agree that C. elegans has a sole Eph receptor called VAB-1 and is therefore not a specific model for EPH2B. However, testing MYCBP2 specificity for EPHB2 was not the goal or our perceived value for the C. elegans experiments. We now clarify this in the text of the Results section.

      Rather, we are providing evidence that the C. elegans ephrin receptor interacts genetically with known MYCBP2/RPM-1 binding proteins. Moreover, we now provide an extensive array of citations to note that genetic enhancer interactions between different RPM-1/MYCBP2 binding proteins is well established. The reviewer has nicely highlighted for us that we handled the C. elegans genetics in too cursory a fashion in our original manuscript. We appreciate this being noted and have now aimed to make this substantially clearer. We hope the reviewer agrees that our revised C. elegans section accomplishes this goal.

      Furthermore, we extensively revised the text of the Results to emphasize a key point: our observation that axon termination defects are not suppressed in vab-1; fsn-1 and vab-1; rpm-1 double mutants excludes the possibility that the VAB-1 Eph receptor is a substrate that is inhibited or degraded by the RPM-1/FSN-1 ubiquitin ligase complex. If the VAB-1 Eph receptor were ubiquitinated and degraded by the RPM-1/FSN-1 complex, we would have observed a suppression of phenotype in vab-1; rpm-1 double mutants. The precedent for this genetic relationship between the RPM-1 ubiquitin ligase and its substrates that are degraded has been established by several prior studies (PMID: 15707898; PMID: 31676756; PMID: 35421092). We now more clearly note that the absence of genetic suppression in vab-1; rpm-1 double mutants and vab-1; fsn-1 double mutants is consistent with the non-canonical stabilizing role of MYCBP2 on EPHB2 that was observed in our biochemical experiments with mammalian cells.

      We also adjusted the text of the manuscript to stress that we are testing genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This is a key point, as genetic enhancer interactions are consistent with the Eph receptor functioning in the RPM-1 signaling network. This concept has been well established for RPM-1 binding proteins as now noted in our revised text with an extensive number of additional citations to published work.

      Based on the above arguments, we respectfully disagree with the reviewer that our C. elegans data should be removed from the paper. To re-iterate, we are not trying to evaluate specificity for MYCBP2 and EPHB2 in C. elegans. Rather, our goals are twofold: 1) To ask whether there is an evolutionarily conserved functional genetic link between Eph receptors and known RPM-1 binding proteins. 2) To provide further in vivo genetic evidence invalidating the hypothesis that Ephrin receptors could be ubiquitination substrates that are inhibited/degraded by MYCBP2.

      Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      • The cellular responses are not robust and the effects of MYCBP2 KO - although significant - are minor in most cases. But I don't think more experiments will help here.

      We interpret the comment about the robustness to mean that the extent to which a given cellular response is affected by the loss of MYCBP2 is minor. First, the cellular responses themselves are typical of previous studies and depend on the cellular biology underlying them. For example, a growth collapse of ~50-60% over a background of 10% (Fig. 7) is typical for these sorts of assays (PMID: 37369692; PMID: 33972524; PMID: 17785182). A decrease of cell area by ~25% (Fig. 3) is quite substantial if one considers how much of a cell’s volume is taken up by the nucleus and organelles. Second, the phenotypes elicited by the loss of MYCBP2 are likely brought on by a decrease in EphB2 protein levels, but not its complete absence, as suggested by our biochemical experiment. Given that EphB2 complete loss only affects the cellular responses to a limited extent, the minor effects are not a surprise (e.g. for GC collapse: PMID: 23143520). Nevertheless, the subtle changes in cellular phenotypes, elicited by EPHB2 signaling are often sufficient to achieve proper cell positioning and cell response to guidance cues. For instance, regulation of the growth cone collapse of the outgrowing axons requires delicate changes that are dynamic and temporal.

      Minor:

      Fig 1C - EPHA3 and EPHB2 seem to run in different sizes, is this the case? In 2A they run at the same size.

      We believe this size discrepancy is due to different percentages of SDS-PAGE gels used to resolve proteins. In Fig. 1C, we used a 6% gel for a Western blot analysis of both EPHA3/-B2-FLAG (~130 kDa) and MYCBP2 (~510 kDa). In Fig. 2A however, we performed Western blot analysis using 10% resolving gel to separate and detect EPHA3/-B2-FLAG along with MYC-FBXO45 (~30 kDa). We have reviewed the results obtained from additional biological replicates of this experiment, and observed a similar pattern in gel migration of EPHA3/-B2-FLAG across all replicates.

      Fig1F - I can't trust the MYCBP2 blot.

      Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the results replace the previous Fig. 1F panel as mentioned on line 158.

      In Fig2b the authors claim that there is enhancement in the binding of MYCBP2 and EPHB2 upon FBXO45 expression. For this type of statement quantification is required.

      The quantification is now included in Fig. 2C and its significance is mentioned on line 180. Our conclusion about the enhancement stands.

      Fig2G - it remained unclear to me where the binding site to MYCBP2 is, how long is the cytoplasmic tail in the DeltaICD protein?

      Based on our experimental observations from Fig. 2E-H, we concluded that the fragment encompassing the extracellular domain(s) and/or transmembrane (TM) domain of EPHB2 is necessary for the protein complex formation with MYCBP2. We would like to accentuate that the EPHB2-MYCBP2 interaction might not be direct, and might involve other transmembrane protein(s) acting as a scaffold for EPHB2 and MYCBP2 binding. We did not pursue experiments to determine the exact region of the extracellular-TM portion of EPHB2 that is required for the interaction with MYCBP2.

      The cytoplasmic tail in ΔICD protein consists of 25 aa of the N-terminal fragment of EPHB2 juxtamembrane (JM) region, which is adjacent to the TM helix, and followed by the 8 aa FLAG tag (EPHB2 ΔICD domain composition: extracellular domains – TM domain – 25 aa fragment of JM region – FLAG). We have determined the TM and JM sequences based on Hedger et al. (PMID: 25779975) and included the N-terminal portion of the JM region to facilitate proper ΔICD protein localization within the plasma membrane (PMID: 35793621). We modified the schematic in Fig. 2G to better visualise the EPHB2 truncations and now provide information on their size in the figure legend.

      Always good to have a model of how all these proteins work together.

      While we acknowledge that this would be helpful, we do not have a clear answer on how the EPHB2-MYCBP2 complex formation occurs. This requires further elucidation of the putative proteins involved in this ternary complex or testing the possibility that a MYCBP2 fragment is extruded extracellularly. Without these experiments there are too many possibilities to summarise into a clear model figure. We thus did not make any edits regarding these possibilities in the section starting on line 195.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the experiments are classical experiments of co-immunoprecipitations, swapping experiments, collapse assays, and stripe assays which all are well carried out and are convincing.

      Thank you for your encouraging comments.

      Controls for the stripe assay may include Fc / Fc stripe assays.

      We have performed these control experiments and now include their quantifications in the results sectioning concerning Fig. 3, starting on line 249, and those concerning Fig. 6 on line 381.

      It is not clear to me why SD and not SEM has been used here for presentations.

      Standard deviation (SD) measures the dispersion of a dataset relative to its mean. The standard error of the mean (SEM) measures how much discrepancy is likely in a sample’s mean compared with the population mean. Thus, SEM includes a statistical inference about the sampling distribution while SD is a less “processed” measurement that by definition is larger than SEM. SEM might make the data look less dispersed and many journals encourage the use of SD in bar graphs (PMID: 16223828).

      Fig 7A: it is rather difficult to see 'branches' in Fig. 7A, better pictures and close-ups should be provided. How are branches defined? This piece of work needs more attention.

      To remedy this shortcoming, we now provide inverted images with GFP signal in dark pixels overlaid on Fc (white) / eB2 (pink) stripes next to the original images.

      Reviewer #3 (Recommendations For The Authors):

      1) My most important suggestion to the authors would be to more carefully describe the results and their interpretation of the results. Sometimes, the distinction is not clear.

      We modified the text throughout the manuscript to address this.

      2) There are several cases, when the authors report on trends that are not statistically significant (1D, for example), or report no change, when it is clear that the addition of one more sample could have dramatically made a difference (4M - see point 12).

      We agree that some of the nonsignificant differences could become significant if we added more Ns. But we prefer not to move our experimental design towards N-chasing and p-hacking (PMID: 25768323). The number of biological replicates is normally pre-determined before the onset of the experiment. Of course, some replicates can be discarded if there is a valid reason, such as a technical issue with the experiment or a positive control not working but this is not relevant for the dataset we have provided.

      3) Data in 1F is very difficult to interpret.

      As in response to Reviewer #1: Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the improved results are in revised Fig. 1F.

      4) Figure 2 puts Figure 1 in a strange perspective. If I understand correctly, fig 2 claims that EPHB2 interaction with MYCBP2 depends on FBXO45 - if that is the case then how does the binding in Figure 1 occur?

      Indeed, we propose that the EPHB2-MYCBP2 interaction depends on FBXO45. In Fig. 2, we reveal that FBXO45 enhances the formation of the EPHB2-MYCBP2 complex. Thus, we suspect that the endogenous FBXO45 present in HeLa cells and neurons would mediate the interaction between EPHB2 and MYCBP2 in Fig. 1 experiments. We were unable to show this by Western blotting due to lack of reliable commercial antibodies against FBXO45, the complex containing endogenous FBXO45 and EPHB2 is also implied by our AP-MS data (Fig. 1B) and published databases.

      5) I am still trying to wrap my mind around the results in 2G-H. So do MYCBP2 and FBXO45 bind the extracellular domain of EPHBP2? What does that mean?

      (see also our response to Reviewer #1, end of their section) Based on our experimental observations from Fig. 2G-H, we conclude that the fragment encompassing the extracellular domain(s) and/or transmembrane domain of EPHB2 is necessary for the protein complex formation with MYCBP2 and FBXO45. Although there is a possibility that MYCBP2 directly binds the extracellular portion of EPHB2, we have not formally tested this hypothesis. MYCBP2 has been previously shown to interact with the extracellular portion of transmembrane N-cadherin (CDH2) via BioID proximity labeling and AP-MS proteomics approaches (PMID: 32341084).

      Considering the results in Fig. 2A-B, we suspect that EPHB2-MYCBP2 interaction is indirect, as FBXO45 enhances this association. Secretion of FBXO45 and direct binding of FBXO45 to the extracellular cadherin (EC1-2) domains of N-cadherin has been documented (PMID: 25143387; PMID: 32341084). Although, not tested, this is also a possibility for EPHB2-FBXO45 mode of interaction. Nevertheless, we also cannot rule out the possibility that an unknown transmembrane protein binds EPHB2 extracellularly and the same unknown protein binds MYCBP2/FBXO45 intracellularly. Resolving this model is beyond the scope of this study and will require us to pursue extensive new lines of investigation.

      6) I don't understand the stable Hela cell line CRISPR - is this a stable MYCBP2 deletion? In which case why is there only a reduction, not complete elimination of the protein? Or, is this a stable integration of a plasmid generating gRNA against MYCBP2? In which case, I would expect a homozygous null to emerge at some point. In any case, this is not well explained.

      These lines are not derived from single cells infected with the CRISPR sgRNA-carrying viruses, therefore they are not clonal and probably contain some cells that express normal levels of MYCBP2, hence its detection on a Western. This is now clarified starting on line 221 and on line 608.

      7) In 3C - is this the right statistical analysis?? I would say you want to claim the different effect of the control +/- eB2 compared to the effect in the mutant +/- eB2. Still should be significant but I think a more correct analysis.

      We now include this comparison in Fig. 3C as well in the results section starting on line 234.

      8) The robustness of the assay in Figure 3D is underwhelming – how was the area measured?

      This is a live imaging experiment. Fig. 3D plots cell area at 60 minutes after ephrin-B2 addition as a fraction of the same cell’s area at 0 minutes (ephrin-B2 addition). For control cells that is a decrease of ~25%. If one considers that a cell’s nucleus and organelles like the Golgi Apparatus take up most of its volume, the magnitude is not that surprising.

      9) Figure 3F – did you try to plot the relative area of overlap divided by the total cellular area? You might get a more striking phenotype. Also – claiming that this confirms that MYCBP2 is REQUIRED for EPHB2 function is a bit overstated, especially given that we don’t know (do you?) the EPHB2 mutant phenotype in this assay.

      We preferred to stay with the original method of image quantification which we use for other assays. With respect to the requirement of MYCBP2 for EPHB2 function in the stripe assay, our logic is rooted in the observation that native HeLa cells do not respond to ephrin-B2 stripes (45.46 ± 7.62% of cells on eB2 stripes v. Fc; data not shown). When they are transfected with EPHB2 expression plasmids they do, therefore we assume that EPHB2 expression endows them with a sensitivity to eB2 stripes. A loss of MYCBP2 attenuates this sensitivity. We clarified this starting on line 246 and on line 251.

      10) I didn't quite get the difference between 4A and 4B.

      We apologize for the confusion. In Fig 4A, we used a stable HeLa cell line that has tetracycline-inducible expression of EPHB2-FLAG. Using these cells, we subsequently generated CTRLCRISPR or MYCBP2CRISPR cells. In these cells we then induced EPHB2 expression with tetracycline and observed that deletion of MYCBP2 resulted in the reduction of EPHB2 protein levels. To confirm this observation and to rule out the possibility that EPHB2 protein reduction is an effect of the CRISPR lines generation, we tested whereas MYCBP2 deletion reduces EPHB2, which has been transiently overexpressed (Fig. 4B). We hence conclude that loss of MYCBP2 decreases EPHB2 that was either expressed from a stable locus (Fig. 4A) or from transient transfection (Fig. 4B). We modified the Results section starting on line 262 to make this point clear.

      11) The entire link to lysosomal degradation should be strengthened. Perhaps I am confused, but if the reduced EPHB2 levels in MYCBP2 mutant cells result from impaired lysosomal degradation then inhibiting the lys-deg should bring the protein levels back to normal (i.e. CRISPR control) - no? As currently presented, I do not understand nor do I think the claim is strongly supported by the data.

      Before treatment with inhibitors, EPHB2 levels in MYCBP2CRISPR cells are already 40% lower than they are in CTRLCRISPR cells and in all our attempts, inhibitors can only rescue/restore EPHB2 in MYCBP2CRISPR cells to a level that is lower than in CTRLCRISPR cells. But this restoration is greater in MYCBP2CRISPR than in MYCBP2CTRL cells (BafA1: 19% increase in CTRL cells and 40% in MYCBP2CRISPR cells; CoQ: 10% comparing to 35%). This indicates that EPHB2 degradation through the lysosomal pathway in MYCBP2CRISPR cells is stronger, explaining why EPHB2 degradation is promoted in MYCBP2CRISPR cells, compatible with reduced EPHB2 levels and enhanced EPHB2 ubiquitination.

      12) 4M, O - reporting ns based on these data seems a bit strange to me... Add one point and it will be strongly significant.

      See our response to point (2), above. We prefer not to invoke potential p-hacking.

      13) 7d - so what are you claiming? That the cellular response to eB1 but not eB2 is affected by the addition of FBD1? this is almost the opposite of what you wrote in the text...

      We treated the cells with two different ephrin-B ligands to make a stronger conclusion. When using ephrin-B1, growth cone collapse in FBD1 WT is not significant comparing to Fc treatment. When using ephrin-B2, growth cone collapse in FBD1 WT is not as significant as it is in FBD1 mut group (* versus ). We interpret this as meaning that the EPHB2-mediated growth cone collapse to both ligands is dampened, when we disrupt the EPHB2-MYCBP2 association. The difference between these two ligands might be due to their different affinities for the receptor or signalling kinetics.

      14) By far the weakest link in this paper is the worm part. I think it's a pity because strengthening this would affect the significance of the finding. First, the authors mention new genes without introducing their relationship to the signaling pathway tested. Second, the textual logics should be strengthened. Finally and most importantly, when the difference between the phenotypic severity is so strong (vab-1 and rpm-1) then I think it's impossible to say anything from the double mutant.

      We appreciate the reviewer noting that they appreciate the value and importance of the C. elegans model. The goals of our C. elegans experiments were twofold:

      1) To evaluate genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This was not clearly explained in the original manuscript nor was the published precedent for these types of genetic enhancer experiments provided. We have now rectified this by substantially revising the text of the Results C. elegans section starting on line 431 and by adding several citations.

      2) Our C. elegans genetics confirmed that the VAB-1 Eph receptor is not inhibited/degraded by the RPM-1/MYCBP2 ubiquitin ligase complex. We have now revised the text to draw this point out more clearly.

      To further address the reviewer’s concerns, we have added a new schematic (Fig. 8A) to show the relationship between the RPM-1 and the RPM-1 binding proteins (FSN-1/FBXO45 and GLO-4/SERGEF) we are testing. We chose FSN-1 because it is part of the RPM-1 ubiquitin ligase complex and we chose GLO-4 because it functions outside the context of RPM-1 ubiquitin ligase signaling via the GLO-1 Rab GTPase to influence late endosomal/lysosomal biogenesis.

      Regarding the reviewer’s concern that different penetrance/frequency of defects between rpm-1 mutants and vab-1 mutants means outcomes with vab-1; rpm-1 double mutants cannot be interpreted. We respectfully disagree. An extensive number of published studies have demonstrated that RPM-1 binding proteins have milder phenotypes than rpm-1 mutants and display genetic enhancer effects as double mutants with one another (PMID:17698012, PMID: 22357847, PMID: 25010424, PMID: 24810406). We now make this point much more clearly. While the frequency of axon termination defects in rpm-1 mutants is high it is not completely saturated as the defect is not 100%. Moreover, a major point of the vab-1; rpm-1 double mutants is that they do not have a significant reduction in phenotypic penetrance/frequency. Thus, our system is fully capable of resolving genetic suppression, which did not occur. We now make this point much more carefully and clearly.

      To further address the reviewer’s concern, we have softened language about the VAB-1/Eph receptor functioning in the same pathway as RPM-1 throughout the manuscript. While we think this is still the case, because the frequency of axon termination defects is not fully saturated in rpm-1 mutants and defects could potentially become more severe (i.e. the hook might occur closer to the head of the animal rather than in the midbody). Nonetheless, this is not a critical point and we think it is more important to be clear about the two major goals and objectives of our C. elegans experiments. We hope the reviewer agrees that our rationale, logic and conclusions are more clearly and accurately drawn in the revised paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the main conclusions are well-evidenced, this paper would be further improved if the following concerns can be properly addressed.

      1) The key data to demonstrate the role of condensin in telomere disjunction is reduced telomere foci in cut14 mutants at the restrictive temperature (Fig 2A). However, this could be due to defected telomere declustering or failed separation of sister telomeres since authors suggested that condensin functions in both processes. To distinguish these, authors can directly measure the separation of sister telomeres using FISH or TETO-labelled telomeres.

      We now provide strong evidence for the role of condensin in telomere disjunction by simultaneously visualizing the behavior of centromeres 3L (imr3-tdTomato), Gar1-CFP (nucleolus), and telomeres 1L (Tel1-GFP) during mitotic progression (Figure S2B). As previously reported (Tada et al. 2011), we visualized the centromere of chromosome 3 by simultaneously inserting tetO repeats into the imr3 region (1093757-1094520 and 1094521-1095451 of chromosome 3) and expressing td-tomato fused to tetR. The left arm of telomere 1 was visualized by inserting lacO repeats into this telomeric region (9282-9805 and 9806-10254 of chromosome 1) and expressing green fluorescent protein (GFP) fused to LacI. With these additional data, we confirm that a cut14-208 mutant grown at non-permissive temperature exhibits a striking defect in the disjunction of Tel1L.

      Note, however, that such an experimental approach is not without risk, as it has been reported that LacO repeats tightly bound by LacI proteins form a barrier to the recoiling activity of condensin (PMID: 31204167). This is discussed further below in our response to point 2).

      2) To prove the defective telomere disjunction in condensin mutant is not due to failed transmission of pulling force from centromeres, the authors showed that Top2 inactivation has no effect on telomere disjunction (Fig 2E). However, this result contradicts a previous study in budding yeast (MBC, 2002, 13:632-645). This needs careful discussion. Moreover, it is puzzling why Top2 inactivation would not cause defective decatenation of telomeres.

      We thank the reviewer for bringing this apparent discrepancy to our attention. A likely explanation is that we monitored telomere separation using the shelterin protein Taz1 tagged with GFP, whereas in the study mentioned by the reviewer, the authors used LacO arrays inserted in the vicinity of TELV and bound by LacI-GFP. It has been shown in budding yeast that such a construct constitutes a barrier for the recoiling activity of condensin in anaphase (PMID: 31204167). Thus, this insertion of LacO/LacI arrays at TELV most likely created an experimental condition in which condensin activity at TELV was reduced, thereby revealing the otherwise dispensable contribution of Topo II. This is now mentioned in the Discussion section as follows:

      Our results do not rule out the possibility that Topo II contributes to telomeres disentanglements, but nevertheless imply that Topo II catalytic activity is dispensable for telomere separation provided that condensin is active. The close proximity of DNA ends could explain Topo’s dispensability. It has been reported in budding yeast that the segregation of LacO repeats inserted in the vicinity of TelV is impaired by the top2-4 mutation (Bhalla et al. 2002). At first sight, this appears at odds with our observations made using the telomere protein Taz1 tagged with GFP. However, since LacO arrays tightly bound by LacI proteins constitute a barrier for the recoiling activity of condensin in anaphase (Guérin et al. 2019), the insertion of such a construct might have created an experimental condition in which condensin activity was specifically impaired at TELV, hence revealing the contribution of Topo II.

      In addition, we would like to point out that the telomere structure in budding yeast and fission yeast is significantly different. Budding yeast protects its telomeres via two independent factors, Rap1 and the Cdc13-Stn1-Ten1 complex, whereas in fission yeast Taz1 and Pot1 are bridged by a complex protein interaction network (Rap1-Poz1-Tpz1). This is a remarkable conserved structural feature between the shelterin of S. pombe and the human shelterin. Recently the group of M. Lei showed that some of the telomeric components of S. pombe can dimerize leading to a higher complex organization of the shelterin (Sun et al., 2022). It is likely that dimerization of Taz1, Poz1, and the Tpz1-Ccq1 subcomplex may also contribute to the clustering of sister and non-sister chromatid telomeres. The architectural differences in telomere organization between budding and fission yeast may require different mechanisms to properly segregate telomeres during mitosis.

      3) The authors claimed that the reduced telomere disjunction in condensin mutants is because compromising condensin function defects the resolution of cohesin-mediated cohesion of sister telomere. The evidence is that cohesin's inactivation remedied telomere disjunction defect in condensin mutants (Fig 6A). However, there could be an alternative explanation: abnormal telomere structure caused by defective condensin might lead to the entanglement of sister telomeres, which requires telomere cohesion. If cohesin is inactivated before the G2 phase, which is the likely case in this experiment, the entanglement would not happen. To distinguish these, the experiment in Fig 6 can be repeated using G2-synchronised cells.

      The hypothesis raised by the reviewer is certainly relevant. To test this possibility, we purified cut3-477 and cut3-477 rad21-K1 mutant cells in early G2 using a lactose gradient. After cell selection of the two mutants grown at permissive temperature, the entire cell population was in G2 (0% of cells in mitosis or cytokinesis). After releasing the cells to the non-permissive temperature of 36°C, we measured the number of telomeric foci as a function of spindle size as the cells entered the first mitosis. The results presented in Figure S6 confirm that cohesin inactivation in G2 cells is able to complement the telomere disjunction defects of a condensin mutant.

      4) The authors further revealed that compromising condensin function leads to overaccumulation of cohesin at the telomere (Fig 6B). Then they proposed that condensin counteracts cohesin at telomeres. However, the over-accumulated telomeric cohesin was observed at the G2 phase (t=0 min, Fig 6B) in the condensin mutant. At this stage, cells were grown at the permission temperature, and condensin activity is expected to largely remain (Fig 2A). The subsequent inactivation of condensin didn't further increase the telomeric association of cohesin (t=30 min, Fig 6B). Moreover, condensin does not bind telomeres at G2 phase (1B). It is difficult to reconcile the scenario that condensin would inhibit cohesin telomere association even though condensin is absent.

      We suspect that there was a misunderstanding because T=0 min in Figure 6B corresponds to cells arrested in G2 and shifted to 36°C while still arrested, as mentioned in the original text "Cells were arrested at the G2/M transition, shifted to the restrictive temperature and released into a synchronous mitosis (Figure 6B)".

      However, this experimental setup has been made clearer in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Further analysis of the telomere segregation foci data could provide additional support for the claim that condensin promotes the uncoupling of telomeres (vs telomere disjunction), in addition to the hiC data presented in Fig 3. The observation that many data points in Figure 2 have less than six foci ( often 2-4) suggests that this data not only shows a defect in disjunction but also in telomere uncoupling. If somehow the two defects could be unpicked in the dataset that would be beneficial to their argument.

      We agree with the reviewer that our data show not only a defect in disjunction but also in telomere uncoupling (confirmed with HiC). We now provide new microscopy data showing the role of condensin in telomere disjunction (as opposed to uncoupling) by simultaneously visualizing the behavior of the centromere 3 (imr3-tdTomato), nucleolus (Gar1-CFP) , and telomere 1L (Tel1-GFP) during mitotic progression (Figure S2B). We confirm that the cut14208 mutant grown at non-permissive temperature has a striking defect in telomere disjunction as opposed to centromere disjunction.

      Reviewer #3 (Recommendations For The Authors):

      The experiments are robust, and the results are well described. However, it should be explicitly stated that the main finding that condensin is needed for chromosome end disjunction could have been anticipated from previous studies (as outlined below). Its novelty does not need to be overstated.

      1) Reyes et al. (2015) previously demonstrated that sister telomere disjunction requires the Aurora B kinase. They also showed that a phosphomimic condensin allele reinstates sister telomere disjunction in cells lacking Aurora B, indicating that condensin is likely the target activated by Aurora B and the primary driver of sister telomere disjunction.

      2) Berthezene et al. (2020) previously revealed the requirement of condensin for sister telomere disjunction during the first meiotic division (Meiosis I).

      3) The Tanaka group described in 2010 the role of condensin in promoting sister chromatid separation by antagonizing residual cohesin during anaphase (DOI 10.1016/j.devcel.2010.07.013). This study should be cited and discussed.

      The novelty of our study resides in the fact that we now provide evidence that condensin contributes to TEL separation in cis, and not through the recoiling of chromosome arms, which had not been previously addressed in our previous manuscripts (Reyes et al. 2015, Berthezene et al. 2020).

      We have now added and discussed the reference from Tanaka's group.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This paper provides valuable information regarding visuospatial working memory performance in patients with MS compared to healthy controls, using a relatively novel continuous measure of visual working memory. There are some weaknesses with the way the clinical groups were matched, but those limitations are acknowledged and the strength of evidence for the claims is otherwise convincing. The paper will be of interest to those working in the field of clinical neuroscience.

      We are grateful to the editors and reviewers for their careful review of our manuscript and their dedicated time and effort. Their valuable feedback has been instrumental in improving the quality of our work.

      Reviewer #1 (Public Review):

      This study compares visuospatial working (VWM) memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to disentangle different contributions to overall performance. The results identify a specific decrease in the precision of VWM recall in MS, although the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out.

      Although we try to address this matter by clinical screening, as the reviewer mentioned, the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out. Therefore, in future studies, including a control condition matched to the experimental paradigm where only the memory components are removed is needed to elucidate this issue.

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures. This represents an advance beyond prior work in this area.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined. The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease-modifying drugs, for example. The present study does not compare the continuous-report testing with a discrete measure task so it is unclear whether the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer brought up an important point, but as they stated, it was not the focus of our current study. Nevertheless, it is a valuable suggestion for future research to compare continuous with discrete measure paradigms to assess their sensitivity and feasibility in the MS population.


      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their thorough reading of this manuscript and valuable suggestions. We appreciate the time and effort they have put into this manuscript to provide feedback for improving our work. Based on their comments, we carefully considered their suggestions and revised the manuscript to address their concerns. Our one-by-one response to reviewer comments is as follows.

      Reviewer #1 (Public Review):

      This study compares visuospatial working memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to potentially disentangle different contributions to overall performance. This aim is met in part, but there are some problems with the authors' interpretation of their findings:

      1) How can the authors be confident the performance deficits in the patient groups are impairments of working memory and not visual or motor in nature? I appreciate there was some kind of clinical screening, but it seems like there should have been a control condition matched to the experimental tasks with only the memory components removed.

      We appreciate the reviewer’s concern regarding the potential confounding effects of visual or motor impairment on the outcomes of our study.

      While we acknowledge that a control condition with only the memory components removed could have further strengthened our results, we did not include one, which is a limitation of the current study.

      To address this limitation, we conducted clinical screening to ensure that the observed deficit was due to working memory impairment and not visual or motor in nature. As part of the expanded disability status scale (EDSS) evaluation, we did not include individuals with issues such as visual acuity, visual field, and extraocular movement impairment, scotoma, nystagmus, and tremors in the upper extremity, which could interfere with the study. Moreover, participants were screened using the 9-Hole Peg Test (9-HPT) before entering the study. These evaluations helped us to ensure that participants with impaired visual or motor performance, which could potentially confound the study, were not included. Our effort to remove the confounding factors with clinical screening provided additional insight into the interpretability of the results. We have updated our inclusion/exclusion criteria accordingly and included this limitation in our discussion.

      2) The participant groups are large, which is definitely a strength, but not particularly well-matched in terms of demographics, with notable differences in age (mean and spread), years of education and gender. These could potentially contribute to differences in performance between groups and tasks.

      We appreciate the reviewer's comment and agree that a matched control group would be ideal. However, we addressed this issue using hierarchical regression analysis.

      Our study assessed visual working memory (VWM) resolution using two analog recall paradigms: the sequential paradigm with bar stimuli and memory-guided localization (MGL). While the demographic data of gender, age, and education in the MGL paradigm were matched between patients and control group, there was a significant difference in these factors between groups in the sequential paradigm.

      To address this issue, we performed hierarchical regression analysis to compare recall parameters in the sequential paradigm with 3-bar and 1-bar stimuli, respectively. We assessed for the confounding effect of gender, age, and education, and the results were presented in supplementary tables 3 and 5.

      In the sequential paradigm with 3-bar stimuli (high memory load condition), we found that all recall parameters were significantly different between groups. However, after adjusting for age and education, the result became insignificant for uniform response proportion. In the 1-bar paradigm (low memory load condition), while the results were significantly different between groups, after adjusting for gender, age, and education, target and uniform response proportions became insignificant (uniform proportion = 1 – target proportion, since there was no swap error in the 1-bar condition).

      3) The authors interpret the mixture model parameter described as "misbinding error" as reflecting failures of feature binding, and propose a link to hippocampus on that basis, however there is now quite strong evidence that these errors (often called swaps) are explained mostly or entirely by imprecision in memory for the cue feature (bar color in this case), e.g. McMaster et al. (2022), already cited in the ms.

      We thank the reviewer for this valuable comment regarding interpreting the mixture model parameter, described as a “misbinding error” in our study.

      Swap error has been attributed to different mechanisms, including the variability in cue feature dimension, cue-independent sources, and strategic guessing. As the reviewer mentioned, in a recent study by McMaster et al., a comprehensive evaluation of these hypotheses was performed and determined that the variability in cue feature dimension could solely explain the occurrence of swap error.

      In response to this comment, we have added a discussion of this matter, the neural correlates of swap error, and the possible explanation for this phenomenon in multiple sclerosis (MS) population to the seventh paragraph of the discussion. Additionally, since our study did not include neuroimaging assessment, we have discussed the results from neuroanatomical points of view to further explain the possible structures involved in the occurrence of swap errors in MS. The seventh and eighth paragraphs of the discussion have been revised for further clarification.

      4) The methodology of the ROC analyses should be described in more detail: it is not clear what measures are being used to classify participants or how.

      This matter is clarified in the results and the last paragraph of materials and methods. In both paradigms, recall error was used for classification purposes.

      5) There are a number of unusual choices of terminology that could potentially confuse or mislead the reader: The tasks are not "n-Back" tasks by the usual meaning: they are analog report tasks with sequential presentation. The terms recall "error", "variability", "precision" and "fidelity" are used idiosyncratically. Variability and precision usually refer to the same thing: they describe the dispersion or spread of errors. The measure described as recall error in the sequential tasks is presumably absolute (or unsigned) error. For the mixture model parameters I suggest describing them more explicitly in terms of the mixture attributes, e.g. "Von Mises SD", "Target proportion", "Non-target proportion" "Uniform proportion".

      We thank the reviewer for this suggestion. We have made revisions to clarify the terminology used in our study.

      The term "n-back" is changed to an analog recall paradigm with sequential presentation. Additionally, as mentioned in the materials and methods, the recall error in the MGL paradigm is the Euclidian distance between the target's location and subject response in visual degree. In the sequential paradigms, this value is the angular difference between the response and target value, in which both are absolute errors. To avoid confusion, we have added the term "absolute error" alongside the term "recall error" to provide a clear understanding of this measurement. Moreover, as the reviewer suggested, we changed "recall variability" to "von Mises SD" for a more precise description. We also changed the remaining terms to "target proportion", "swap error (non-target proportion)", and "uniform proportion".

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined.

      The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease modifying drugs, for example.

      However, there are some significant limitations which severely affect the scientific validity and interpretability of the study:

      1) There is a striking lack of key clinical information:

      1.1) There is a striking lack of key clinical information. The inclusion and exclusion criteria are unclear and a recruitment flowchart has not been provided. Therefore it is unclear what proportion of MS patients were ineligible due to, for example, visual impairment.

      We thank the reviewer for raising this matter. To address this issue, we revised the first section of materials and methods to include detailed inclusion/exclusion criteria information. However, it is important to note that we recruited the patients in a full-census manner, where only the patients who fulfilled the inclusion criteria participated. Unfortunately, we did not record the number of patients who did not meet the inclusion criteria.

      1.2) Basic clinical data such as EDSS scores, disease duration, treatment history, and performance on standard cognitive testing were not provided. Basic clinical and demographic data for each subgroup were not provided in a clear format. This severely limits the interpretability of the study and its significance for this clinical population. For example, might it be that the SPMS patients performed worse on the MGL task because they were more cognitively impaired than RRMS patients? That question might be easily answered, but the answer is unclear based on the data provided.

      We appreciate the reviewer for bringing up this important concern. To further clarify the basic clinical and demographic data, we have revised tables 1 and 2 to include detailed information regarding gender, age, education, cognitive ability, disease duration, EDSS score, and disease-modifying therapy (DMT) for each group, respectively. The information is reported as mean ± standard deviation except for the categorical data.

      Regarding the participants' cognitive ability, we added the Montreal cognitive assessment test results for both paradigms. MoCA is a standard cognitive screening tool that has a score of 0 to 30. The different ranges of MoCA scores related to the different levels of cognitive function, in which a score ≥ 26 is considered normal cognitive ability, 18-25 denotes mild cognitive impairment, 10-17 determines moderate cognitive impairment, and a score ≤ 10 is considered severe impairment.

      First, we classify the participants based on their MoCA value and compare groups with each other. While the primary results showed that patient groups were more impaired compared to healthy controls, our results remained significant after adjusting for MoCA using hierarchical regression analysis. This suggests that the observed difference was not solely due to more cognitive impairment in the patients' population.

      Moreover, the information regarding the treatment history of patients is added in the following format. DMT is classified into two groups, i.e., platform and non-platform treatments. In our study, the platform treatments include interferon beta-1a and glatiramer acetate, and non-platform treatments include rituximab, ocrelizumab, fingolimod, dimethyl fumarate, and natalizumab. In both paradigms, the patients did not significantly differ based on the received therapy. The MoCA assessment and treatment history information is added to tables 1 and 2 and supplementary tables 1, 3, and 5. Moreover, the second paragraph of materials and methods, second paragraph of statistical analysis in materials and methods, and the appropriate sections of the results are revised.

      2) The study is completely agnostic to the underlying pathophysiology. There is no neuroimaging available, therefore it is unclear how the specific working memory impairments observed might relate to lesioned underlying brain networks which are crucial for specific aspects of working memory. This severely limits the scientific impact of the results. This limitation is acknowledged by the authors, but the authors did not put forward any hypotheses on how their results may be underpinned by the underlying disease processes.

      We appreciate the reviewer for this valuable suggestion. To further strengthen the connection between our findings and the possible underlying mechanisms of WM dysfunction in MS, we have added a discussion from the neuroanatomical perspective in the eighth paragraph of the discussion section.

      3) The present study does not compare the continuous-report testing with a discrete measure task so it is unclear if the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer pointed out an interesting matter. However, this was not the focus of the current study. Nonetheless, it is a valuable suggestion for future work to compare continuous vs. discrete measure paradigms to determine their sensitivity and feasibility in the MS population.

    1. Author Response

      We outline reviewer/editor queries, our responses are indicated below we thank the reviewers for their suggestions that we address below and with minor edits (that do not appreciably change the content such as figure lettering and methods information).

      Reviewer #1 (Public Review):

      The paper by Dongsheng Xiao, Yuhao Yan and Timothy H Murphy presents a timely approach to record neuronal activity at multiple temporal and spatial scales. Such approaches are at the forefront of system neuroscience and a few examples include, among others, fMRI alongside electrophysiology (Logothetis et al, 2021. Nature) or widefield calcium imaging (Lake et al, 2020. Nat Meth) , or functional ultrasound imaging and multi unit recording (Claron et al, 2023 Cell Reports), The method presented here combines "low resolution" (i.e. cortical regions) widefield calcium imaging across most of the dorsal portions of the murine cortex combined with electrical recording of single neurons in specific cortical and subcortical locations (as a matter of fact, this later components can be used everywhere in the murine brain).

      The method presented here is straightforward to implement and very well documented. Examples of novel insights that this approach can generate are well presented and demonstrate the strength of the presented approach, some aspects of the analysis require clarification.

      For example, the author reveal Spike-Triggered average cortical activation Maps (STMs) linked to the activity of single neurons (Figs 4 and 5) This allows to directly asses the functional connectivity between cortical and sub-cortical areas. It nevertheless unclear what is the stability of the established relationships. The nature of the "recordings" in Fig 4. is unclear. It looks like these are imaging sessions on the same day, the length of these recordings as well as the interval between them is not stated. It will be fundamental to build a metric to compare STMs variability across sessions/recordings/days; a root-mean-square from an average map across all recordings could provide a starting point.

      Our goal was to present a well-documented protocol for implanting electrodes (tetrodes and peripheral nerve) that do not impede cortical mesoscale imaging and support chronic investigation of spike trains. We do provide examples of repeated spiking measurements across days from the same electrodes and animals. Unfortunately, due to the pandemic interrupting data collection and other factors, this dataset does not contain a thorough analysis of response longevity using these electrodes, but we do show examples in the figures. In Figure 1F, G, we showed that the single unit activity was relatively stable during one week, two weeks, and two months of recordings after implantation. In Figure 4B we showed spiking activity in the hippocampus was stable across day 8 and day 9. We also showed that the STM of the hippocampus neuron was consistently associated with the RSP, BCS, and M2 region for 10 recording sessions across days. In Figure 4D, We showed that the STMs of a midbrain neuron were relatively stable over 2 months. The spiking activity of the neuron on different days was consistently correlated with the lower limb, upper limb, and trunk sensorimotor areas on both hemispheres of the cortex.

      Also with respect to the STMs analysis, the data-driven choice of 10 clusters might need a bit more explorations. While the silhouette clustering accuracy peaks at 10 (Fig 5A), this metrics comes without a confidence intervals making it difficult to know if a difference of less than 10% (i.e. 11 or 13 clusters) should be deemed different. Maybe a bootstrapping approach could be used here to build such confidence intervals. Another approach to reach the number of cluster to use could be based on "consensus" between different partitioning algorithms (e.g. Strehl, A. & Ghosh, J. itions. J. Mach. Learn. Res. 3, 583-617 (2001). A much stronger argument should be provided to use the 0.3 correlation cutoff value which seems to be arbitrarily low. The main point here is that the authors should show that their conclusions hold within a range of parameter values (number of clusters and correlation threshold).

      Thank you for the interesting suggestions regarding cluster numbers. We agree that the number (10 clusters) could be taken as an arbitrary value. However, we have done previous work examining cortical connectivity maps in Mohajerani et al. 2013 Nature Neurosci. and found that cortical mesoscale activity has a degree of freedom (number of unique elements) in the range of 10-15. This number is also supported by major structural networks found by the Allen Brain Connectivity Atlas and within functional imaging data. In other work using unsupervised methods Xiao et al. 2021 Nature Comm a similar number of clusters were identified so these numbers are without some basis.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed very much reading the manuscript!

      Minor comments (aesthetics and typos)

      Please clarify how the hemodynamic correction was performed. The text refers to "substracted". This usually involves the computation of a general of per-pixel weight. Is this correction constant along the longitudinal imaging session (i.e. over weeks)?

      The hemodynamic correction was calculated based on the results of each daily session. Typically these corrections have minimal impact on overall values and are not expected to appreciably change over time.

      In Figure 3, authors might reconsider scaling down the size of panel A and enlarging the data presented in D. Also, with respect to panel D, what does the gray band represent, confidence intervals, standard dev? Please clarify.

      The gray bands correspond to the standard deviation of random trigger average traces.

      Lines in 4E could be made thicker.

      In the caption of fig6, panel D is mentioned twice (should be E).

      Thanks for catching this mistake we have changed the caption in the online version.

      Reviewer #2 (Public Review):

      The article presents 'Mesotrode,' a technique that integrates chronic widefield calcium imaging and electrophysiology recordings using tetrodes in head-fixed mice. This approach allows recording the activity of a few single neurons in multiple cortical/subcortical structures, in which the tetrodes are implanted, in combination with widefield imaging of dorsal cortex activity on the mesoscale level, albeit without cellular resolution. The authors claim that Mesotrode can be used to sample different combinations of cortico-subcortical networks over prolonged periods of time, up to 60 days post-implantation. The results demonstrate that the activity of neurons recorded from distinct cortical and subcortical structures are coupled to diverse but segregated cortical functional maps, suggesting that neurons of different origins participate in distinct cortico-subcortical pathways. The study also extends the capability of Mesotrode by conducting electrophysiological recordings from the facial motor nerve. It demonstrates that facial nerve spiking is functionally associated with several cortical areas( PTA, RSP, and M2), and optogenetic inhibition of the PTA area significantly reduced the facial movement of the mice.

      Studying the relationship between widefield cortical activity patterns and the activity of individual neurons in cortical and subcortical areas is very important, and Murphy's lab has been a pioneer in the field. However, the choice of low-yield recording methods (tetrode) instead of more high-yield recording techniques, such as silicon probes, makes the approach presented in this study somewhat less appealing. Also, the authors claim that a tetrode-based approach can allow chronic recordings of single neural activity over days - a topic that is very controversial. In terms of results, I was under the impression that most of the conclusions presented in the bulk of the paper ( Figures 1-5) are very similar to what previous work from Murphy's lab and other labs has shown using acute preparation. In this respect, the paper can benefit from a more in-depth analysis of the heterogeneity of single-neuron functional coupling. The last part of the facial nerve recording is interesting (Figure 6), but I think it can be integrated better into the rest of the paper.

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      1) The methodology described in the paper is based on chronic tetrode recordings combined with widefield calcium imaging. The authors emphasize the advantages of using tetrodes in that they are 1) easy to implant 2) have a small footprint, and 3) allow to record the same neurons over days.

      I agree regarding the first advantage, however, the ability to reliably record the activity of the same neurons over days using electrophysiological recordings is controversial. The authors claim that:

      'We found that the single unit activity was relatively stable, during one week, two weeks, and two months of recordings after implantation (Figure 1F, G)',

      The only 'proof' the authors show for recording stability are waveforms of one neuron on one channel (out of presumably four channels), which seem to differ in amplitude over days. Two-dimensional plots of the neuron waveform for all channel combinations could be a more convincing way to make this claim. But, as I already mentioned - the ability to record from the same neurons chronically with electrophysiological methods is rather controversial, especially with tetrodes that don't allow for laminar profiling of neuronal response to account for a potential drift over time.

      We now make it more clear that examples of mesotrode stability are indicated in the figures. Furthermore, we acknowledge caveats that spike sorting experiments required to more conclusively identify single neurons would be improved with larger format silicon probes. Our work employs compact tetrode electrodes that permit simultaneous resolution of single units and mesoscale GCAMP activity. It is conceivable that improvements in spike sorting fidelity could be made by switching to more densely spaced silicon probes. While this is an obvious advantage, these probes do not have a compact footprint and would interfere with regional imaging.

      2) The authors present little analysis justifying the advantage of conducting chronic electrophysiological recordings instead of acute recordings with their data. In fact, throughout the paper, the authors mention that the results were consistent with their previous work with acute recordings. The only longitudinal analysis in this paper is qualitative and suggests that cortical maps were stable over days. I believe this was also shown in the past already. More in depth analysis of across days dynamics or showcase of an experiment centered on across days dynamics will strengthen the appeal of this approach. Generally speaking, there is very little quantitative analysis of longitudinal maps/functional coupling of single neurons over days. The paper will benefit from at least some quantification of this part.

      To our knowledge data showing the persistence of spike-associated maps longer than an acute experiment is novel. However, due to a low yield of recorded single neurons, we have not been able to follow these maps over a longer period in a population that would permit group statistics. We suggest that future experiments could be done using silicon probes with larger yields which would help to better align electrophysiological features with mesoscale GCAMP maps.

      3) Recording with tetrodes gives very low yields compared to silicon probe recordings. While silicon probes have a larger footprint and may occlude the widefield imaging on the side of the silicon probe implant, it is unclear why not to use denser electrode arrays on one side of the brain and image from the other hemispheres, given that the maps are very correlated across hemispheres

      Taking advantage of mirrored activity in the opposite hemisphere is a great idea. Future studies could include experiments that would take advantage of bilateral symmetry by placing high-resolution silicon probes in one hemisphere and then reading out mesoscale maps in the other.

      4) The advantage of the electrophysiological recordings is in providing access to single-neuron activity at high temporal resolution. The authors could add more quantifications regarding individual neuron functional coupling diversity. For instance, in the per-area distributions in Figure 5D -- did all neurons from a given area participate in the same functional maps, or did different neurons show diversity in the functional coupling. Did simultaneous recordings of neurons from the same tetrode show more similar maps, than recordings of other neurons from the same area conducted on different days/in different animals? Did the map differ when the neurons were bursting/were at specific phases of the LFP, etc.

      Unfortunately the yield of neurons was not enough to investigate some of the interesting state-dependent phenomena the reviewer describes. In previous work we have examined heterogeneity between single neuron responses in more detail Xiao et al. 2027 in acute work.

      5) Facial nerve stimulation. This part feels detached from the rest of the paper and is not explained/discussed in sufficient detail. For example, there is no description of the surgical procedure or the electrode used for facial nerve recordings in the Methods (in the Results section, the authors mention 'micro-wires', but the Method section only contains information about tetrodes).

      Thank you for bringing up the issue of surgical details for facial nerve experiments are now in the methods. This information is also available by contacting the authors and below.

      For facial nerve recordings, peripheral nerve activity was measured by fine wire recording directly from the nerves subserving the whisker. During surgery, mice will be anesthetized and positioned on a warming pad connected to a rectal probe, and the temperature maintained at 37 °C. A skin incision was made, exposing a small part of the buccal branch of the left facial nerve. Magnification of the surgical field with a dissecting microscope allowed a careful dissection of a nerve branch with minimum disruption of the tissues and blood supply surrounding the nerve. The appropriate site of exposure was determined by using two projection lines: a vertical line running downward, posterior from the outer corner of the eye, and a horizontal line running in the caudal direction, starting at the whisker E-row. Then two insulated fine wires (about 25 µm tips) were hooked and placed around the nerve separated about 2 mm from one another. The insulation at the ends of the wires was removed and a knot was made on each wire to prevent it from slipping. The opposite ends of each wire were soldered to a mini connector attached by dental cement to the skull. Finally, 6-0 silk sutures were used to close the skin incisions.

      The functional maps associated with facial nerve spiking show different patterns from the optogenetic stimulation maps that led to significant facial nerve responses. Specifically, the STM maps show responses in the posterior parts of the cortex, but the photostimulation map showed almost an opposite pattern, where the effects were observed in the anterior parts. The authors do not discuss this mismatch in sufficient detail. Further, the authors refer to area PTA but use partitions based on the Allen Institute, which does not indicate this area.

      The posterior parietal area location is based on our previous work Mohajerani et al. 2013 and using the Allen Institute Brain Atlas for guidance.

      Minor comments

      6) The authors mention that "on average, we obtained 3-5 neurons per tetrode implanted, and this yield was consistent across regions (Figure 2C). " -- for how long, on average, could the authors record single-neuron activity from each tetrode?

      The 3-5 neurons obtained per tetrode were recorded 1 week after tetrode implantation.

      7) Figure 4B - it is unclear what the labels "recording 1, ...5, " correspond to. Are these different recording sessions within the same day "day 8"?

      The labels "recording 1, ...5, " correspond to different recording sessions within the same day.

    1. Author Response

      Review 1:

      Major concerns that need to be addressed:

      Investigate the effects of Malat1 on the clearance of Listeria or LCMV.

      In our prior publication (Gagnon et al, Cell Reports) we showed that miR-15/16 deficiency in T cells does not affect the clearance of LCMV, and that transferred memory T cells formed in these mice can function normally to clear a secondary infection with Listeria expressing the LCMV gp33 peptide. However, the size of the memory pool was clearly changed, as was the programming of memory cells. Here, we show that disrupting miR15/16 binding to MALAT1 induces a reciprocal phenotype, validating a biological function for this RNA:RNA interaction. We employed these systems because they are widely used to reveal key aspects of T cell memory, but both infections are readily cleared by the host. These changes in the memory response likely play a limiting role in some biological context(s), and we agree that further investigation to uncover such situations would further validate the importance of this RNA circuit.

      Demonstrate that Malat1 shuttles to the cytosol, this will strengthen the conclusions that Malat1 sponges miR15/16.

      The location of miR-15/16 interaction with Malat1 is an interesting area for future study. Many prior studies have shown clearly that Malat1 is primarily located in the nucleus, but since T cells express such a large excess of this lncRNA, even the remaining fraction detected in the cytosol may be sufficient to “sponge” a significant amount of miR-15/16. Alternatively, these molecules may interact in the nucleus, or during mitosis. As the reviewer suggests, Malat1 may shuttle between compartments, raising the intriguing possibility that it could not only “sponge” but “drag” miR-15/16 away from its targets into the nucleus. A proper analysis of the mechanism of ceRNA function is beyond the scope of this paper, but we do believe that this circuit may be an especially good one for further study.

      Through flow cytometry or immunoblot analyses, investigate the effects of Malat1-miR15/16 on genes listed in table 3. This would add credence to the sequencing and CLIP data.

      We thank the reviewer for bringing to our attention the manuscript’s overemphasis on the former Table 3 gene set, which represented just a few of the hundreds of genes for which our data provide evidence for miR-15/16 binding and inhibition of expression. We have removed this table to avoid the appearance of suggesting an oversimplified model for how miR-15/16 regulate T cell responses, and replaced it with a short description of two targets (Pik3r1 and Mapk8) that link the roles of miR-15/16 in T cell activation and tumor suppression. Like transcription factors, miRNAs function as network regulators of gene expression, gaining biological power through their ability to coregulate many genes with convergent effects on cell behavior. In the case of miR-15/16, our published data, reinforced by the data in this manuscript, indicates that the relevant target network is very large, and that even very small changes in the expression of these targets is sufficient to alter the fate of antigen-responsive T cells in the setting of acute infection.

      This comment also raises the important issue of target validation, which is often difficult, since the effect size for each miRNA target is small (typically 10-30%, sometimes reaching 50% reduction). The expected effect of Malat1 inhibition of miR-15/16 is some fraction of that. Nevertheless, in Figure 3 and Figure 7, we validated two direct targets (CD28 and Bcl2) using flow cytometry, a technique that facilitates precise sampling of protein expression on a large number of individual cells.

      Minor concerns:

      The discussion is too broad and does not address the limitations of the study.

      We added a sentence to acknowledge the limitation regarding small effect sizes and the shortcomings of the acute infection models used in this study:

      “The magnitude of this effect was modest in acute LCMV and Listeria infection, two models that feature robust pathogen clearance, allowing assessment of memory T cells in the absence of chronic antigen persistence. Further work is needed to assess other settings in which Malat1:miR-15/16 interaction may have a bigger impact on the outcome of immune responses.”

      Reviewer 2:

      1) Given the lack of an effect on microRNA or Malat1 levels following the genetic modification is it possible that Malat1 is actually not directly bound by the miRNA? Could the knock-out of the miRNA could induce Ago2 loss on Malat1 by indirect mechanisms? If there is any room for doubt about a direct interaction the authors should at least mention discuss.

      There is very little room for doubt about the direct interaction between miR-15/16 and Malat1. The AHC data we report indicates that the loss of Ago2 binding to the mutant Malat1 occurs predominantly at the site containing the miR-15/16 binding site of interest. This suggests that the mutation we created does not affect global Ago2 levels or occupancy across the rest of the transcript. Further, the miR-15/16 KO data directly support this result, showing that miR-15/16 is necessary for Ago2 binding at that site. If loss of miR15/16 resulted in a non-specific indirect loss of binding to Malat1, we would expect that other binding events would be affected as well, which we do not observe.

      In the Results, the authors write: "miR-15/16 has not been previously shown to interact with Malat1", but they should cite/discuss: MALAT1 regulates the transcriptional and translational levels of proto-oncogene RUNX2 in colorectal cancer metastasis, Qing Ji et al, 2019.

      We thank the reviewer for bringing this study to our attention, and we have cited it in our updated version of the manuscript. While the interaction between miR-15/16 and Malat1 has been shown before, our study represents a significant step beyond this study in two important ways: The rigorous biochemical mapping of the miR-15/16:Malat1 interaction site, and direct evidence for the role of a miR:lncRNA interaction in an in vivo physiological phenotype.

      2) The authors write: "Only a few studies demonstrate sequence dependent function of lncRNAs (Elguindy and Mendell, 2021; Kleaveland et al., 2018; Lee et al., 1999)". But this seems more common that the statement implies (see for example this review: https://www.sciencedirect.com/science/article/pii/S002228361200896 0#s0065).Moreover, SNPs in lncRNAs are associated with pathologies (see for example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306726/, where also SNPs in Malat1 are presented). The authors could acknowledge this and by reformulating their sentence and citing these.

      A large number of studies uncovered lncRNA functions without identifying RNA sequences that are responsible for that activity, but evidence for sequence-specific effects remain rare. We thank the reviewer for providing direction to additional sequence-specific studies and we have now cited several of them in the updated version of the introduction:

      “Studies demonstrating sequence dependent function of lncRNAs are comparatively rare (Carrieri et al., 2012; Elguindy and Mendell, 2021; Faghihi et al., 2008; Gong and Maquat, 2011; Kleaveland et al., 2018; Lee et al., 1999; Yoon et al., 2012).”

      In particular, association of important SNPs with lncRNA loci is an exciting motivator in the study of lncRNAs and can be informative in the dissection of lncRNA function. For Malat1 in the linked Minotti et al publication, we do not believe the SNPs referenced represent indications of sequence-specific transcript function. The SNPs identified for Malat1 are rs1194338, rs4102217, and rs591291. In the UCSC genome browser screenshot in Author response image 1, you can see that all of these SNPs are upstream of Malat1 and in regions of extremely dense H3K27Ac, suggesting enhancer function. These SNPs do not represent sequence specific function of the Malat1 transcript, but rather more likely genomic sequence regulation of Malat1 (or nearby gene) expression.

      Author response image 1.

      • Figure 2H: In the figure legend, could the authors clarify what they mean by "same conditions as in F"?

      We have updated the figure legend for clarity.

      • Figure 3 panel labels B, C, D don't match figure.

      We have corrected this and provided an updated figure.

      • Figure 4 D, E, F: Can the authors comment more about why in their opinion early activation genes are not significantly decreased in Malat1 scr/scr?

      Figure 4A shows that interrupting Malat1 interaction with miR-15/16 does affect the early induction of the immediate early gene CD69. Even miR-15/16 deficiency did not affect Nur77 expression, indicating that Malat1 and miR-15/16 regulate specific cues and signaling pathways involved in T cell activation. In particular, the transcriptomic analysis led us to focus on effects on costimulation-induced genes (Figure 3). Figure panels 4D, E, and F show the production of cytokines, including IL-2, which has been well documented to be responsive to CD28 signaling and clearly did so in our experiments. These data show a consistent increase in miR-15/16-deficient T cells, despite considerable noise in the assay. The trend toward reduced IL-2 in Malatscr/scr T cells is of smaller magnitude, as expected, and not statistically significant. Repeating this assay to obtain a better p value doesn’t seem warranted. However, we did independently observe decreased IL-2 production in Malatscr/scr T cells in an ex vivo cytokine capture assay (Figure 7F-G).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) In general given several of the "equivalence groups" were distinguished from each other in Packer et al's annotation, can the authors comment more on why they aren't able to distinguish them? Are the markers listed for those cell states in Packer not expressed appropriately in these data? Or are they expressed but the states are not different enough to form discrete clusters? I suggest the possibility that the analysis choices of 20 "initial dimensions" or 1000 most variable genes filtered out some of these differences which may be encoded in later principle components, or that the use of t-SNE projection is not sufficient to resolve these distinct states.

      2) I was a bit confused by the spatial gene expression analysis. Several distinct ideas appear to be posed in the text. These ideas aren't really supported by any quantitative analysis, just the visual patterns in Figure 4B/C which I'm not sure I always agree with.

      For example, ceh-43 expression is mentioned as having "physically proximate" expression. But it is well established that different lineages form specific spatial territories (e.g. Schnabel et al 1997). Thus it seems logical that genes with specific lineage patterns will have specific spatial patterns as well. If the claim is that the observed patterns are more clustered along the A-P axis than expected by chance given their lineal complexity then I'm not sure this is shown. Maybe some comparison with control lineage patterns of similar complexity of non-TFs or non-HD TFs could get whether these genes specifically are more spatially patterned? Visually it looks to me like some patterns are more like "blobs" or even lateral or D-V specific patterns than they are like "stripes."

      In addition there is a long history in the literature discussing the origin of position-specific patterns in C. elegans - most I'm aware of support the idea that positional information arises primarily from intrinsic lineage mechanisms (e.g. Cowing and Kenyon 1996). Perhaps the authors are making this same argument here, but if so this isn't clear from the text.

      Or maybe the authors are trying to make the argument that combinations of TFs encode more precise position than individual TFs? This seems more likely to me from the images presented still not well-supported without quantitative or statistical analyses.

      3) The comparison with Drosophila is interesting but also under-developed. I think all I would feel comfortable claiming from the data as shown is that genes that are spatially patterned in early fly development are also usually patterned in the C. elegans lineage. But to even say this is an enrichment over expectation would require more analysis.

      Minor comments:

      Methods: some statement about temperature control during cell isolation would be useful. In other words were embryos continuing to develop or put at low temperature such as in a cold room to prevent temporal differences between the first and last cells collected from a given embryo?

      Current links to data at GEO are incorrect and link to Levin et al 2016 instead. I was not able to access the raw single cell data, just the processed data in Table S6.

      The standardization of expression in embryos isn't well explained - would be good to expand a little on the types of batch effects being addressed and how this approach was chosen or a relevant citation.

      Page 2: Including P0 and cell deaths there are 1,341 branches in the hermaphrodite lineage (2n-1 for 671 terminal cells including deaths).

      -"as their each have" (grammar error)

      -"very large nuclear hormone receptor domain" (add "family")

      Page 3: As noted Packer et al largely missed cells prior to the 50-cell stage as described - but the reason for this is likely that the use of 10 micron filters or centrifugation to remove undissociated embryos also removes early stage cells.

      -"few new expressions occur" (grammar). Also, in both Tintori and Hashimshony datasets there well over 1000 newly expressed genes detectable (see for example Sivaramakrishnan et al 2021 biorxiv).

      Figure S1 would be easier to interpret with a legend explaining what fates are represented by each color

      Some genes listed as markers in Figure S2 are not included in the marker table such as flh-3, oma-2, sma-9.

      "New markers were required" - this is plural but only F19F10.1 is mentioned. Were other markers examined this way or should it be singular?

      In Figure S2 the lower ("robustness") plots are nice but could be explained more clearly. What is the nature of the "cell similarity score"? How many (if any) cells were excluded due to not being most similar to their own cluster?

      "transcriptomically very similar shortly after division" - can the authors comment on any information they have about how long after division the cells were collected?

      GFP reporter lineaging - the methods are minimally described (what brand of microscope, which strains/transgene/CRISPR configurations etc). And data are not presented. If these embryos are all incorporated into Ma et al 2021, that is fine, but should be clearly cited. Otherwise it is important in my view to include some way to access the quantitative values from the lineaging and understand these details.

      "as illustrated for ceh-43, dmd-4 and unc-30" - were there other examples as suggested from this wording? I'd also note that similar fluorescent reporter imaging data have been published previously for all three genes listed (Walton et al 2015 for UNC-30, Ma et al 2021 for DMD-4 and CEH-43 protein reporters, Murray et al 2012 for dmd-4 and ceh-43 promoter reporters).

      Zacharias and Murray are cited as promoting "continuous symmetry breaking" but actually that review argued for a "non-monophyletic" architecture similar to that supported by the data .

      The text and figure don't always agree. For example mec-3 expression is listed in the text as part of one of the stripes, but mec-3 is not labeled on the figures.

      The stage of each embryo in figure 4B/C should be explicitly labeled (and maybe also given specific figure panel designations to clarify what statements in the text correspond to which figures).

      In the discussion it is unclear what the numbers "97 to 104" refer to

      The scRNA-seq reads were mapped to a relatively old genome build and annotation set (WS230) - thus current users may find discrepancies with current gene names in WormBase. Also, since the CEL-seq data are 3' biased, it is worth noting that Packer et al found that a substantial number of genes (~1000) in a slightly later annotation set (WS260) were undercounted (sometimes dramatically) with the similarly biased 10x data due to incomplete 3'UTR annotations. While I would be reluctant to ask for a requantification for the purposes of the manuscript given the challenges of repeating the various analyses, it is worth explicitly mentioning whether this was dealt with.

      Reviewer #2 (Recommendations For The Authors):

      The writing was otherwise good, at least to my eye, and the data was presented very well and made freely available to other researchers. I am not as well-versed in the statistical methods and will leave comments on these to a better-equipped reviewer(s).

      Fig. 1 legend 'P' should be P4 (subscript 4).

      p. 9 'ceh-51' should be italicized. Only one factor seems to have been confirmed by smFISH, F19E10.1. There are available reporters, did they show a similar pattern? From CGC website: RW12347 F19F10.1(st12347[F19F10.1::TY1::EGFP::3xFLAG]) V endogenous tagged reporter; RW11620 unc-119(tm4063) III; stIs11620 [F19F10.1::H1-wCherry + unc-119(+)] array reporter.

      Reviewer #3 (Recommendations For The Authors):

      Typo: on page 11, where it says nanog it should read nanos.

      Reviewer #4 (Recommendations For The Authors):

      I found some sentences and paragraphs to be a bit unclear. There are no page or line numbers in the manuscript, so I point in the general direction, and hope the authors find what I am referring to.

      • 2nd paragraph of the Introduction - "their" should be "they", but the sentence as a whole is not clear.

      • 3rd para. of the Intro. - The last sentence of this paragraph doesn't make sense. Please rephrase and/or break up into shorter sentences.

      • 1st Para. of Results - "the maternal deposit" is not clear. Perhaps "maternally deposited transcripts" or something similar.

      • 1st Para. after Figure 3. The last sentence "Thus, continuous symmetry breaking..." is unclear. What is "continuous symmetry breaking"? Please define and expand.

      • Fig. 4 - the genes seem to be listed from posterior to anterior. The common way of presenting Hox gene lists and other regionally expressed genes is from anterior to posterior.

      • For the benefit of the non-C. elegans crowd, please give names of Drosophila homologs where relevant (e.g., when comparing to Drosophila expression patterns)

      In a few places there are citations of popular science books or general textbooks (e.g., Carrol et al., 2004; Wolpert et al., 2019) . I think it would be better to cite review papers from the scientific literature or relevant primary papers.

      I am very happy to submit the revised manuscript. We were very happy to have received reports from four reviewers!

      We have decided not to prepare a separate response to the public comments of the reviewers, as we did not undertake any further major revisions.

      We did address most of the minor editorial suggestions.

    1. Author Response

      eLife assessment

      This paper presents a series of experiments investigating the role of cadherin-11 mediated interactions between cancer cells and fibroblasts in metastasis using updated 3D cell co-invasion assays. The primarily descriptive data are a valuable contribution to our understanding of the nature of cross cell-type interactions in metastasis, but are incomplete with respect to the far-reaching conclusions about the central role cadherin-11, especially given the complex nature of the phenotype and the need to better contextualize these observations in a complete picture of metastasis.

      We extend our gratitude to eLife for affording us the opportunity to publish our manuscript as a peer-reviewed preprint. We acknowledge that our exploration of the novel cell hijacking mechanism underlying cancer metastasis remains an evolving endeavor. Being the inaugural study to introduce this innovative phenotype, substantiated by comprehensive in vivo investigations that underscore its real-world significance, we eagerly anticipate forthcoming research in this domain. The inception of the concept of cancer metastasis dates back to the 18th century. Throughout the extensive journey marked by a multitude of millions of publications in this field, our work introduces a transformative and disruptive dimension with the unveiling of this cell hijacking mechanism. Simultaneously, it initiates a deeper exploration of the intricacies within the metastatic process. We sincerely value the meticulous assessment of our work and look forward to subsequent investigations that will elucidate these findings within the broader context of metastasis.

      Joint Public Review:

      The authors of this manuscript studied cell-cell interaction between fibroblast and cancer cells as an intermediary model of tumor cell migration/invasion. The work focused on the mesenchymal cadherin-11 (CDH11) which is expressed in the later stages of the epithelial mesenchymal transition (EMT) in tumor cellular models, and whose expression is correlated with tumor progression in vivo. The authors employed 3-D matrix and live cell imaging to visualize the nutrient-dependent co-migration of fibroblast and cancer cells. By siRNA-based suppression of CDH11 expression in tumor cell line and/or fibroblast cells, the authors observed decreased co-movement and attenuated growth of mixed xenograft. Accordingly, the authors conclude that post-EMT cancer cells are capable of migrating/invading through CDH11-mediated cell-cell contact.

      While the data point to the involvement of CDH11 in fibroblast mediated co-invasion, as it stands it is difficult to fully contextualize these observations within the broader context of the molecular mechanisms underlying metastasis, and in particular do not firmly establish a primary role for CDH11 at this time. The reviewers were specifically concerned about indirect effects of CDH11 manipulation on the physiology and cell biology of the tumor cells, and the possibility that several of the results could be consequences of these changes rather than due specifically to CDH11 mediated interactions.

      The reviewers acknowledge the difficulty in fully controlling for these phenomena, and believe this work will be of interest to the large number of researchers investigating the molecular basis for metastasis and specifically of trans cell-type interactions. However until experiments establishing the specific formation and CDH11-mediated interactions in co-invasion are carried out, the author's conclusions about the prominent role of CDH11 should be treated as intriguing, but speculative.

      We extend our sincere gratitude to the peer reviewers for their invaluable and constructive feedback. We also wish to express our appreciation for the concise summary of our study and the recognition of the challenges posed by the current technological landscape in fully elucidating the phenotype.

      In response to the reviewer's concerns regarding the indirect effects of CDH11 manipulation on the physiology and cellular biology of tumor cells, we encourage readers to revisit Figure 3. In this figure, we not only silenced CDH11 in cancer cells but also in fibroblasts. The outcomes of this intricate experiment have been comprehensively discussed in the main text and are visually summarized in Supplemental Figure S2.

      Furthermore, we draw attention to a comprehensive review of our in vivo studies presented in Figure 6, wherein we exclusively silenced CDH11 in fibroblasts without any manipulation of the cancer cells. These findings underscore the molecular underpinnings of CDH11 as the mediator of cell hijacking. Consequently, we are confident that the reviewer's concerns regarding potential side effects of CDH11 manipulation on tumor cells, which could weaken the manuscript's conclusions, can be addressed.

      In conclusion, we wish to emphasize that we shared the same initial concerns as our reviewers when designing these studies. We have diligently endeavored to alleviate these concerns through a series of comprehensive in vitro, ex vivo, and in vivo experiments. Once again, we strongly encourage readers to explore our supplemental data for a more in-depth understanding. Thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and somewhat unusual paper supporting the idea that creatine is a neurotransmitter in the central nervous system of vertebrates. The idea is not entirely new, and the authors carefully weigh the evidence, both past and newly acquired, to make their case. The strength of the paper lies in the importance of the potential discovery - as the authors point out, creatine ticks more boxes on criteria of neurotransmitters than some of the ones listed in textbooks - and the list of known transmitters (currently 16) certainly is textbook material. A further strength of the manuscript is the careful consideration of a list of criteria for transmitters and newly acquired evidence for four of these criteria: 1. evidence that creatine is stored in synaptic vesicles, 2. mutants for creatine synthesis and a vesicular transporter show reduced storage and release of creatine, 3. functional measurement that creatine release has an excitatory or inhibitory (here inhibitory) effect in vivo, and 4. ATP-dependence. The key weakness of the paper is that there is no single clear 'smoking gun', like a postsynaptic creatine receptor, that would really demonstrate the function as a transmitter. Instead, the evidence is of a cumulative nature, and not all bits of evidence are equally strong. On balance, I found the path to discovery and the evidence assembled in this manuscript to establish a clear possibility, positive evidence, and to provide a foundation for further work in this direction.

      it is notable that, historically, no neurotransmitter has ever been established in a single paper. While creatine will not be an exception, data presented in this paper are more than any previous paper in demonstrating the possibility of a new neurotransmitter. However, we added an entire paragraph in the Discussion part about differences between Cr and classic neurotransmitters such as Glu, beginning with the absence of a molecularly defined receptor at this point and the Ca2+ independent component of Cr release induced by extracellular K+.

      We appreciate the reviewer for noting that evidence obtained by us now support that creatine satisfies all 4 criteria of transmitters.

      We respectively disagree the point about a smoking gun: any of these four is a smoking gun, while the satisfication of all 4 is quite strong, more than a smoking gun.

      We find it disagreeable that a receptor “would really demonstrate the function of a transmitter”. Textbook criteria for a transmitter usually require postsynaptic responses, not a molecularly defined receptor. A molecularly defined receptor for many of the known transmitters required many years of work, while they were accepted as transmitters before their receptors were finally molecularly defined. As long as there is a postsynaptic response, there is of course a receptor, though its molecular properties should be further studied. For examples, responses to choline were discovered in 1900 (Hunt, Am J Physiol 3, xviii-xix, 1900), those to acetylcholine in 1906 (Hunt and Taveau, Br Med J 2:1788-1789, 1906), those to supradrenal glands before 1894 (Oliver and Schäfer, J Physiol 18:230-276 1895). Henry Dale was awarded a Nobel prize in 1936 partly for his work on acetylcholine. Receptors for acetylcholine and noradrenaline were not molecularly defined until the 1970s and 1980s. Before then, they were only known by mediating responses to natural transmitters and synthesized chemicals.

      There were two previous reports that creatine could be taken into brain slices (Almeida et al., 2006) or synaptosomes (Peral, Vázquez-Carretero and Ilundain, 2010). These were used by the reviewer to argue that the idea of creatine as a neurotransmitter “is not entirely new”. However, no one has followed up these studies for 10 years, thus they would not be considered as good smoking guns. While we have reproduced the synaptosome uptake result (together with our new finding that this uptake was dependent on SLC6A8), it should be noted that uptake of molecules into synaptosomes is not absolutely required for a neurotransmitter because degradation of a transmitter is equally valid. Furthermore, molecules required synaptically but not as a transmitter can also be transported into the synaptic terminal.

      Our detection of Cr in the synaptic vesicles provides much stronger evidence supporting its importance. If a smoking gun is important, the detection of creatine in the SVs is the best smoking gun, whose discovery in fact was the reason leading us to study its release, postsynaptic responses as well as repeating the uptake experiment with genetic mutants.

      Reviewer #2 (Public Review):

      Summary:

      Bian et al studied creatine (Cr) in the context of central nervous system (CNS) function. They detected Cr in synaptic vesicles purified from mouse brains with anti-Synaptophysin using capillary electrophoresis-mass spectrometry. Cr levels in the synaptic vesicle fraction were reduced in mice lacking the Cr synthetase AGAT, or the Cr transporter SLC6A8. They provide evidence for Cr release within several minutes after treating brain slices with KCl. This KCl-induced Cr release was partially calcium-dependent and was attenuated in slices obtained from AGAT and SLC6A8 mutant mice. Cr application also decreased the excitability of cortical pyramidal cells in one third of the cells tested. Finally, they provide evidence for SLC6A8-dependent Cr uptake into synaptosomes, and ATP-dependent Cr loading into synaptic vesicles. Based on these data, the authors propose that Cr may act as a neurotransmitter in the CNS.

      Strengths:

      1) A major strength of the paper is the broad spectrum of tools used to investigate Cr.

      2) The study provides strong evidence that Cr is present in/loaded into synaptic vesicles.

      Weaknesses:

      (in sequential order)

      1) Are Cr levels indeed reduced in Agat-/-? The decrease in Cr IgG in Agat-/- (and Agat+/-) is similar to the corresponding decrease in Syp (Fig. 3B). What is the explanation for this? Is the decrease in Cr in Agat-/- significant when considering the drop in IgG? The data should be normalized to the respective IgG control.

      We measured the Cr concentration in the whole brain lysates using Creatine Assay Kit (Sigma, MAK079). Cr levels in the brain were reduced in Agat-/- mice. The Cr concentration in AGAT-/- mice was reduced to about 1/10 of AGAT+/+ and AGAT+/- mice (Author response image 1).

      Author response image 1.

      Cr concentration in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=5 male mice for each group). , p<0.05, **, p<0.001, one-way ANOVA with Tukey’s correction.

      As pointed by the reviewer, the decrease in Cr IgG in Agat-/- seems similar to the corresponding decrease in Syp (Fig. 3B in the paper). Cr pulled down by IgG was 0.46 ± 0.04, 0.37 ± 0.06 and 0.17 ±0.03 pmol/μg anti-syp antibody for Agat+/+, Agat+/-, and Agat-/- mice respectively. There was a trend of reduction Cr IgG in Agat-/-, however, there were no statistically significant differences between Agat-/- and Agat+/+, or between Agat-/- and Agat+/-, as determined by one-way ANOVA (Fig. 3B in the paper). Due to the fact that Agat-/- reduced Cr concentration in the brain, we speculate that the apparent drop in Cr pulled down by IgG may have partially resulted from the overall reduction of Cr content in the brain.

      The absolute content of Cr pulled down by Syp in Agat-/- mice was reduced to 21.6% of Agat+/+ mice and 23.6% of Agat+/- mice (Fig. 3B in the paper). As suggested by the reviewer, we normalized the Cr pulled down by Syp to the respective IgG control (Author response image 2). The normalized Cr content in AGAT-/- mice has a tendency to decrease, but not statistically significant, as compared to Agat+/+ and Agat+/- mice (n=10 for each group, one-way ANOVA).

      Author response image 2.

      Normalized Cr content in brain from AGAT+/+, AGAT+/- and AGAT-/- mice (n=10 for each group). Cr pulled down by anti-Syp antibody was normalized to that of IgG.

      2) The data supporting that depolarization-induced Cr release is SLC6A8 dependent is not convincing because the relative increase in KCl-induced Cr release is similar between SLC6A8-/Y and SLC6A8+/Y (Fig. 5D). The data should be also normalized to the respective controls.

      As suggested by the reviewer, we normalized the Cr release during KCl stimulation to the baseline (Author response image 3). The ratio of Cr release evoked by high KCl stimulation to the baseline was similar in WT and Slc6a8 knockouts. This suggests that Cr is not released through SLC6A8 transporter.

      Author response image 3.

      Normalized Cr release from slices from Slc6a8+/Y and Slc6a8-/Y mice (n=7 slices for each group). Cr released evoked by high KCl stimulation was normalized to baseline.

      However, without Slc6a8, KCl-induced release of Cr was significantly reduced (Figure 5D in the paper). This is because Slc6a8 is a transporter to Cr uptake into synaptic terminals (Figure 5D and 8C in the paper). Therefore, Cr content in SVs (Figure 2C in the paper) indirectly reduced Cr release.

      3) The majority (almost 3/4) of depolarization-induced Cr release is Ca2+ independent (Fig. 5G). Furthermore, KCl-induced, Ca2+-independent release persists in SLC6A8-/Y (Fig. 5G). What is the model for Ca2+-independent Cr release? Why is there Ca2+-independent Cr release from SLC6A8 KO neurons? How does this relate to the prominent decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G)? They show a prominent decrease in Cr control levels in SLC6A8-/Y in Fig. 5D. Were the data shown in Fig. 5D obtained in the presence or absence of Ca2+? Could the decrease in Ca2+-dependent Cr release in SLC6A8-/Y (Fig. 5G) be due to decreased Cr baseline levels in the presence of Ca2+ (Fig. 5D)?

      These are interesting questions that, at this point, could only be answered by references to literature. For example, one possibility was that Ca2+-independent Cr release might occurs in glia, since as pointed by the reviewer in Point 6, high GAMT levels were reported for astrocytes and oligodendrites (Schmidt et al. 2004; Rosko et al. 2023). As reported, other neuromodulators such as taurine can be released from astrocytes (Philibert, Rogers, and Dutton 1989) or slices (Saransaari and Oja 2006) in Ca2+ independent manner. In addition, in the absence of potassium stimulation, Ca2+ depletion lead to increased release of taurine in cultured astrocytes (Takuma et al. 1996) or in striatum in vivo (Molchanova, Oja, and Saransaari 2005). Similarly, in SLC6A8 KO slices, Ca2+ depletion (Figure 5G) also increased creatine baseline levels as compared to that in normal ACSF (Figure 5D). Another possibility was that Ca2+-independent Cr release might occurs in neurons lacking SLC6a8 expression.

      As mentioned in the paper, data shown in Figure 5D was obtained in the presence Ca2+. Reduction of Ca2+-dependent Cr release evoked by potassium in SLC6A8-/Y (Figure 5G) may be due to decreased Cr baseline levels in the presence of Ca2+ and reduced Cr in synaptic vesicles (Figure 5D).

      4) Cr levels are strongly reduced in Agat-/- (Figure 6B). However, KCl-induced Cr release persists after loss of AGAT (Figure 6B). These data do not support that Cr release is Agat dependent.

      Although KCl-induced Cr release persisted in AGAT-/- mutants, it was dropped to 11.6% of WT mice (Figure 6B). AGAT is not directly involved in the release, but required for providing sufficient Cr.

      5) The authors show that Cr application decreases excitability in ~1/3 of the tested neurons (Figure 7). How were responders and non-responders defined? What justifies this classification? The data for all Cr-treated cells should be pooled. Are there indeed two distributions (responders/non-responders)? Running statistics on pre-selected groups (Figure 7H-J) is meaningless. Given that the effects could be seen 2-8 minutes after Cr application - at what time points were the data shown in Figure 7E-J collected? Is the Cr group shown in Figure 7F significantly different from the control group/wash?

      The responders were defined by three criteria: (1) When Cr was applied, the rheobase was increased as compared to both control and wash conditions. (2) The number of total evoked spikes was decreased during Cr application than both control and wash. (3) The number of total evoked spikes was decreased at least by 10% than control or wash.

      For all the individual responders, when Cr was applied, the rheobase was increased (Figure 7E and 7F). While in individual non-responders, the rheobase was either identical to both control and wash (n=19/35), identical to either control or wash (n=11/35), between control and wash (n=2/35) or smaller than both control and wash (n=3/35) following Cr application. Thus, the responders and non-responders were separatable. When the rheobase data were pulled together, many points were overlapped, so we did not pull the data here.

      As suggested, we pulled the data of the ratio of spike changes in response to 100 μM Cr application for all neurons together (Author response image 4). Evoked spikes of non-responders were typically (34/35) changed in the range of -10% to 10%.

      Author response image 4.

      Relative changes of total evoked spikes in response to 100 μM Cr. Responders are represented by red dots and non-responders by black dots. Dashed black line indicates 10%. Relative change = (Cr-(Control +wash)/2)/((Control +wash)/2)*100%.

      In Figure 7E-J, we collected data at time points when the maximal response was reached. The Cr group shown in Figure 7F was indeed significantly different from the control group/wash (p<0.05, paired t test, for data points collected under 75-500 pA current injection).

      6) Indirect effects: The phenotypes could be partially caused by indirect effects of perturbing the Cr/PCr/CK system, which is known to play essential roles in ATP regeneration, Ca2+ homeostasis, neurotransmission, intracellular signaling systems, axonal and dendritic transport... Similarly, high GAMT levels were reported for astrocytes (e.g., Schmidt et al. 2004; doi: 10.1093/hmg/ddh112), and changes in astrocytic Cr may underlie the phenotypes. Cr has been also reported to be an osmolyte: a hyperosmotic shock of astrocytes induced an increase in Cr uptake, suggesting that Cr can work as a compensatory osmolyte (Alfieri et al. 2006; doi: 10.1113/jphysiol.2006.115006). Potential indirect effects are also consistent with a trend towards decreased KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C). These indirect effects may in part explain the phenotypes seen after perturbing Agat, SLC6A8, and should be thoroughly discussed.

      We discussed the possibility of creatine/phosphocreatine as non-transmitters in discussion part. We added the possibility of astrocytic Cr in discussion part. KCl-induced GABA (and Glutamate) release in SLC6A8-/Y (Figure 5C) was not significant.

      7) As stated by the authors, there is some evidence that Cr may act as a co-transmitter for GABAA receptors (although only at high concentrations). Would a GABAA blocker decrease the fraction of cells with decreased excitability after Cr exposure?

      We performed another experiment in CA1 pyramidal neurons in hippocampus showing that Cr at 100 μM did not change GABAergic neurotransmission (n=8, Author response image 5). Inhibitory postsynaptic currents (IPSCs) recorded in the presence of glutamate receptor blockers (10 μM APV and 10 μM CNQX) were not changed by 100 μM creatine in hippocampal CA1 pyramidal neurons (Bgroup data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration). These did not support Cr activation of GABAA receptors.

      Author response image 5.

      IPSCs recorded in in hippocampal CA1 pyramidal neurons. (A) representative raw traces before (Control), during (Creatine) and after (Wash) the application of 100 μM creatine. (B&C) group data of IPSC frequency (B) and amplitude (C) averaged in 1 min duration.

      8) The statement "Our results have also satisfied the criteria of Purves et al. 67,68, because the presence of postsynaptic receptors can be inferred by postsynaptic responses." (l.568) is not supported by the data and should be removed.

      We have deleted this sentence, though what could mediate postsynaptic responses other than receptors?

      Reviewer #3 (Public Review):

      SUMMARY:

      The manuscript by Bian et al. promotes the idea that creatine is a new neurotransmitter. The authors conduct an impressive combination of mass spectrometry (Fig. 1), genetics (Figs. 2, 3, 6), biochemistry (Figs. 2, 3, 8), immunostaining (Fig. 4), electrophysiology (Figs. 5, 6, 7), and EM (Fig. 8) in order to offer support for the hypothesis that creatine is a CNS neurotransmitter.

      We thank the reviewer for the summary.

      STRENGTHS:

      There are many strengths to this study.

      • The combinatorial approach is a strength. There is no shortage of data in this study.

      • The careful consideration of specific criteria that creatine would need to meet in order to be considered a neurotransmitter is a strength.

      • The comparison studies that the authors have done in parallel with classical neurotransmitters are helpful.

      • Demonstration that creatine has inhibitory effects is another strength.

      • The new genetic mutations for Slc6a8 and AGAT are strengths and potentially incredibly helpful for downstream work.

      WEAKNESSES:

      • Some data are indirect. Even though Slc6a8 and AGAT are helpful sentinels for the presence of creatine, they are not creatine themselves. Therefore, the conclusions that are drawn should be circumspect.

      SLC6A8 and AGAT mutants are not essential for Cr’s role as a neurotransmitter.

      • Regarding Slc6a8, it seems to work only as a reuptake transporter - not as a transporter into SVs. Therefore, we do not know what the transporter is.

      Indeed, SLC6A8 is only a transporter on the cytoplasmic membrane, not a transporter on synaptic vesicles. We have shown biochemistry here, and we have unpublished data that showed other SLCs on SVs, which did not include SLC6A8.

      • Puzzlingly, Slc6a8 and AGAT are in different cells, setting up the complicated model that creatine is created in one cell type and then processed as a neurotransmitter in another.

      • No candidate receptor for creatine has been identified postsynaptically.

      • Because no candidate receptor has been identified, is it possible that creatine is exerting its effects indirectly through other inhibitory receptors (e.g., GABAergic Rs)?

      As shown in our response to Question 7 of Reviewer 2, Cr did not exert its effects through inhibitory GABAA receptors.

      • More broadly, what are the other possibilities for roles of creatine that would explain these observations other than it being a neurotransmitter? Could it simply be a modifier that exists in the SVs (lots of molecules exist in SVs)?

      We discussed the possibility of a non-transmitter role for creatine/phosphocreatine in discussion part.

      • The biochemical studies are helpful in terms of comparing relevant molecules (e.g., Figs. 8 and S1), but the images of the westerns are all so fuzzy that there are questions about processing and the accuracy of the quantification.

      Multiple members (>4) have carried out SV purifications repeatedly over the last decade in our group, we are highly confident of SV purifications presented in Figs. 8 and S1.

      There are several criteria that define a neurotransmitter. The authors nicely delineated many criteria in their discussion, but it is worth it for readers to do the same with their own understanding of the data.

      By this reviewer's understanding (and the Purves' textbook definition) a neurotransmitter: 1) must be present within the presynaptic neuron and stored in vesicles; 2) must be released by depolarization of the presynaptic terminal; 3) must require Ca2+ influx upon depolarization prior to release; 4) must bind specific receptors present on the postsynaptic cell; 5) exogenous transmitter can mimic presynaptic release; 6) there exists a mechanism of removal of the neurotransmitter from the synaptic cleft.

      6 criteria seem to be only required by the reviewer. As discussed in our Discussion part, Purves’ textbook did not list 6 criteria but only three criteria, “the substance must be present within the presynaptic neuron; the substance must be released in response to presynaptic depolarization, and the release must be Ca2+ dependent; specific receptors for the substance be present on the postsynaptic cell” (Purves et al., 2001, 2016).

      Kandel et al. (2013, 2021) listed 4 criteria for a neurotransmitter: “it is synthesized in the presynaptic neuron; it is present within vesicles and is released in amounts sufficient to exert a defined action on the postsynaptic neuron or effector organ; when administered exogenously in reasonable concentrations it mimics the action of the endogenous transmitter; a specific mechanism usually exists for removing the substance from the synaptic cleft”.

      While we agree that any neuroscientist can have his/her own criteria, it is more reasonable to accept the textbooks that have been widely read for decades.

      For a paper to claim that the work has identified a new neurotransmitter, several of these criteria would be met - and the paper would acknowledge in the discussion which ones have not been met. For this particular paper, this reviewer finds that condition 1 is clearly met.

      Conditions 2 and 3 seem to be met by electrophysiology, but there are caveats here. High KCl stimulation is a blunt instrument that will depolarize absolutely everything in the prep all at once and could result in any number of non-specific biological reactions as a result of K+ rushing into all neurons in the prep. Moreover, the results in 0 Ca2+ are puzzling. For creatine (and for the other neurotransmitters), why is there such a massive uptick in release, even when the extracellular saline is devoid of calcium?

      To avoid the disadvantage of high KCl stimulation, we performed optogenetic experiments recently, with encouraging preliminary data. We do not know the source of Ca2+-independent release of Cr and neurotransmitters, though astrocytes are a possibility.

      Condition 4 is not discussed in detail at all. In the discussion, the authors elide the criterion of receptors specified by Purves by inferring that the existence of postsynaptic responses implies the existence of receptors. True, but does it specifically imply the existence of creatinergic receptors? This reviewer does not think that is necessarily the case. The authors should be appropriately circumspect and consider other modes of inhibition that are induced by activation or potentiation of other receptors (e.g., GABAergic or glycinergic).

      Our results did not support Cr stimulation of inhibitory GABAA receptors (see our answer to Point 7 in of Reviewer 2).

      Condition 5 may be met, because the authors applied exogenous creatine and observed inhibition (Fig. 7). However, this is tough to know without understanding the effects of endogenous release of creatine. if they were to test if the absence of creatine caused excess excitation (at putative creatinergic synapses), then that would be supportive of the same.

      After the submission of our manuscript, we found a recent paper showing that slc6a8 knockout led to increased excitation in pyramidal neurons in the prefrontal cortex (PFC), with increased firing frequency (Ghirardini et al., 2023). Because we have shown that slc6a8 knockout would cause decrease of Cr in SVs (Figure 2 in our paper), this result provide the evidence described as Condition 5 of this reviewer: that decrease of Cr in SVs led to excess excitation.

      For condition 6, the authors made a great effort with Slc6a8. This is a very tough criterion to understand for many synapses and neurotransmitters.

      In terms of fundamental neuroscience, the story would be impactful if proven correct. There are certainly more neurotransmitters out there than currently identified.

      The impact as framed by the authors in the abstract and introduction for intellectual disability is uncertain (forming a "new basis for ID pathogenesis") and it seems quite speculative beyond the data in this paper.

      We deleted this sentence.

      Reviewer #1 (Recommendations For The Authors):

      To strengthen the manuscript, I suggest the following considerations:

      1) The key missing evidence to my mind is a receptor - but this is clearly outside the scope of this paper. Yet, I am surprised that in the list of criteria for neurotransmitters in general there is no mention of a receptor. Furthermore, many receptors have been identified through receptor agonists or antagonists, like neurotoxins or drugs. The authors do not talk about putative receptors except for a sentence in the discussion where they speculate on a GPCR. There are numerous GPCR agonists and antagonists, which may be a long-shot, or something even a bit more designed based on knowledge about creatine? I do not think the publication of this manuscript should have been made dependent on finding an agonist or antagonist of this specific unknown receptor (if it exists), but it would be good to have at least some leads on this from the authors what has been tried or what could be done? How about a manipulation of G-protein-coupled signal transduction to support the idea that there IS such a GPCR? There may be a real opportunity here to test existing compounds in wild type, the slc6a8 and agat mutants.

      We will keep trying, but accept the reality that Rome was not built in a single day and that no transmitter was proven by one single paper.

      A key new puzzle piece of evidence is the identification of creatine in synaptic vesicles. The experiment relies heavily on the purity of the SV fraction using the anti-synaptophysin antibody. I am quite sure that these preparations contain many other compartments - and of course a big mix of synaptic (and other) vesicles. Would it be possible to purify with an anti slc6a8 antibody?

      Sl6a8 is expressed in on the plasma membrane of neurons7-9, instead of synaptic vesicles. Consistent with this, we could not detect obvious Slc6a8-HA signal in our starting material (Lane S in Author response image 6) that was used for SV purification. We have tried to purify SVs by HA antibody in Slc6a8 mice and SV markers could not be detected.

      Author response image 6.

      Lack of Slc6a8-HA in our starting material. In Slc6a8-HA knock-in mice, the HA signal was present in whole brain homogenate (H), but not obvious in supernatants (S) following 35000 × centrifugation. In contrast, SV marker Syp was present in supernatants.

      The K stimulation protocol in slices is relatively crude, as all neurons in the slice get simultaneously overactivated - and some of the effects on Ca-dependent release are not very strong (e.g. the 35 neurons that were not responsive to creatine at all). A primary neuronal culture of neurons that respond to creatine would strengthen this section.

      To avoid the disadvantage of K stimulation, we also performed optogenetic experiments recently and obtained encouraging preliminary results.

      Reviewer #2 (Recommendations For The Authors):

      1) The different sections of the manuscript are not separated by headers.

      2) The beginning of the results section either does not reference the underlying literature or refers to unpublished data.

      We have kept a bit background in the beginning of the Results section.

      3) The text contains many opinions and historical information that are not required (e.g., "It has never been easy to discover a new neurotransmitter, especially one in the central nervous system (CNS). We have been searching for new neurotransmitters for 12 years."; l. 17).

      This is a field that has been dormant for decades and such background introductions are helpful for at least some readers.

      4) Almeida et al. (2008; doi: 10.1002/syn.20280) provided evidence for electrical activity-, and Ca2+-dependent Cr release from rat brain slices. This paper should be introduced in the introduction.

      Those were stand-alone papers which have not been reproduced or paid attention to. Our introduction part did not mention them because our research did not begin with those papers. We had no idea that those papers existed when we began. We started with SV purification and only read those papers afterwards. Thus, they were not necessary background to our paper but can be discussed after we discovered Cr in SVs.

      5) Fig. 7: A Y-scale for the stimulation protocol is missing.

      Revised.

      Reviewer #3 (Recommendations For The Authors):

      The main suggestion by this reviewer (beyond the details in the public review) is to consider the full spectrum of biology that is consistent with these results. By my reading, creatine could be a neurotransmitter, but other possibilities also exist, and the authors need to highlight those too.

      We have discussed non-transmitter role in the discussion.

      References

      Ghirardini, E., G. Sagona, A. Marquez-Galera, F. Calugi, C. M. Navarron, F. Cacciante, S. Chen, F. Di Vetta, L. Dada, R. Mazziotti, L. Lupori, E. Putignano, P. Baldi, J. P. Lopez-Atalaya, T. Pizzorusso, and L. Baroncelli. 2023. Cell-specific vulnerability to metabolic failure: the crucial role of parvalbumin expressing neurons in creatine transporter deficiency. Acta Neuropathol Commun, 11: 34. doi: 10.1186/s40478-023-01533-w.

      Lowe, M. T., Faull, R. L., Christie, D. L. & Waldvogel, H. J. Distribution of the creatine transporter throughout the human brain reveals a spectrum of creatine transporter immunoreactivity. J Comp Neurol 523, 699-725 (2015). https://doi.org:10.1002/cne.23667

      Mak, C. S. et al. Immunohistochemical localisation of the creatine transporter in the rat brain. Neuroscience 163, 571-585 (2009). https://doi.org:10.1016/j.neuroscience.2009.06.065.

      Molchanova, S. M., Oja, S. S. & Saransaari, P. Mechanisms of enhanced taurine release under Ca2+ depletion. Neurochem Int 47, 343-349 (2005). https://doi.org:10.1016/j.neuint.2005.04.027

      Philibert, R. A., Rogers, K. L. & Dutton, G. R. K+-evoked taurine efflux from cerebellar astrocytes: on the roles of Ca2+ and Na+. Neurochem Res 14, 43-48 (1989). https://doi.org:10.1007/BF00969756

      Rosko, L. M. et al. Cerebral Creatine Deficiency Affects the Timing of Oligodendrocyte Myelination. J Neurosci 43, 1143-1153 (2023). https://doi.org:10.1523/JNEUROSCI.2120-21.2022

      Saransaari, P. & Oja, S. S. Characteristics of taurine release in slices from adult and developing mouse brain stem. Amino Acids 31, 35-43 (2006). https://doi.org:10.1007/s00726-006-0290-5

      Schmidt, A. et al. Severely altered guanidino compound levels, disturbed body weight homeostasis and impaired fertility in a mouse model of guanidinoacetate N-methyltransferase (GAMT) deficiency. Hum Mol Genet 13, 905-921 (2004). https://doi.org:10.1093/hmg/ddh112

      Speer, O. et al. Creatine transporters: a reappraisal. Mol Cell Biochem 256-257, 407-424 (2004). https://doi.org:10.1023/b:mcbi.0000009886.98508.e7

      Takuma, K. et al. Ca2+ depletion facilitates taurine release in cultured rat astrocytes. Jpn J Pharmacol 72, 75-78 (1996). https://doi.org:10.1254/jjp.72.75

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their remarks which significantly improved the paper. Following these remarks we completed the analysis and validation of our cryo-EM data and peformed several biochemical tests to support our conclusions, lending credbility to the paper. Please find our detailed answers bellow each recommendation of the reviewers.

      Major recommendations

      1) Errors and omissions in the presentation make the manuscript difficult to access.

      a) The text should be edited for grammatical errors more carefully

      • We corrected the grammatical errors.

      b) Figures should be labeled to allow the reader to follow the logic of the presentation and identify the features being discussed. Identification through the color coding (the identity of the histones, the location of zinc fingers, the active site, and so on) would be helpful.

      • We labeled the Rossman fold and Zn-finger domains in Figure 1 and described the histone color codes. The active site of SIRT6 is depicted in Figure 4.

      2) The recent publications from the Farnung/Cole and Peterson/Tan/Armache labs need to be cited and the results from Smirnova et al. compared and contrasted with those publications explicitly.

      • We added the following paragraph to the discussion section:<br /> “While this manuscript was under review two studies describing the structure of SIRT6-NCP appeared in press (Wang et al., 2023 ; Chio et al., 2023). The conclusion of these papers regarding the position of SIRT6 on the nucleosome and the unwinding of DNA by the enzyme are similar to our findings. We however dissected in addition the movements of SIRT6 on the nucleosome and analyzed via molecular dynamics the conformations of the H3 tail with respect to the SIRT6 active site. Our results point to the importance of the flexibility between the globular domains of SIRT6 and also explain how SIRT6 can access lysines that are much closer to the histone core than H3K9.”

      a) Notably, the Peterson/Tan/Armache labs suggest that H3K27 cannot be deacetylated by SIRT6 whereas the Farnung/Cole labs show deacetylation of H3K27 by SIRT6. Do the results of the Smirnova et al. structure help to resolve this situation?

      • We performed deacetylation tests of H3K27Ac nucleosomes and show that SIRT6 deacetylate H3K27Ac albeit at somewhat lower efficiency than H3K9Ac. Our molecular dynamics simulations explain how H3K27, which is close to the histone core, can still be reached by SIRT6 active site. We added the following text to the paper: “To lend support to this claim we tested whether SIRT6 can deacetylate residue H3K27 that was first acetylated by SAGA (Supplemental Fig. 7c). We find that indeed SIRT6 could efficiently deactylate H3K27Ac, although at a somewhat slower rate than H3K9Ac. We conclude that partial DNA unwrapping by SIRT6 allows H3-tail conformations that make lysines that are close to the core of H3 accessible to the enzyme.”

      b) The Farnung/Cole labs have visualized an intermediate state of deacetylation. How does this compare to the structure presented in this manuscript? Addressing these points would facilitate further research and discussion in the community.

      • We believe the resolution of the SIRT6 Rossmann fold precludes addressing these points.

      c) Can the authors exclude the possibility that the additional density observed in Supplemental Figure 6 is not coming from the H3 tail, as observed in the two other structures?

      • One density is the continuation of the H2A histone tail. We strongly believe that this density corresponds to this tail. The other density indeed can originate from the H3 tail. Therefore, we didn’t model anything inside it.

      d) It would be useful to comment on how much flexibility has been observed in the other structures for the SIRT6 interaction with the acidic patch, and also how other acidic-patch binding proteins compare with the results here.

      • We refrain from estimating the flexibility observed in the other structures as no such analysis is provided by these papers. Regarding the interaction with the acidic patch we mention that R175 packs against H2B L103 and serves as a classical “arginine anchor motif” and refer the reader to a review on the topic.

      e) Does the presence or absence of NAD+ affect the comparisons among the structures?

      • NAD+ binding might affect the fine structure of the active site although NAD+ was not observed in crystal stuctures of SIRT6 in its presence. The resolution of this part precludes further addressing this issue.

      3) The lack of biochemical validation of conclusions should be acknowledged and the reasoning behind this choice discussed.

      • We added experiments to validate our conclusions with biochemical tests. We produced nucleosomes with acetylatexd histone H3 by employing purified SAGA acetyltransferase complex. We isolated SIRT6 where the four residues implicated in interactions with the acidic patch are mutated to alanines (SIRT6-4A). We show that this mutant has very weak interaction with the nucleosome and much lower H3K9Ac deacetylation activity than WT. Similarly SIRT6-3A with mutations in the residues we suggest involved in binding to nucleosomal DNA also shows weak activity and binding to the nucleosome. We added Supplement Figure 7 that depicts the results of these experiments and embedded reference to these results in the approporiate sections of the text. Furthermore, we also show that SIRT6 is active in deacetylating H3K27Ac. This supports our molecular dynamics simulations showing that when SIRT6 binds the nucleosome, H3 tail can assume conformations where H3K27 is accessible by the enzyme’s active site. These results also appear in Supplement Figure 7.

      4) The authors nicely analyze and discuss the conformational flexibility of SIRT6 binding. This is an interesting finding, but Fig. 2 does not adequately convey this flexibility.

      • We now considerably improved Figure 2. We added panels c and f which depict clearly the movements we observe.

      5) The authors need to explain why two cryo-EM datasets were collected but were not merged, and the labeling of the datasets in the Supplemental Table appear to be switched.

      • The two datasets were collected with two very different pixel spacing therefore merging the two was possible only in Relion. This process, however, did not improve the resolution of the SIRT6’s Rossmann fold domain. We thank the reviewer to notice the discrepancy in the text and the Supplemental Table 1, it was corrected.

      6) Supplemental Figure 4 should be expanded to show additional representative densities with the respective fit of the model. This will allow the reader to better judge the quality of the data. At least the acidic patch interaction, the DNA-SIRT6 interactions, and the H2A should be shown in this context.

      • To illustrate the high-resolution features of the structure as well as the key regions we added Supplemental Figure 4.

      7) Standard elements of data analysis and validation should be included (angular distribution plots for cryo-EM reconstructions, a 3D FSC sphericity plot, a Q-score and EMRinger score for the cryo-EM data and atomic model, a model-to-map FSC curve). In general, model building is poorly described as it is unclear which maps (or to what degree different maps) were used for this process. This should be clarified in the methods section and in the Supplemental Table 1.

      • The model validation and data analysis details were added to Supplemental Figures 2 and 3 as well as in Supplemental Table 1.

      8) The provided maps also do not fully recapitulate the path of the H2A tail. The various density maps and PDB provided for this review do not support the final modeled residues of H2A between residues #118/119-123. This affects the validity of figure 3E and the discussion of the proximity of the potential substrates to the active site. The authors should clarify how they inferred that this is the H2A tail rather than the loosely bound SIRT6 Nterminal loop (whose stability could be altered by the presence or absence of NAD+) as suggested by overlaying the relevant crystal structures.

      • We added a panel to Supplemental Figure 4 (d) depicting the density where the H2A tail was modelled.

      9) The authors should explain how the data produced an asymmetrically oriented complex with a single SIRT6 molecule bound to one face. Were complexes with two SIRT6 molecules excluded? Is supplementary figure 4A the basis for the orientation and is this sufficient for this purpose?

      • Complexes with two SIRT6 molecules were present but only at around 1.5 percent of the whole dataset. These images were excluded from the refinement (shown in Supplementary Figure 2). The DNA orientation is depicted in Supplementary Figure 5A. The resolution obtained at the dyad (~2.5Å) allowed us to distinguish purine and pyrimidine bases. The Widom 601 sequence is asymmetric and the densities clearly show that there is only one orientation of the DNA observed with respect to SIRT6.

      10) The authors should clarify how supplemental figure 4B supports the conclusion that DNA is unwrapped. The density is not readily visible and docking of a simple DNA model in the ZN-focused map does not clearly rule out the possibility that this density comes from the H3 N-terminal tail.

      • We added to this figure the cryo-EM densities used to model the DNA path and the orientation of SIRT6. This image is now Supplemental Figure 5c.

      Minor recommendations

      1) The scale bar is missing for the 2D classes shown in Supplemental Figure 2.

      • We added the scale bar to the image depicting the 2D classes in Supplemental Figure 2.

      2) Masked classifications should be shown in the classification tree (Supplemental Figure 2 +3) with the masks shown as a transparent volume.

      • We now show the mask used for the 3D classifications of the SIRT6’s Rossman fold domain in Supplemental Figure 2.

      3) Supplemental Figure 3 should show the indicated 3D classifications in the classification tree.

      • We added the 3D classifications in Supplemental Figure 3.

      4) The authors should consider applying local CTF refinement and particle polishing to improve their resolution.

      • We did local and global CTF refinements. Polishing didn’t improve the resolution as movie frame alignment was done outside of Relion.

      5) The descriptions of the Widome 601 sequence orientation should be less ambiguous, perhaps mentioning the AT-rich and AT-poor arms instead of left and right arms.

      • We corrected the text as required.

      6) The statement "Such a large change in DNA trajectory is reminiscent of the chromatin-remodeler ATPases or pioneer transcription factors binding to nucleosome but was not observed in other histone modifiers" requires a citation.

      • We added approporiate references.

      7) The authors should provide a supplemental figure of the nucleosome-SIRT6 and PRC1-nucleosome structure comparison to complement the discussion section.

      • We refer the reader to the paper describing the PRC1-nucleosome structure.
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) Here are a few sentences that could potentially benefit from further discussion, particularly in the context of the plant developmental framework of an effective germline. It is important to note that the idea of an effective germline is supported by many, but not all, scientists. Nevertheless, as long as this concept remains relevant, a discussion based on it may be appropriate.

      The early establishment of germlines during development is crucial in addressing the impact of somatic mutation on the next generation. To emphasize this aspect, we have included an additional sentence addressing this point in ll. 242–244.

      2) Lines 161-163: The suggestion that long-lived tropical trees do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts might warrant additional examination.

      We have revised our statement to present a more balanced perspective, and we have also included a sentence to emphasize the importance of conducting further studies in future.

      3) Lines 200-202: The observation of potential influences of GC-biased gene conversion during meiosis or biased purifying selection for C>T inter-individual nucleotide substitutions could be further elaborated upon.

      Our data does not provide enough information to delve into a more detailed discussion regarding GC-biased gene conversion during meiosis or biased purifying selection for C>T substitution. However, future studies that obtain genome sequences from somatic cells, male or female gametophytes, and offspring (such as seeds or seedlings) would offer opportunities to assess these phenomena.

      4) Line 245: The statement "somatic mutations can be transmitted to seeds" might be correct, but it would be helpful to explore the extent to which this occurs.

      In response to the comment from Reviewer 1 (#4) and 2 (#16), we have decided to remove the discussion about the heritability of somatic mutations in next generation. We have completely rewritten the final paragraph to discuss the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals.

      Reviewer #2

      5) l. 108- 115: The authors seem to have made a really great work at assembling and annotating two reference genomes. Even if this does not represent the main result of the manuscript, these genomic resources are a plus for the community, especially given that reference genomes from tropical trees are known to be underrepresented in the literature (e.g. Plomion et al. 2016). The authors have made the particular effort of generating two high-quality reference genome assemblies for two species of the same genus, including one with an excellent contiguity. Even if they do not explicitly indicate the divergence time between the two species, it is clear that the cheapest solution would have been to map the reads of the two species against a single assembly, but this could have generated some biases. So by generating two de novo assemblies, the authors have used here the best design possible to control for some potential biases for the detection of somatic mutations. However, given the interests these two assemblies represent by themselves, I consider that a couple of additional investigations could have been made on local synteny and orthologous genes in particular. Thanks to whole-genome alignments and orthology (e.g. Lovell et al. 2022), they could have generated more general information regarding the two assembles and investigated additional questions regarding mutations, e.g. mutations in collinear / non-collinear (if any) segments, intensity of purifying selection (or neutral evolution) at single vs. multiple copies or between shared vs. private genes, etc.

      To address the comment by Reviewer 2, we performed synteny analysis using the MCScanX in TBtools-II and added Supplementary Figure 3 to illustrate conserved synteny relationship between S. laevis and S. leprosula. Detecting selection in the genome will be a future study as our current data are not sufficient for the aim because of limited number of individuals (n = 2 for each species).

      6) l. 123-124. Here, the authors indicate that they have "validated" 93.9% of the mutations. It would be more accurate to indicate that they have "validated" 31/33 mutations (94%), 22/24 mutations on S1 and 9/9 on S2 (Table S5). Can the authors indicate why no somatic mutations from the F1 and F2 were tested? According to me, the use of the word "validation" is not totally accurate (see also Schmitt et al. 2022), since amplicon sequencing can be viewed as a kind of validation but it doesn't represent a complete validation since it represents new sequencing data that are mapped against the same reference assembly, in such a way that we could always imagine that the same biases are at play, leading to a similarly false positive call. Reciprocally, a "non-validated" mutation could be associated to a mutation that is at a too low allele frequency, at least after amplification, in such a way that the call is not heterozygous despite the fact that the mutation is real. I think that another terminology than "validated" could be used, plus one or two sentences explaining this degree of complexity.

      To improve the clarity of the statement, we have modified the sentence as follows: We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary Table 5).”

      While we did not conduct additional assessments using F1 and F2, we anticipate a similar high level of agreement between the somatic SNV calls and amplicon sequencing in these trees. We have included sentences in the Materials and Methods section to elucidate the challenges involved in validating true somatic mutations.

      7) l. 135-137 the reasoning appears to be quite circular to me. As indicated by the authors in the line just before, an incongruent pattern could also be explained biologically, in such a way that the overall congruency between the phylogenetic tree and the tree architecture cannot be considered as a way to prove the reliability of the detection. In some species, it seems clear that the phylogenetic tree do not seem to follow the plant architecture (Zahradnikova et al. 2020) in such a way that we should argue to not consider the plant architecture in the design and not consider this represents either a way to validate mutations or a way to validate the methodological framework. I suggest removing this sentence.

      We have removed the sentence as suggested by Reviewer 2.

      8) l. 150. It seems that the differences in length and diameter between the two species come from two different studies and therefore that no statistical test has been performed to test its significance.

      We agree with Reviewer 2. To clarify this point, we have replaced “significantly” with “substantially” in the revised text.

      9) l. 156-159: the same sentence is repeated twice.

      We have removed the repeated sentence.

      10) l. 159-161: Comparing somatic mutation rates between studies is difficult. It is too sensitive to the methodology used, here again see Schmitt et al. 2022. I propose to remove these two sentences. It represents an interesting working hypothesis but would require a better design, or at least, to reanalyze all the data with the same pipeline.

      We have toned down our statement, and added a sentence that additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

      11) l. 171-175: Here I am wondering if the authors could provide more information regarding the enrichment at CpG sites? I suggest first estimating the proportion of CpG sites thanks to the two genome assemblies and then using this information as a way to weight the results and therefore to estimate the level of enrichment of mutations at CpG sites.

      In response to the comment by Reviewer 2, we first determined the proportion of CpG sites as 0.030 and 0.028 for S. laevis and S. leprosula, respectively, based on the triplet matrix using the reference genome of each species. Subsequently, we estimated the proportion of somatic mutations at CpG sites. The results revealed a 4.54-fold and 3.53-fold increase in somatic mutations at CpG sites for S1 and S2, and a 3.38-fold and 2.56-fold increase for F1 and F2, respectively. We have incorporated this finding into ll. 172–175.

      12) l. 176-187. Interesting comparison and insights. You could also indicate that SBS5 is also detected in all human cancers too. So the detection of SBS1 and SBS5 signatures indeed suggest some shared mutation biases. Note that in humans, a specific signature of UV is associated to TCG -> TTG mutations (Martincorena & Campbell, 2015). It seems that there is a substantial difference in the mutation spectra between the two trees for this specific category, note sure if this difference could be associated to UV.

      We slightly modified the sentence to indicate that SBS5 is also detected in all human cancers. We are very interested in the potential impact of UV on somatic mutations in tropical trees, considering the high levels of UVR in the tropics. Conducting a comparative analysis of the mutational spectrum among trees inhabiting diverse UVR environments would provide valuable insights to substantiate this hypothesis.

      13) l. 206: I rather suggest "the somatic mutation rate per year is roughly the same, suggesting that somatic mutations rates are independent of growth rate".

      In response to the suggestion from Reviewer 2, we have revised the sentence as follows: "The somatic mutation rate per year remains largely consistent, indicating that somatic mutation rates are independent of the growth rate."

      14) l. 207-232: Here, It is the section looks a mixture between a result and a discussion. I guess the authors consider here that it remains a verbal model at this stage and it therefore represents more a discussion. If so, I agree but it could be good to discuss more this part, in particular to know how this model could be improved and empirically tested.

      The argument based on the model will be more accurate when the cell cycle duration can be directly estimated for each tree. We have added this explanation in the revised text.

      15) l. 238-239: The parallel drawn with the molecular clock is interesting but according to me, it remains a working hypothesis at this stage, since it is not validated outside the two focal species. I encourage the readers to continue to work on this question and to investigate also some annual plants for instance in the future (assuming that they have a higher α) in order to be able to derive a global model. In addition, even if I consider that the authors use and interpret this parallel wisely, I consider that the use of this terminology could be misleading for some readers. That's why I also suggest removing "molecular clock" from the title and using a more explicit one, e.g. "Somatic mutation rates scale with time not growth rate in dipterocarp trees".

      We agree with Reviewer 2. We have changed the title to “Somatic mutation rates scale with time not growth rate in long-lived tropical trees.”

      16) l. 245-249: The results rather suggest that (i) there is little diversity due to somatic mutations and that (ii) most heritable non-synonymous mutations are deleterious and therefore purged from the population. So rather than this last section of this discussion that has little interest and could be quite debatable, I consider that the authors could extend their discussion, e.g. the differences with somatic mutations in mammals (recently, Cagan and coauthors (2022) demonstrated that somatic mutation rates are inversely correlated with lifespan in mammals) or the overall low rate of molecular evolution in trees could be some directions. But there are many others.

      We have completely rewritten the final paragraph to propose the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals, rather than discussing the heritability of somatic mutation in next generation.

      17) l. 570-571: I guess, the reader should understand here "fixed at the heterozygous state"

      To avoid confusion, we have modified the text as follows: “If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed within an individual tree.” We have also removed “heterozygote” in Supplementary Figure 5.

      18) Fig. 4d. the y-axis would be easier to interpret by writing "Delta Inter-individual vs. Somatic SNPs" and/or by adding arrows on the right margin of the plot to indicate the directions with some short sentences such as "more somatic mutations observed than expected assuming the inter-individual comparison", "less somatic mutation than expected". According to me, some statistical tests are lacking here. Are the differences in the mutation spectra significant given the relatively limited amount of somatic mutations detected?

      We have added short sentences explaining the directions.

      19) Supplementary Tables (excel file): please correct the typos. There are many on these supplementary tables.

      We carefully checked supplementary tables and corrected the typos.

      Reviewer #3

      20) To estimate false negative rates, the authors might consider using mutation insertion tools such as Bamsurgeon (https://github.com/adamewing/bamsurgeon) to create simulated mutations. Alternatively, one could assess the calling rate of high-confidence SNPs that differ between individuals of the same species to get at the FNR.

      We agree with Reviewer 3. To calibrate our pipeline, we previously performed simulation to estimate the false negative and positive rates in different tree species (Betula platyphylla) using wgsim v0.1.11 (https://github.com/lh3/wgsim). Based on our simulations, we found that the false negative and false positive rates were very low, averaging at 0.050 and 0.046, respectively. It is important to note that the estimated false positive rate obtained from the simulation data was substantially lower than the proportion of potential false positive SNVs (as shown in Supplementary Fig. 5). This observation suggests that simulation-based evaluation of the false positive rate is not reliable, at least for the tree species we studied. Similarly, the same argument could be applied to the false negative rate. Therefore, we conclude that the simulation-based analysis for estimating false positive and false negative rates is not informative for our study.

      The rate of true-positive or false-negative mutation calls can be estimated only when the true mutational status is known, but the data are not currently available. However, under the assumption that the final set of SNVs represents true somatic mutations, we were able to calculate the potential false negative rate. Our findings indicate that this rate is low, specifically less than 10%, when using less stringent filtering thresholds such as BQ20 and MQ20. While these estimated values may not precisely represent the true false negative rate, we included them as potential false negative rates in Supplementary Figure 7 of the revised manuscript. This information provides additional insights into the performance of our pipeline under different filtering thresholds and contributes to the overall assessment of our study.

      21) It may be interesting to examine the mutation trees for constancy (or not) in mutation rate per meter. Examining Figure 1, it appears that the number of mutations near the crown "4" node is consistently higher than in nearby nodes (3-1 and 3-2).

      We calculated the branch-level increment of SNVs per meter by dividing the number of single nucleotide variations (SNVs) by the physical distance. Our analysis revealed a slight increase in the number of SNVs per meter as the branch position became higher in S. laevis, as shown in Author response table 1. However, this trend was not clearly observed in S. leprosula. We found this observation in S. laevis intriguing, particularly because our recent analysis (Tomimoto et al., in preparation) demonstrated that genetic distance increases in branch pairs located in the upper part of a tree. This was elucidated through a mathematical model that describes the dynamics of the stem cell population during elongation and branching. We opted not to delve further into the findings in the current manuscript, as this topic will be extensively investigated in a future study.

      Author response table 1.

      The branch-level increment of SNVs per meter.

      22) Line 150: Use of "significantly different" is confusing as the phrase is usually reserved for statistical significance. Consider replacing with "substantially different."

      We have replaced “significantly” with “substantially” in the revised text.

      23) In the Discussion, a clearer explanation of the assumptions that underlie the authors' reasoning would be welcome: e.g., constancy in mutation rate per meter within an individual tree. In particular, the authors assume that mutations that are seen in one leaf and not in another cannot have predated the most recent common meristematic node linking the two leaves. Is this a reasonable assumption? Since the meristem is multicellular, is it possible for a mutation to have arisen earlier in development and "assorted" into one cell lineage but not another?

      We greatly appreciate an important comment. It is true that when the meristem is multicellular, and the stem cell lines are retained during mutation accumulation (e.g. a structured meristem analyzed in Tomimoto and Satake 2023), it is possible for a mutation to have arisen earlier before the bifurcation. Using a mathematical model, we have proved that the intercept and slope of the linear regression between the pairwise genetic distance and physical distance are influenced by the type of a meristem (strength of somatic genetic drift in a meristem) as well as the branching architecture of the tree. We have included an explanation of this point in the revised manuscript (ll. 244–249).

      24) Supplementary Data 7: Column J should be "2_2"

      We corrected the typo.

    1. Author Response

      We would like to express our gratitude to the Editors and Reviewers for their thoughtful and helpful comments. We sincerely appreciate the opportunity to submit our revised manuscript titled “Predicting Ventricular Tachycardia Circuits in Patients with Arrhythmogenic Right Ventricular Cardiomyopathy using Genotype-specific Heart Digital Twins” to eLife. We are delighted that our research in ARVC has garnered the interest of the three reviewers. Below, we provide our point-by-point responses to the reviewers’ comments. We have also incorporated the suggestions provided by the reviewers in our revised manuscript.

      Comments from Reviewer 1

      We thank Reviewer 1 for their positive assessment and thoughtful suggestions. Here are the responses to the comments of reviewer 1:

      Comment 1: One addition that could add more insight is to predict the effect of structural remodeling alone well, considering only normal electrophysiological models.

      We thank the reviewer to give this thoughtful suggestion to our experiment design. We would like to highlight that this suggestion was indeed taken into consideration in our study as all the patients’ hearts were modeled using the gene-elusive cell model before the structural-EP mismatch was implemented. The gene-elusive cell model is a baseline ten Tusscher (TT2) human ventricular model described in the “Cell-level modeling” of our Methods. Therefore, we have already examined the impact of structural remodeling alone in the study.

      Comment 2: Another interesting approach would be a sensitivity analysis, to determine how sensitive the VT circuits are to the specific geometry of the patient and remodeling that occurs during the disease, such an approach could also be used to determine how sensitive the outputs are to electrophysiological model inputs.

      We think this suggestion is of great value and could benefit our future ARVC studies. The reviewer pointed out the importance of investigating how sensitive the VT circuits are to the specific geometry/remodeling of the patient during disease progression. To achieve this, for each patient, a sequence of LGE-CMR images at different stages of this disease is required for model reconstruction; unfortunately, our cohort for this study does not incorporate such data.

      Comments from Reviewer 2

      We thank Reviewer 2 for the positive assessment, and here are the responses to the comments:

      Comment 1: I appreciate that the types of computational models detailed in this paper take enormous time to develop. However, to identify bottlenecks in the clinical workflow (and thus targets for future research), it may be nice for the authors to discuss the time taken to generate and run the models for each patient?

      We sincerely appreciate the valuable feedback from the reviewer. We recognize the importance of considering model generation and run time. In the introduction, we have highlighted the clinical challenge in managing ARVC ablation procedures, which is the inability to capture all the VT due to an incomplete understanding of VT mechanisms. We acknowledge the reviewer’s concern regarding the potential time taken by the model to predict VT circuits and whether this could hinder the integration into the current ablation procedure. However, it is important to clarify that our model is primarily based on clinical images obtained in advance of the procedure. As a result, there is sufficient time available to generate the results required for ablation planning.

      Comment 2: In the Materials and Methods section, some references are underlined? Is this a typo or meant to convey some particular information?

      We thank the reviewer for pointing this typo out and we have removed the underlining of references in our revised manuscript.

      Comment 3: The authors state that the cellular models are available from the CellML model repository. This is an excellent practice. However, the URL that is given points to the entire CellML website. It will be more useful for URLs that point to the specific models used in the study so that readers can be sure they are looking at the correct model.

      We appreciate the reviewer for this suggestion, and we have edited the URL in Data Availability to link to a specific cell model on the CellML website.

      Comment 4: In the abstract, the authors report the sensitivity, specificity, and accuracy of their computer models but fail to comment in the abstract that they are comparing against recordings from the patient during a previous EPS study. To assist further readers who are scanning the abstract, the authors may wish to add a sentence or two to detail what they are comparing their model results to.

      We thank the reviewer for the suggestion. This is a retrospective study. We recognize the importance of wording clarity in the abstract; in response, we have added a sentence in the abstract to clarify that we compared VT locations of Geno-DT with the ones recorded during clinical EPS to obtain sensitivity, specificity, and accuracy.

      Comment 5: In Table 1 some of the data is discrete e.g., the number of patients on a beta-blocker. The authors give a p-value for comparing the GE and PKP2 data and state in the caption that a Student's t-test has been used. Strictly speaking, a t-test is not really appropriate for the population proportion with non-parametric data. That said, the size (n) of the data here makes the p-values from any statistic very unreliable. Perhaps the authors might like to reconsider if p-values add anything to such data? If so, then the statistical test should be reconsidered.

      We truly appreciate the reviewer for pointing out this typo in the caption of Table 1. For the non-parametric discrete data, we used z-test, a common statistical method used to compare percentages, to get the p values, but we mistakenly only mentioned t-test in our caption. We acknowledge the limitation of our sample size and we have corrected this typo in our revision.

      Comment 6: I found Table 1 and its caption a little confusing. The authors put the range in [] brackets and then abbreviated standard deviation with () brackets. On initial reading, I incorrectly assumed that the numbers in the table in () brackets were standard deviations when, in fact, they are percentages. Perhaps the authors could consider changing the caption so that the percentage is in, say, {} brackets and make the caption say that values are given as n {%} etc.

      We appreciate the reviewer for pointing this out and we recognize that certain expression in the Table 1 caption is confusing. In our revised manuscript, we used n {%} to replace n (%) and deleted the abbreviated standard deviation which has not been used.

      Comment 7: In the caption for Figure 2 the authors present action potentials "at steady state". Adding the pacing frequency (or cycle length) for the steady state would be useful.

      We thank the reviewer for pointing this out. We agree that showing pacing frequency is important and we have made the edit in our revision.

      Comment 8: In Table 2 the VT locations are compared between the EPS and the Geno-DT model. The comparison metrics listed in the table should be better described in the table caption. It is unclear if the authors compare VT locations in the AHA segments or if the specific geometric location is used. If it is a geometric location, then I would have expected to see information on the mean error distance or similar information? If it is a comparison of AHA segments, there could be a problem if a VT location was very close to the border between segments. The predicted VT location might be very close to the measured VT location but may end up in a different segment? The authors may like to clarify the methodology and/or discuss these issues.

      We thank the reviewer for this comment. We recognize the need for clarification on the comparison metrics of Table 2. In the text related to Table 2, we used the wording “anatomical location” to avoid excessive repetition of mentioning AHA segments. However, we agree that reverting it back to the “AHA segment” will reduce confusion. Regarding the point of comparing exact locations the reviewer mentioned, in clinical settings, clinicians primarily rely on AHA segments to describe the VT locations during ablation and descriptions in the EP report, rather than using exact coordinates. As such, a match between our predicted AHA segments and clinical AHA segments is a direct comparison. This alignment provides a meaningful comparison and is sufficient for assisting ablation procedures.

      Comment 9: In Figure 7, activation maps are shown, and the row is labelled as Induced VTs/Geno-DT. Are the colour maps from the model or the EPS measurements? The last sentence of the caption indicates they are from the measurements, but such detailed full-wall maps seem to be from a model. The authors may like to clarify what the figure shows.

      We thank the reviewer for this comment. We understand the reviewer’s concern regarding the clarity of Figure 7’s caption. While we believe that the first bold sentence in the caption adequately clarifies that the results in Figure 7 are derived from the Geno-DT model, we agree with the reviewer that it is needed to further enhance the wording clarity. In response, we have made the necessary edits to the caption in our revised manuscript.

      Comments from Reviewer 3

      We thank Reviewer 3 for giving the positive assessment. Here are the responses to the comments.

      Comment 1: The small sample size is a limitation but has already been acknowledged and documented by the authors.

      We thank the author for this comment, and we acknowledged the small sample size as a limitation in our manuscript.

      Comment 2: Another limitation is the consideration of only two of the possible genotypes in developing the cell membrane kinetics, but again has been acknowledged by the authors.

      We thank the author for this comment, and we acknowledged the consideration of only two genotypes as a limitation in our manuscript. We hope to enlarge the genotype groups in our future ARVC studies.

    1. Author Response

      We thank the reviewers for their helpful comments and thorough assessment of our manuscript which will allow us to improve the work in a subsequent revision. Many suggestions, such as mutating residues to help validate the proposed site will be included in a future revision. Below we clarify three aspects that led to confusion in the initial review

      The comment of reviewer 2 that “... the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state." is true. However, as explained in our manuscript (Fig. 7), we don’t expect binding at this position to happen in the resting state as the C-terminal domain is bound to this region, impeding PIP binding.

      Reviewer 2 also suggests that we produce a resting state model of Nav1.4 to replace/supplement the results we obtained using our resting Nav1.7 model. We chose to model Nav1.7 due to the availability of structures with different VSDs in the deactivated conformation, something that is not true for Nav1.4. While we plan to explore a Nav1.4 resting state based on the reviewer's suggestion, we note that this introduces an extra layer of uncertainty. However, due to sequence conservation of the gating charges and proposed binding site residues between Nav subtypes, we propose very similar modes of PIP binding among the Nav subtypes across the different conformations.

      Finally, we strongly disagree with the reviewer’s assessment that there are ‘There are a lot of incorrect statements in many areas’ and this may have come from a misreading of the mentioned sentence. The sentence in question reads "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." To which the reviewer 2 states ‘Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.’ The statement quoted is not incorrect – from the original experiments we know that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP present the channel will recover from inactivation more quickly. We appreciate that this sentence could be reworded for clarity and will address this in our revision to prevent such misreading.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your recent editorial decision on our manuscript. I have included a revised version of our manuscript in which we have addressed all of the required editorial and referees’ comments as requested. In summary, we have added substantial amounts of new data and analysis (new Fig. 5D; Supplementary Figures S1E, S3C, S3E, S3I, S4C), amended several figures (Figures 2 and 3), added a new supplementary Table (Table S2) and we have changed the text and figure labelling/presentation in appropriate places to clarify or correct the issues raised by the reviewers.

      In summary, we firmly believe that we have addressed all the outstanding issues in a positive manner and that the manuscript is now suitable for publication in eLife. I look forward to receiving your final editorial decision on this manuscript.

      eLife assessment:

      ZMYM2 is a transcriptional corepressor but little was known about how it is recruited to chromatin. This study reveals that ZMYM2 homes to distinct classes of retrotransposons bound by the TRIM28 and ChAHP complexes in human cells, an important finding in the field of transcriptional regulation. The evidence supporting the claims of the authors is solid, although inclusion of more functional data would have strengthened the original model proposed.

      We have taken all the comments on board and provided additional new experimental data where requested and more data analysis to substantiate our claims.

      Reviewer #1 (Public Review):

      Owen D et al. investigated the protein partners and molecular functions of ZMYM2, a transcriptional repressor with key roles in cell identity and mutated in several human diseases, in human U2OS cells using mass spectrometry, siRNA knockdown, ChIP-seq and RNA-seq. They tried to identify chromatin bound complexes containing ZMYM2 and identified known and novel protein partners, including ADNP and the newly described partner TRIM28. Focusing mainly on these two proteins, they show that ZMYM2 physically interacts with ADNP or TRIM28, and co-occupies an overlapping set of genomic regions with ADNP and TRIM28. By generating a large set of knockdown and RNA-seq experiments, they show that ZMYM2 co-regulates a large number of genes with ADNP and TRIM28 in U2OS cells. Interestingly, ZMYM2-TRIM28 do not appear to repress genes directly at promoters, but the authors find that ZMYM2/TRIM28 repress LTR elements and suggest that this leads to gene deregulation at distance by affecting the chromatin environment within TADs.

      A strength of the study is that, compared to previous studies of ZMYM2 protein partners, it investigates binding partners of ZMYM2 using the RIME method on chromatin. The RIME method makes it possible to identify low-affinity protein-protein interactions and proteins interactions occurring at chromatin, therefore revealing partners most relevant for gene regulation at chromatin. This allowed the identification of novel ZMYM2 partners not identified before, such as TRIM28. The authors present solid interaction data with appropriate controls and generated an impressive amount of datasets (ChIP-seq for TRIM28 and ADNP, RNA-seq in ZMYM2, ADNP and TRIM28 knockdown cells) that are important to understand the molecular functions of ZMYM2. These datasets were generated with replicates and will be very useful for the scientific community. This study provides important novel insights into the molecular roles of ZMYM2 in human U2OS cells.

      The authors could have been more precise in the manuscript title and abstract to emphasize that these findings apply to human cells, as indeed there is no demonstration yet that the findings presented here can be transposed to mouse cells.

      We have slightly changed the title and abstract to emphasise that the findings are in human cells.

      The manuscript's main conceptual advance is that the authors propose a novel model of gene regulation whereby transcriptional repressors of transposable elements could regulate genes at distance by modulating the local chromatin environment within TADs. Additional experiments would be needed to strengthen this model. For example the authors could have performed TRIM28 ChIP in ZMYM2-kd cells to test if ZMYM2 favors the recruitment of TRIM28 to its genomic targets, as well as ChIP-seq of repressive chromatin marks (such as H3K9me3) in ZMYM2-kd cells to investigate if the loss of ZMYM2 leads to reduced H3K9me3 in ERVs and over large regions surrounding the ERVs.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Public Review):

      In this study the authors investigate functional associations made by transcription factor ZMYM2 with chromatin regulators, and the impact of perturbing these complexes on the transcriptome of the U2OS cell line. They focus on validating two novel chromatin-templated interactions: with TRIM28/KAP1 and with ADNP, concluding that via these distinct chromatin regulators, ZMYM2 contributes to transcriptional control of LTR and SINE retrotransposons, respectively.

      Strengths and weakness of the study:

      • The co-localization of ZMYM2 with ADNP and TRIM28 is validated through RIME, ChIP-seq and co-IP. (Notably, since both RIME and ChIP-seq rely on crosslinking, and the co-IP with TRIM28 required crosslinking due to being SUMO-dependent, only the ZMYM2-ADNP co-IP experiment demonstrates an interaction in the absence of crosslinking).

      This is not correct as the co-IP experiments between endogenous ZMYM2 and TRIM28 were not performed in the presence of cross linkers. They did have NEM added, but this was to inactivate SUMO proteases rather than to cross link proteins.

      • It is good that uniquely-mapped reads are used in the ChIP-seq analysis given the interest in repetitive elements. Likewise, though the RT-qPCR data in Fig5 should be complemented by analysis of the RNA-seq data that the authors already have, it seems that the primers are carefully designed such that a single retrotransposon copy is amplified.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      • The top-scoring interactors are highly-abundant nuclear proteins: for example, data from the contaminant repository for affinity purification mass-spec data (https://reprint-apms.org/) show that TRIM28 is identified in 466 / 716 AP-MS experiments with a mean spectral count of 16. While this does not indicate that the ZMYM2-TRIM28 interaction is not 'true', it would have been helpful to further dissect the interaction to strengthen this conclusion. For example, it would be nice to see the co-IP (fig 3A) repeated from the cells expressing the ZMYM2 mutant that is no longer competent to bind SUMO (used in the ChIP-seq data of Fig 2). Alternatively - if the model is that ZMYM2 recruits SUMOylated TRIM28 with well-characterized TRIM28 mutants that lack SUMOylation.

      We are aware that TRIM28 is often present as an apparent contaminant in many mass spec studies. However we have provided co-IP, PLA and ChIP-seq data to support their co-association on chromatin. We also convincingly show that ZMYM2 and TRIM28 functionally converge on regulating the same gene expression programmes. As requested by the referee, we have added further data showing that the ZMYM2 protein that is defective in SUMO binding (ZMYM2(SIM2mut); new Supplementary Fig. S3C) shows reduced binding to TRIM28 in co-IP assays. This further strengthens the (SUMO-dependent) association between ZMYM2 and TRIM28.

      • The transcriptional response using bulk RNA-seq in ZMYM2-depleted cells is rather gene-centric despite the title of the paper being about TE transcription. In fact the only panels about TE transcription are the RT-qPCR data in Fig 5D,F. I may be missing something (and there aren't many details given about the RNA-seq experiments) but why not look at TE transcription in an unbiased way with the transcriptomic data at hand? I appreciate potential hazards of multi-mapping etc but it would be interesting to see at least some subfamily analysis (e.g. using the TEtranscripts tool). On a similar point, why not show some RNA-seq in the genome browser snapshots of the epigenomics - together with a RepeatMasker annotation track of TEs...

      See response to the same point above.

      While the results broadly support the authors' conclusions, I have the overall impression that the central claim of TE transcriptional regulation by ZMYM2 could be strengthened a lot with some fairly straightforward additional experiments and analyses.

      Reviewer #3 (Public Review):

      ZMYM2 is a transcriptional repressor known to bind to the post-translational modification SUMO2/3. It has been implicated in the silencing of genes and transposons in a variety of contexts, but lacking sequence-specific DNA binding, little is known about how it is targeted to specific regions. At least two reports indicate association with TRIM28 targets (Tsusaka 2020 Epigenetics & Chromatin, Graham-Paquin 2022 bioRxiv) but no physical association with TRIM28 targets had been observed. Tsusaka 2020 theorizes an indirect, potentially SUMO-independent, interaction via ATF7IP and SETDB1.

      Here, Owen and colleagues show that a subset of ZMYM2-binding sites in U2OS cells are clearly TRIM28 sites, and further find that hundreds of genes are silenced by both ZMYM2 and TRIM28. They next demonstrate that ZMYM2 homes to chromatin, and interacts with TRIM28, in a SUMOylation-dependent manner, suggesting that ZMYM2 is recognizing SUMOylation on TRIM28 itself. ZMYM2 separately homes to SINE elements bound by the ChAHP complex, in an apparently SUMOylation independent manner. Although this is not the first report to show physical interaction between ZMYM2 and ChAHP, it is the first to show that ZMYM2 homes to ChAHP-binding sites and functions as a corepressor at these sites.

      The mode by which ZMYM2 and TRIM28 coregulate genic targets remains somewhat unclear. TRIM28/ZMYM2 bind to LTR elements, loss of these proteins results in upregulation of genes distal to (but in the same TAD as) these binding sites.

      Overall, the manuscript is well-written, convincing, and fills a significant hole in our understanding of ZMYM2's mechanistic function.

      We thank the referee for his/her positive evaluation of the mechanistic insights we provide. We have further added to these through addressing the specific issues raised in their “recommendations for authors”.

      Recommendations for the authors:

      The reviewers appreciated the novelty of the findings, and in particular, the use of the RIME method to identify the protein partners of ZMYM2 while bound on chromatin, and multiple validation steps of these novel ZMYM2 interactors. However, they also felt that the model presented at the end of the manuscript seems preliminary and would deserve additional experiments to be really supported, the essential ones being listed below:

      1 - Despite the claimed scope of the manuscript on TE regulation, their expression analysis is limited to RT-qPCR and targeted to a few families or copies. Please use the RNA-seq data generated in U2OS cells depleted for ZMYM2 to assess retrotransposon expression genome-wide, performing both family-level and copy-level analyses, and compare with TRIM28-depleted U2OS cells.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      2 - Clarify the relationship between dysregulated genes and TAD boundaries, as this seems important to support the model of distant gene regulation by the action of ZMYM2 on local chromatin environment within TADs (see comment of Reviewer #1 and #3).

      We have now provided further support for the idea that ZMYM2 functions within TADs as detailed below in response to the reviewers comments. New bioinformatics analysis has been done which is incorporated into the paper in Fig. 4D and Supplementary Fig. S4C.

      3 - Perform TRIM28 ChIP-seq in ZMYM2-kd cells, to prove that ZMYM2 indeed participates to TRIM28 recruitment to TE loci. This could be complemented by H3K9me3 ChIP-seq, to see if ZMYM2 depletion reduces H3K9me3 at retroytransposons, and over the regions surrounding ERVs. This last experiment seems also important for reinforcing the distant regulation model of nearby genes through ZMYM2-mediated repression of retrotransposons.

      As suggested by the referees below, we have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #1 (Recommendations For The Authors):

      • Figure S1D is not clear. The authors want to investigate if ADNP and ZMYM2 regulate gene expression in the same directionality. They compare the genes down in siADNP and up in siZMYM2 (or vice versa) and show very small overlaps. If I understand correctly, this shows that very few genes are regulated in opposite directions by ADNP and ZMYM2 and consequently that they tend to regulate genes in the same directionality. This is not what is said in the text page 19 ("with no clear common roles as either an activator or repressor") and should be clarified. Furthermore, to compare if ADNP and ZMYM2 regulate genes in the same directionality, there are better ways to represent this, for example scatter plots of log2 FC in ADNP kd vs ZMYM2 kd. Similar criticisms apply to Fig S3F.

      We agree that the text could be clearer and have rewritten it as “….although the large numbers of genes directionally co-regulated by these two proteins (ie either positively or negatively) indicates no clear common role as either an activator or repressor”. We have also added a scatter plot to the supplementary data (Fig. S1E) to further emphasise the common directionality of effect as suggested by the reviewer. Similarly, we changed the text and have added a scatter plot to support the conclusions on ZMYM2 and TRIM28 functional interactions (new Fig. S3I).

      • The authors suggest an indirect control of genes by ZMYM2 within TADs (Fig 4C). Yet Fig 4C does not seem to address this point. Fig 4C shows that TADs with a ZMYM2/cluster 1 peak contain more upregulated than downregulated genes, but the key question should be: are upregulated genes significantly enriched in TADs containing a ZMYM2/cluster 1 peak compared to other TADs or other genomic regions?

      We have taken this suggestion on board and determined the frequency distribution of the number of TADs containing a gene upregulated (fold change >1.6; Padj <0.01) following ZMYM2 depletion. 10,000 iterations were performed by randomly selecting 216 TADs across all 3062 TADs. The observed number of TADs containing an upregulated gene (42) from 216 TADs containing a cluster 1 ZMYM2 peak is a clear outlier in this distribution (P-value = 0.0002) (see Supplementary Fig. S4C).

      • A key question not addressed in the manuscript is whether ZMYM2 participates in the recruitment of TRIM28 to ERVs. I recommend performing TRIM28 ChIP in ZMYM2-kd cells.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Recommendations For The Authors):

      Please give more details of RNA-seq analyses in the experimental section (this will be particularly important if the comment about analysing TE transcription genome-wide is acted on).

      We have now expanded on the description of the RNA-seq analysis including adding in the mapping statistics to a new Supplementary table. We followed the referee’s useful suggestion of looking at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs).

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      • The relationship of TRIM28/ZMYM2 repression of LTRs and silencing within/between TADs is interesting but underdeveloped. Upon ZMYM2 depletion, the authors observe simultaneous upregulation of genes within TADs more often than would be expected by chance, but this analysis does not distinguish "proximal to" from "in the same TAD". If a ZMYM2 binding site is X bases from a gene TSS, is it more likely to regulate that gene if it is in the same TAD? This can and should be tested bioinformatically.

      The basic question the referee is asking is whether ZMYM2 affects gene expression at a certain distance irrespective of whether the TSS of the gene is in the same TAD. We have now tested this and added text to the results section. Basically we took all of the ZMYM2 regions associated with genes upregulated by ZMYM2 depletion that resided in the same TAD and calculated the peak to TSS distance. Then we searched in the opposite direction for the TSS of genes at a similar distance (+/-25%) that resided in an adjacent TAD. We then asked whether these genes were upregulated by ZMYM2 depletion. 102 ZMYM2 peaks were positioned within these distance constraints with at least one gene in an adjacent TAD (716 genes in total). Of these genes, only 11 were upregulated following ZMYM2 depletion. There is therefore not a general spreading of deregulation around ZMYM2 peaks in a distance-dependent manner.

      Furthermore, the authors note in the text and discussion that LTRs can demarkate TAD boundaries, but this is a distinct concept from the idea that they regulate genes within a TAD. Is there evidence that ZMYM2 binding sites are found at TAD boundaries?

      We have provided more evidence to support the associations of ZMYM2 peaks with TADs and now show that they are closer than randomly expected to TAD boundaries (Fig. 4D). However they are clearly not all located very close to the boundaries.

      • The analysis of transposons expression was limited to qPCR of a handful of elements. Since the authors have conducted RNA-seq of U2OS cells depleted for both TRIM28 and ZMYM2, they can determine if certain classes of transposons are globally upregulated.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      Minor Comments:

      • Typo: "human HEK393 cells". They are HEK293 cells.

      We have corrected this error.

      • "These ADNP peaks showed enrichment of binding motifs for several transcription factors with the top two motifs for HBP1 and IRF both found in over 35% of target regions (Figure 1D)." According to Ostapcuz 2018, ADNP has its own motif (CGCCCYCTNSTG). It is intriguing that this does not appear enriched in ADNP sites in U2OS cells, this seems worthy of comment.

      This is a good point, so we did an additional search using the motif found in Ostapcuk 2018 and found this in 15% of ADNP binding regions. This value is substantially lower than the 63% seen previously. It therefore is present but is not the dominant motif. This new data and its implication regarding chromatin targeting mechanisms is now discussed in the Results section around Fig. 1D.

      • Figures S2F and S2G are central to the paper and belong in the main text.

      We have now added these to the main figures as requested (meaning that Fig.2 has now been split into two separate figures {2 and 3} as became too large for a single figure).

      • A supplementary table including libraries generated and mapping statistics should be included.

      We have now added this (new Supplementary Table S2)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The enteroviruses comprise a medically important genus in the large and diverse picornavirus family, and are known to be released without lysis from infected cells in large vesicles containing numerous RNA genome-containing capsids - a feature allowing for en bloc transmission of multiple viral genomes to newly infected cells that engulf these vesicles. SIRT-1 is an NAD-dependent protein deacetylase that has numerous and wide ranging effects on cellular physiology and homeostasis, and it is known to be engaged in cellular responses to stress and autophagy.

      Jassey et al. show that RNAi depletion of SIRT-1 impairs the release of enterovirus D-68 (EVD68) in EVs recovered from the supernatant fluids of infected cells using a commercial exosome isolation kit. The many functions attributed to SIRT-1 in the literature reflect its capacity to deacetylate various cell proteins engaged in transcription, DNA repair, and regulation of metabolism, apoptosis and autophagy. However, Jassey et al. make the surprising claim that the proviral role of SIRT-1 in promoting enterovirus release is not dependent on its deacetylase activity. Fig. S1C is crucial to this suggestion, as it is said to show that reconstituting expression with a catalytically-inactive mutant can rescue virus release from SIRT-1 depleted cells. However, no information is provided concerning the levels of endogenous and ectopicallyexpressed SIRT-1 proteins in this experiment, making it very difficult to interpret the results. Is the mutant SIRT-1 protein expressed at a higher level than the non-mutant protein? Is there a 'sponging' effect with these transfections that lessens the siRNA efficiency and reduces knockdown of the endogenous protein? Fig. S1B and Fig. 4C convincingly show that EX527, a small molecule inhibitor of the deacetylase activity of SIRT-1, inhibits extracellular release of the virus. This suggests that the deacetylase activity of SIRT-1 is in fact required for the proviral effect of SIRT-1. This is a fundamentally important question that will require more investigation.

      We have included western blot data (Fig. S1D), which shows comparable levels of expression between the wild-type and mutant SIRT-1 constructs as well as the endogenous SIRT-1. While both constructs partially rescued EV-D68 titers in SIRT-1 knockdown cells, only the wild-type construct rescued SERCA2A protein levels, indicating that SIRT-1 deacetylase activity is required for SERCA2A expression but not for EV-D68 infection.

      Fig. 6 shows how SIRT-I knockdown impacts the release of enterovirus D68 in EVs recovered from cell culture supernatant using a commercial 'Total Exosome Isolation Kit'. The authors should describe the principle this kit exploits to isolate 'exosomes' (affinity isolation?) and specify which antibodies it involves (anti-phosphatidylserine, anti-CD63, others?) This could impact the outcome of these experiments, and moreover is important to include in the longterm scientific record. The authors are appropriately cautious in describing the vesicles they presume to be isolated by the kit as simply 'extracellular vesicles', since there are multiple types of EVs with very different mechanisms of biogenesis, of which 'exosomes' are but one specific type. It would have been more elegant had the authors shown that SIRT-1 is required for EVD68 release in detergent-sensitive vesicles with low buoyant density in isopycnic gradients, and to characterize the size and number of viral capsids in these vesicles by electron microscopy.

      We have added a description of the Total Exosome Isolation Kit principle to the materials and methods. The reagent, in brief, ties up water molecules and forces less soluble components, such as vesicles, out of the culture media, which can then be pelleted by centrifugation. The purity and size distribution of exosomes isolated with this kit is comparable to ultracentrifugation.

      Fig. 6 shows that SIRT-1 depletion upregulates CD63 expression, but has no apparent impact on the release of CD63-positive 'EVs' from uninfected cells. EV-D68 infection also upregulates CD63 expression in SIRT-1 replete cells, and in this case, increases the release of CD63-positive EVs. The combination of infection and SIRT-1 depletion massively upregulates CD63 expression, but appears to eliminate the enhanced release of CD63-positive EVs resulting from infection alone. These are interesting results, from which the authors infer CD63 is associated with EVs containing EV-D68. But, do we know this? Can a CD63 pulldown immunoprecipitate EV-D68 capsid proteins or viral RNA? CD63 is strongly associated with exosomes released from cells through the multi-vesicular body pathway, which are distinct from the LC3-positive EVs released by secretory autophagy that have previously been associated with enteroviruses. The authors suggest that 'knockdown of SIRT-1 may prevent the exocytosis of CD63-positive EVs", but this is a very broad claim (and not really demonstrated by Fig. 6): it requires a clearer definition of what the authors mean by 'exocytosis' and a much more detailed analysis of the size and buoyant density of EVs released in a SIRT-1-dependent process.

      We have toned down this suggestion, which sets up our logic for what is now Figure 7 but we agree does not prove the specific nature of these vesicles.

      The authors suggest that almost all EV-D68 released from infected cells is released without cell lysis in EVs. However, they generally show data from only a single time point following infection (5 or 6 hrs post-infection). It would have been interesting to see a more complete temporal analysis, and to know whether a high proportion of virus continues to be released in EVs, or if it is swamped out ultimately by lytic release of nonenveloped virus.

      In these cells, very little virus is released at earlier timepoints, and after 6hpi it is difficult to analyze virus release because of cell detachment and lysis. In a future publication we will use less susceptible cells to analyze a time course of release.

      Fig. 1D indicates that a small fraction of SIRT-1 leaks from the nucleus in EV-D68 infected cells. The authors suggest this is due to targeted nuclear export, rather than simply leaky nuclear pores which are well known to exist in enterovirus-infected cells. The authors present similar fluorescent microscopy data showing inhibition of TFEB export in leptomycin-B treated cells in Fig. S2A in support of their claim that this is specific SIRT-1 export, but these data are far from convincing - there is equivalent residual TFEB and SIRT-1 in the cytoplasm of the treated cells. Quantitative immunoblots of cytoplasmic and nuclear cell fractions might prove more compelling.

      We have changed the text to remove the word “block” and instead suggest that there is inhibition, given the difference we observe with and without leptomycin-B.

      Finally, the authors should be more specific in describing the viruses they have studied (EV-D68 and PV). It would be preferable to describe these as 'enteroviruses' (including in the title of the manuscript), rather than more broadly as 'picornaviruses'. There is no certainty that the requirement for SIRT-1 in non-lytic release of virus extends to hepatoviruses or other picornaviral genera, for which mechanisms of nonlytic release may be quite different.

      We have made this change and thank the reviewer for pointing this out.

      Reviewer #2 (Public Review):

      The authors aimed to connect SIRT-1 to EV-D68 virus release through mediating ER stress. They are successful in robustly connecting these pathways experimentally and show a new role for SIRT-1 in EV-D68 infection. These results extend to additional viruses, suggesting role(s) for SIRT-1 in diverse virus infection.

      The authors note that EV-D68 does not significantly impact SIRT-1 protein levels (Fig 1E and F), though this has been described for other picornaviruses (Xander et al., J Immunol 2019; Han et al., J Cell Sci 2016; Kanda et al Biochem Biophys Res Commun 2015). This may be of interest to note in the manuscript.

      We have cited the above papers in the manuscript and thank the reviewer for these suggestions.

      The data regarding CVB3 (Fig S4) are especially interesting because they show no discernable impact on infection. The manuscript should describe this further and perhaps speculate on potential reasons. Could it be due to inefficient knockdown?

      We have shown that both genetic and pharmacological inhibition of SIRT-1 does not significantly alter CVB3 titers. We do not think this is due to inefficient knockdown since the CVB3 and PV experiments were done concurrently. We are currently investigating why CVB3 responds differently from EV-D68 and PV.

      SIRT-1 (and other sirtuins) have been linked to an innate interferon response. Are any of the phenotypes observed here due to IFN responses? The use of H1HeLa cells would suggest this is not the case.

      We think this is unlikely because H1HeLas are not IFN-competent and the knockdown of SIRT1 did not significantly alter viral RNA replication

      Reviewer #1 (Recommendations For The Authors):

      In Fig. 1, it would be informative to show an immunoblot of the protein in knockdown vs control cells (this is shown in different experiments in Fig. 2A and 3C, with variable degrees of knockdown efficiency, but ideally should be shown here also).

      The knockdown efficiency of SIRT-1 is now shown in Fig. S1D. We thank the reviewer for this suggestion.

      Why is the extracellular virus titer in the control cells in Fig. 1C so much lower (over a 1.5 logs) than in Fig. 1B? Has the plasmid transfection induced an innate immune response, and could this be confounding the experiment?

      We think this is due to stress induced by transfection and not an innate immune response, since H1Hela are not interferon competent.

      SIRT-1 is recognized to have a regulatory role in autophagy, but the author's claim that it is "essential for stress induced and basal autophagy" would be strengthened by including in Fig. 2B control images of starved and CCCP-treated cells.

      LC3 lipidation and p62 degradation are the hallmarks of autophagy initiation and flux, which are shown in Fig. 2A. The goal of Fig. 2B was to verify the impact of SIRT-1 knockdown in restricting basal autophagic degradation. We will examine the effect of starvation and CCCP treatment in future studies. We thank the reviewer for understanding.

      The BiP immunoblot shown in Fig. 4B does not support the claim that 'TG [thapsigargin] treatment induced BiP protein levels' whereas 'EV-D68 infection reduced BiP levels...suggesting that EV-D68 blocks ER stress.' The apparent differences in BiP expression are minimal and of questionable biological significance.

      We have consistently observed a reduction in BiP levels during EV-D68 infection in both hSABCi-NS1.1 as indicated in Fig. 4B and H1HeLa (see Author response image 1), which is consistent with an ER stress blockade during EV-D68 infection.

      Author response image 1.

      Minor comments:

      1) The variable and wide-ranging scale of the y-axis in Figs. 1A-C and S1 is distracting, exaggerates small differences, and makes it difficult to assess the magnitude of differences in virus titers. The scale should be standardized and held constant in graphs showing results from similar types of experiments.

      Our graphs are plotted based on the viral titers from experiments, mostly done on different days. We are confident that the variabilities in the y-axis do not affect the statistical analyses.

      2) The number and types of (technical or biological?) of experimental replicates should be indicated in the figure legends. Ideally, each replicate should be individually plotted in graphs.

      All experiments are repeated at least three times unless otherwise indicated. We have added this information to the figure legends.

      3) Fig. S5C - how many replicates were done, and is there a statistically significant difference in viral RNA abundance at the last time point?

      The experiment was done three times, twice with a low MOI (0.1) and once with a high MOI (30). There is no statistical difference at the last time point as shown in the graphs in Author response image 2.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1D would benefit from staining for viral replication compartments (J2, for instance) to correlate the amount of viral dsRNA with nuclear egress of SIRT-1. Similar data would benefit Figure 5A. The data in Figure S5 suggests that most, but not all cells, are infected, so having this control seems important for their IFA experiments.

      SIRT-1 dsRNA staining for EV-D68 infection is shown in Fig. S5A and all cells appear to be infected. The IFA data (Author response image 3) shows dsRNA staining of CVB3-infected cells.

      Author response image 3.

      Are EVs not released as efficiently with SIRT-1 knockdown? The authors show that knockdown reduces CD63 levels in purified EVs, but this could be explained if exosomes are not generated as robustly with SIRT-1 knockdown.

      We don’t want to use the word “exosomes” since their definition is very specific, and only use it once in our manuscript, to describe known membrane associations of CD63. We do not think SIRT-1 knockdown affects the intracellular generation of EVs, since depleting SIRT-1 leads to the buildup of CD63 positive signals in the whole cell lysates compared to the scramble control (Fig. 7B and C). Instead, our data suggest that SIRT-1 regulates the release of EVs during EV-D68 infection.

      Labels of graphs for "Infection" versus treatment ("TG" or "EX527") is unclear. All samples are presumably infected, so perhaps the authors meant to label these diagrams as untreated.

      We have made the changes in the labels and thank the reviewer for helping make these graphs more clear.

      The induction of ER stress with TG and repression of stress with EV-D68 infection is clear from BiP western blots. Are BiP levels reduced in SIRT-1 knockdown cells? Their data with TG treatment and knockdown suggests this may be possible.

      We have not examined the impact of SIRT-1 knockdown on BiP protein levels. But since SIRT1 KD increases ER stress, as evidenced by a reduction in SERCA2A levels (Fig. 3C and E), we would expect an increase in BiP levels in SIRT-1 depleted cells.

      Would the authors expect TG to reduce EVs with EV-D68 as well? Presumably, combination of TG with SIRT-1 would reduce EVs similar to the results shown in Figure 6C. They mention in the discussion that TG and SIRT-1 "share common cellular targets" so it would be interesting to determine if TG acts similar to SIRT-1 knockdown with regard to EVs.

      We think TG will similarly reduce EVs in EV-D68-infected cells, and we are currently testing this hypothesis.

      Because of the inclusion of the SARS-CoV-2 data and mention in the abstract, it may be appropriate to include that data (Fig S7) in the main figures. The authors mention SIRT-1 as important to MERS-CoV infection in the introduction, but SIRT-1 has been implicated in RNA virus infection, including picornaviruses (noted above). The expansion of this section to provide additional context would benefit the introduction and discussion.

      We have moved the former Fig. S7 to the main manuscript as Fig. 6.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for submitting your article "Microhomology-Mediated Circular DNA Formation from Oligonucleosomal Fragments During Spermatogenesis" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the assessment has been overseen by a Reviewing Editor and Diane Harper as the Senior Editor.

      eLife assessment

      This study provides valuable information on the biogenesis of eccDNAs during spermatogenesis, i.e., eccDNAs in spermatogenic cells are not derived from miotic recombination hotspots but represent oligonucleosomal DNA fragments from apoptotic male germ cells, whose ends are ligated through microhomology-mediated end-joining. The study is currently incomplete because the method of bioinformatics needs more details and data interpretation should take the amplification bias into consideration.

      We highly appreciate the positive assessment of our manuscript. Following the insightful suggestions by editors and two reviewers, we have fully addressed two major concerns, i.e., the missing of method detail and the biased data interpretation.

      First, to provide the detail of our bioinformatics methods, i) We have illustrated the principle and steps of our eccDNA detection method by Figure 4C and Figure 4-figure supplement 2B, and submitted our source codes to GitHub (website); ii) We compared the performance of our methods in comparison with four established bioinformatics tools on both simulated and real datasets, and revealed that it has comparable sensitivity and specificity (Figure 4—figure supplement 2C and E), and much higher accuracy on the assignment of eccDNA boundaries (Figure 4—figure supplement 2A, D and F); and iii) we have added more description to help readers to better understand our method (see Methods – eccDNA Detection).

      Second, the amplification bias is indeed a problem of Circle-seq. Following editors’ and Reviewer #1’s insightful suggestions, we analyzed other datasets generated by amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021) and long-read sequencing (Henriksen et al., Mol Cell, 2022). We identified the presence of homologous sequences surrounding eccDNA breakpoints in both datasets (Figure 5-figure supplement 1E and F), suggesting the involvement of MMEJ-medicated ligation for the unexplored size populations of eccDNAs by Circle-seq as well. We have discussed this point and added one section to remind readers of the limitations of rolling-circle amplification-based Circle-seq (the 2nd paragraph of Discussion section).

      For your and reviewers’ convenience, all changes in the revised manuscript have been marked in red. We hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study aims to address the mechanism of eccDNA generation during spermatogenesis in mice. Previous efforts for cataloging eccDNA in mammalian germ cells have provided inconclusive results, particularly in the correlation between meiotic recombination and the generation of eccDNA. The authors employed an established approach (Circle-seq) to enrich and amplify eccDNA for sequencing analyses and reported that sperm eccDNA is not associated with miotic recombination hotspots. Rather, the authors reported that eccDNAs are widespread, and oligonucleosomal DNA fragments from sperm undergoing apoptosis, with the ligation of DNA ends by microhomology-mediated end-joining, would be a major source of eccDNA.

      The strength of the study includes evaluating the eccDNA contents not only in sperm but also from earlier stages of cells in spermatogenesis. The differences in eccDNA size peaks between sperm and other progenitors, in particular, the unique peak in sperm around 360 bp, are intriguing. Results from sequencing data analysis were presented elegantly.

      We are grateful to Reviewer #1 for his or her recognition of the strength of this study.

      I also have critiques. First, the lack of eccDNA quality control step is a concern. Previous studies employed electron microscopy to ensure that DNA species are mostly circular before rolling-circle amplification. Phi29 polymerase is widely used for DNA amplification, including whole genome amplification of linear chromosomal DNA. Phi29 polymerase has a high processivity and strand displacement activity. When those activities occur within a molecule, it creates circular DNA from linear DNA in vitro. In vitro-created eccDNA from linear DNA would be randomly distributed in the genome, which may explain the low incidence of common eccDNA between replicates. Therefore, it will be crucial to show that DNA prior to amplification is dominantly circular. Electron microscopy would be challenging for the study because the relatively small number of cells were processed to enrich eccDNA. An alternative method for quality controls includes spiking samples with linear and circular exogenous DNA and measuring the ratios of circular/linear control DNA before and after column purification/exonuclease digestion. eccDNA isolation procedures can be validated by a very high circular/linear control DNA ratio.

      We greatly appreciate Reviewer #1's valuable suggestions. We have introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures according to Reviewer #1's suggestion. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (new Figure 1-figure supplement 2A). These results affirm the high selectivity of our protocol in enriching eccDNAs.

      Another critique is regarding the limitation of the study. It is important to remind the readers of the limitations of the study. As the authors mentioned, rolling circle amplification preferentially increases the copy numbers of smaller eccDNA. Therefore, the native composition of eccDNA is skewed. In addition, the candidate eccDNAs are identified by split reads or discordant read pairs. The details of the mapping process are unclear from the methods, but such a method would require reads with high mapping quality; the identification of eccDNA is expected to require sequencing reads that are mapped to genomic locations uniquely with high confidence, and reads mapped to more than one genomic location, such as highly similar repeat sequences or duplications, are eliminated. Such identification criteria would favor eccDNA formed by little or no homology at the junction sequences, and eliminate eccDNA formed by long homologies at the ends, such as eccDNA formed exclusively by satellite DNA. Therefore, it is not surprising that the authors found the dominance of microhomology-mediated eccDNA. It remains to be determined whether small eccDNA with microhomologies are the dominant species of eccDNA in the native composition. In this regard, it is noted that similar procedures of eccDNA enrichment (column purification, exonuclease digestion, and rolling circle amplification ) revealed variable sizes and characteristics of eccDNA in sperm (human from Henriksen et al. or mice from this study), dependent on the methods of sequencing (long-read or short-read sequencing). Considering these limitations, the last sentence of the introduction, "We conclude that germline eccDNAs are formed largely by microhomology mediated ligation of nucleosome protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots" needs to be revised.

      We thank Reviewer #1 for bringing attention to the limitations of the study. Since rolling circle amplification preferentially increases the copy numbers of smaller eccDNA, the exact size distribution of eccDNA in native composition is yet to be determined. As pointed out by Reviewer #1, our mapping and eccDNA detection processes might indeed introduce some biases since we only focused on uniquely-mapped reads. We have addressed and incorporated Reviewer #1’s perspectives in our revised manuscript, as detailed in the 2nd paragraph of Discussion section.

      Despite these limitations, microhomology mediated ligation of DNA fragments seems to be the major mechanism of eccDNA biogenesis nonetheless. We analyzed eccDNA datasets generated through long-read sequencing (Henriksen et al., Mol Cell, 2022) or amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021). Although these eccDNAs represented size populations that were largely missed by this study, our sequence feature analyses also revealed the presence of homologous sequences surrounding eccDNA breakpoints, as depicted in the newly added Figure 5-figure supplement 1E and F. Considering that we could not totally overcome these biases in this study, we have toned down some statements and revised the last sentence of the introduction as follows: “We conclude that germline eccDNAs are likely formed by microhomology mediated ligation of nucleosome-protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots.”

      Small eccDNA (microDNA) data from various mouse tissues are available from the study by Dillion et al., (Cell Reports 2015). Authors are encouraged to examine whether the notable findings in this study (oligonucleosomal-sized eccDNA peaks and the association with apoptotic cell death) are unique to sperm or common in the eccDNA from other tissues.

      We are thankful to Reviewer #1 for this suggestion. We analyzed eccDNA data from various mouse tissues (Dillion et al., Cell Rep, 2015) to see whether our findings are unique to sperm or common for other tissues. Sequence-based prediction revealed significantly higher nucleosome occupancy probability for ~180 bp and ~360bp eccDNA regions, suggesting their origin from oligonucleosomal fragments (Figure 5-figure supplement 1A). In contrast to simulated controls (~20%), more than 1/3 of eccDNAs had microhomologous sequences, most of which were shorter than 5bp (Figure 5-figure supplement 1B). The remaining 2/3 of eccDNAs had the same sequence motifs between eccDNA starts and sequences following eccDNA ends, and between eccDNA ends and sequences in front of eccDNA starts (Figure 5-figure supplement 1C). The genomic distribution of eccDNAs closely matched with that of eccDNAs whose generation was dependent on apoptotic DNA fragmentation (new Figure 5-figure supplement 1D). Altogether, these results indicate microhomology directed ligation of oligonucleosomal fragments in apoptotic cells significantly contributes to eccDNA biogenesis in different mouse tissues. We have described this part in the revised manuscript (see the last 2nd paragraph of Results section).

      Reviewer #2 (Public Review):

      This study presents a useful investigation of eccDNAs in spermatogenesis of mouse. It provides evidence about the biogenesis of eccDNAs and suggests that eccDNAs are derived from oligonucleosmal DNA fragmentation during apoptosis by MMEJ and may not be the direct products of germline deletions. However, the method of data analyses were not fully described and data analysis is incomplete. It provides additional observations about the eccDNA biogenesis and can be used as a starting point for functional studies of eccDNA in sperms. However, many aspects about data analyses and data interpretations need to be improved.

      We thank Reviewer #2 for his or her critical reading. We have provided more method details, performed additional analyses and made some clarifications in our revised manuscript (see below).

      • Most of the conclusions made by the work are only based on the bioinformatics analyses, the validation of these foundlings using other method (biochemistry/molecular biology method) are missing. For example, no QC results presented for the eccDNA purification, which may show whether contaminates such as linear DNA or mitochondria DNA have been fully removed. Additionally, it is also helpful to use simple PCR to test the existence of identified eccDNAs in sperm or other samples to validate the specificity of the Circle-seq method.

      Following both this Reviewer’s and Reviewer #1’s suggestions, we performed quality control of eccDNA purification. First, we introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (Figure 1-figure supplement 2A). Second, mitochondria DNA is supposed to be cleaved into linear DNA by PacI and degraded by exonuclease. As expected, the abundance of mitochondria DNA significantly decreased after eccDNA isolation procedures (Figure 1-figure supplement 2B). Third, we performed PCR using outward primers and validated three randomly-selected eccDNAs (Figure 1-figure supplement 2C).

      • The reliability of the data analysis methods is uncertain, as the authors constructed and utilized their own pipeline to identify eccDNAs, despite the availability of established bioinformatics tools such as ECCsplorer, eccFinder, and Amplicon Architect. Moreover, the lack of validation of the pipeline using either ground truth datasets or simulation data raises concerns about its accuracy. Additionally, the methodology employed for identifying eccDNA that encompasses multiple gene loci remains unclear.

      We thank Reviewer 2 for pointing out this problem. In the original version of our manuscript, focusing on one eccDNA dataset generated in this study, we have compared the performance between our method and established methods for identification of eccDNA regions, such as Circle_finder, Circle_Map and ecc_finder. Our method has comparable sensitivity and specificity with existing methods, especially Circle_finder and Circle_Map (original Figure 4—figure supplement 2C). We also used one specific genomic region to show that existing methods identified the same eccDNA regions but misassigned the eccDNA boundaries (original Figure 4—figure supplement 2A). In the revised manuscript, we have further included ECCsplorer for comparison. Since Amplicon Architect is more specifically designed for detection of ecDNAs, it was not included in our comparison. Following Reviewer #2’ suggestions, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all methods for eccDNA identification. In total, 97.9%, 97.9%, 97.4%, 95.3% and 91.1% eccDNA regions could be detected by our method, Circle_Map, Circle_finder, ecc_finder and ECCsplorer, respectively (Figure 4—figure supplement 2C). This result suggest that our method has comparable performance in detecting eccDNA regions. However, only our method could faithfully assign breakpoints with 97.4% accuracy, in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      As pointed out by Reviewer #2, similar to ECCsplorer, Circle_finder, Circle_Map and ecc_finder, our method fails to identity eccDNAs that encompass multiple gene loci. We have reminded readers of this limitation in our revised manuscript. Besides the schematic workflow (Figure 4—figure supplement 2B), we have included more method details to help readers better understand how our method works (see Methods – eccDNA Detection).

      • Although the author stated that previous studies utilizing short-read sequencing technologies may have incorrectly annotated eccDNA breakpoints, this claim requires careful scrutiny and supporting evidence, which was not provided in the manuscript.

      Following this Reviewer’s suggestions, we conducted a systematic evaluation of the performance of various existing methods, namely Circle_finder, Circle_Map, ECCsplorer and ecc_finder, for eccDNA breakpoint annotation.

      First, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all different methods for eccDNA identification. As expected, our method could correctly assign breakpoints for 97.4% eccDNAs (Figure 4—figure supplement 2D), in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      Second, we examined the performance of all methods on one dataset generated in this study. Our method detected 59,680, 54,898, 32,993 and 22,019 eccDNAs with homologous sequences that were also detected by Circle_finder, Circle_Map, ECCsplorer and ecc_finder, respectively. Remarkably, we observed that at least 60% of breakpoints were misannotated by the existing methods (Figure 4—figure supplement 2F).

      We have included an example in Figure 4—figure supplement 2A, where all existing methods incorrectly annotated the eccDNA breakpoints when homologous sequences were present. These results highlight the advantage of our method over existing methods in accurately annotating eccDNA breakpoints in the presence of homologous sequences.

      • The similarity between the eccDNA profiles of human and mouse sperm remains uncertain, and therefore, analyses of human eccDNA data and comparisons between the two are necessary if the authors claim that their findings of widespread eccDNA formation in mouse spermatogenesis extend to human sperms.

      Our Fig. 5 have shown that human sperm eccDNAs are originated from oligonucleosomal fragmentation (Fig. 5A-C), not associated with meiotic recombination hotspots (Fig. 5D and E) but formed by microhomology directed ligation (Fig. 5F and G). These findings are consistent with what we observed in mouse sperm eccDNAs. To further substantiate our findings, we analyzed an additional eccDNA dataset from human sperms generated by long-read sequencing (Henriksen et al., Mol Cell, 2022). Although predominantly composed of large-sized eccDNAs, the analysis of sequence features also indicated their association with microhomology directed ligation (Figure 5-figure supplement 1E). Overall, the eccDNA profiles in human and mouse sperm exhibit notable similarities.

      Reviewer #1 (Recommendations For The Authors):

      In the last sentence of the abstract, the authors stated, "provide a potential new way for quality assessment of sperms." There is no basis for the claim in the abstract. The authors need to mention the association of eccDNA with apoptosis somewhere to claim it.

      We have revised the Abstract as suggested.

      Some of the references need to be clarified. For example, Coquelle et al., 2002 described the BFB cycles and common fragile sites, but the report does not seem to be relevant to eccDNA. Mouakkad-Montoya et al., 2021 enriched eccDNA without rolling-circle amplification.

      Thanks for pointing this out. We cited Coquelle et al., 2002 to list known biogenesis mechanisms for ecDNAs but not eccDNAs. We have deleted Mouakkad-Montoya et al., 2021 in our revised manuscript, as it did not involve rolling-circle amplification.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear why the authors took 3000bp as the cutoff to divide eccDNAs into short and long categories. How many long eccDNAs in these samples?

      Henriksen et al identified size range of sperm eccDNAs as ~3–50 kb. We therefore used 3kb as an arbitrary cutoff to better compare two different eccDNA populations with those reported by Henriksen et al. SPA, RST, EST and sperm cells have 278, 609, 373 and 691 eccDNAs respectively that are longer than 3000bp. We have clarified this in the revised manuscript.

      • In figure 2D,2E, what is the zero point in the heatmaps? The 5', 3' end or center of eccDNA? Please make it clear in figure and main text.

      The zero point represents the center of eccDNA regions. We have clarified this in the revised manuscript.

      • In line 245, the author mentioned that "periodic distribution of nucleosomes was observed for ~360bp eccDNAs but not for ~180bp ones, indicating that eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays (Figure 2E)". Please explain how to make the conclusion from the Figure 2E?

      Taking the H3K27me3-marked nucleosome as an example, vertical stripes were distributed every ~180bp for ~360bp eccDNAs, as shown by heatmap (more evident if in an enlarged view), and periodic signal distribution was apparent for ~360bp eccDNAs (Figure 2E), as shown by meta-gene analysis on top of heatmap (Figure 2B). However, such pattern was not observed for ~180bp eccDNAs. Similar results could also be observed for nucleosomes marked with other histone variants and histone modifications (H3, H3K27ac, H3K4me1, H3K9ac, H3K36me3, H3K9me3 in Figure 2E). Thus, eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays in sperm.

      • In line 261, the author mentioned: "the large-sized sperm eccDNAs detected in this study also displayed weak but apparent negative correlation with gene density and Alu elements (Figure 3C and D)". However, the data didn't show the "apparent negative correlation", as only one or two data points may support this conclusion and the p-values are not even close to 0.05.

      Many thanks for pointing this out. We have toned down this statement as “the large-sized sperm eccDNAs detected in this study displayed a weak negative correlation with gene density or Alu elements (Figure 3C and D)”.

      • The enrichment of both active (H3K27ac, H3K9ac) and repressive (H3K9me3) histone markers in the original loci of eccDNA poses an intriguing question: how can this seemingly contradictory pattern be explained? In the H3K9me3 heatmap, the average level of H3K9me3 in eccDNA is lower than control's, how to interpret the result?

      We found that small-sized eccDNAs were more enriched at H3K27ac-marked euchromatin regions (Figure 2C-E and 3A), while large-sized ones were more enriched at H3K9me3-marked heterochromatin regions (Figure 3A). This is probably because heterochromatin regions are too condensed to be fragmented into smaller pieces for small-sized eccDNA formation, in comparison with euchromatin regions. We have included this information in our revised manuscript.

      H3K9me3 histone marks are enriched at repeat sequences that are widely distributed within the mouse genome. Moreover, the H3K9me3 ChIP-seq dataset we analyzed in this study had the highest number of ChIP-seq peaks, compared to ChIP-seq datasets of other histone modifications. Thus, even random control would probably have stronger ChIP-seq signals than small-sized eccDNAs (e.g., ~180bp or ~360bp eccDNAs) that were preferentially generated from active regions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thanks for your comments and suggestions concerning our manuscript entitled “miR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis”. These comments are all of great important and extremely helpful for revising and improving our manuscript. We have revised the manuscript carefully according to all your comments. Our point-by-point responses to the comments are listed below.

      Reviewer #1 (Recommendations For The Authors):

      1) If the authors wish to improve their phylogenetic analysis, I strongly suggest using their hemipteran sequences alongside the Drosophila homolog and at least all of the human paralogs. This should be generally sufficient to recapitulate the generally accepted TRPM phylogeny. If the authors contend that this is in fact a separate lineage from other insect TRPMs, a phylogeny that is as taxonomically inclusive as possible, and as methodologically rigorous as possible, would be ideal.

      Thanks for your great suggestion. We have redid the phylogenetic analysis in Figure S1B using CcTRPM sequence with homologs from other 16 species, including 8 human paralogs, 1 Mus musculus homolog, 1 Drosophila homolog, and 6 insect homologs. The relative description was added in Line 489-491 and Line 1044-1049 of our revised manuscript.

      2) If the authors wish to conclude that this is a cold-sensitive ion channel, I strongly suggest repeating at least the Ca2+ imaging with a cold stimulus. In the absence of this experiment, I think that the conclusions need to be significantly softened/hedged, making it clear that the only evidence of cold sensitivity is indirect (resulting from the knockdown experiments).

      Thanks for your excellent suggestion. We have performed Ca2+ imaging with a cold stimulus of 10°C. As expected, there was a clear increase of Ca2+ concentration was observed when treated with cold stimulus of 10°C, which was similar with menthol treatment. So, we could get the solid conclusion that CcTRPM is a direct cold-sensitive ion channel in C. chinensis. We also have added the Ca2+ imaging result with a cold stimulus of 10°C in Figure 2D and moved the results of Ca2+ imaging with menthol treatment to Figure S2I. The related results and methods were added in Line 193-200, Line 919-923, and Line 1065-1069 of our revised manuscript.

      3) Lines 173 and 181: The method used to identify the putative transmembrane domains was not described (although the 3D model does have the correct TRP structure, these methodological details would be appreciated).

      Thanks for your great suggestion. We used an online software of SMART (a Simple Modular Architecture Research Tool) to identify the putative transmembrane domains of CcTRPM, and have added these methodological details in Line 485-487 of Materials and Methods of our revised manuscript.

      4) Lines 176-178: The authors state that "phylogenetic analysis revealed that CcTRPM was most closely related to the DcTRPM homologue (Diaphorina citri, XP_017299512.2), which was consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences." The meaning of this sentence is unclear to me. I'm not sure what it means to be "consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences."

      Thanks for your excellent suggestion. We have revised this sentence in Line176 to 179 of our revised manuscript.

      5) Lines 474-475: The authors state that the NCBI database was used to identify homologous sequences, but there isn't sufficient methodological detail to repeat the search. For example, was this a BLASTP search? Was it taxonomically restricted? What statistical thresholds for homology inference were used? These details would be much appreciated.

      Thanks for your great suggestion. We used BLASTP of NCBI database to identify homologous sequences and preferred the representative species that TRPM sequences have been reported. We have added more description about the methodological detail of phylogenetic analysis in Line 489 to 491 of our revised manuscript.

      6) It would be very interesting, but not critical, to know if menthol and borneol alone have an effect on cuticle thickness.

      Thanks for your excellent suggestion. Actually, we performed the experiments of menthol and borneol alone on cuticle thickness at the beginning. Under 25°C condition, treatment of menthol and borneol alone induced 30-40% transition of 1st instar nymphs from summer-form to winter-form, but only had some slight effect on cuticle thickness, not strong as 10°C of low temperature, because of the opposite effect of 25°C. However, under 10°C condition, we could not know whether the effect on cuticle thickness is from 10°C of low temperature, or direct from menthol and borneol alone.

      7) It would be interesting, but not critical, to confirm the authors' ab initio protein folding by comparing their model to the AlphaFold2-derived model, either by folding it themselves or extracting it from the AlphaFold Protein Structure Database, if it has already been folded by DeepMind.

      Thanks for your great suggestion. We have predicted the tertiary protein structures of CcTRPM with AlphaFold2 software and the result was shown in Author response image 1. Compared with the result in Figure 2A, the conserved ankyrin repeats (ANK) and six transmembrane domains were almost similar.

      Author response image 1.

      The tertiary structures of CcTRPM predicted with AlphaFold2 software.

      8) Figures 1F-G, 3F, 4A-B, 5G-J, S6C, and S7C-D do not plot replicates (although these are plotted in other figures).

      Thanks for your excellent suggestion. Besides Figure 1F-G was stacked grouped graph type and could not add the plot replicates, we have added the plot replicates in Figures 3F, 4A-B, 5G-J, S6C, and S7C-D of our revised manuscript.

      9) Figure 5A-C, and associated text: The significance of these findings is somewhat lost on me, coming from a position of general naivety concerning chitin biosynthesis. My interpretation of Figure 5A was that each of these steps was a necessary component of chitin biosynthesis. It was thus surprising that not all of the steps were required. I think it would be exceptionally helpful if the authors spent more time describing this pathway, alternative pathways to generating the intermediate steps, and ultimately, their hypothesis of why only two steps seem critical.

      Thanks for your great suggestion. The signal pathway of chitin biosynthesis in Figure 5A was modified from the paper of Doucet and Retnakaran, 2012. De novo biosynthesis of chitin has eight enzymatic steps, including 1 Trehalose, 2 enzymes in Glycolysis, 4 enzymes in Hexosamine pathway, and 1 Chitin synthesis. Glycolysis and hexosamine pathway are two complex cellular metabolic processes within organisms. We supposed that there are two reasons for not all of these steps were required: (1) the function of some enzymes may be replaced or supplemented by other enzymes, for examples, function of hexokinase and glucokinase was similar. (2) The reason for no obviously phenotypic defects might be cause by insufficient interference efficiency of RNAi. So, it’s worth to further study the functions of these chitin biosynthesis enzymes by CRISPR-Cas9 in future. We have added more describing about this chitin biosynthesis pathway in Line 379-390 of our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Line 19, should be morphological transition.

      Thanks for your excellent suggestion. We have changed “behavioral transition” to “morphological transition” in Line 19 of our revised manuscript.

      2) Line 21, delete the novel.

      Thanks for your excellent suggestion. We have deleted the word of “novel” in Line 21 of our revised manuscript.

      3) Fig. 2B, did authors examine the CcTRPM expression level before 3 d? Given that CcTRPM acts as a cold sensor, it is supposed to respond to temperature change quickly.

      Thanks for your excellent suggestion. We have examined the CcTRPM expression level in 1 d and 2 d after 10°C treatment compared with 25°C treatment. As expected, CcTRPM expression levels were also obviously increased in 1 d and 2 d after 10°C treatment. We have added the relative results in Figure S2F and relative description in Line 184-185, Line 500, and Line 1059-1060 of our revised manuscript.

      4) Fig. 2I, from the figure legend and the text in the panel, it's hard for readers to understand what the authors intend to say. This data is important since knockdown of CcTRPM decreases the winter-form from 90% to 30% at 10℃. Provide more information in the figure legend.

      Thanks for your excellent suggestion. We have added more information in the figure legend of Figure 2I in Line 933-939 of our revised manuscript.

      5) Line 224, ...CcTRPM functions as a molecular switch to modulate the transition from .... The phrase 'molecular switch' is inappropriate because knockdown of CcTRPM partially decreases the form ratio as shown in Fig.2I instead of reversing the effect completely. So, use other words instead of 'molecular switch'.

      Thanks for your excellent suggestion. We have changed “a molecular switch” to “an essential molecular signal” in Line 225 of our revised manuscript.

      6) Fig. 4G, this data is important. It's nice to see that this data is provided.

      Thanks for your excellent suggestion. We have provided the data of Figure 4G in Table S2 of our revised manuscript.

      7) Authors showed that CcTRPM functions as a cold receptor to regulate the transition of C. chinensis from summer-form to winter-form. Does this mean that a heat receptor gene functions oppositely by transiting winter-form into summer-form? Did the authors test the function of a heat TRP in the form transition? At least, discuss this in the discussion part.

      Thanks for your excellent suggestion. TRPV ion channel has been reported to function as a heat receptor in mammals by David Julius (Caterina et al., 1997; Cao et al., 2013). So, we supposed TRPV maybe function as a heat receptor to induce the transition from winter-form to summer-form in C. chinensis. The relative tests are on going. We have added two references in Line 681-686 and some discussion about the heat receptor in Line 341-345 of our revised manuscript.

      8) Line 433, which tissue was used for transmission electron microscopy?

      Thanks for your excellent suggestion. The thorax was used for transmission electron microscopy, and we have added the information in Line 448 and Line 453 of our revised manuscript.

      9) How is the conservation of miR-252? Does the regulatory role of CcTRPM and miR-252 apply to the psylla family in addition to C. chinensis?

      Thanks for your excellent suggestion. Besides C. chinensis, the phenomenon of summer-form and winter-form also existed in other psylla species, like Cyamophila willieti. Because of no genomic information was reported in most psylla species, we could not evaluate the conservation of miR-252 between different psylla species. However, it is worth and interesting to clarify whether the function of TRPM and miR-252 were conserved in the future.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Developing vaccination capable of inducing persistent antibody responses capable of broadly neutralizing HIV strains is of high importance. However, our ability to design vaccines to achieve this is limited by our relative lack of understanding of the role of T-follicular helper (Tfh) subtypes in the responses. In this report Verma et al investigate the effects of different prime and boost vaccination strategies to induce skewed Tfh responses and its relationship to antibody levels. They initially find that live-attenuated measles vaccine, known to be effective at inducing prolonged antibody responses has a significant minority of germinal center Tfh (GC-Tfh) with a Th1 phenotype (GC-Tfh1) and then explore whether a prime and boost vaccination strategy designed to induce GC-Tfh1 is effective in the context of anti-HIV vaccination. They conclude that a vaccine formulation referred to as MPLA before concluding that this is the case.

      Clarification: MPLA serves as the adjuvant, and the vaccine formulation is characterized as a Th1 formulation based on the properties of the adjuvant.

      Strengths: While there is a lot of literature on Tfh subtypes in blood, how this relates to the germinal centers is not always clear. The strength of this paper is that they use a relevant model to allow some longitudinal insight into the detailed events of the germinal center Tfh (GC-Tfh) compartment across time and how this related to antibody production.

      Weaknesses: The authors focus strongly on the numbers of GC-Tfh1 as a proportion of memory cells and their comparison to GC-Tfh17. There seems to be little consideration of the large proportion of GC-Tfh which express neither CCR6 and CXCR3 and currently no clear reasoning for excluding the majority of GC-Tfh from most analysis. There seems to be an assumption that since the MPLA vaccine has a higher number of GC-Tfh1 that this explains the higher levels of antibodies. There is not sufficient information to make it clear if the primary difference in vaccine efficacy is due to a greater proportion of GC-Tfh1 or an overall increase in GC-Tfh of which the percentage of GC-Tfh1 is relatively fixed.

      We appreciate the reviewer's comment. Indeed, while there is substantial literature on Tfh subtypes in blood, the strength of our study lies in utilizing a relevant model to provide longitudinal insights into the dynamics of the germinal center Tfh (GC-Tfh) compartment over time and its relationship to antibody production. Regarding the concern about the comprehensive analysis of GC Tfh subsets, including GC-Tfh1, GC-Tfh17, and others not expressing CCR6 and/or CXCR3, we fully acknowledge its importance. To address this, we will conduct a detailed analysis of GC Tfh and GC Tfh1 frequencies, encompassing subsets without CCR6 and CXCR3 expression, to provide a more comprehensive view of the GC-Tfh population in our analysis.

      Reviewer #2 (Public Review):

      Summary:

      Anil Verma et al. have performed prime-boost HIV vaccination to enhance HIV-1 Env antibodies in the rhesus macaque model. The authors used two different adjuvants, a cationic liposome-based adjuvant (CAF01) and a monophosphoryl lipid A (MPLA)+QS-21 adjuvant. They demonstrated that these two adjuvants promote different transcriptomes in the GC-TFH subsets. The MPLA+QS-21 adjuvant induces abundant GC TFH1 cells expressing CXCR3 at first priming, while the CAF01 adjuvant predominantly induced GC TFH1/17 cells co-expressing CXCR3 and CCR6. Both adjuvants initiate comparable Env antibody responses. However, MPLA+QS-21 shows more significant IgG1 antibodies binding to gp140 even after 30 weeks.

      The enhancement of memory responses by MPLA+QS-21 consistently associates with the emergence of GC TFH1 cells that preferentially produce IFN-γ.

      Strengths:

      The strength of this manuscript is that all experiments have been done in the rhesus macaque model with great care. This manuscript beautifully indicated that MPLA+QS-21 would be a promising adjuvant to induce the memory B cell response in the HIV vaccine.

      Weaknesses:

      The authors did not provide clear evidence to indicate the functional relevance of GC TFH1 in IgG1 class-switch and B cell memory responses.

      We appreciate the recognition of our meticulous work in the rhesus macaque model and the potential of MPLA+QS-21 as an adjuvant for HIV vaccine-induced humoral immunity. We acknowledge the need to provide clearer evidence of the functional relevance of GC Tfh1 in IgG1 class-switching and B cell memory responses. We will attempt to address this concern in our revisions.

    1. Author Response:

      We thank the editors and reviewers for their thoughtful and constructive assessment of our manuscript. In the upcoming revision process, we plan to address key concerns highlighted by the reviewers. While the bulk of our data involved the use of chemical SOD1 inhibitors, we intend to assess their on-target efficacy by measuring SOD activity after treatment. Additionally, we plan to perform key experiments to measure oxidative stress and DNA damage in SOD1-deletion cell lines to compare against the effects of chemical SOD1 inhibition. We acknowledge the lack of consideration for SOD2 and plan to explore changes in mitochondrial SOD2 expression and function in PPM1D-mutant cells at baseline and after SOD1-deletion. We will refine the text to clarify the data interpretation and elaborate on the limitations of our study in the discussion. Altogether, we thank the reviewers for their suggestions to improve our study and we hope that these additional experiments will provide additional evidence that SOD1 is a dependency in PPM1D-mutant leukemia cells.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript by Liu et al entitled "Discovery and biological evaluation of a potent small molecule CRM1 inhibitor for its selective ablation of extranodal NK/T cell lymphoma" reports the identification of a novel CRM1 inhibitor and shows its efficiency against extranodal natural killer/T cell lymphoma cells (ENKTL).

      This is a very timely and very original study with potential impact in a variety of pathologies not only in ENKTL. However, the main conclusions of the work are not supported by experimental evidence.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is original with considerable translational impact to the field. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      The study claims that LFS-1107 reversibly inhibits the nuclear export receptor CRM1 but the authors only show that the compound binds to CRM1 and that the CRM1 substrate IκBα accumulates in the cell nucleus upon LFS-1107 treatment. The evidence is indirect and alternative scenarios are certainly possible.

      Many thanks for this critical comment. We have conducted extra experiments to demonstrate that LFS-1107 can reversibly inhibit the nuclear transport machinery mediated by CRM1. Namely, culturing the medium for two hours after LFS-1107 treatment restored the transport of IκBα from the nucleus to the cytoplasm. Please see Figure 2 -Figure Supplement 3 for more details.

      On the other hand, the manuscript is not always well-written and insufficiently referenced.

      Thanks for this critical comment. This has been fixed. We have checked through the manuscript with extensive language editing. Moreover, we have added more references to the manuscript.

      The nuclear translocation in figure 2G is not convincing. The western blot in figure 2G shows that LFS-1107 treatment induces IκBα expression, and both cytoplasmic and nuclear amounts increase in a dose-dependent manner. Together, these data do not support nuclear IκBα accumulation upon LFS-1107 treatment.

      Thanks for this critical comment. This has been fixed. We have reconducted the Western experiments and our results revealed that only nuclear IκBα amount was increased upon the treatment of LFS-1107. In contrast, cytoplasmic IκBα amount was decreased after the treatment of LFS-1107. Please see Figure 2J for more details.

      Reviewer #2 (Public Review):

      Indeed, ENKTL is a rather deadly tumor with unmet medical needs. The work is novel in the sense that they designed and identified a very potent inhibitor homing at CRM1 via a deep-reinforcement learning model to suppress the overactivation of NF-κB signaling, an underlying mechanism of ENKTL pathogenesis. The authors demonstrated that LFS-1107 binds more strongly with CRM1 (approximately 40-fold) as compared to KPT-330, an existing CRM1 inhibitor. Another merit of the small-molecule inhibitor is that LFS-1107 can selectively eliminate ENKTL cells while sparing normal blood cells. Their animal results clearly demonstrated that the small-molecule inhibitor was able to extend mouse survival and eliminate tumor cells considerably. Overall, the manuscript may provide a possible therapeutic strategy to treat ENKTL with a good safety profile. The manuscript is also well-written. The weakness of the manuscript is that some details for the design and evaluation of the small-molecular inhibitor are missing.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is relatively novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the neural activity, measured by intrinsic optical imaging in reach-to-grasp, and reach-only conditions in relation to the Intra-cortical micro stimulation maps. The paper mostly describes a relatively unique and potentially useful data set. However, in the current version, no real hypotheses about the organization of M1 and PMd are tested convincingly. For example, the claim of "clustered neural activity" is not tested against any quantifiable alternative hypothesis of non-clustered activity, and support for this idea is therefore incomplete.

      The combination of intrinsic optical imaging and intra-cortical micro-stimulation of the motor system of two macaque monkeys promised to be a unique and highly interesting dataset. The experiments are carefully conducted. In the analysis and interpretation of the results, however, the paper was disappointing to me. The two main weaknesses in my mind were:

      a) The alternative hypotheses depicted in Figure 1B are not subjected to any quantifiable test. When is an activity considered to be clustered and when is it distributed? The fact that the observed actions only activate a small portion of the forelimb area (Figure 5G, H) is utterly unconvincing, as this analysis is highly threshold-dependent. Furthermore, it could be the case that the non-activated regions simply do not give a good intrinsic signal, as they are close to microvasculature (something that you actually seem to argue in Figure 6b). Until the authors can show that the other parts of the forelimb area are clearly activated for other forelimb actions (as you suggest on line 625), I believe the claim of cluster neural activity stands unsupported.

      We appreciate the reviewer’s concerns and we have made several revisions.

      (1) The two panels in Fig 1B should have been presented as potential outcomes as opposed to hypotheses in need of quantifiable testing. We revised the Introduction (line 105-111) and the Results (line 149-152) accordingly.

      (2) We agree that the thresholding procedure adopted in the original submission could have impacted the spatial measurements of cortical activity (i.e., Fig 5G-H in original submission). We have completely revised the thresholding procedure and it is now based on statistical comparisons that include all trials (instead of thresholding by number of sessions in the original submission). Thus, the thresholded maps in Fig 5G & 5J are now obtained from pixel-by-pixel comparisons (t-tests, p<1e-4) between frames acquired post-movement and frames acquired before movement. Nevertheless, even with this relatively relaxed threshold, the largest activity maps overlapped <40% of the forelimb representations.

      It is important to note that major vessels were excluded from the thresholded map and from the motor map. Thus, uncertainty about imaging in and around vessels was likely not a factor in the calculated overlap between thresholded maps and the motor map.

      (3) We agree that showing activation in other parts of the forelimb representations in response to action other than reach-to-grasp would have supported some of the arguments that we previously put forth. Unfortunately, we do not have the supporting data and obtaining it would take months/years. We have therefore expanded the Discussion to include limitations of the behavioral task (line 439-443).

      b) The most interesting part of the study (which cannot be easily replicated with human fMRI studies) is the correspondence between the evoked activity and intra-cortical stimulation maps. However, this is impeded by the subjective and low-dimensional description of the evoked movement during stimulation (mainly classifying the moving body part), and the relatively low-dimensional nature (4 conditions) of the evoked activity.

      We agree with the reviewer on all accounts. We expanded the Discussion to consider the low dimensionality of the motor maps and the behavioral task (line 439-449).

      Measuring cortical activity in a variety of motor tasks would likely have provided additional insight about movement-related cortical activity. Nevertheless, including additional tasks, even if it were possible to do so in the same monkeys, would have delayed study completion by months/years. The hidden challenge of the experimental design is that each monkey is trained to not move for many seconds to minimize contamination of ISOI signals. For example, from trial initiation to Go Cue, the monkey must hold its hand in the start position for 5 seconds. Similarly, after movement completion, the monkey must hold its hand in the start position for another 5 seconds. In between successful trials, a monkey must wait for ~12 seconds before it can initiate a new trial. These durations are >1 order of magnitude longer than in electrophysiological studies in comparable tasks. Achieving consistent task performance with the long durations used here, took months of daily training. Moreover, our monkeys typically run out of steam after ~60-70 min of working on the task. This forces us to limit the overall number of task conditions tested in a session, to obtain a large enough number of trials from each condition.

      c) Many details about the statistical analysis remain unclear and seem not well motivated.

      We address the reviewer’s specific concerns.

      Reviewer #2 (Public Review):

      Chehade and Gharbawie investigated motor and premotor cortex in macaque monkeys performing grasping and reaching tasks. They used intrinsic signal optical imaging (ISOI) covering an exceedingly large field-of-view extending from the IPS to the PS. They compared reaching and fine/power-grip grasping ISOI maps with "motor" maps which they obtained using extensive intracranial microstimulation. The grasping/reaching-induced activity activated relatively isolated portions of M1 and PMd, and did not cover the entire ICM-induced 'motor' maps of the upper limbs. The authors suggest that small subzones exist in M1 and PMd that are preferentially activated by different types of forelimb actions. In general, the authors address an important topic. The results are not only highly relevant for increasing our basic understanding of the functional architecture of the motor-premotor cortex and how it represents different types of forelimb actions, but also for the development of brain-machine interfaces. These are challenging experiments to perform and add to the existing yet complementary electrophysiology, fMRI, and optical imaging experiments that have been performed on this topic - due to the high sensitivity and large coverage of the particular IOSI methods employed by the authors. The manuscript is generally well written and the analyses seem overall adequate - but see below for some additional analyses that should be done. Although I'm generally enthusiastic about this manuscript, there are two major issues that should be clarified. These major questions relate mainly to potential thresholding issues and clustering issues.

      Major:

      1) The main claim of the authors is that specific forelimb actions activate only a small fraction of what they call the motor map (i.e., those parts of M1/PMd that evoke muscle contractions upon ICM). The action-related activity is measured by ISOI. When looking a the 'raw' reflectance maps, it is rather clear that relatively wide portions of the exposed cortex are activated by grasping/reaching, especially at later time points after the action. In fact, another reading of the results may be that there are two zones of 'deactivation' that split a large swath of motor-premotor cortex being activated by the grasping/reaching actions. (e.g. at 6 seconds after the cue in Fig 3A, 5A). At first sight, the 'deactivated' regions seem to be located in the cortex representing the trunk/shoulder/face - hence regions not necessarily activated (or only weakly) during the grasping/reaching actions. If true, this means that most of the relevant M1/PMd cortex IS activated during the latter actions - opposing the 'clustering' claims of the authors. This raises the question of whether the 'granularity' claimed by the authors is

      a. threshold dependent. In this context, the authors should provide an analysis whereby 'granularity' is shown independent of statistical thresholds of the ISOI maps.

      We appreciate the reviewer’s concerns and have completely revised the analyses central to Fig 5. We believe that the figure now contains evidence from both thresholded and unthresholded ISOI data in support of limited spatial extent of cortical activation (i.e., “granularity” in the reviewer’s comments).

      For evidence from unthresholded ISOI data, we examined reflectance change time courses from different size ROIs (line 764-768). (A) Small circular ROIs (0.4 mm radius), which we placed in the M1 hand, M1 arm, and PMd arm, zones (Fig 5B). (B) Large ROI inclusive of the M1 and PMd forelimb representations (Fig 5B). We reasoned that if cortical activity is spatially widespread, then the small and large ROIs would report similar time courses. In contrast, if cortical activity is spatially focal, then activity would be detected in the small ROI time courses but would washed out in the large ROI time courses. Our results support the second possibility (Fig 5C-F). Thus, in the movement conditions, time courses from the small ROIs had a large negative peak after movement completion (Fig C-E). In contrast, the characteristic negative peak was absent in the time courses obtained from the large ROI (Fig 5F).

      Separately, we revised our thresholding approach to make those results less sensitive to thresholding effects (more details in our response to the first major point from Reviewer 1). The revised results – thresholded/ binarized maps – are consistent with focal cortical activity. Fig 5G & 5J show activity maps thresholded (t-test, p<0.0001) without correction for multiple comparisons, and therefore represent the least restrictive estimate of the spatial extent of cortical activity. Measurements from these maps showed that significantly active pixels overlapped <40% of the M1 & PMd forelimb representations. We interpret the thresholded results as evidence in support of focal cortical activity.

      This raises the question of whether the 'granularity' claimed by the authors is

      b. dependent on the time-point one assesses the maps. Given the sluggish hemodynamic responses, it is unclear which part of the ISOI maps conveys the most information relative to the cue and arm/hand movements. I suspect that timepoints > 6 s will reveal even larger 'homogeneous' activations compared to the maps < 6s.

      We agree with the reviewer that the lag in hemodynamic signals complicates frame selection. Nevertheless, it is unlikely that cortical activity maps would have been larger at time points >6s from Cue. We provide three supporting arguments.

      (1) In the imaging sessions used in Fig 4, we acquired images for 9s per trial and systematically varied Cue onset time. The time courses in Fig 4A-B show that for all Cue onset conditions, the negative peak occurred <6s from Cue. This observation from unthresholded results does not support the notion of greater cortical activity at time points >6s from Cue.

      (2) From the same experiment, Fig 4C shows 9 thresholded/binarized maps generated from different time points in relation to Cue. We measured the size of each map (i.e., overlap with the M1/PMd forelimb representations). We present the results in Author response image 1. The largest maps came from an average frame captured +5.8-6.0s from Cue. Those maps are on the diagonal in Fig 4E (top left to bottom right). This result from thresholded data therefore does not support the notion of greater cortical activity at time points >6s from Cue.

      Author response image 1.

      (3) In all other sessions, we acquired images for 7s per trial (-1.0 to +6.0 s from Cue) without varying Cue onset time. At every time point (100 ms), we measured the size of the thresholded/binarized map in relation to the size of the M1 and PMd forelimb representations. The results are presented in Fig 5I & 5L and indicate that thresholded maps plateau in size by 5.0-5.5 s from Cue. At peak size, the maps overlapped <50% of the M1 and PMd forelimb representations. These result indicates that it is unlikely that we underreported the size of activity maps by not measuring map size beyond 6s from Cue.

      In fact, Fig 5F (which is highly thresholded) shows a surprisingly good match between the different forelimb actions, which argues against the existence of small subzones that are preferentially activated by different types of forelimb actions -the main claim of the authors.

      Our original proposal should have been more clearly stated. We were proposing that the thresholded maps, which had similar spatial organizations across conditions as the reviewer suggested, reported on subzones tuned for reach-to-grasp actions. Adjacent to those subzones could be other subzones that are preferentially active during other types of forelimb actions (e.g., pulling, pushing, grooming). We could not test this possibility in our study because the behavioral task examined a narrow range of arm and hand actions. We therefore revised the Discussion to state the limitations of our task and to lean more on published work that supports the present proposal (439-443 and 504-508).

      2) Related to the previous point, the ROI selections/definitions for the time course analyses seem highly arbitrary. As indicated in the introduction, the clustering hypothesis dictates that "an arm function would be concentrated in subzones of the motor arm zones. Neural activity in adjacent subzones would be tuned for other arm functions." To test this hypothesis directly in a straightforward manner, the authors could use the results from the ICM experiment to construct independent ROIs and to evaluate the ISOI responses for the different actions. In that case, the authors could do a straightforward ANOVA (if the data permits parametric analyses) with ROI, action, and time point (and possibly subject) as factors.

      We agree with the reviewer, and we now leverage the ICMS map for guiding ROI placement. All time courses are now derived from 1 of 2 types of ROIs. (1) Small ROIs (0.4 mm radius) placed in zones defined from ICMS (e.g., M1 hand zone). (2) Large ROIs that include the entire forelimb representations in M1 or in PMd (Fig 5B).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper evaluates the effect of knocking out CST7(Cystatin 5) on the APPNL-G-F Alzheimer's disease mouse model. They found sexually dimorphic outcomes, with differential transcriptional responses, increased phagocytosis (but interestingly a higher plaque burden) in females and suppressed inflammatory microglial activation in males (but interestingly no change in plaque burden). This study offers new insight into the functional role of CST7 that is upregulated in a subset of disease- associated microglia in AD models and human brain. Despite the discovery of disease-associated microglia several years ago, there has been little effort in understanding the function of the different genes that make up this profile, making this paper especially timely. Overall, the experiments are well-controlled and the data support the main conclusions and the manuscript could be strengthened by addressing the below comments and clarifying questions that could impact the interpretation of their data/ findings.

      1) In the first section discussing CST7 expression levels in AD models, it would be good to involve a discussion of levels of CST7 change in human AD samples. There are sufficient available datasets to look at this, and it would help us understand how comparable the animal models are to human patients. For example, while in mice CST7 is highly enriched in microglia/macrophages, in human datasets it seems like it is not quite so specific to microglia - it is equally expressed in endothelial cells. This might have a significant impact on the interpretation of the data, and it would be good to introduce and assess the findings in mice through the human subjects lens. There is a discussion of the human data in the discussion section, but it would be more appropriately assessed in the same way as the mouse data and comparatively presented in the results section. The authors could also include the data from Gerrits et al. 2021 in their first figure.

      We agree with the reviewer on the importance of considering the work in the context of human disease. While CST7 is not as strongly upregulated in human AD brain as it is in mouse expression is observed predominantly in myeloid cells in the brain with very minimal expression detected in endothelial cells (see screenshots in Author response image 1 from Brain Myeloid Landscape platform (http://research-pub.gene.com/BrainMyeloidLandscape/BrainMyeloidLandscape2/) and is enriched in AD clusters vs homeostatic in scRNASeq studies (Gerrits et al., 2021). We attempted immunostaining for human CF (CST7) in AD brains to assess expression and co-localisation with microglial markers but failed to validate any of the antibodies tested. Additionally, King et al., 2023 (PMID: 36547260) recently showed increase in CST7 expression in bulk hippocampal RNASeq in AD vs mid-life controls suggesting an ageing/AD mechanism. CST7 has also been shown to be expressed following overexpression of TREM2 in human microglia in vitro and that siRNA-mediated knockdown of expression leads to an increase in phagocytosis (Popescu et al., 2023 - PMID: 36480007), mirroring our data and suggesting a conserved role in human cells. Overall, we believe that, even in the context of mouse models, the understanding of the function of genes upregulated in disease is of importance to the field and that this study paves the way for further work investigating human CST7 in disease. We have added this (with citations to the datasets mentioned) to the discussion (highlighted).

      Author response image 1

      2) The differential RNAseq data is perhaps one of the most striking results of this paper; however it is difficult to see exactly how similar the male v female APPNL-G-F profiles are, in addition to the genes shared or not between the KO condition. Venn diagrams, in addition to statistical tests, would enhance this part of the paper and add more clarity.

      We have added Venn diagrams to show DEGs between male and female AppNL-G-F microglia vs WT control to show how similar the male v female APPNL-G-F profiles are. Additionally, to exemplify the Cst7KO-Sex interaction, a Venn showing DEGs between male and female AppNL-G-F microglia vs. AppNL-G-FCst7-/- microglia (Fig. 2 – Fig. supplement 3). We confirm we have derived all differential gene expression changes reported (including those represented in the Venn diagrams) using appropriate Padj statistical approaches (see Methods).

      3) A major argument in the paper is a continuation of Sala-Frigerio 2019 which says that the female phenotype is an acceleration of the male phenotype. Does this mean that if males were assessed at later timepoints, they would be more similar to the females? Or are there intrinsic differences that never resolve? It would be helpful to see a later timepoint for males to get at the difference between these two options

      This is an interesting question and while we acknowledge that empirically addressing with a later timepoint could add insight, we believe it would actually need multiple closely-spaced timepoints as choosing what single later timepoint would be optimal is difficult to judge (and likely not possible at all) for reasons below. We also believe data already published combined with our observations show it is most-likely a cell-intrinsic effect that explains our sex-specific differences.

      First, we emphasize the acceleration of the microglial phenotype in female AppNL-G-F mice previously published is fairly subtle and relative rather than absolute e.g. the DAM/ARM microglia state represents ~50% of all microglia in male and ~55% of all microglia in females at 12 months old therefore both sexes have similarly abundant microglia in the state that most highly express Cst7. Indeed, after the age at which DAM/ARM state microglia appear in appreciable numbers (~ 6 months), both females and males both have an abundance of them. It is important to note that a 12-month male is far more “progressed” than a 6-month female hence the stepped age effect is temporally short.

      Second, Cst7 deletion in the AppNL-G-F mice condition caused qualitative differences affecting distinct genes and/or overlapping genes moving in different directions between female and male mice - if a stepped age effect explained sex differences from Cst7 deletion, given that it could only be stepped by a very short timeframe (several weeks maximum) from reasoning above, we would expect to see similar qualitative changes but of different magnitude in female and male mice arising from Cst7 deletion; this is not the pattern we see.

      Third, beyond 12 months old, regression from ARM/DAM actually occurs, again making it unlikely males would “catch up” with females to show the same profile from Cst7 deletion but just at an older age – practically, this also complicates choosing a single later timepoint (and age-related systemic morbidity emerges as a potential confounder as well).

      In summary, while the acceleration of the DAM signature in female microglia offers an intriguing possible explanation to our observation of sexual dimorphism in response to deletion of one of the key genes in this signature, we believe it more likely that intrinsic effects are responsible for the Cst7 deletion sex-related impact. Taking the alternative perspective, even if a stepped age effect in the underlying progression of the model could explain our findings, this would need multiple timepoints with short gaps between (e.g. monthly at 12, 13, 14, 15 months old) to provide the temporal resolution to expose this pattern; we would not have the resources to conduct such a resource-intensive and lengthy study. We hope this reasoning appears logical and conscious of the importance to convey this in our manuscript we have revised the Discussion to as concisely as possible capture some key points outlined above.

      4) If the central argument is that CST7 in females decreases phagocytosis and in males increases microglia activation, are there changes in amyloid plaque burden or structure in the APPNL-G-F /CST 7 KO mice compared to APPNL-G-F/CST7 WT that reflect these changes? Please address. If not, how does this affect the functional interpretation of differential expression observed in phagocytic/reactive microglia genes? Pieces of this are discussed but it could be clearer.

      We emphasise the data already presented in Fig 6 and Fig. 6 – Fig. Supplement 2 showing altered Aβ burden (6E10 staining) and plaque count (MeX04) but no change in plaque area. Regarding the functional interpretation of Cst7-dependent gene changes in microglia beyond the endolysosomal function we present in figures 3-5, we have included additional data using simple immunohistochemistry, as suggested by the reviewer, to assess synapse abundance. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss. We would also emphasise that altered expression of phagolysosomal genes could affect disease in ways beyond interactions with amyloid and synapses.

      5) It is confusing that increased phagocytosis in the APPNL-G-F/CST7 KO females leads to greater plaque burden, considering proteolysis is not affected. What might explain this observation? Additionally, it is interesting that suppression of microglial activation doesn't lead to an increase in plaques in the male APPNL-G-F/CST7 KO mice. How does the profile of phagocytic microglia in the male APPNL-G-F/CST7 KO mice differ from the APPNL-G-F males?

      We emphasize our comments on this topic in the discussion where we speculate that the greater plaque burden in females is linked to increased uptake of Aβ (which we observe in Fig. 4B&C) and deposition into plaques as suggested by Huang et al., 2021 (PMID: 33859405), d’Errico et al., 2022 (PMID: 34811521) and Shabestari et al., 2022 (PMID: 35705056). Regarding the lack of effect in males despite the suppression of inflammatory genes, we agree this is a curious observation, although may point to as yet ill-defined mechanisms for how inflammatory pathways influence plaque pathology. Unfortunately, we were not able to specifically compare the profile of phagocytic microglia in AppNL-G-F vs AppNL-G-FCst7-/- as we did not perform single-cell RNASeq. However, our bulk RNASeq profiling suggests modest downregulation of phagocytic/endolysosomal genes (eg Lilrb4a, Fig. 2I) and reduced expression of LAMP2 in microglia by immunostaining. We have added further comment on this in the discussion.

      6) Seems that the authors have potentially discovered an unusual mechanism for how CST7 could regulate cell autonomous function without impacting its canonical protease target. The authors deal with this extensively in the discussion but an ELISA or ICC to localize CST7 to microglia in vitro or in vitro would help address this point.

      We have added FISH data localising Cst7 expression to IBA1+ cells specifically around plaques in App brains (Fig. 1B-E). We agree that assessing the subcellular localisation and any non-microglial expression of Cystatin-F (the protein coded by Cst7) would offer valuable insight into the protease target and may reveal details on the precise mechanism by which CF deletion leads the phenotype we observe in this study. However, despite attempting numerous commercially available and gifted antibodies to detect CF we were unable to validate (using Cst7-/- as controls) any methods other than FISH.

      7) The authors focus on plaques in their final figure, however dysregulated microglial phagocytosis could impact many other aspects of brain health. Simple immunohistochemistry for synapses and myelin/oligodendrocytes (especially given the results of the in vitro phagocytosis assay) could provide more insight here.

      We fully agree with the reviewer. As also outlined in our responses elsewhere, phagocytic changes could have multiple consequences, and we have included additional data using immunohistochemistry as advised for synapses in WT, AppNL-G-F, and AppNL-G-F/Cst7-/- brains. We show loss of Sy38 coverage around plaques (Fig. 6I) and a moderate but significant decrease in coverage between AppNL-G-F/Cst7-/- vs AppNL-G-F brains only in females (Fig. 6J). This reflects the effect observed with plaque coverage whereby we observe increased burden in AppNL-G-F/Cst7-/- vs AppNL-G-F females but not males (Fig. 6B-F) suggesting the increased plaque burden in Cst7-/- female mice may lead to increased synapse loss.

      We also performed immunohistochemistry for myelin makers MAG and MBP but found no plaque-associated pathology. Finally, we searched for dystrophic neurites using LAMP1 but found that the antibody stained microglial lysosomes rather than dystrophic neurites in this model (see Author response image 2), an observation that has been made by others (Sharoar et al., 2021 - PMID: 34215298).

      Overall, our data suggest Cst7 may play a protective role in females, limiting phagocytosis, reducing plaque burden and blunting synapse loss.

      Author response image 2.

      Reviewer #3 (Public Review):

      In this manuscript, Daniels et al explored the role of Cystatin F in an A-driven mouse model of Alzheimer's disease. By crossing a constitutive knockout mouse lacking the gene that encodes Cystatin F, Cst7, to the AppNL-G-F mouse line, the authors describe impairments in microglial gene expression and phagocytic function that emerge more prominently in females versus males lacking Cst7. A strength of the study is its focus: given mounting evidence that microglia are a hub of neurological dysfunction with particular potential to trigger or exacerbate neurodegenerative disorders, it is essential to determine the changes in microglia that occur pathologically to promote disease progression. Similarly, the wide-spread identification of the gene in question, Cst7, as upregulated in AD models makes this gene a good target for mechanistic studies.

      The paper in its current form also has several weaknesses which limit the insights derived, weaknesses that are largely related to the experimental tools and approaches chosen by the authors to test their hypotheses. For example, the paper begins with a figure replotting data from previous studies showing that Cst7 is upregulated in mouse models of Alzheimer's disease. Though relevant to the current study, there are no new insights provided here. Next, the authors perform bulk RNA-sequencing on microglia isolated from male and female mice in the Cst7-/-; AppNL-G-F mouse line. In the methods, it is unclear whether the authors took precautions to preserve the endogenous transcriptional state of these cells given evidence that microglia can acquire a DAM-like signature simply due to the process of dissociation (Marsh et al, Nature Neuroscience, 2022). If the authors did not control for this, their results may not support the conclusions they draw from the data. Relatedly, it appears the authors pooled all microglia together here, instead of just isolating DAMs specifically or analyzing microglia at single-cell resolution, which could reveal the heterogeneous nature of the role of Cst7 in microglia. In addition to losing information about heterogeneity, another concern is that they could be diluting out the major effects of the model on microglial function by including all microglia. Overall, the biggest issue I have with the RNA-sequencing data is the lack of validation of the gene expression changes identified using a different method that does not require dissociation, like immunohistochemistry or fluorescence in situ hybridization. Especially given the limited number of genes they found to be mis-regulated (see Fig. 2 E and G), I worry that these changes might simply be noise, especially since the authors provide no further evidence of their mis-regulation. Without further validation, the data presented are not sufficient to support the authors' claims.

      We believe we have addressed this comment in the “Essential Revisions (for the authors)” section above. Please see again below:

      We took standard precautions to minimise the risk of aberrant ex vivo cell activation, including maintaining cells on ice during non-enzyme steps of the procedure and carrying out preps in small batches to minimise time taken from removal of brain to purification of microglial RNA. Importantly, we also validated key expression data by in situ methods such as RNA FISH for Cst7 and Lilrb4a (Fig. 1B-E, Fig 2. - Fig. supplement 3) thus eliminating dissection-induced effects. Additionally, when performing qPCR on microglia from non-disease mice to test the disease-specific role of Cst7-dependent gene regulation we did not observe the same gene changes (Fig 2. - Fig. supplement 4) which, if such changes were dependent on tissue dissociation, we would expect to observe in WT or disease animals. We utilised the resources provided by Marsh et al. 2022 to search for overlap between enzyme-induced genes and our DEG lists from our key comparisons. We found the enzyme-induced gene set had very minimal overlap with any of our comparisons with overlap of only 4 genes between enzyme-induced genes and Cst7-dependent genes in males and no overlap between enzyme-induced genes and Cst7-dependent genes in females. We would further point out that the disease-induced microglial RNAseq profile in the AppNL-G-F Cst7+/+ (i.e. disease WT) condition mirrors those observed previously by multiple methods including in situ profiling (Zeng et al 2023 - PMID: 36732642) and RiboTag approaches (Kang et al 2018 - PMID: 30082275). We believe these combined approaches provide convincing validation of the RNAseq data.

      In assessing the changes in microglial function and A pathology that occur in males and females of the Cst7-/-; AppNL-G-F line, the authors identify some differences between how females and males are affected by the loss of Cst7. While the statistical analyses the authors perform as given in the figure legends appear to be correct, the plots do not show significant changes between males and females for a given parameter. Take for example Figure 3H. Loss of Cst7 decreases IBA+Lamp+ microglia in males but increases this parameter in females. However, it does not appear that there is a significant difference in IBA+Lamp+ microglia in male versus female mice lacking Cst7. If there is no absolute difference between males and females, can the differential effects of Cst7 knockout on the sexes really be so relevant to the sexual dimorphism observed in the disease? I question this connection, but perhaps a greater discussion of what the result might mean by the authors would be helpful for placing this into context.

      We understand the reviewer’s perspective and we agree that the interpretations could be presented and explained better in the text - we have updated the discussion as suggested to address this.

      We designed our study initially to search for sex-specific effects of Cst7. Therefore, whilst our ANOVA does include main effects analysis for disease or sex, we carried out post-hoc analysis primarily to investigate effects of Cst7 deletion within sex. In the case of Fig. 3H pointed out by the reviewer, we observe a main effect for disease in the ANOVA and for disease-sex interaction but not for sex. Post-hoc analysis revealed the sex-specific effects of Cst7 we describe in the manuscript. This approach on analysis was also taken by Hoghooghi et al. (2020 - PMID: 33027652) who show related pathway gene Cstc is detrimental in EAE in females but not males (included in the discussion in this manuscript). The observation in Fig. 3H that there appears to be a Cst7 effect in males and females but not a sex effect in Cst7-/- is accurate but a relative anomaly in this study. Generally, we find that, alongside Cst7 deletion affecting females differently to males, we also see a sex effect in Cst7-/- animals but not in Cst7+/+ animals i.e. absolute levels in disease condition as well as relative changes from control to disease condition are different between males and females. This is exemplified in Fig. 4B&C where we observe increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals and in Fig. 6D where we observe increased Aβ plaque burden in female Cst7-/- animals vs male Cst7-/- animals. This is most strikingly demonstrated in the case of our RNASeq data where we observe a difference in sex-dependent genes in AppNL-G-F vs AppNL-G-F/Cst7-/- (Fig. 2 – Fig. supplement 3B) implying removal of the Cst7 gene led to an ‘unlocking’ of sexual dimorphism in our cohort which we comment on in the discussion.

      Finally, the use of in vitro assays of microglial function can be helpful as secondary analyses when coupled with in vivo or ex vivo approaches, but are not on their own sufficient to support the authors' conclusions. Quantitative engulfment assays (see Schafer et al, Neuron, 2012) on brain tissue showing that male and female microglia lacking Cst7 engulf different amounts of material (e.g. plaques, synapses, myelin) in the intact brain would be more convincing.

      We agree that in vitro assays for microglial function are not always sufficient as standalone methods to support conclusions on functions in disease. The reviewer may have missed our in vivo MeX04 uptake assays (Fig 4A-D) which use measurements by flow cytometry on isolated microglia, this is a reflection of the microglial uptake in vivo following MeX04 injection pre-mortem – this experiment showed increased microglial Aβ in female Cst7-/- animals vs male Cst7-/- animals (Fig. 4B&C). Our in vitro assays complement and extend insight in ways not possible in vivo, for example they offer key insight into uptake/degradation kinetics that would be extremely challenging to carry out in vivo.

      In general, a major limitation to the insights that can be derived in the study is the decision of the authors to perform all experiments at a single late-stage time point of 12 months of age. As this is quite far into disease progression for many AD models, phenotypic changes identified by the authors could arise due to the downstream effects of plaque deposition and therefore may not implicate Cst7 as a mechanism driving neurodegeneration rather than one of many inflammatory changes that accompany AD mouse models nearing the one-year time point. A related problem is that the study uses a constitutive KO mouse that has lacked Cst7 expression throughout life, not just during disease processes that increase with aging. In summary, the topic of the article is important and timely, but the connection between the data and the authors' conclusions is not as strong as it could be.

      As described above, Cst7 expression is absent at steady-state and low until 6-12 months. Therefore, we predict that deletion would have little effect until 12+ months whereby cells expressing Cst7 have had the temporal window to affect disease pathology, as we find in the current study. This was a key part of the reasoning in our choice of the 12-month age for analyses. The negligible expression of Cst7 at baseline/early stages of disease suggests constitutive KO of the gene will not impact the phenotype until disease onset. This is substantiated by the lack of any genotype-related differences in the WT vs Cst7-/- comparisons in the non-disease condition.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an interesting data set from historic Western Eurasia and North Africa. Overall, I commend the authors for presenting a comprehensive paper that focuses the data analysis of a large project on the major points, and that is easy to follow and well-written. Thus, I have no major comments on how the data was generated, or is presented. Paradoxically, historical periods are undersampled for ancient DNA, and so I think this data will be useful. The presentation is clever in that it focuses on a few interesting cases that highlight the breadth of the data.

      The analysis is likewise innovative, with a focus on detecting "outliers" that are atypical for the genetic context where they were found. This is mainly achieved by using PCA and qpAdm, established tools, in a novel way. Here I do have some concerns about technical aspects, where I think some additional work could greatly strengthen the major claims made, and lay out if and how the analysis framework presented here could be applied in other work.

      clustering analysis

      I have trouble following what exactly is going on here (particularly since the cited Fernandes et al. paper is also very ambiguous about what exactly is done, and doesn't provide a validation of this method). My understanding is the following: the goal is to test whether a pair of individuals (lets call them I1 and I2) are indistinguishable from each other, when we compare them to a set of reference populations. Formally, this is done by testing whether all statistics of the form F4(Ref_i, Ref_j; I1, I2) = 0, i.e. the difference between I1 and I2 is orthogonal to the space of reference populations, or that you test whether I1 and I2 project to the same point in the space of reference populations (which should be a subset of the PCA-space). Is this true? If so, I think it could be very helpful if you added a technical description of what precisely is done, and some validation on how well this framework works.

      We agree that the previous description of our workflow was lacking, and have substantially improved the description of the entire pipeline (Methods, section “Modeling ancestry and identifying outliers using qpAdm”), making it clearer and more descriptive. To further improve clarity, we have also unified our use of methodology and replaced all mentions of “qpWave” with “qpAdm”. In the reworked Methods section mentioned above, we added a discussion on how these tests are equivalent in certain settings, and describe which test we are exactly doing for our pairwise individual comparisons, as well as for all other qpAdm tests downstream of cluster discovery. In addition, we now include an additional appendix document (Appendix 4) which, for each region, shows the results from our individual-based qpAdm analysis and clustering in the form of heatmaps, in addition to showing the clusters projected into PC space.

      An independent concern is the transformation from p-values to distances. I am in particular worried about i) biases due to potentially different numbers of SNPs in different samples and ii) whether the resulting matrix is actually a sensible distance matrix (e.g. additive and satisfies the triangle inequality). To me, a summary that doesn't depend on data quality, like the F2-distance in the reference space (i.e. the sum of all F4-statistics, or an orthogonalized version thereof) would be easier to interpret. At the very least, it would be nice to show some intermediate results of this clustering step on at least a subset of the data, so that the reader can verify that the qpWave-statistics and their resulting p-values make sense.

      We agree that calling the matrix generated from p-values a “distance matrix” is a misnomer, as it does not satisfy the triangle inequality, for example. We still believe that our clustering generates sensible results, as UPGMA simply allows us to project a positive, symmetric matrix to a tree, which we can then use, given some cut-off, to define clusters. To make this distinction clear, we now refer to the resulting matrix as a “dissimilarity matrix” instead. As mentioned above, we now also include a supplementary figure for each region visualizing the clustering results.

      Regarding the concerns about p-values conflating both signal and power, we employ a stringent minimum SNP coverage filter for these analyses to avoid extremely-low coverage samples being separated out (min. SNPs covered: 100,000). In addition, we now show that cluster size and downstream outlier status do not depend on SNP coverage (Figure 2 - Suppl. 3).

      The methodological concerns lead me to some questions about the data analysis. For example, in Fig2, Supp 2, very commonly outliers lie right on top of a projected cluster. To my understanding, apart from using a different reference set, the approach using qpWave is equivalent to using a PCA-based clustering and so I would expect very high concordance between the approaches. One possibility could be that the differences are only visible on higher PCs, but since that data is not displayed, the reader is left wondering. I think it would be very helpful to present a more detailed analysis for some of these "surprising" clustering where the PCA disagrees with the clustering so that suspicions that e.g. low-coverage samples might be separated out more often could be laid to rest.

      To reduce the risk of artifactual clusters resulting from our pipeline, we devised a set of QC metrics (described in detail below) on the individuals and clusters we identified as outliers. Driven by these metrics, we implemented some changes to our outlier detection pipeline that we now describe in substantially more detail in the Methods (see comment above). Since the pipeline involves running many thousands of qpAdm analyses, it is difficult to manually check every step for all samples – instead, we focused our QC efforts on the outliers identified at the end of the pipeline. To assess outlier quality we used the following metrics, in addition to manual inspection:

      First, for an individual identified as an outlier at the end of the pipeline, we check its fraction of non-rejected hypotheses across all comparisons within a region. The rationale here is that by definition, an outlier shouldn’t cluster with many other samples within its region, so a majority of hypotheses should be rejected (corresponding to gray and yellow regions in the heatmaps, Appendix 4). Through our improvements to the pipeline, the fraction of non-rejected hypotheses was reduced from an average of 5.3% (median 1.1%) to an average of 3.8% (median 0.6%), while going from 107 to 111 outliers across all regions.

      Second, we wanted to make sure that outlier status was not affected by the inclusion of pre-historic individuals in our clustering step within regions. To represent majority ancestries that might have been present in a region in the past, we included Bronze and Copper Age individuals in the clustering analysis. We found that including these individuals in the pairwise analysis and clustering improved the clusters overall. However, to ensure that their inclusion did not bias the downstream identification of outliers, we also recalculated the clustering without these individuals. We inspected whether an individual identified as an outlier would be part of a majority cluster in the absence of Bronze and Copper Age individuals, which was not the case (see also the updated Methods section for more details on how we handle time periods within regions).

      In response to the “surprising” outliers based on the PCA visualizations in Figure 2, Supplement 2: with our updated outlier pipeline, some of these have disappeared, for example in Western and Northern Europe. However, in some regions the phenomenon remains. We are confident this isn’t a coverage effect, as we’ve compared the coverage between outliers and non-outliers across all clusters (see previous comment, Figure 2 - Suppl. 3), as well as specifically for “surprising” outliers compared to contemporary non-outliers – none of which showed any differences in the coverage distributions of “surprising” outliers (Author response images 1 and 2). In addition, we believe that the quality metrics we outline above were helpful in minimizing artifactual associations of samples with clusters, which could influence their downstream outlier status. As such, we think it is likely that the qpAdm analysis does detect a real difference between these sets of samples, even though they project close to each other in PCA space. This could be the result of an actual biological difference hidden from PCA by the differences in reference space (see also the reply to the following comment). Still, we cannot fully rule out the possibility of latent technical biases that we were not able to account for, so we do not claim the outlier pipeline is fully devoid of false positives. Nevertheless, we believe our pipeline is helpful in uncovering true, recent, long-range dispersers in a high-throughput and automated manner, which is necessary to glean this type of insight from hundreds of samples across a dozen different regions.

      Author response image 1.

      SNP coverage comparison between outliers and non-outliers in region-period pairings with “surprising” outliers (t-test p-value: 0.242).

      Author response image 2.

      PCA projection (left) and SNP coverage comparison (right) for “surprising” outliers and surrounding non-outliers in Italy_IRLA.

      One way the presentation could be improved would be to be more consistent in what a suitable reference data set is. The PCAs (Fig2, S1 and S2, and Fig6) argue that it makes most sense to present ancient data relative to present-day genetic variation, but the qpWave and qpAdm analysis compare the historic data to that of older populations. Granted, this is a common issue with ancient DNA papers, but the advantage of using a consistent reference data set is that the analyses become directly comparable, and the reader wouldn't have to wonder whether any discrepancies in the two ways of presenting the data are just due to the reference set.

      While it is true that some of the discrepancies are difficult to interpret, we believe that both views of the data are valuable and provide complementary insights. We considered three aspects in our decision to use both reference spaces: (1) conventions in the field (including making the results accessible to others), (2) interpretability, and (3) technical rigor.

      Projecting historical genomes into the present-day PCA space allows for a convenient visualization that is common in the field of ancient DNA and exhibits an established connection to geographic space that is easy to interpret. This is true especially for more recent ancient and historical genomes, as spatial population structure approaches that of present day. However, there are two challenges: (1) a two-dimensional representation of a fairly high-dimensional ancestry space necessarily incurs some amount of information loss and (2) we know that some axes of genetic variation are not well-represented by the present-day PCA space. This is evident, for example, by projecting our qpAdm reference populations into the present-day PCA, where some ancestries which we know to be quite differentiated project closely together (Author response image 3). Despite this limitation, we continue to use the PCA representation as it is well resolved for visualization and maximizes geographical correspondence across Eurasia.

      On the other hand, the qpAdm reference space (used in clustering and outlier detection) has higher resolution to distinguish ancestries by more comprehensively capturing the fairly high-dimensional space of different ancestries. This includes many ancestries that are not well resolved in the present-day PCA space, yet are relevant to our sample set, for example distinguishing Iranian Neolithic ancestry against ancestries from further into central and east Asia, as well as distinguishing between North African and Middle Eastern ancestries (Author response image 3).

      To investigate the differences between these two reference spaces, we chose pairwise outgroup-f3 statistics (to Mbuti) as a pairwise similarity metric representing the reference space of f-statistics and qpAdm in a way that’s minimally affected by population-specific drift. We related this similarity measure to the euclidean distance on the first two PCs between the same set of populations (Author response image 4). This analysis shows that while there is almost a linear correspondence between these pairwise measures for some populations, others comparisons fall off the diagonal in a manner consistent with PCA projection (Author response image 3), where samples are close together in PCA but not very similar according to outgroup-f3. Taken together, these analyses highlight the non-equivalence of the two reference spaces.

      In addition, we chose to base our analysis pipeline on the f-statistics framework to (1) afford us a more principled framework to disentangle ancestries among samples and clusters within and across regions (using 1-component vs. 2-component models of admixture), while (2) keeping a consistent, representative reference set for all analyses that were part of the primary pipeline. Meanwhile, we still use the present-day PCA space for interpretable visualization.

      Author response image 3.

      Projection of qpAdm reference population individuals into present-day PCA.

      Author response image 4.

      Comparison of pairwise PCA projection distance to outgroup-f3 similarity across all qpAdm reference population individuals. PCA projection distance was calculated as the euclidean distance on the first two principal components. Outgroup-f3 statistics were calculated relative to Mbuti, which is itself also a qpAdm reference population. Both panels show the same data, but each point is colored by either of the two reference populations involved in the pairwise comparison.

      PCA over time

      It is a very interesting observation that the Fst-vs distance curve does not appear to change after the bronze age. However, I wonder if the comparison of the PCA to the projection could be solidified. In particular, it is not obvious to me how to compare Fig 6 B and C, since the data in C is projected onto that in Fig B, and so we are viewing the historic samples in the context of the present-day ones. Thus, to me, this suggests that ancient samples are most closely related to the folks that contribute to present-day people that roughly live in the same geographic location, at least for the middle east, north Africa and the Baltics, the three regions where the projections are well resolved. Ideally, it would be nice to have independent PCAs (something F-stats based, or using probabilistic PCA or some other framework that allows for missingness). Alternatively, it could be helpful to quantify the similarity and projection error.

      The fact that historical period individuals are “most closely related to the folks that contribute to present-day people that roughly live in the same geographic location” is exactly the point we were hoping to make with Figures 6 B and C. We do realize, however, that the fact that one set of samples is projected into the PC space established by the other may suggest that this is an obvious result. To make it more clear that it is not, we added an additional panel to Figure 6, which shows pre-historical samples projected into the present-day PC space. This figure shows that pre-historical individuals project all across the PCA space and often outside of present-day diversity, with degraded correlation of geographic location and projection location (see also Author response image 5). This illustrates the contrast we were hoping to communicate, where projection locations of historical individuals start to “settle” close to present-day individuals from similar geographic locations, especially in contrast with pre-historic individuals.

      Author response image 5.

      Comparing geographic distance to PCA distance between pairs of historical and pre-historical individuals matched by geographic space. For each historical period individual we selected the closest pre-historical individual by geographic distance in an effort to match the distributions of pairwise geographic distance across the two time periods (left). For these distributions of individuals matched by geographic distance, we then queried the euclidean distance between their projection locations in the first two principal components (right).

    1. Author Response

      Reviewer #1 (Public Review):

      “The authors use hM4Di to "silence" Fos-tagged neurons in the basal forebrain, but they have not validated the efficiency or the possible various effects of this reagent.

      It is possible that hM4Di actually has a relatively small effect on suppressing the AP activity of neurons. Nevertheless, hM4Di might still be an effective manipulation, because it was shown to additionally reduce transmitter release at the nerve terminal (see e.g. Stachniak et al. (Sternson) 2014, Neuron). Thus, the authors should evaluate in control experiments whether hM4Di expression plus CNO actually electrically silences the AP-firing of ChAT neurons in the BF as they seem to suggest, and/or if it reduces ACh release at the terminals. For example, one experiment to test the latter would be to perfuse CNO locally in the BLA; after expressing hM4Di in the cholinergic neurons of the BF. At the very least, the assumed action of hM4Di, and the possible caveats in the interpretation of these results should be discussed in the paper.”

      We find that activation of hM4Di with clozapine in basal forebrain cholinergic neurons results in clear alterations to neuronal activation in projection targets and in behavior (Figures 3, Figure 3-Supplement 1, Figure 5, Figure 5-Supplement 1, Figure 5-Supplement 2, Figure 6-Supplement 1 and Figure 8). Previous studies demonstrated that activation of hM3Dq or hM4di in cholinergic neurons results in changes to electrical activity and behavioral response (Zhang et al. 2017 & Jin et al. 2019). Though we are unable to distinguish whether the effects on behavior in our experiments are a result of decreases in ACh release at terminals, inhibition of action potential firing, or both, our behavioral findings are consistent with demonstrations that inhibition of basal forebrain cholinergic neurons can alter behavior. See Page 17 Lines 488-493 for a discussion.

      “The names of brain areas like "NBM/SIp" and "VP-SIa" need to be better introduced, and somehow contextualized (in the Introduction, and also at first reading in the Results).”

      We agree that our prior presentation of these regions was confusing and in general the boundaries of these regions are not well-defined in the field. We have included a description of anatomical landmarks and bregma coordinates to clarify our definitions of the regions NBM/SIp (Page 4 Line 103-104) and VP/SIa (Page 4 Line 107-108).

      “Figure 3C: Application of CNO on the memory recall day leads to a strong reduction in CS-driven freezing. However, in this experiment, and also in Fig. S7, the pre-tone value of freezing is also strongly reduced. This would indicate that the activity of NBM/SIp cells (or else, ACh-release from these cells - see also Major point 1), also influences contextual learning. The authors should, first, statistically, test these effects (I am not sure this was done). If these differences are significant, a possible role of ACh in contextual fear learning should be discussed. Has it been shown before whether ACh is involved in contextual fear learning? Does this indicate the involvement of another target area of ACh neurons (e.g., the hippocampus?).”

      We statistically compared the pre-tone freezing response between Sham and hM4Di groups across our experiments and found no significant differences in pre-tone freezing between the groups (Figure 3D- Sham vs. ADCD-hM4Di, Pre-tone p=0.3544; Figure 5B- Sham vs. hM4di, Pre-tone p=0.0679; Figure 5C- Sham vs. hM4Di, Pre-tone p=0.0966; Figure 5-Supplement 2A- Sham vs. hM4Di, Pre-tone p>0.99). These comparisons can also be reviewed in the statistical reporting table uploaded along with the manuscript.

      “The discussion could be improved by better comparing what they found, to the wider literature. For example, previous papers studying other neuromodulatory systems found evidence for a modulation of neuromodulator release after learning, e.g. see Martins and Froemke 2015 Nat. Neuroscience for the noradrenergic system, Tang et al. (Schneggenburger lab) 2020 J. Neuroscience for the dopaminergic system and fear learning; and Uematsu et al., 2017, Nat. Neuroscience for the noradrenergic system and fear learning. Maybe the authors could include these and similar references when revising their discussion to take into account a broader view of previous findings related to other neuromodulatory systems.”

      Our study joins the growing body of literature demonstrating stimulus-encoding and rapid stimulus-contingent responses in various neuromodulatory systems in learning and memory recall. We have now added a substantial discussion, detailing both the similarities and differences between our findings and those found in the dopaminergic, serotonergic, noradrenergic, and oxytocinergic systems in fear learning. See Pages 20-21 Lines 575-605.

      Reviewer 2 (Public Review):

      “Throughout the paper, the authors use comparisons of cell activity between groups to address questions about projection-specific and cue-specific cell activation and reactivation. However, statistical comparisons are sometimes done between biological replicates (e.g. Fig. 5A), whereas a lot of them are done between technical replicates (e.g. Fig. 2B, 5B, 7B). Adding statistics that compare biological replicates would help increase confidence in the results.”

      We have replotted our data as a comparison of biological replicate (by individual animal) in new versions of Figures 1-8, and Figure 1-Supplements 1-3, Figure 5-Supplements 1 & 2, Figure 6-Supplements 1 & 2, Figure 7-Supplement 1, and Figure 8-Supplement 1. Correspondingly, all statistical analyses have been conducted comparing biological replicates. To note, these changes have not changed the overall conclusions of each figure. The sample size, statistical test and p-values for our comparisons are included in the figure legends and in the newly included statistical reporting table.

      "To demonstrate engram-like specificity, in figure 4C the authors show fold change in cholinergic reactivation in low and high responders (animals that show low and high defensive freezing upon cue presentation) as normalized by cell activity while sitting in the home cage. However, the authors also collected a better control for this comparison, which is shown in figure S4, where the animals were exposed to an unconditioned tone cue. Comparing fold change to this tone-alone condition would provide stronger evidence for the authors' point, as this would directly compare the specificity of cholinergic reactivation to a conditioned vs an unconditioned cue. A discussion of the same comparison is relevant for figure 2 (and is shown in figure S4) but is not mentioned in the text.”

      We have evaluated the cholinergic response to the tone using GRABACh3.0 as a readout of ACh release in the BLA, and using IEG expression as a readout of cholinergic neuron activation. We find no significant increase in ACh release in the BLA in response to tone presentation (Figure 1C-left, 1D-left) and no significant increase in tone associated reactivation of cholinergic neurons (using IEG as a readout, 2C/D, Figure 1-Supplement 2, Figure 1-Supplement 3, Figure 6-Supplement 1A) unless the tone has been previously paired with a foot shock(see Figure 1C-right, 2C, 3D). In addition, we find no statistically significant differences between home cage and tone alone conditions (Figure 2C – home cage-home cage condition vs. tone-tone condition, p=0.5012; Based on these analyses, we use the home cage group as our control group for comparison.

      “The significant correlation between cue-evoked percent change in defensive freezing from pretone and fold change in cholinergic cell activity relative to the home cage that is shown in figure 4D is somewhat confusing. Is the correlation considering all the points shown (high and low responders as depicted by black and grey points)? It's first reported as one correlation but then is discussed as two populations that have different results. Further, is the average amount of reactivation for the home-cage controls used here the same denominator for each reported animal? Similarly to the point above, a correlation looking at fold change from tonealone would also be helpful to determine the degree to which cholinergic reactivation is specific to threat-association learning versus the more general attentional component that this system is known for.”

      We have substantially modified this figure, now new Figure 6, to clarify our point. Along with this revision, we have removed the correlation plots and corresponding analyses from the revised version of the manuscript and figures.

      Figure 6 now begins with behavior data from a distinct cohort of mice outlining our criteria for high vs. low responders (Figure 6A/B). In Figure 6C, conducted in a separate cohort of mice that only underwent behavioral testing to clarify the definition of high vs. low responders, we note via schematic that ADCD labeling was carried out during the recall session (unlike Figure 2). In panel D, we show fold change of activated cholinergic neurons stratified by High vs. Low responder status. This fold change is normalized to the average activation from the home cage control animals in each experimental cohort. Taken together we find animals with a ~2 fold increase in activation of cholinergic neurons display significant, distinguishable freezing in response to the tone as compared to pretone freezing. We find that this cluster of activated neurons is segregated to the anterior NBM/SIp (Figure 6E).

      Regarding the involvement of cholinergic reactivation tone response (attention) rather than learning - in Figure 1-Supplement 3, we evaluate ACh release and behavioral response in mice that were exposed to three shocks alone (no tone) on day 1 and then exposed to a single (novel) tone on day 2. In these mice we find no significant change in ACh release in the BLA in response to tone, and no significant increase in freezing behavior in response to the tone. In Figure 2D, we evaluate reactivation of cholinergic neurons in a similar context and find that this group does not significantly differ from the home cage → home cage group. Further, we present that this home cage group does not significantly differ from Low Responders. As such, we find significant reactivation of cholinergic neurons in animals with increased responsiveness to the CS tone during the recall session (High Responders).

      “The compelling argument of this paper is that the authors are separating out the general attention role typically attributed to the cholinergic system from a more specific, engram-based role. Given the importance of untangling this, it would useful to see the recorded traces and behavioral scoring for the data shown in figure S2B. For example, was the higher slope in the recorded cholinergic response during unconditioned tone 1 also accompanied by an increase in freezing, which later went away with additional non-reinforced tones? Given that the animals were not habituated to tones (according to the Methods), this activity could be related to a habituation/general attention response, which may then be weaker than the learned response.”

      We include individual traces of GRABACh3.0 release in the BLA in response to the unconditioned tone from a protocol with 3x tone presentation on Day 1 and tone presentation on Day 2 (Figure 1-Supplement 2C). We have also included average + SEM traces for the entire duration of the tone presentation for the three unconditioned tones in this paradigm along with an inset showing 1s before and after tone onset (Figure 1Supplement 2D). Finally, we include individual traces of GRABACh3.0 release in the BLA in response to the first (naïve) tone from mice that underwent the training (tone + shock) followed by recall (tone) paradigm in Figure 1-Supplement 4C, left. None of the unconditioned tone responses were statistically significantly different from the preceding baseline. Instead, we find the learned response is significantly higher than the response baseline (Figure 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used MD simulations to investigate the role of N-terminal myristoylation and the presence of two SH domains on the allosteric regulation of c-Abl kinase. Standard established MD simulation methods and analyses were applied, including the force distribution analysis (FDA) method developed by Grater et al. some time ago.

      The system is large and the conformational changes are complicated. In light of this, and aggravated by the fact that direct comparison with - and critical testing against - experimental data is not possible in the present case, I consider the overall simulation times to be rather short (several repeats, but only 500 ns). So there might be statistical convergence issues. Especially also because at least some of the starting structures were generated from available experimental structures after some modifications/modelling, and they might thus be out of equilibrium and need some time to fully relax during the MD simulations.

      Unfortunately, I cannot find any convergence tests concerning the length of the simulations, which are usually considered to be standard analyses (Appendix Fig. 5 shows the effect of different thermostats and capping of the peptide chain, but no tests concerning simulation time). This could be critical in the present case, where the authors acknowledge themselves (e.g., on p. 4) that there are only subtle differences between the different simulation systems and the variations within a given system are larger than the relevant (putative) differences between systems (Fig. 1 C, D, E).

      We thank the reviewer for taking the time and critically assessing our manuscript. We appreciate and have addressed the raised concerns as follows. We have quadrupled the simulation time to 2 µs for 20 out of the 30 replicates and show the updated results for these. We refer the reviewer to the modified Fig. 2 and 3 (former Fig. 1 and 2) with the updated data. Our main conclusions remained unchanged, namely that Myr unbinding shifts the overall kinase domain dynamics towards an active state. We furthermore still observe allosteric signal propagation from the Myr binding site to the active site along the alpha_F helix and a collaborative effect of Myr and the SH domains. Only some minor points were not confirmed after analyzing the longer simulations, for example the force differences transmitted to the A-loop upon SH domain binding/unbinding (former Fig. 2D), and changes in amplitude of N- and C-lobe opening upon Myr unbinding (former Fig. 1E). Furthermore, to demonstrate convergence, we added block and autocorrelation analyses for Fig. 1 (now Fig. 2) to Fig. 2 – fig supplement 3, and observed good convergence across all systems. Finally, we also increased simulation times of the umbrella sampling from 50ns to 200ns, again without that the quantitative trends and our conclusions have changed (see also next point).

      Issues with statistical convergence are expected not only for the standard MD simulations but also for the umbrella sampling simulations, as 50 ns sampling per window is nowadays not considered state of the art and is likely insufficient for quantitative binding free energy calculation, especially for membranes (see, e.g., DOI 10.1021/ct200316w). However, worrying about this latter aspect might neither be useful nor needed, because in our view the statement that myristoyl groups can bind to the membrane and that they can compete with binding in the hydrophobic protein pocket can hardly be considered a surprise and would not have required any simulation at all in my view because the experimental K_D values are available (Table 1). The very unfavourable K_d values for unbinding of Myr from both the hydrophobic protein pocket as well as from the membrane in fact show that this is not how it is expected to work in reality. The fully solvated state will be avoided due to its high free energy. Instead, isn't the myristoyl expected to directly transition from the pocket into the membrane, after membrane binding of the kinase in a proper orientation?

      The experimental values were determined with different methods, i.e. estimated from zeta potential measurements in case of the membrane and calorimetry, which only considered the kinase domain instead of the SH3-SH2-kinase complex, in case of Abl. We thus found it appropriate to perform Umbrella Sampling simulations to ensure comparability. Additionally, these allowed us to study the effects of different alpha_I helix conformations, which had a significant impact on the free energy of Myr unbinding, precisely Abl with a partially unfolded helix reflected the experimental energy better than the crystal structure with a kinked helix. We highlight this more explicitly in the corresponding Discussion section. Regarding the simulation time per sampling window, we did a block analysis (Fig. 5 – fig supplement 1) as suggested in the cited reference and also extended the time of each sampling window from 50 ns to 200 ns. This did not significantly alter the results and, importantly, the relative differences between Abl and the membrane stayed the same and are in good agreement with the experimental values.

      Concerning the metadynamics simulations, these are usually done to obtain a free energy landscape. Why was this not attempted here? In the present case, the authors seemed to have used metadynamics only for generating starting structures, with different degrees of helicity of the alpha_I part, for subsequent standard MD simulations. Not surprisingly, nothing much happened during the latter, and conformers with kinked/partially unfolded alpha_I as well as conformers with straight alpha_I were both found to be "stable", at least on the short simulation time scale. It could also not be expected that the SH domain would spontaneously detach in response to helix straightening - again, this would require much longer simulation times than 500 ns. Nevertheless, alpha_I straightening might very well reduce the binding affinity towards SH - this can only be explicitly studied with free energy simulations, however.

      Our main goal was indeed to achieve different alpha_I helix conformations for subsequent Umbrella Sampling simulations, and found that helix formation is in principle possible without SH2 domain unbinding. We would like to emphasize the impact of the different helix conformations on the free energy of Myr unbinding, which further highlights the need to investigate these structures. We chose Metadynamics to obtain them because it only facilitates the transition away from the kinked conformation without biasing towards certain end structures or transition pathways, which we found advantageous compared to alternative methods such as targeted MD. The reason for not reporting a free energy surface is that we considered the helicity of all seven residues making up the kink within a single CV, which smeared the energy landscape to the point that it is almost completely flattened. Furthermore, orthogonal CVs such as new interactions between the alpha_I helix with the SH2 domain or positional adjustments of the SH2 domain would have to be considered for a reliable quantitative result. We nevertheless observed transient SH2 domain unbinding during the applied time scale and added histograms to Fig. 4 – fig supplement 1 (former appendix Fig. 4) to make this more obvious.

      Reviewer #2 (Public Review):

      The manuscript aims at understanding how the fatty acid ligand MYR inhibits the activity of Abl kinase. Despite a wealth of structural and biochemical data, a key mechanistic understanding of how MYR binding could inactive Abl was missing.

      The authors used equilibrium and enhanced molecular dynamics (MD) simulations to masterfully answer open questions left by extensive experimental data in the mechanistic understanding of this system. The authors took advantage of several state-of-the-art simulation techniques and carefully planned simulations to extract a coherent understanding from a wealth of experimental facts.

      The manuscript convincingly identifies an allosteric regulation by MYR. Allostery is often a source of confusion and sometimes is used as a magic catch-it-all explanation for poorly understood phenomena. Here, the authors show very compelling evidence of the existence of an allosteric mechanism. Also, they identify the physical origin of the allosteric pathway, providing a clear mechanistic understanding at the residue-level resolution. This is an impressive achievement.

      We thank the reviewer for appreciating our work and its significance for understanding Abl regulation.

      By leaving a pocket in the protein, MYR enables the protein's activation. But MYR is a highly hydrophobic molecule surrounded by water. Where could it go rather than quickly binding back to the protein pocket? By asking this reasonable question, the authors propose an exciting mechanistic hypothesis. The physical proximity of Abl kinase to a cellular membrane could lead to a competition between the protein and the membrane for MYR, leading to a novel layer of regulation for this kinase. Free energy calculations performed by the authors show that this hypothesis is reasonable from the thermodynamic point of view.

      From a broader perspective, this manuscript is an important contribution to the discussion of four outstanding topics. 1) myristoylation is an example of lipidation, a post-translational modification where an acyl chain is covalently linked to a protein. The role of post-translational modifications has been greatly underappreciated and investigated in the MD community. However, as all the work on Sars-Cov2 and this contribution show, post-translational modifications can be crucial to understanding function. Ignoring them could lead to severely biased results. 2) the debate on the nature of allostery is still on the rage. Some authors claim that looking for a residue-level mechanistic chain of events that explains the allosteric action does not make sense and that the only way of thinking about allostery is as a sudden global change of the conformational landscape. Here, the authors show that instead, it is possible and leads to an essential understanding. 3) The authors hypothesize a novel crosstalk between the Abl and cellular membranes mediated by MYR. This exciting and far-reaching hypothesis opens the door to new complex layers of regulation. I suspect that these crosstalks between cytosolic proteins, or the soluble domain of membrane-tethered proteins and membranes, are much more ubiquitous than what has been appreciated so far. 4) From a methodological point of view, this manuscript represents a masterful use of simulations to put existing experimental data in a coherent picture. It is an example of the use of MD simulations at its best, where the simulations make sense of experiments, integrate existing data into a unified picture, and lead to new hypotheses that can be tested in future experiments.

      We thoroughly appreciate the reviewers positive feedback and the valuable suggestions for improvement below.

      It would be superb if the authors could propose precise predictions that could inspire future experiments. Now that they present a residue-resolution allosteric pathway, can they suggest point mutations that would interrupt it?

      We have added a short segment to the end of the discussion proposing possible experiments.

    1. Author Respones

      Reviewer #1 (Public Review):

      The manuscript by Hekselman et al presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying celltype-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

      Statistical analyses were changed to include permutation testing and a different threshold (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2). Assessments of type I error were based on literature text-mining and expert curation, and showed that false-positive rates were low in both (0.01 and 0.07, respectively; Figure 1F and Figure 1–figure supplement 4A).

      Reviewer #2 (Public Review):

      This study identifies 110 disease-affected cell types for 714 Mendelian diseases, based on preferential expression of known disease-associated genes in single-cell data. It is likely that many or most of the results are real, and the results are biologically interesting and provide a valuable resource. However, updates to the method are needed to ensure that inference of statistical significance is appropriately stringent and rigorous.

      Strengths: a systematic evaluation of disease-affected cell types across Mendelian diseases is a valuable addition to the literature, complementing systematic evaluations of common disease and targeted analyses of individual Mendelian diseases. The validation via excess overlap with diseasecell type pairs from literature co-appearance provides compelling evidence that many or most of the results are real. In addition, many of the results are biologically interesting. In particular, it is interesting that diseases with multiple affected tissues tend to affect similar cell types in the respective tissues.

      Limitations: the main limitation of the study is that, although many or most of the results are likely to be real, the criteria for statistical significance is probably not stringent enough, and is not welljustified. For diseases with only 1 disease-associated gene, the threshold is a z-score>2 for preferential expression in the cell type, but this threshold is likely to be often exceeded by chance. (For diseases with many disease-associated genes, the threshold is a median (across genes) zscore>2 for preferential expression in the cell type, which is less likely to occur by chance but still an arbitrary threshold.) Thus, there is a good chance that a sizable proportion of the reported disease-affected cell types might be false positives. The best solution would be to assess statistical significance via empirical comparison with results for non-disease-associated control genes, and assess the statistical significance of the resulting P-values using FDR.

      We thank the reviewer for the valuable insights and suggestions. We revised the method to assess statistical significance by using empirical comparison followed by FDR correction, as suggested by the reviewer (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2).

      The re-analysis using mouse single-cell data adds an interesting additional dimension to the study, with the small caveat that mouse single-cell data does not provide statistically independent information across genes (for the same reason that adding data from independent human individuals would not provide statistically independent information across genes, given that human and mouse expression are partially correlated).

      We acknowledge this caveat in the text (Discussion, page 17, 2nd paragraph, lines 8-11).

      Reviewer #3 (Public Review):

      The authors describe the method, PrEDiCT, which helps identify disease affected cell types based on gene sets. As I understand it, the method is based on finding which "disease genes" (from an annotation) are relatively highly expressed. The idea is nice, however, I have concerns about how "significance" is assessed and the relative controls.

      Overall, I find the idea interesting, but the execution raises some concerns.

      1) From a causal perspective, there is an association of high expression of these genes within these cell types, but without also assessing individuals with those specific diseases, I do not it is fair to say "disease affected" cell types. It is possible that these genes might behave completely fine but are highly expressed in those cell types while being affected another in other cell types.

      We agree with the reviewer. We changed the terminology to "likely disease-affected cell types” and added this caveat to the Discussion, page 16, 2nd paragraph.

      2) It is unclear to me what the "null" comparison is in the method and if there is one. For example, by chance, would I expect this gene to be highly expressed because other genes are also highly expressed in this cell type? Some way to assess "significance" or "enrichment" beyond simply using ranks and thresholds would be helpful in deciding whether these associations are robust.

      We revised the procedure for assessing statistical significance to include permutation tests. Specifically, given a disease D with n disease-associated genes, the null hypothesis was that the PrEDiCT score of these genes is not significantly different from the PrEDiCT score of a random set of n genes. To test this, we randomly selected n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c was set to the fraction of random scores in c that were at least as high as the original PrEDiCT score of D in c. The acquired p-values were adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. To increase stringency, we treated only statistically significant disease–cell-type pairs with PrEDiCT score≥1 as 'likely affected'. The procedure is detailed in Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2. Additionally, we estimated type I error by using literature text-mining or expert curation (Results, page 7, 2nd paragraph; Methods, page 22, ‘Textmining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F and Figure 1–figure supplement 4A).

      3) Additionally, it is unclear to me, but I suspect that there are unequal cell numbers in the scores computed as well as between relevant tissues. This is related to point (2) above, but as a result, the estimates of the scores will inherently have different variances, thus making comparisons between them difficult/unreliable unless accounted for. If I understand correctly, the score is first the average expression within a tissue, then, the Z-score? If so, my comment applies.

      To clarify, the PrEDiCT score of a disease D in cell type c was set to the median preferential expression P of its disease genes (Equation 1 below). The preferential expression of each gene in c was computed as a Z-score, by comparing the average expression of the gene in c to its average expression in all cell types of the tissue, divided by the standard deviation (SD, Equation 2 below). Tissues indeed had unequal numbers of cell types, however, the distribution of PrEDiCT scores were similar between tissues (now in Supplementary File 13). We revised this part of Methods and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’) and Supplementary File 13.

      4) There is a large set of work done in gene enrichment sets which appears to not be mentioned (e.g. GSEA and other works by the Price group). It would be helpful for the authors to summarize these methods and how their method differs.

      We added work done in gene enrichment sets (including two relevant and recent studies from the Price group) and summarized these methods in the Introduction (page 2-3).

      5) Additionally, it should be noted that a caveat of this analysis is that the comparisons are all done only relative to the cell types sampled and the diseases which have Mendelian genes associated with them. I would expect these results to change, possibly drastically, if the sampled cell types and diseases were to be changed.

      We agree with the reviewer and now discuss the generalizability of our results, relating to the extent of the sampled cell types (Discussion, page 18, 1st paragraph).

      6) Finally, I would appreciate a more detailed explanation in the methods of how the score is computed. Some equations and the data they are calculated from would be helpful here.

      We now provide a detailed explanation of how the score and its statistical significance were computed and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’).

      In summary, the general idea is an interesting one, but I do think the issues above should be addressed to make the results convincing.

      We thank the reviewer for the important feedback which helped us strengthen our analyses.

    1. Author Response

      Thank you for providing us with the reviewer comments. We will provide the revised manuscript at a later stage as recommended.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      Reviewer #2 (Public Review):

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3 (Public Review):

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      We appreciate this positive assessment of our work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by O'Reilly and Delis provides a valuable data-driven framework for extracting task-related muscle synergies in a step towards the understanding and practical use of synergies in real scenarios (e.g., evaluation of patients in a clinical environment). The approach is incomplete since the authors did not compare their method with classical physiologically grounded approaches for assessing muscle synergies. In this sense, the comparisons with classical approaches would clarify if physiological assemblies were preserved and were not altered to incorporate task space variables. Despite limitations, the proposed framework would interest motor control and neural engineering researchers.

      We thank the editors for the positive assessment of our work and appreciate their constructive feedback. In our revised manuscript, we believe we have sufficiently addressed the identified limitations by a) comparing our approach to existing physiologically-based methods, providing thorough comparisons of their respective outputs, b) applying it to a dataset of post-stroke participants to demonstrate that it can identify physiologically-interpretable markers of motor recovery and c) providing examples to demonstrate how readers can interpret the novel perspective introduced.

      Reviewer #1 (Public Review):

      The proposed study provides an innovative framework for the identification of muscle synergies taking into account their task relevance. State-of-the-art techniques for extracting muscle interactions use unsupervised machine-learning algorithms applied to the envelopes of the electromyographic signals without taking into account the information related to the task being performed. In this work, the authors suggest including the task parameters in extracting muscle synergies using a network information framework previously proposed. This allows the identification of muscle interactions that are relevant, irrelevant, or redundant to the parameters of the task executed.

      The proposed framework is a powerful tool to understand and identify muscle interactions for specific task parameters and it may be used to improve man-machine interfaces for the control of prostheses and robotic exoskeletons.

      With respect to the network information framework recently published, this work added an important part to estimate the relevance of specific muscle interactions to the parameters of the task executed. However, the authors should better explain what is the added value of this contribution with respect to the previous one, also in terms of computational methods.

      We thank the reviewer for their constructive comments. We have adjusted the introduction section of the manuscript to better explain the added value of this framework over previous work. Specifically, we draw the reviewer’s attention to the following updated section of the introduction:

      “In [11], we considered, key limitations among current approaches to muscle synergy analysis in extracting functionally relevant and interpretable patterns of muscle activity [12]. We proposed a combinatorial approach based on information- and network-theory and dimensionality reduction (the network-information framework (NIF)) that significantly improved the generalisability of the extraction process by, among others, removing restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics [12]. By determining the pairwise mutual information between muscles, this innovation paved the way for the appropriate mapping of muscular interactions to the task space. To elaborate on the significance of this development, the extraction of motor patterns in isolation of the task space comes at the expense of both functional and physiological relevance [12,13]. Furthermore, effective methods for mapping large-scale physiological dynamics to behaviour is a current gap across the neurosciences [14]. Thus, here we build on this work by, for the first time, directly including task space parameters during muscle synergy extraction. In doing so, we address these current research gaps, progressing muscle synergy research and successful engineering applications in a fruitful direction [12,15,16]. This enables us, in a novel way, to dissect the concept of the muscle synergy and therefore quantify interactions between muscle activations with shared or complementary functional roles. “

      In general, the method proposed relies on several hyperparameters and cost functions that have been optimized for the specific datasets. A sensitivity analysis should be performed, varying these parameters and reporting the performance of the framework.

      We thank the reviewer for this comment which enabled us to clarify a potential misunderstanding. Our proposed framework does not require setting or varying hyperparameters to optimise cost functions.

      For model-rank specification, a modularity maximising cost-function is used which determines what partitioning of the networks results in maximal modularity. We have offered two alternative approaches using this cost-function which consistently converge on the same solution. To further ensure the representativeness of this solution, we also offer a consensus-based approach where we apply these alternative approaches to individual participant or task data, then group the collective partitions together and re-apply the approaches. One of these approaches (Equation 2.2) requires two hyperparameters, γ and ω, which adjust the intra- and inter- network layer resolutions. As stated in the manuscript, we set both of these parameters to 1, thus nullifying their presence in the cost-function and aligning our work with the classical notion of modularity. Across the two alternative approaches to model-rank specification, the solution is unique and data-driven and has a demonstratable generalisability across datasets.

      The only other cost-function present in the framework is during dimensionality reduction, which is a standard loss function used across the muscle synergy analysis literature. Thus, the approach is essentially parameter-free and we now have mentioned this more explicitly in the manuscript:

      “To empirically determine the number of components to extract in a parameter-free way, we then concatenated these adjacency matrices into a multiplex network and employed network community-detection protocols to identify modules across spatial and temporal scales (fig.3(D)) [29–32,44].”

      “In its generalised multilayer form, the Q-statistic is given an additional term to consider couplings between layers l and r with intra- and inter-layer resolution parameters γ and ω (Equation 2.2). Here, μ is the total edge weight across the network and γ and ω were set to 1 in the current study for classical modularity [30], thus removing the need for any hyperparameter tuning.”

      It is not clear how the well-known phenomenon of cross-talk during the recording of electromyographic muscle activity may affect the performance of the proposed technique and how it may bias the overall outcomes of the framework.

      Indeed artifacts such as crosstalk are a standard issue across the EMG literature and may impact the performance of subsequent analyses where prevalent in the dataset. Crosstalk is expected to be present irrespective of the task and so should not affect redundant and synergistic muscle representations, however it could be present in the task-irrelevant muscle interactions extracted. Due to the prominence of long-range functional connections with the task-irrelevant representations extracted, we suggest that such artifacts are unlikely to have played a prominent role in the extracted patterns. Nonetheless, we have recognised this possibility with the following updated sentence in the Discussion section:

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [65], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [20,50].”

      Reviewer #2 (Public Review):

      This paper is an attempt to extend or augment muscle synergy and motor primitive ideas with task measures. The authors idea is to use information metrics (mutual information, co-information) in 'synergy' creation including task information directly. My reading of the paper is that the framework proposed radically moves from attempts to be analytic in terms of physiology and compositionality with physiological bases, instead into more descriptive ML frameworks that may not support physiological work easily.

      We thank the reviewer for taking the time to provide a thorough commentary on this manuscript. An overall aim in developing this framework is to build on other recent developments in providing a more fine-grained functional architecture underlying movement control [1,2]. It is a requirement for the successful communication and introduction of this toolbox to the field to provide readers with an understanding of how to use the framework and an intuition on how to interpret the results. Thus, we agree with the reviewer that functional interpretations are of crucial use.

      We also agree with the reviewer that maintaining a physiological underpinning is a desirable direction for the field and should not be made secondary to functional descriptions. In our updated version of this manuscript, we have therefore included direct comparisons with the gold-standard in the field for muscle synergy extraction, namely non-negative matrix factorisation based muscle synergy extraction (see ‘Building on current approaches to muscle synergy analysis’ and fig.5-6 of revised manuscript) [3,4]. In these comparison, we show how our framework goes beyond this current approach in terms of functional insight while still maintaining physiological relevance. Indeed, in the revised manuscript we also include a fourth dataset comprising post-stroke participants and healthy controls (Fig.6). We demonstrate, through a simple example application to this dataset, how our proposed framework can produce more predictive representations of motor impairment than the gold-standard approach. The representations we identified were discriminative of motor impairment measured via the Fugl-Meyer assessment using just one trial per participant. This improves considerably upon the sensitivity of the current approach to altered motor patterns which have predominantly required many trials and participants to gain significance [5,6]. Thus, the patterns we extract are a more comprehensive representation of the actual underlying physiological state of the participants.

      This approach is very different from the notions of physiological compositional elements as muscle synergies and motor primitives, and to me seems to really be striving to identify task relevant coordinative couplings. This is a meta problem for more classical analyses. Classical analyses seek compositional elements stable across tasks. These elements may then be explored in causal experiments and generative simulations of coupling and control strategies. The present work does not convince me that the joint 'meta' analysis proposed with task information added is not unmoored from physiology and causal modeling in some important ways. It also neglects publications and methods that might be inconvenient to the new framework.

      We would be very interested in receiving the reviewer’s suggestions of existing approaches that we have not incorporated here and would be happy to discuss these in the revised manuscript.

      Information based separation has been used in muscle synergy analyses using infomax ICA, which is information not variance based at core. Though linear mixing of sources is assumed, minimized mutual information is the basis.

      We agree with the reviewer that ICA relies on information measures, however it does not incorporate task-space information. The novelty of our approach lies in the characterisation of muscle interactions with respect to the task at hand. If the reviewer could provide references to this statement, we would be able to consider this further.

      Physiological causal testing of synergy ideas is neglected in the literature reviews in the paper. Although these are in animal work, the clear connection of muscle synergy choices and analyses to physiology is important and needs to be managed in the new methods proposed. Is any correspondence assumed? Possible?

      We agree with τhe reviewer that this a crucial element of muscle synergy research and will aim to address it in our future work. However, we would like to point out that the current manuscript is a “tools and resources” article aiming to introduce a new framework. In our revised manuscript, we have incorporated an application of the framework to a dataset from post-stroke patients to demonstrate the use of the framework in clinical settings to identify biomarkers and use them to make predictions of motor recovery (see Fig.6 of updated manuscript).

      Questions and concerns with the framework as an overall tool:

      First, muscle based motor information sources have influences on different time scales in the task mechanics. Analyses of synergies in the methods proposed will be very much dependent on the number and quality of task variables included and how these are managed. Standardizing and comparing among labs, tasks sets and instrumentation differences is not well enough considered as a problem in this new proposed method toolset, at least in my reading. Will replication, and testing across groups ever be truly feasible in this framework?

      We agree with the reviewer that this important point can be a limitation of the applicability of the framework. For this reason, we chose a “holistic” approach, applying the framework to several datasets collected in different settings, and selecting different kinds of task variables to extract muscle networks from. Crucially, we used a leave-one-task-out and leave-one-participant-out cross validation procedure to specifically address this point. Our results showed that the extracted couplings are robust irrespective of the task variable and/or participant excluded and this lends credit to the generalisability of the framework.

      Muscle based motor information sources have influences on different time scales in the task mechanics. Kinematic analyses, dynamic analyses and force plate analyses of the same task may provide task variables that alter the results in the proposed framework it seems.

      As we have mentioned above, here we used all the above types of task variables together to illustrate the range of measures that can be included in the proposed framework and showed that the outputs are robust to the exclusion of any task/participant. This point is especially evident for dataset 3 results, where high levels of generalisability were found despite the inclusion of kinematic, dynamic and IMU data (see Table 1. of original submission and updated manuscript). We believe that this is an advantage of the approach as it allows researchers to apply the method to different kinds of measurements they may have collected and gain insights into the relationships of muscle couplings with kinematic/dynamic/force parameters. This will also enable scientists to attribute different functional roles to the identified couplings and it is something we plan to do in future applications of the framework.

      Second, there is a sampling problem in all synergy analyses. We cannot record all muscles or all task parameters. Examining synergies across multiple tasks seeks 'stationary' compositionality. Including task specific elements may or may not reinforce or give increased coordinative precision to the stationary compositionality.

      We fully agree that this is a limitation of all synergy analyses and aimed to consider this study a step in the direction of addressing this limitation by providing the research community with a toolbox that can be used to quantify muscle couplings that can have different levels of task specificity.

      To me the new methods proposed seem partly orthogonal to the ideas of stable compositionality. The 'synergies' obtained will likely differ, and are more likely to be coordinative control groupings of recurrent task and muscle motifs (based on instrumentation) which may or may not relate to core compositionality in physiology. Is there any expectation that the framework should relate to core compositionality and physiology. This is not clear in the paper as written.

      In our new analysis, we have compared the proposed approach to existing physiologically-based methodologies and showed that the new framework can capture several salient physiological features of movement that the current NMF-based approach cannot. For example, as we have moved away from optimising variance accounted for metrics, our framework can identify subtle muscle couplings that have important functional roles. These subtle couplings are often not captured in current muscle synergy analysis as, against physiological relevance, higher amplitude muscles often take prominence. Further, by directly including task parameters during extraction, we can determine the muscles that have a functional role concerning the included task parameter rather than inferring this relationship indirectly using knowledge about the task executed. In our updated manuscript, by applying the framework to post-stroke participants (see Fig.6), we were also able to demonstrate that the extracted couplings are associated with functional parameters of motor recovery and have a clear link with the physiological state of individual participants.

      It would be useful to explore the approach with a range of neuromechanical models and controllers and simulated data to explore the issues I am raising and convince readers that this analysis framework adds clarity rather than dissolving the generalizability and interpretability of analyses in terms of underlying causal mechanisms.

      The authors need to better frame their work in relation to causal analyses if they are claiming links to muscle synergies analyses and claim extension/refinement. Alternatively, these may not be linked, and instead parallel approaches exploring different hypotheses and goals using different organizational data descriptors.

      To address the reviewers concerns here, we have included in the updated manuscript a toy example simulating situations in which pairs of muscles would have a redundant or synergistic functional relationship (see Fig.2). This simulation gives clear intuition on situations where two muscles (e.g. an antagonist-agonist pair) may share functionally similar or complementary information about task direction (left vs right). In particular, within the main text describing this figure, we state how current NMF based approaches consider muscles functionally equivalent when they share similar magnitude activations, whereas our framework captures muscles with identical task information. Thus, our work is an extension of current approaches towards understanding causal mechanisms. The suggestion to use neuromechanical models is valuable, however we consider it beyond the scope of this work. This “Tools and Resources” paper is aimed at introducing the computational framework for the analysis of large-scale muscle couplings in task space. Our future work will use this framework to address unanswered questions in the field and we hope that it will be helpful for other scientists in testing their hypotheses.

      To me this appears a data science tool that may not help any reductionist efforts and leads into less interpretable descriptions of motor control. Not invalid, but sufficiently different that common term use muddies the water.

      We believe that the novel evidence we provided both on simulated and real data have contributed to a better interpretability of the approach outcomes. Specifically, we have introduced examples showing the functional roles of the different types of interactions as well as the predictive power of the outputs. Concerning the use of the term synergy, we have provided a clear description throughout the manuscript regarding the interpretation of synergy vs redundancy in the novel perspective we propose. For example in the discussion section:

      “ We thus sought to provide greater nuance to the notion of ‘working together’ by defining motor redundancy and synergy in information-theoretic terms [6,56]. In our framework, redundancy and synergy are terms describing functionally similar and complementary motor signals respectively, introducing a new perspective that is conceptually distinct from the traditional view of muscle synergies as a solution to the motor redundancy problem [3,6,7]. In this new definition of muscle interactions in the task space, a group of muscles can ‘work together’ either synergistically or redundantly towards the same task. In doing so, the perspective instantiated by our approach provides novel coverage to the partitioning of task-relevant and -irrelevant variability implemented by the motor system along with an improved specificity regarding the functional roles of muscle couplings [20–22]. Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      Reviewer #3 (Public Review):

      In this study, the authors developed and tested a novel framework for extracting muscle synergies. The approach aims at removing some limitations and constrains typical of previous approaches used in the field. In particular, the authors propose a mathematical formulation that removes constrains of linearity and couple the synergies to their motor outcome, supporting the concept of functional synergies and distinguishing the task-related performance related to each synergy. While some concepts behind this work were already introduced in recent work in the field, the methodology provided here encapsulates all these features in an original formulation providing a step forward with respect to the currently available algorithms. The authors also successfully demonstrated the applicability of their method to previously available datasets of multi-joint movements.

      Preliminary results positively support the scientific soundness of the presented approach and its potential. The added values of the method should be documented more in future work to understand how the presented formulation relates to previous approaches and what novel insights can be achieved in practical scenarios and confirm/exploit the potential of the theoretical findings.

      Strengths:

      This work proposes a novel framework that addresses physiologically non-verified hypothesis of standard muscle synergy methods: it removes restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics.

      The method is solid and achieves the prescribed objectives at a computational level and in preliminary laboratory data.

      A toolbox is available for testing the methods on a larger scale.

      The paper is well written and shows a high level of innovation, original content and analysis

      Weaknesses:

      Task performance variables could be specified in more quantitative definition in future work (e.g.: articular angles rather than a generic starting point- end point).

      We agree with this point and will incorporate it in future work. Our aim here was to show that the framework would work with any task variable and that scientists can use it to identify the relevance of muscle interactions to different types of task parameters.

      The paper does not show a comparison with previous approaches (e.g.: NMF) or recently developed approaches (such as MMF).

      We have now illustrated such a comparison on two datasets and explained more how the new framework can dissect the different types of muscle groupings (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      In our revised manuscript, we have introduced 2 new applications of the framework to real data to exemplify its use for a) functional interpretability and b) identification of biomarkers (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript). We also point towards its use in movement restoration and augmentation devices and in the clinical setting in the discussion section:

      “The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      In this work, the effort of the authors aimed at developing the field is clear. It is fundamental to develop novel frameworks for synergy extraction and use them to make them more interpretable and applicable to real scenarios, as well as more adherent to recent findings achieved in motor control and neuroscience that are not reflected in the standard models. At the same time, muscle synergies are being used more and more in research but their impact in practical scenarios is still limited, probably because synergies have rarely been analyzed in a functional context. This paper shows a very in-depth analysis and a novel framework to interpret data that links to the task space from a functional perspective. I also found that the results on the datasets are very well commented but could expand more to show why using this framework is advantageous.

      There are some key points for discussion that follow from this paper which can be described more, maybe in future work, and that might contribute to major developments in the field, including:

      The understanding of how the separation between relevant (redundant and synergistic) and irrelevant synergies impact on synergy analysis in practical works;

      We have now introduced new figures (Fig. 5 and 6) to the revised manuscript, demonstrating simple applications of the framework and providing intuition regarding the outputs. We have also added points to the Discussion commenting on the differences between types of couplings and how they can be interpreted in future works:

      “Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [64], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [19,49]. Thus, task-irrelevant muscle interactions reflect both biomechanical- and task-level constraints that provide a structural foundation for task-specific couplings. The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      Interpreting how different synergistic organizations described in this work allows to better describe data from real scenarios (e.g.: motor recovery of patients after neurological diseases);

      We have now added an example application of the framework to a dataset of stroke patients (Fig.6) and identified a redundant muscle patterns that are predictive of functional measures.

      Discussing in detail how the presented findings compare with standard algorithms such as NMF to determine the added value provided with this approach;

      As indicated above, we have now shown such a comparison on two new datasets (see Fig.5-6 of revised manuscript).

      Describe how redundant synergies reflect real neural organization and - if their "existence" is confirmed - how they contribute to redesign the concept of muscle synergies and of modular/synergistic control in general.

      This is an important point that we have now addressed more in our Discussion by relating redundant muscle couplings to degeneracy in the motor system and synergistic couplings to integrative dynamics by higher-level processes. We have also added a simple simulation illustrating how synergistic and redundant interactions co-exist and represent different contributions to task performance (see Fig.2 of revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of changes

      I thank the reviewers for their thorough feedback on this paper and providing me with such a detailed list of recommendations. I have been able to incorporate many of their suggestions, which I believe has greatly improved this paper.

      The most important changes:

      • I added comparisons to the lexicon- and rule-based sentiment algorithms TextBlob and VADER to Supplementary Fig. 4. This shows the superiority of ChatGPT in scoring the sentiment of scientific texts compared to existing and already-validated tools for sentiment analysis based on natural language processing. [Suggestion Reviewer 2]

      • I added the measure intra-class correlation to Fig. 3b, emphasizing the inconsistency in sentiment scores across different reviews of the same paper. [Suggestion Reviewer 3]

      • I added Supplementary Fig. 6, in which I directly propose different experiments to test the causes of the observed gender effects on peer review. [Suggestion Reviewer 3]

      • I further studied the issue of variability in responses by ChatGPT (Supplementary Fig. 2), and learned that this has greatly improved in the latest version of ChatGPT (for Version Aug 3, 2023, R2 values of 0.99 (sentiment) and 0.86 (politeness) were reached). I show these findings in Supplementary Fig. 2. [Suggestions Reviewers 1 and 3]

      • Throughout the manuscript (most notably in the Abstract and Discussion), I emphasize that this is a proof-of-concept study, and make suggestions on how to scale this up across journals and fields. I also toned down certain claims given the relatively small sample size of this study, including in the abstract. I also more prominently and elaborately discuss the limitations of the study in the Discussion section. [Suggestions Reviewers 1, 2 and 3]

      • I made many smaller changes to text, figures and references on the basis of the reviewers’ comments. [Suggestions Reviewers 1, 2 and 3]

      Notably, Reviewer 3 has provided me with a very detailed list of recommendations for follow-up experiments. I appreciate their ideas, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted papers. As suggested by this reviewer, I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review.

      Based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      Reviewer #1 (Public review)

      Strengths:

      The innovative method is the biggest strength of this article. Moreover, the method can be implemented across fields and disciplines. I myself would like to see this method implemented in a grander scale. The author invested a lot of effort in data collection and I especially commend that ChatGPT assessed the reviews twice, to ensure greater objectivity.

      I want to thank this reviewer for commending the innovative methodology of this study. I appreciate that this reviewer would like to see this methodology implemented at a grander scale, which is a view that I share. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores).

      The reviewers have provided me with a list of potential follow-up experiments, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript of a journal. In addition, as suggested by Reviewer #3, I am looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Importantly, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Weaknesses:

      I have several concerns regarding the methodology of the article. The first relates to the fact that the sample is not random. The selection of journal and inclusion and exclusion criteria do not contribute well to the strength of the evidence.

      Indeed, the inclusion of only accepted manuscript from a single journal is the biggest caveat of this paper. I have re-written much of the Abstract to emphasize that this is a proof-of-concept paper, hoping that other researchers concurrently expand this method to larger and more diverse datasets.

      An important methodological fact is that the correlation between the two assessments of peer reviews was actually lower than we would expect (around 0.72 and 0.3 for the different linguistic characteristics). If the ChatGPT gave such different scores based on two assessments, should it not be sound to do even more assessments and then take the average?

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #3. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations).

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      Reviewer #1 (Recommendations to author)

      I had some difficulties reading the article, so it would maybe help to structure the article more (e.g. In the introduction there are three aims stated, so the Statistical Analysis section could be divided in three sections, and instead of the link to figures, the author could state which variables were analysed in a specific manner) to be easier to comprehend the details. Also, I found on one place that the sample consisted of 572 reviews, and on other that it was 558.

      These are very good points. I re-wrote the statistical analysis for clarity (Page 7 of the manuscript). The 558 reviews was a mistake from my part, as I forgot to include the fourth review for the 14 papers that received four reviews in the histograms of Fig. 2b and the accompanying text. This has been updated.

      For figures 1a and 1b it could be considered to enter the table instead of several figures.

      I thank the reviewer for pointing this out. I tried this suggestion, but I found it to reduce the readability of the paper. As an alternative, I now provide an Excel spreadsheet with all the raw data, so people can find all the characteristics of the included papers.

      99.8% of the reviews analysed were assessed as polite. This is, in my opinion, extremely important finding, which shows that reviewers are still holding to certain degree of standards in communication, and it can be mentioned in the abstract.

      I very much agree with this reviewer; this has now been added to the Abstract.

      In results you state that QS World Ranking is "imperfect" measure. When stating that in the results section, it poses the question why it is used in the study, so maybe it is more suitable for the discussion.

      This point is well taken. Even though the QS World Ranking score is imperfect, I still think it can be useful, as a rough proxy of perceived prestige of an institution. I now removed this “imperfect measure” statement from the Results section, and moved it to the Discussion (Page 5).

      In the Results section, instead of using only p values, please add measures of effect (correlations, mean differences), to make it easier to place in the context.

      For the significant effects of Fig. 4, I have added these to the figure legends. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      I think the results interpretation should be softened a bit, or the limitations of the study should be placed as the second paragraph in the discussion, since this was only specific journal with specific subfield.

      I agree with this reviewer that the relatively small sample size of this paper demands more careful wording. Throughout the manuscript, I have toned down claims, and emphasized the “proof of concept” nature of this study (for example in the Abstract). I also moved the limitations section to the second paragraph of the Discussion, and elaborate more on the study’s caveats.

      Methods:

      The measure Review time was assessed from submission to acceptance, but this does not need to be review time since it takes a lot of time sometimes to find reviewers. that needs to be stated as the limitation.

      This point is well taken. I changed this to “Paper acceptance time” in Fig. 3 and the accompanying text.

      Gender name determination methods differed between the assessment of the first authors and the last authors, and that needs stronger explanation.

      I appreciate this reviewer raising this point, which has also been raised by Reviewer #3. For this paper, I have carefully weighed the pros and cons of automated versus manual gender determination. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process.

      I also realize that my rationale for the different methods of gender determination was not explained well enough in the original submission; I now explain my reasoning more elaborately on Page 7 on the manuscript.

      For sentiment analysis: Please state based on what the GPT made a decision? Which program? (e.g. for gender it used genderize.io)

      This has been added to Page 7.

      Finally, your entire analysis can be made reproducible (since everything is publicly available). You can share ChatGPT chats as online materials with variables entered with the dataset analysed and the code. This would increase the credibility of the findings.

      I will make the entire raw dataset available through the eLife website, including all reviews and their scores.

      Reviewer #2 (Public review)

      Strengths include:

      1) Given the variability in responses from ChatGPT, the author pooled two scores for each review and demonstrated significant correlation between these two iterations. He confirmed also reasonable scoring by manipulating reviews. Finally, he compared a small subset (7 papers) to human scorers and again demonstrated correlation with sentiment and politeness.

      2) The figures are consistently well presented and informative. Figure 2C nicely plots the scores with example reviews. The supplementary data are also thoughtful and include combination of first/last author genders. It is interesting that first author female last author male has the lowest score.

      3) A series of detailed analysis including breaking down reviews by subfield (interesting to see the wide range of reviewer sentiment/politeness scores in computational papers), institution, and author's name and inferred gender using Genderize. The author suggests that peer review to blind the reviewers to authors' gender may be helpful to mitigating the impoliteness seen.

      Thank you.

      Weaknesses include:

      1) This study does not utilize any of the wide range of Natural Language Processing (NLP) sentiment analysis tools. While the author did have a small subset reviewed by human scorers, the paper would be strengthened by examining all the reviews systematically using some of the freely available tools (for example, many resources are available through Hugging Face [https:// huggingface.co/blog/sentiment-analysis-python ]). These methods have been used in previous examinations of review text analysis (Luo et al. 2022. Quantitative Science Studies 2:1271-1295). Why use ChatGPT rather than these older validated methods? How does ChatGPT compare to these established methods? See also: colab.research.google.com/drive/ 1ZzEe1lqsZIwhiSv1IkMZdOtjPTSTlKwB?usp=sharing

      This was a great recommendation by this reviewer, and I have tested ChatGPT against TextBlob and VADER, the two algorithms also used by the Luo et al. study — see Supplementary Fig. 4. Perhaps unsurprisingly, these algorithms performed very poorly at scoring sentiment of the reviews. Please note that I also tested these two algorithms at scoring individual sentences, Tweets and Amazon reviews, which it did very well (i.e., the software package was working correctly). Thus, ChatGPT is better at scoring scientific texts than TextBlob and VADER, likely because these algorithms struggle with finding where in the review the sentiment is conveyed. I now discuss this on Pages 1, 3 and 4 of the manuscript.

      2) The author's claim in the last paragraph that his study is proof of concept for NLP to analyze peer review fails to take into account the array of literature already done in this domain. The statement in the introduction that past reports (only three citations) have been limited to small dataset sizes is untrue (Ghosal et al. 2022. PLoS One 17:e0259238 contains over 1000 peer review documents, including sentiment analysis) and reflects a lack of review on the topic before examining this question.

      I thank this reviewer for pointing me to this very useful study. I regret missing this one in my initial submission; I now discuss this paper in Pages 1 and 5 of the manuscript.

      3) The author acknowledges the limitation that only papers under neuroscience were evaluated. Why not scale this method up to other fields within Nature Communications? Cross-field analysis of the features of interest would examine if these biases are present in other domains.

      I share this reviewer’s opinion that it would be very interesting to expand this analysis to different subfields. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Reviewer #3 (Public review)

      Strengths:

      On the positive side, I thought the use of ChatGPT to score the sentiment of text was novel and interesting, and I was largely convinced by the parts of the methods which illustrate that the AI provides broadly similar sentiment and politeness scores to humans who were asked to rank a sub-set of the reviews. The paper is mostly clear and well-written, and tackles a question of importance and broad interest (i.e. the potential for bias in the peer review process, and the objectivity of peer review).

      Thank you.

      Weaknesses:

      The sample size and scope of the paper are a bit limited, and I have written a long list of recommendations/critiques covering diverse aspects including statistical/inferential issues, missing references, and suggestions for other material that could be included that would greatly increase the usefulness of the paper. A major limitation is that the paper focuses on published papers, and thus is a biased sample of all the reviews that were written, which prevents the paper properly answering the questions that it sets out to answer (e.g. is peer review repeatable, fair and objective).

      I very much appreciate this reviewer taking the time to provide me with such a detailed list of recommendations. Below, I will respond to this list in a point-by-point manner.

      Reviewer #3 (Recommendations to author)

      My main issues with the paper are that it is not very ambitious, and gave me the impression the aim was to write the first paper using ChatGPT to address this question, rather than to conduct the most thorough and informative investigation that would have been feasible (many obvious questions that could be addressed are not tackled, since the sample size is small and restricted). There are also issues with selection bias, and the statistical analysis, that have possibly led to erroneous inferences and greatly limit what conclusions can be drawn from the analysis. I hope my comments of use in further improving the paper.

      The repeatability of ChatGPT when calculating the two linguistic characteristics is low. Taking the average of multiple assessments is one way to deal with this. To verify that taking the average of, say, 5 scores gives a repeatable score, the author could consider calculating 10 scores for a set of 20-30 reviews, calculating two scores for each review using the first 5 and second 5 ChatGPT ratings, and then calculating repeatability across the 20-30 reviews. It is important to demonstrate that ChatGPT is sufficiently repeatable for this new method to be useful.<br /> Also, it might be possible to automate this process a bit to save time - e.g. the author could change the ChatGPT prompt, like "please rate the politeness of this review from -100 to +100, do it 10 times independently, and print your 10 ratings as well as their average". Hopefully the AI is smart enough to provide 10 independently-computed ratings this way, saving the need to copypaste the prompt into the chat box 10 times per review.

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #1. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations). I also tested this Reviewer’s suggestion to ask ChatGPT to score many times, and give separate scores for each iteration — this worked very well.

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      To my mind, the main reason to use an AI instead of one or more human readers to rank the sentiment/politeness of peer reviews is to save time, and thereby allow this study to have a larger sample size than would be feasible using human readers. With this in mind, why did you choose to download only 200 papers, all from the discipline of Neuroscience, and only from Nature Communications? It seems like it would be relatively easy to download papers from many more journals, fields of research, or time periods if using AI-based methods, and in fact it would have been feasible (though fairly laborious) for one person to read and classify the sentiment of the reviews for 200 papers.

      As well as providing more precise estimates of the parameters you are interested in (e.g. the consistency of reviews, and the size of the difference in reviewer sentiment between author genders), expanding the sample beyond this small set of papers would allow you to address other interesting questions. For example, you could ask whether the patterns observed for neuroscience are similar to those in other research disciplines, whether Nature Comms is representative of all journals (given there are other journals with public reviews), and you could test whether the male-female differences have become greater or smaller over time (e.g. by comparing the male-female differences observed in the past to the effect size observed in 2022-23). Additionally, the main analyses in this paper would have higher statistical power - for example, you only include 53 papers with a female senior author, giving you quite low power/ precision to estimate the gender difference in the average sentiment of reviews (given the high variance in sentiment between papers).

      I want to thank this reviewer for taking the time about possible ways to increase the impact of this work. I agree, these are all great suggestions, and there are many possibilities to apply ChatGPTbased natural language processing to scientific peer review. Respectfully, I chose to continue with publishing this work in the form of a proof-of-concept paper, because I currently do not have the resources to perform this (quite labor intensive) study. Below I will explain my reasoning, that I also shared with Reviewers #1 and #2.

      I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals. The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Also, if you could include some reviews of papers that were reviewed double-blind, you could test whether the gender-related differences in peer reviews are ameliorated by double-blind reviewing. Nature Comms (and many other journals with open review) do have some double-blinded papers, and there is evidence that that double-blinding is preferentially selected by authors who think they will experience discrimination in the peer review process (DOI: 10.1186/s41073-018-0049-z), and also that double-blinding does ameliorate bias (DOI: 10.1111/1365-2435.14259), so this seems very relevant to the ideas under study here.

      I note that the PLOS journals allow open peer review, and there is an API for PLOS which one can use to download the reviews for a given paper (e.g. try this query to get to the XML file of a paper which has open peer review: http://journals.plos.org/plosone/article/file?id=10.1371/ journal.pone.0239518&type=manuscript). Using an API could allow this project to be scaled up, because you can programmatically search for the papers with open reviews, download those reviews using the API and some code, and then score them using the same ChatGPT-based methods used for Nature Comms. Also, Publons recently merged with Web of Science (Clarivate), and you can now read all the open peer reviews on Web of Science for papers which had open review (e.g. for this paper: https://www-webofscience-com.napier.idm.oclc.org/wos/woscc/fullrecord/WOS:000615934800001). It would be possible to write to Web of Science, request access to their data or search engine, and programmatically download many thousands of papers and their associated reviews, and then use ChatGPT or a similar AI to score them all (especially if you can pass the reviews to ChatGPT for scoring programmatically, instead of manually copy-pasting the reviews into the chat box one at a time as it appears was done in the present study).

      These are great suggestions, and I have different plans for follow-up studies, including the use of APIs to download large batches of peer reviews. The analyses in this paper have been performed in February of this year, even before the ChatGPT API had been released, which did not let me automate the process at that time. As a result, these analyses have been performed manually. I realize that the field is moving rapidly, and that there are now different options to scale this up quickly.

      I plan on using the suggestions from this Reviewer for follow-up experiment in a next paper, and publish this revision as a proof-of-concept paper. In this way, different researchers can optimally use ChatGPT-based sentiment analyses for similar studies without a delay.

      As you acknowledge, there is a selection bias in this study, since you only include papers that were ultimately published in Nature Comms (missing reviews of papers that were rejected). This is a really big limitation on the usefulness of some of your analyses. For example, you found no relationship between author institutional prestige and reviewer sentiment. This could be evidence of a fair and impartial review process (which seems unlikely!), or it could be a direct result of selection bias (specifically a "collider bias", like the famous example involving height and skill among professional basketball players). The likelihood that a paper is published is positively related both to its quality and the prestige held by the authors, we might expect a flatter (or even negative) correlation between prestige and reviewer sentiment among papers that were published than among the whole set of papers (like how the correlation between height and speed/skill is less positive among NBA players than among the general population, since both height and speed/skill provide advantages in basketball).

      I agree with this reviewer that the selection bias is a major limitation of this study. I rewrote much of the Abstract and Discussion to tone down claims, and more prominently discuss the limitations of this study. I also made several suggestions for follow-up experiments.

      In the section "Consistency across reviewers", you write that there was little similarity between review sentiment scores from different reviewers from the same paper, and then write "This surprising result indicates high levels of disagreement between the reviewers' favorability of a paper, suggesting that the peer review process is subjective." However I disagree with this conclusion for three reasons:

      • Firstly, your dataset only includes papers that were published, and thus there is a selection bias against manuscripts where both/all reviewers disliked the paper - the removal of this (probably large) set of reviews will add a (potentially very strong) downward bias to your estimate of how consistent the review process is (since you are missing all those papers where the reviewers agreed). I think that one cannot properly answer the question "are reviewers consistent in their appraisals" without having access to papers that were rejected as well as those that were accepted.

      I agree with this reviewer that there is a selection bias in this study, which I acknowledged throughout the initial submission of this manuscript. Indeed, having access to reviews of rejected papers will greatly increase my confidence in this finding. However, if there is consistency across reviewers in the entire pool of (post-review rejected+accepted) manuscripts, some of that has to trickle down into the pool of accepted papers. The correlation between sentiment scores of the different reviewers is so strikingly low (or even absent) that I simply cannot envision a way in which there is consistency across reviewers in the pre-editioral decision stage. Yet, I realize that this point is debatable. Therefore, I changed the phrasing of the Discussion section, including the following sentence:

      That being said, the extremely low (or even absent) relation between how different reviewers scored the same paper was striking, at least to this author.

      • Secondly, the method used to assess whether the reviews for each paper tend to be similar (shown in Figure 3b) does not fully utilize the information contained in the data and could be replaced with another method. (In the paper 3 univariate regressions compare the sentiment scores for R1 vs R2, R1 vs R3, and R2 vs R3, which needlessly splits up the data in the case of papers with more than 2 reviewers, reducing power.) You could instead calculate the intraclass correlation coefficient (aka 'repeatability'), to determine what proportion of the variance in sentiment scores is between vs within papers (I suggest using the excellent R package rptR for this). Note that the sentiment scores are not normally distributed, and so regular regression (as you used) or one-way ANOVA (which you might be tempted to use for the ICC calculation) are not ideal - consider using a GLM or transformation (the rptR package automates the tricky calculation of repeatability for generalized models).

      I thank this reviewer for pointing me towards this option. I added this analysis to Fig. 3b, which confirmed the inconsistency in sentiment scores for reviews of the same paper (ICC = 0.055). As suggested by this reviewer, I decided to perform the ICC on log-transformed data, as ICC calculation is very sensitive to non-normally distributed data.

      • Thirdly, an alternative and very plausible hypothesis for this lack of similarity (besides peer review being highly subjective) is that ChatGPT is estimating the "true sentiment" of a review (i.e. what the reviewer intended to say) with some amount of error (e.g. due to limitations/biases in the AI, or reviewers struggling to make themselves understood due to issues such as writing in a second language, typos, or writing under time pressure), which dilutes the similarly in the estimated sentiment of the reviews. In other words, if the true sentiment values are strongly correlated, but there is random error in how those values are estimated by ChatGPT, then the correlation between reviewer scores for each paper will tend to zero as the error tends to infinity. Furthermore a nebulous quality like "sentiment" cannot be fully summarised in a single variable running from -100 to +100, and if you had used a more multi-dimensional classification system for the reviews (or qualitative assessment by human readers) you might have found that there is a bit more correspondence (I'm speculating here, but I think you cannot really exclude this and the paper doesn't mention this limitation).

      This point is well taken. I added caveats to the Discussion section on Page 5. Altogether, after taking these caveats into account, I do believe that this analysis convincingly demonstrates subjectivity in the peer review of this subset of papers. That said, I hope that my re-written discussion and additional analysis have added the necessary nuance to this point.

      In Figure 3C, you write "Contribution of paper scores to review time". This strongly implies to the reader that the sentiment scores inferred for the reviews have a causal effect on the review time. This is imprecise writing (since the scores were calculated by you after the papers were published, and thus cannot be causal - you mean that the actual reviews affected the review time, not the scores), but more importantly you cannot infer any causality here since your dataset is observational/correlational. You could fix this by re-phrasing to emphasise this, e.g. "Statistical associations between paper scores and review time".

      This is a very good point raised by this reviewer. I have corrected the phrasing so it no longer implies causality.

      For the analysis shown in Figure 4d and Figure 4e, I am not certain what you mean by "data split per lowest/median/highest sentiment score". This is ambiguous, and I am also not sure what the purpose of this analysis is or what it shows - I suggest re-writing for greater clarity (and ideally providing the code used in all your analyses) and perhaps revising the analysis. Additionally, an important missing piece of information from this analysis (and most analyses in the paper) is the effect size. For example, you don't report what is the difference in politeness score and sentiment score between male and female authors, and what is the SE and 95% CIs for this difference. From eyeballing the figure, it looks like the difference in politeness is about 4 points on your 200point scale - this is small in absolute terms, but might be quite large in relative terms given that "politeness score" usually hovered around a small part of the full 200-point scale. What is this as a standardised effect size (i.e. in terms of standard deviations, as captured by effect sizes like Cohen's d and Hedges' g)? Calculating this (and its 95% CIs) would allow you to say whether the difference between genders is a "big effect", and give an idea of your confidence in your effect size estimate and any inferences drawn from it. You even discuss the effect size in your discussion, so it would help to calculate the standardised effect size. If you're not familiar with effect size and why it's useful, I found this paper very instructive: https://onlinelibrary.wiley.com/ doi/abs/10.1111/j.1469-185X.2007.00027.x

      I agree with this reviewer that this phrasing was ambiguous. I now rephrased this on Page 4 of the manuscript:

      To study whether these more impolite reviews for female first authors were due to an overall lower politeness score, or due to one or some of the reviewers being more impolite, I split the reviews for each paper by its lowest/median/highest politeness score. I observed that the lower politeness scores for first authors with a female name was driven by significantly lower low and median scores (Fig. 4d, bottom panel). Thus, the least polite reviews a paper received were even more impolite for papers with a female first author.

      I also added effect sizes of the significant effects from Fig. 4 to its figure legend. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      "Double-blind peer review has been debated before, but has come under scrutiny for various reasons" - this is vague and unhelpful. I think it's worthwhile to properly engage with the debate and the substantial body of evidence in your paper, given your main focus is on potential bias in the review process based on authors' identities (e.g. gender, institutional prestige).

      I thank the reviewer for pointing this out. I rephrased this sentence to indicate that there is evidence that it helps to remove certain forms of bias (Page 5):

      To address this issue, double-blind peer review, where the authors' names are anonymized, could be implemented. Evidence suggests that this is useful in removing certain forms of bias from reviewing8,9, but has thus far not been widely implemented, perhaps because some studies have cast doubt on its merits21,22.

      I have also added a Supplementary Fig. 6 to this paper, in which I lay out how my tool can be used to study bias by applying it to single- and double-blinded reviews (see also my answer to the other question about this topic below).

      On a related note, in the first paragraph, when discussing the potential of single-blind review to allow reviewers to essentially discriminate against papers by women, there is a key missing citation. This year, the first truly experimental test of this hypothesis was published (DOI: 10.1111/1365-2435.14259); a journal conducted a randomised controlled trial in which submitted manuscripts were reviewed either single- or double-blind. They found no effect of author gender on reviewer ratings or editorial decisions (though there was an effect of review type on success rate of authors from different countries). It would be better to cite this instead of reference 6, which as you acknowledge is methodologically flawed. This paper is also worth a read given your focus on Nature journals: DOI: 10.1186/s41073-018-0049-z.

      This point is well taken. I now cite this paper (citation #8) and rephrased this part of the Introduction (Page 1).

      "Another - arguably more simple - solution [compared to double-blind peer review] could be for reviewers to be more mindful of their language use." Here, you seem to be saying that we don't need to blind author names during peer reviewers, because it would simpler if all reviewers were simply nicer! I object to this because A) double-blind review is easy to implement, and greatly reduces the opportunity to tune the review to the author's identity (and there is some experimental evidence that it works in this regard), and B) it seems like wishful thinking to say that we don't need to implement measures that reduce the scope for bias, because all reviewers could instead stop using impolite language.

      This is a very valuable comment. I rephrased this to emphasize that this is an additional measure.

      "reviewers may want to use ChatGPT to extract a politeness score for their review before submitting" Yes, that's an interesting idea, and I can imagine that some (probably small) proportion of reviewers will be interested in doing this. But I think you should think bigger about wholesale changes to the review system that are possible because of AI like ChatGPT. For example, the submission platforms where reviewers submit their reviewers (e.g. ScholarOne, Manuscript Central) could be updated to use AI to pre-screen draft reviews, and issue a warning to reviewers, like "Our AI assistant has indicated that the writing in this review might be impolite (example phrases here) - would you like to edit your review before you submit it?" Also, reviewcredit platforms like Publons could display not only the number of reviews that someone wrote, but an AI-generated assessment of how constructive, detailed, and polite their reviews are (this would help nudge people into writing better reviews, and also give credit where it's due to careful reviewers, which is part of the aim of Publons and similar platforms). This is just off the top of my head - there are many other good ideas about how AI could transform the peer review process. Indeed, AI is already good enough to generate quite useful peer reviews and constructive criticism of draft papers, and will surely get better at this... this surely has lots of implications for science publishing over the coming decades.

      These are great suggestions for implementation of this tool. I now end the first paragraph of the Discussion (Page 4) with the following sentence:

      Such an automated language analysis of peer reviews can be used in different ways, such as afterthe-fact analyses (as has been done here), providing writing support for reviewers (for example by implementation in the journal submission portal), or by helping editors pick the best papers or most constructive reviewers.

      "Further research is required to investigate the reasons behind this effect and to identify in what level of the academic system these differences emerge." Here you could mention what this research would be - I think you'd need the full sample of reviewed papers, not just those that were accepted. Spell out what analyses would be required to test and falsify the various (very plausible and interesting) competing hypotheses that you mention for the male-female difference in sentiment scores.

      Great point. I added a Supplementary Fig. 6, in which I show a visual depiction of the experiments that can be performed to answer these questions.

      "areas of concern were discovered within the academic publishing system that require immediate attention. One such area is the inconsistency between the reviews of the same paper, highlighting the need for greater standardization in the peer review process." I disagree here. I think it is natural for there to sometimes be differences in how two or more reviewers rate the quality of a paper, even if the peer review process were carefully standardised (e.g. via the use of a detailed "peer review form", which helps guide reviewers to comment on all important aspects of the paper - some journals use these). This is because reviewers differ in their experience, expertise, or interests, and so some reviewers will catch mistakes that others miss, or request stylistic changes that others would not. More broadly, it's often not possible to write a version of the paper that satisfies all possible reviewers.

      I re-phrased part of the Discussion on Page 5 to indicate other sources of inter-reviewer variability. Specifically, I mention that some variability in sentiment can be expected based on the different backgrounds of the reviewers:

      Notably, some level of variability may be expected, for example due to different backgrounds, experiences, and biases of the reviewers. In addition, ChatGPT may not always reliably assess a reviews sentiment, adding some spurious inter-reviewer variability.

      Yet, as also mentioned in my response to one of the previous questions, I still find the the extremely low levels of consistency striking, even after taking these possible sources of interreviewer variability into account.

      "the maximum score an institution could receive was 100 (in 2023 this was Massachusetts Institute of Technology)" - this seems unnecessary information (just mention the score runs from 0-100).

      I agree with this reviewer that this was unnecessary information. This has been removed.

      "reviewers are generally familiar with the senior author of papers they review and thus are likely aware of their gender identity." This seems like a strong assumption, and you don't provide any evidence for it Speaking personally, as a reviewer and journal editor I am often not familiar with the senior author, or I am familiar with the first author - I am not sure how often I know the senior author but not the first author or vice versa. It's also not always the case that the first author is a junior scientist and the last author a senior, famous one, as you imply. I suggest that you use the same approach to score the gender of both author positions, namely inferring their gender programmatically from their name (I agree that generally the important thing for the purposes of this study is the gender that reviewers will infer from the name, not the author's actual gender, and so gender estimation from first names is the correct approach).

      I appreciate this reviewer raising this point, and I have carefully weighed the pros and cons of both approaches. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process. I now more elaborately explain why I made this decision on Page 7 of the manuscript.

      In the Abstract, you write "suggesting a gender disparity in academic publishing". This part of the sentence contains no information about what you think is the cause of the male/female difference, and no further interpretation of its ramifications, so I think you can just remove it (because "disparity" just means a difference, so you are effectively saying something redundant like "there was a difference between papers with male and female senior authors, suggesting there is a difference")

      I thank the reviewer for pointing this out. I replaced the latter part of this sentence with “(…) for which I discuss potential causes.”, which I think is better than a short summary of potential causes which may lack the nuance that such a topic deserves.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we would like to again thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article. With those comments in mind, we have now revised our manuscript. Please see below for a point-by-point response (our responses in green) to all comments.

      Reviewer #1 (Recommendations For The Authors):

      Sun and colleagues outline structural and mechanistic studies of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. The manuscript includes a crystal structure of the Ig-like domains of PrgB, cryo-EM structures of the majority of the intact polypeptide in DNA-bound and free forms, and an assessment of the phenotypes of E. faecalis strains expressing various PrgB mutants.

      Generally, the study has been conducted with a good level of rigor, and there is consistency in the findings. However, I do have some specific technical concerns relating to the study that necessitate the undertaking of additional experiments. These are summarized as follows:

      1) Recombinant PrgB188-1233 produced in the study purifies as a mixture of monomeric and dimeric species separatable by SEC. There is very limited discussion in the text re. the significance and/or implications of this. Is it feasible that the dimeric form is biologically relevant in the context of the in vivo situation? Or alternatively, is this simply an artifact of protein production?

      Experimental data that we published in 2018 indeed indicates that the dimer is relevant in the in vivo situation. We did not discuss this here since this was discussed in detail in the previous paper: Schmitt et al, 2018. We have now added a bit more information on this in the results section, highlighting this, so that it is clearer to the reader (lines 114-116).

      2) The authors see no evidence of the adhesive domain of PrgB in their PX structure highlighting that this must have been cleaved during crystallisation. Is this claim supported by an inspection of the crystal packing? It could be that this region of the protein is dynamic within the context of the crystal and is thus not observed. This should be clarified in the text either way.

      The crystal packing does not provide any space for the PAD. We have added this to the results section. We have added a sentence describing this in lines 122-124.

      3) The Cryo-EM structures reported are both at ~10-angstrom resolution. Are the authors truly confident in the placement of their crystal structures on these maps? Visual inspection indicates that their positioning of the PrgB domains into the EM envelopes is somewhat questionable. The authors need to provide some quantitative measures of the quality of their domain fitting. The narrative of the manuscript very much hinges on this being correct.

      This is something that the other reviewer also commented on. The fitting of the crystal structures in the maps are indeed not optimal, but was the best we could do with the available data. In line with point #6, we have now constructed new protein variants of the stalk domain (the four Ig-like domains) alone, and have assayed it’s interaction with the PAD in vitro using native gels and size exclusion chromatography. The outcome of these experiments is that the two domains do not interact in any substantial way on their own. Thus, the added experiments do not support the hypothesis that the PAD interacts with the Ig-like domains, at least not without the local high concentration provided by the linker region in the in vivo situation.

      To account for these new experiments, we have moved the cryo-EM structure to the supplement, and rewritten this part of the manuscript to say that the cryo-EM data indicated that there might be an interaction, but that we have not been able to verify this in vitro, indicating that if the interaction at all exists it must have a low affinity and is likely not physiologically relevant. In line with this, we have also further modified the text throughout the manuscript to account for this.

      4) The manuscript would be significantly strengthened if the authors could include confirmatory hydrodynamic data in support of the observed conformational reorganization of PrgB in the presence of DNA. SAXS analysis of the DNA-free and bound complexes would be ideal for this and would also help address the issues raised above in pt 3.

      To analyze PrgB radius with and without DNA, we tried both SEC-MALS and DLS experiments. It proved difficult to obtain precise and reproducible values, but the initial data indicated that no large changes were observed upon DNA binding. As we could also not measure specific interaction between the PAD and the stalk in vitro, we did not perform SAXS experiments. As mentioned in the response to point #3, we have modified the results and discussion regarding the potential interaction of th PAD and Stalk domains.

      5) The authors present binding studies of various PrgB mutant-expressing strains. A number of the mutations generated delete significant portions of the polypeptide. Can the authors confirm that these mutant proteins are correctly folded despite the introduced mutations? It could be that loss of function is simply a consequence of mutation-induced misfolding. I would like to see some confirmatory data (CD, SEC, etc.) in support of the foldedness of the mutant proteins.

      We cannot completely rule out that the folding of some of the variants is affected in E. faecalis. However, CD or SEC experiments would only give indications of the contrary if the overall fold had been majorly affected in an in vitro situation where the protein is not anchored to the E. faecalis cell wall.

      To alleviate this valid concern, we probed if all variants are correctly exported and linked to the cell-wall. Therefore we have now extracted the cell wall of E. faecalis producing wild-type or variant PrgB and performed Western blot . The results of the Western blot with cell wall extract largely matches the whole cell experiments that were in the initial manuscript. If a protein variant was largely misfolded, it would likely not be targeted and linked to the cell-wall, nor would it be stable in vivo. We have added this new data as a new fig 3 – figure supplement 1 and on lines 201-214

      6) The authors suggest a direct interaction between the PAD and the stalk domains in PrgB. The discussion of this is very generic and no evidence to support this is provided other than the 10-angstrom resolution EM map. If they believe this to be the case, then additional evidence should be provided.

      Answer: As mentioned previously, we have now performed additional in vitro experiments to probe this potential interaction, but conclude that this indication from the EM data is likely not a real high affinity interaction. In line with this, we have modified the results and discussion regarding this point, see also response to point #3 and 4.


      Reviewer #2 (Recommendations For The Authors):

      As currently presented, I don't feel that the cryoEM data support the authors' proposed model, largely because the fit of the crystal structures to the EM volumes does not seem entirely reasonable for the apo- dataset and because the EM volume for the ssDNA bound dataset is not even contiguous. For me to believe the model as it is currently built, I would want to see a dataset with the PAD deleted, showing that its proposed density disappears, or a dataset with a PAD-specific antibody as a fiducial marker. It would be nice to see some goodness of fit metric with a comparison to other crystal structures fit such low-resolution data as well. At the very least, the authors must include the standard cryoEM workflow supplementary figure showing representative micrographs, 2Ds, and 3Ds along with particle numbers.

      In line with the comments raised by reviewer #1, we have now added more experiments where we have analyzed the potential interaction between PAD and the stalk domain. From this new data, it looks like they do not interact with any substantial affinity, at least not on their own without any linker region holding them together, and that this interaction if it all exist likely is not physiologically relevant. The cryo-EM data has been moved to the supplement as we agree with both reviewers that the resolution, and the fitted model, is not good enough to draw any hard conclusions. The standard table for the cryoEM workflow was present as supplementary table 2, where eg particle numbers etc are described, but we have now also added a new supplementary fig 2 – figure supplement 2 that shows the EM processing workflow, including representative micrographs, 2D and 3D classes. We debated whether we should remove the EM data, but decided against it in line of transparency and to explain why the interaction studies with the PAD and stalk domains were performed.

      The X-ray crystallographic structure is very nice, but I was a bit surprised by the R factors in Table 1. After downloading the structure factors and coordinates from the PDB (thank you for depositing before submission!) I was able to see quite a few positive peaks in the difference map that could probably use some cleaning up. I realize I may just be a bit of a masochist when it comes to adding/deleting waters and moving around side chains to get things just right, but for such lovely data, I would have liked to see the model polished up a bit more. I was going to say that the isopeptide bond should be modelled, but I can see from a cursory Google that the authors did in fact try to find a way to model this and that it is indeed a bit of a pain.

      The model refinement proved surprisingly recalcitrant with regards to the remaining difference density, so we took the decision to only model what was solidly there (which leads to slightly higher R factors). We did indeed try to model the isopeptide bond, but we did not find a good way to do so (despite trying quite extensively), and ended up determining them as a linker in the PDB file, so that the bond shows up when one opens the structure in eg. Pymol.

      For protein production/purification in general I would have liked to see actual traces for the gel filtration and pure protein on a gel in a supplementary figure. I strongly believe that this type of information is so critical for future researchers looking to replicate or build upon published work so that they have some sense that what they are doing is working in the way it should be.

      We have now added a supplementary figure (as new Fig. 1 – figure supplement 1) that shows SEC and SDS-PAGE for the purification of PrgB188-1233.

      Finally, I think for the in vivo data it only makes sense to show the reader whether any or all the differences measured across your different mutants are statistically significant. Having done the graphing and analysis in GraphPad this should be a simple thing to achieve.

      We have now added statistical test (One way Anova) that show the statistical significance between the mutants, and show that in Fig 3 and Fig 4.

      Overall, I think it's a very nice paper and while I feel that the cryoEM data in its current form doesn't support the model of occlusion from PrgA, I also don't think that removing the cryoEM data and that specific mechanistic idea from the paper detracts from its overall message and impact.

      Thank you for those comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      p. 5, l. 87-90: The control of flgM by OmrA/B (PMID 32133913) and the antisense RNA to flhD (PMID 36000733) are other examples of known regulatory RNAs that impact the flagellar regulon.

      We thank the reviewer for pointing out these references and have added citations to them (page 5, lines 87-91).

      p.11/Fig. 3: it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA. I realize that it is outside of the scope of this study, but have the authors considered the possibility that ArcZ or McaS could have a role in the previously reported repression of rpoS by LrhA (PMID 16621809)?

      We agree that it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA, and added mention of this regulatory connection (page 12, lines 247-250).

      p. 13/l. 272: I do not understand why the authors say that "r-proteins were almost exclusively found in chimeras with MotR and FliX and no other sRNAs...", given that several other chimeras between r-prot and other sRNAs are found

      While some r-proteins encoding genes were found with other sRNAs in RIL-seq datasets, MotR and FliX generally had the highest numbers. The text was revised to better describe the RIL-seq data for r-proteins interaction partners (page 14, lines 291-295), and a new panel showing the S10 operon with all the interacting sRNAs was added to Figure 3—figure supplement 1B.

      Fig. 4 and 5: One possible improvement would be to more systematically assess the effect of base-pairing mutants of the sRNAs, such as MotRM1 or FliXM1 on fliC and rps/rpl genes in vivo. This is especially important for the mutants that affected the sRNA effects in the in vitro probing assays, such as UhpU-M2, MotR-M1 and FliX-S-M1 on fliC (Fig. S7)

      As suggested, we examined fliC mRNA levels across growth in motR-M1 and fliX-M1 chromosomal mutants. The results of these northern assays, now shown in Figure 8—figure supplement 1, are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background (page 21, lines 444446, 449-453).

      Fig. 5: it may be worth including a schematic of the whole S10 operon to highlight its length and its organization?

      As suggested, a schematic representation of the S10 operon was added to Figure 3—figure supplement 1 with a summary of the RIL-seq data for this operon.

      Probing data (Fig. 5, S7 and S9): in general, it is difficult to differentiate the thin and thick brackets, and what is indicated by the dashed brackets is not always clear. Maybe using a color-code instead could help? Highlighting the predicted pairing regions on the different gels could be useful as well.

      We thank the reviewer for this suggestion and color-coded the brackets (Figure 5, Figure 4figure supplement 2, and Figure 5-figure supplement 2). The correspondences to regions of predicted pairing are described in the figures legends.

      Fig. S10: The experimental evidence used to support FliX-dependent degradation of the rpsS mRNA is indirect (primer extension to observe higher levels of cleavage intermediates). It would be nice to be able to observe a decrease in the mRNA levels as well, either by Northern, or primer extension from a region more distant to the FliX pairing site.

      The S10 operon is long (~5 KB). We have tried multiple probes for this mRNA and detect many bands with each, likely due to extensive regulation of this operon. We think teasing out the origin of the different bands to appropriately interpret changes in patterns will require a significant amount of work.

      legend of Fig. S10: from the gel, it seems that only the plasmids differ in the samples, and it is not clear where the data corresponding to the WT strain mentioned in the legend is shown

      The samples shown in this figure are all for the indicated plasmids in the WT strain. We corrected the figure legend.

      Table S1: please define the NOR (normalized odds ratio?)

      The definition of Normalized Odds Ratio was added to the legend of Supplementary file 1.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figure 1B. Please add a negative control (which could be in the supplementary section) from a large section showing transcripts that are not directly influenced by Hfq.

      We think the flgKLO browser in this figure serves as a negative control; flgK and flgL clearly are not enriched on Hfq in contrast to FlgO. Figure 1B was generated using published datasets that are easily accessible to the readers at a genome browser and show many other examples of transcripts that are not influenced by Hfq: https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hpc.nih.gov/~NICHD- core0/storz/trackhubs/ecoli_rilseq/hub.hub.txt&hgS_loadUrlName=https://hpc.nih.gov/~NICHDcore0/storz/trackhubs/ecoli_rilseq/session.txt&hgS_doLoadUrl=submit

      Line 158. MotR* is a more abundant version of [the constitutively overexpressed] MotR. Is there a Northern or qPCR to confirm this? While I understand the relevance of these mutated constructs, their high expression can lead to artefactual effects.

      This is a valuable point and therefore we provided a northern blot to document the relative levels of MotR and MotR* (Figure 2—figure supplement 1A).

      Figure 2. The overexpression of MotR/MotR* from a plasmid is increasing the number of flagella. However, when the MotR gene is deleted, is there a reduction of the number of flagella? Same question with FliX: what happens when the fliX gene is deleted? According to the model described in the manuscript, we should expect fewer flagella in ΔmotR background and an increased number of flagella in ΔfliX background. Both Figure 2 and Figure 8 would benefit from additional experiments with deleted motR and fliX genes.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provided such data in Figure 8 and Figure 8—figure supplement 1 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. The chromosomallyexpressed MotR-M1 and FliX-M1 base pairing mutants did show the expected phenotypes of reduced and increased numbers of flagella, respectively (Figure 8A-B). As suggested by reviewer 1, we added northern analysis that examined fliC mRNA levels across growth in motRM1 and fliX-M1 chromosomal mutants. The results of these northern assays are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with the expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs, respectively.

      Figure 3 is key to demonstrating the sRNAs pairing with their specific targets and potential effect on bacterial swimming. However, these results would be more relevant with endogenous expression of the sRNAs and demonstration of their effects on the same targets. A Northern blot showing the overproduced sRNA level compared to endogenous sRNA level could help us appreciate the expression ratio.

      The levels of the UhpU, MotR and FliX expressed from the overexpression plasmids are at least 100-fold higher than the endogenous levels. Thus, we agree that assays of chromosomal deletion/point mutants are important experiments. We did construct chromosomal uhpU-M1 and uhpU∆seed sequence mutants. However, under the conditions assayed, the uhpU chromosomal mutations did not result in observable effects on motility or FlhD-SPA protein levels. It is possible we would be able to detect differences between the wild type and uhpU chromosomal mutant strains under different growth conditions or in different assays, but this would require a significant amount of work. For many other sRNA chromosomal mutations have no or only subtle effects, suggesting redundancy between sRNAs or sRNA roles in fine tuning gene expression.

      Figure 4. In panel B, the empty plasmid pZE alone seems to positively affect the flagellin expression when compared to the WT background. This can also be seen in Figure 4C. There is no fliC signal with empty plasmid pBR* but a strong fliC signal with empty plasmid pZE. Maybe the authors can explain this in the manuscript.

      With respect to panel B and Figure 4—figure supplement 1A, we agree that there is some variation between the levels of flagellin in the WT and pZE control samples, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4— figure supplement 1 to better document the changes in flagellin levels.

      With respect to panel C, the pBR samples were collected in crl+ background while the pZE samples were collected in crl- background, which explains the lack of fliC signal in the pBR control sample. This is now noted in the figure legend.

      In lines 154-157, the justification for using two plasmids is described. An IPTG-inducible Plac promoter, the pBR*, is used because the constitutive overexpression of UhpU is resulting in mutated UhpU clones. These observations suggest a toxic expression level of UhpU that the cell can only tolerate when the UhpU RNA is somewhat deactivated by mutations. This does not seem like a detail and could be discussed further.

      We agree with the reviewer that this observation is important and now mention that it suggests at a critical UhpU role (page 8, lines 160-163).

      Figure 5E and I. While the bindings of MotR on rpsJ and Flix-S on rpsS are clear, the resolution of both gels in the areas of binding (upper part of both gels) could be improved.

      We found it tricky to choose the mRNA fragments for the in vitro structure probing for the regions of predicted pairing internal to CDSs. Given that we hoped to retain native RNA folding, we chose long fragments; for rpsJ, we started with the +1 of S10 leader and for rpsS, we started 147 nt into the CDS, a region that overlaps the region that was cloned to the rpsS-rplV-gfp fusion. Consequently, the region of base pairing is in the upper part of both gels. The gels were already run for an unusually long time. Thus, we do not think the resolution could be improved further. Nevertheless, we think the region of protection is evident for both mRNAs.

      Minor comments:

      Fig 1B. The promoter symbols are extremely small, please increase the size.

      As suggested, we have enlarged the promoter symbols in Figure 1B as well as in Figure 3A.

      Line 211. "the lrhA mRNA has an unusually long 5´ UTR". How long exactly?

      The 5’ UTR of the lrhA mRNA is 371 nt long. This is now mentioned in the text (page 11, line 224)

      Line 320. Should "Fig 9C" be "Fig S9C" instead?

      We thank the reviewer for noticing this typo. Callouts to supplementary figures have now been renumbered per eLife format.

      Line 384. Something seems to be missing in the sentence "a representative combined class 2 and 3 promoter".

      The sentence has been modified to clarify the designation (page 19, lines 409-411).

      Reviewer #3 (Recommendations For The Authors):

      Recommendation to clarify/strengthen the presentation of science in the paper:

      Lines 102-103: Can the authors provide some more information on how the sRNAs were initially discovered to be potentially sigma-28 dependent and selected?

      As suggested, we expanded the section discussing the discovery and the selection of these sRNAs (page 6, lines 104-109).

      Lines 192-193: It would be helpful to provide a bit more information in the main text about what are the different RIL-seq data sets (18 in total).

      As suggested, we now provide more details about the different RIL-seq datasets we used in the analysis (page 10, lines 202-205).

      It would be helpful to specify the criteria for "top" interactions in targets retrieved from RIL-seq data (Table S1 and text, e.g., line 273): e.g. number of conditions, number of chimeras, etc.

      As suggested, we now more explicitly specify the criteria for selecting targets to characterize (page 10, lines 205-206).

      Fig. 4B/ S6 and line 242: The flagellin amount in the empty vector control (pZE) looks higher than in WT, and the stated effect of MotR/MotR* OE on flagellin is not very clear from the blot. The "cross-reacting band" above flagellin also seems to vary among strains. Could the authors include a quantification of flagellin protein amount and normalize relative to a housekeeping protein (e.g., GroEL), instead of Ponceau S as loading control?

      We agree that there is some variation between the levels of flagellin in the WT and pZE control sample, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4—figure supplement 1 to better document the changes in flagellin levels.

      Figure legends: It would be helpful to have a bit more information about the method used/displayed image rather than stating results in the legends.

      As suggested, we now provide a bit more information about the methods used/displayed image in the figure legends to allow for easier comprehension of the data presented in the figures (while trying to balance this with the length of the legends).

      Fig. 2: Please include a scale for all electron microscopy images or, if it is the same for all panels, state it in the figure legend. Moreover, the same image is used for the pZE control in panel C, E and Figure S4A/C. It would be better to show different fields of bacteria for the pZE sample.

      As is now mentioned in the legends to Figure 2, Figure 2—figure supplement 2, and Figure 8, the same scale was used for all panels. We thought it was better to show the same image for the pZE control in the different panels to emphasize that these samples were all analyzed on the same day.

      Fig. 2: The sRNA OE strains seem to show some heterogeneity in cell length (pZE-MotR) or width (pZE-FliX). The authors could, e.g., check whether this is a phenotype correlated to sRNA OE by quantifying these parameters for different fields and comparing to WT or comment on this in the text if this is not consistently seen.

      We also were intrigued by the slightly different sizes and widths of cells in the EM images. However, our statistical analysis did not reveal significant differences between the different samples. We now comment on this (page 53, lines 1178-1179).

      As a follow-up to this study, it would be interesting to assess the impact of MotR and FliX regulation of ribosomal protein synthesis on overall ribosome activity (e.g., via Ribo-seq), also considering that antitermination regulates rRNA transcription. In the case of MotR, the authors suggest that MotR upregulation of S10 protein might not only impact antitermination, but also lead to the formation of more active ribosomes that would increase flagellar protein synthesis (lines 359-362). However, in the RNA-seq performed in OE MotR* several transcripts encoding rRNA and ribosomal proteins are significantly downregulated compared to EVC (Supplementary Table S2). Could the authors comment on this?

      We share the reviewer’s enthusiasm for follow-up work and thank for the suggested experiments. We hope we will be able to decipher the full mechanism of MotR and FliX action on ribosomal protein synthesis in future experiments. The observation that some ribosomal protein-coding gene levels are reduced in the RNA-seq experiment with overexpression of MotR* is interesting but we do not have an explanation other than the fact that the samples were collected early in exponential growth. We now mention the observation in the text (page 19, lines 404-407).

      Considering that OE of the WT MotR appears to increase fliC mRNA abundance but has no strong impact on flagellin protein levels, can the authors speculate what is the physiological relevance of MotR* for flagellin production?

      We agree that while we do see significant increases in the flagella number and fliC mRNA abundance with MotR and MotR* overexpression, the western analysis did not reveal a striking increase in flagellin levels and also wonder how MotR strongly increases the flagella number, which requires flagellin subunits, but only has a weak effect on the intercellular levels of flagellin. One possibility explanation is that it is more difficult to see significant increases for a protein whose levels are high to begin with. These points are now discussed (page 13, lines 264-269).

      Fig. 4C: The pZE samples seem to show variable expression of fliC mRNA although the samples are collected at the same timepoints. Try to clarify in the text.

      The northern membrane on the bottom was exposed for a longer time due to the lower fliC mRNA levels in the samples with FliX overexpression. We now note these differences in the legends to Figure 4 and Figure 4—figure supplement 1.

      Fig. 7/S13: While a volcano plot for MotR is shown in Fig. 7A, quantification of GFP reporter fusion regulation is shown for MotR. Quantifications of MotR are shown in Fig. S13. Maybe swap the figures.

      Given that the data for MotR are in the supplement figures for all other figures we would also like to retain this distribution for Figure 7 (aside from the volcano plot since this experiment was only carried out for MotR).

      Lines 135-136 (Fig. S1B): on the northern blots, only sRNA levels of MotR are comparable between rich and minimal media (excluding M63 G6P and M63 gal). Most other sRNA seem to be more abundantly expressed in minimal media conditions compared to LB. Maybe rephrase.

      As suggested, the text was revised to point out the differences in the sRNA levels for cells grown in different growth media (page 7, lines 140-144).

      Lines 229-234: this paragraph seems not directly connected to the aims of the study (i.e., no effect on motility tested of these other sRNAs) and could be removed (or moved to discussion).

      We appreciate the reviewer’s suggestion but, considering Reviewer 1’s comments, think that showing the regulation of lrhA by other sRNAs has value in highlighting the complexity of the regulatory circuit. We have revised the text to incorporate Reviewer 1’s suggestions and better explain why these results are intriguing (page 12, lines 247-250).

      Line 200 and Fig. S5: For FlgO sRNA only one target was identified in RIL-seq. This gene could be specified and labeled in Fig. S5 and the text. Does FlgO also bind ProQ?

      We now mention the single FlgO target (gatC) detected in four datasets (page 10, lines 213215). In Figure 3—figure supplement 1, we labeled only targets that we followed up with in the current study. Therefore, to be consistent, we prefer not to label gatC in the FlgO plot. FlgO was found to co-immunoprecipitate with ProQ but at much lower levels than with Hfq, and to have very few RNA partners (Melamed et al., 2020).

      Lines 493-498: It is mentioned that the four sRNAs were also detected in recent RIL-seq experiments of Salmonella and EPEC. Are any of the here identified targets also found in other species or was none detected as analyses were carried out under conditions that do not favor flagella expression?

      The targets identified in this study were not detected in the Salmonella and EPEC RIL-seq datasets. However, the Salmonella and EPEC experiments were carried out under different growth conditions. Based on the sequence conservation of the Sigma 28-dependent sRNAs across several bacterial species (Figure 8—figure supplement 2), we do think overlapping targets will be found in other bacterial species under the appropriate growth conditions.

      The strongest evidence of MotR dependent target regulation is the one on rpsJ, which does not necessarily require the additional experiments with MotR. Since the authors were able to show upregulation of the rpsJ-gfp reporter upon OE of MotR WT, it would have strengthened the results if they performed the experiments in Fig. S8C with MotR WT. Similary as an increase of flagella number was seen with OE of MotR WT in Fig. 2A, the effect of the OE S10∆loop could be compared to OE MotR instead of OE MotR (Fig. 6A). At least if would be helpful, to briefly comment on why MotR* was used instead of MotR WT for these experiments.

      As suggested, we state MotR was used in some assays given the stronger effects for some phenotypes (page 10, lines 196-197). We think, given that we established MotR and MotR cause the same effects, with increased intensity for the latter, it is reasonable to use MotR* in some of the experiments.

      p. lines 482-491 and 508-511: The authors discuss that both UhpU sRNAs and RsaG sRNA from S. aureus are derived from the 3'UTR of uhpT, but conclude there is no overlap regarding flagella regulation, suggesting independent evolution of these sRNAs. However, the authors also mention that UhpU sRNA has many additional targets beyond LhrA involved in carbon and nutrient metabolism. Thus, maybe regulation of metabolic traits could be a conserved theme and function for UhpU and RsaG? Maybe try to comment on or better connect these two parts in the discussion.

      As suggested, we now comment on the possibility of the regulation of metabolic traits being a conserved theme and function for UhpU and RsaG (page 24, lines 520-527).

      Check the text for consistency regarding the use of italics for gene names (e.g., legend of Figs. 7 and 8)

      The text was corrected.

      Please introduce abbreviations, e.g., G6P (line 139), REP (line 150), ARN (line 258), NOR/U (Table S1 legend)

      As suggested, we now introduce the abbreviations for G6P (page 7, line 142), REP (page 8, lines 155-156), and NOR (Supplementary file 1 legend). Regarding ARN, these sequences are already written in parentheses in the same sentence. However, we revised this to “ARN motif sequences” (page 13, line 278).

      Fig. S1A: Highlight REP sequence mentioned in text (line 150).

      REP sequences are now highlighted in gray in Figure 1—figure supplement 1A.

      Fig. S1C: It would be helpful to list number nt positions on the sRNAs based on full-length transcripts.

      The corresponding positions based on the full-length transcripts have also been added to this figure.

      Fig. S2: Adjust the position of UhpU-S label.

      UhpU-S label position was adjusted.

      Fig. S6: Include UhpU in the figure title.

      UhpU was added to the title.

      Fig. S10: It would be helpful to indicate on the figure (or state more clearly in the legend) which RNA was extracted from WT or ΔfliCX background.

      The samples shown in the Figure are all in a WT strain. We corrected the figure legend accordingly.

      Line 290: the effect is on flagella number, not motility.

      This typo is now corrected (page 15, line 312).

      Fig. S8: One-way ANOVA (panel A legend)

      This typo is now corrected (page 64, line 1433).

      Line 320: Fig. S9C instead of 9C

      We thank the reviewer for noticing the typo. The numbering of the supplementary figures has now been changed to the eLife format.

      It would be helpful to add reference for statement in line 57.

      A reference to (Fitzgerald et al., 2014) was added as suggested.

      Add PMID:32133913 as reference for post-transcriptional regulation of the flagellar regulon in the introduction (lines 87-91)

      The indicated reference was added as suggested (page 5, lines 87-91).

      Legend Fig. S6: expand view -> expanded view

      This typo is now corrected (page 63, line 1406).

      line 513: sRNA -> sRNAs

      This typo is now corrected (page 25, line 549).

      Fig. 8G: Maybe include lrhA as target of UhpU sRNA at top of the cascade.

      As suggested lrhA has been added as a target of UhpU at the top of the cascade.

  3. Sep 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • The improvement of the gene annotations of the ferret genome was an important part of this study, and so I would recommend that the authors have a results section and figure dedicated to documenting this.

      Thank you so much for appreciating our efforts on improving gene models, which was indeed a critical part in this study. According to the reviewer’s suggestion, we added a new section to the main text, “Improvement of the gene model for scRNA-seq of ferrets” with a figure (Fig.1 C, D, E).

      • Are the references to figure S8A, B alright (line 306)? In fact, that entire figure was not well described or out of place. In general, unlike the rest of the manuscript, the section dealing with the human-ferret comparison was a little bit confusing, and the figure legends were not extremely helpful. Could the authors please revisit the main text and figure legends of this section for clarity?

      We agree with the reviewer’s recommendation. We removed references to Figure S8A, B. In place of that, we explained the reason more carefully; “We chose a recently published human dataset (Bhaduri et al, 2021) for comparison, because this study containing GW25 dataset which included more tRG cells than previous studies that did not contain GW25 data. Furthermore, we used only data at GW25”

      We also revised several parts in this section to understand more easily by additional explanations as well as in the legends of Fig. 7 and Fig. S8.

      Reviewer #2 (Recommendations For The Authors):

      I have a few very minor comments on the manuscript.

      • I would caution the authors against claiming that they have demonstrated bona fide generation of ependymal cells from tRG cells. While the expression of FOXJ1 is a very good indication, they have not demonstrated the morphological transformation of a tRG cell into an ependymal cell.

      We agree the reviewer’s opinion. We have never thought that we proved that tRG differentiates ependymal cells, but we consider that this is highly likely the case (We use the term “suggest” in the abstract). To prove this genetically, we extensively tried to knock the EGFP gene into the CRYAB gene by the CRISPR/Cas9 method, to be able to show the lineage relationship between tRG and ependymal cells. However, we have so far failed to do this for a year trial. We also tried to just label tRG with EGFP and follow it in the slice culture.

      However, we failed to keep the slice in the culture until we observed the transition from tRG shape to the ependymal shape. It seems to be a slow process. What we could do was to observe the transition from single cilia to multi-cilia, which is part of the morphological transition from epithelial neural stem cells such as Radial Glia to an ependymal-like sheet form. To prove this transition from tRG to ependymal cells (and also astrocytes) is one of the most important issue which needs some new idea, technique or strategy.

      • There are several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of "OLIG2"

      Thank you so much. We carefully read and corrected typos. We wish we corrected all of them.

      Besides these two points, the manuscript is already prepared to a high standard.

      I really appreciate reviewersʼ efforts to finish reviews in a short time, responding to our request related to the first authorʼs thesis application.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable investigation of the chromatin dynamics throughout the cell cycle by using fluorescence signals and patterns of GFP-PCNA and CY3-dUTP, which labels newly synthesized DNA. The authors report reduced chromatin mobility in S relative to G1 phase. The technology and methods used are solid, but the significance of the work is reduced by the model system employed, the HeLa cell line, which has a greatly abnormal genome.

      We have obtained data from a diploid human cell that validates the reduction of S-phase chromatin mobility.

      Public Review:

      The manuscript presented by Pabba et al. studied chromatin dynamics throughout the cell cycle. The authors used fluorescence signals and patterns of GFP-PCNA (GFP tagged proliferating cell nuclear antigen) and CY3-dUTP (which labels newly synthesized DNA but not the DNA template) to determine cell cycle stages in asynchronized HeLa (Kyoto) cells and track movements of chromatin domains. PCNA binds to replication forks and form replication foci during the S phase. The major conclusions are: (1) Labeled chromatin domains were more mobile in G1/G2 relative to the S-phase. (2) Restricted chromatin motion occurred at sites in proximity to DNA replication sites. (3) Chromatin motion was restricted by the loading of replisomes, independent of DNA synthesis. This work is based on previous work published in 2015, entitled "4D Visualization of replication foci in mammalian cells corresponding to individual replicons," in which the labeling method was demonstrated to be sound. Although interesting, reduced chromatin mobility in S relative to G1 phase is not new to the field.

      It was first shown in yeast (Heun et al. 2001; DOI:10.1126/science.1065366) that the S-phase mobility is reduced compared to the G1 phase. This was followed by other papers showing the same in yeast [(Gasser 2002; DOI: 10.1126/science.1067703), (Smith et al. 2019; DOI: 10.1091/mbc.E19-08-0469)]. The relation between chromatin motion and cell cycle progression in the mammalian genome is less studied. Over recent years there have been a few studies that addressed chromatin mobility and cell cycle progression but from a different perspective. In the publication Nozaki et al. (2017; DOI:10.1016/j.molcel.2017.06.018) chromatin motion analysis was performed on single histones. The study did not find a significant change of histone/nucleosome mobility measured during cell cycle progression. Using CRISPR/dCas9 to label random DNA loci, Ma et al. (2019; DOI:10.1083/jcb.201807162) found that chromatin motion in S-phase was significantly lower than in the G1 phase. However, most of the studies measure the chromatin motion using either insertion of ectopic loci or proteins marking the loci (dCas9) or histones. Using either ectopic loci addition or CRISPR/dCas9 might have an effect on the chromatin mobility itself and measuring single histone motion is not equivalent to measuring the motion of DNA segments. We, therefore, opted to label the DNA directly using the replication of the DNA. In this manner we preserve the native chromatin structure and, thus, motion.

      Importantly, in addition to measuring decreased DNA motion in S-phase, our study indicates that it is not the DNA synthesis per se but the loading of replisomes onto chromatin that slows down its motion. This allowed us to propose a mechanism on how chromatin motion is affected by DNA replication in S-phase.

      The genome in HeLa cells is greatly abnormal with heterogeneous aneuploidy, which makes quantification complicated and weakens the conclusions.

      We agree that the HeLa cells are aneuploid and we have addressed the heterogeneity of HeLa Kyoto within our detection methods (for clarification see point 3). To validate our conclusions in normal diploid human cells, we performed the chromatin mobility analysis using human fibroblasts (IMR90 cells in figures 2, 3 and S2) and plotted the MSD curves for different cell cycle stages. The outcome of this analysis showed that the mobility of chromatin in diploid fibroblasts in S-phase is lower than in G1 and G2. In fact, this effect is stronger in IMR90 cells than in HeLa Kyoto cells. Hence, this is not an aneuploid tumor cell phenomenon.

      The manuscript is difficult to follow in places due to insufficient clarity. The manuscript should be written in a way that can be understood without referencing previous articles. Overall, the work is moderately impactful to the field.

      Major recommendations:

      1) In Figure 1B, the illustration and images for S phase are confusing. The author should specify which is early S and which is late S. Do the yellow circles represent GFP-PCNA foci? How did the authors distinguish mid S from early S and late S (in Figure 2)? Are all images in Figure 1 scaled to the same contrast threshold?

      The yellow circles correspond to the colocalized signal of GFP-PCNA and Cy3-dUTP that overlap and represent the labeled chromatin sites that are replicated in the next cell cycle.

      We clarified all the points mentioned above and updated figure 1 and figure 2 accordingly.

      2) In Figure 2B, the y-axis is marked as "Frequency of cells" but the equation listed below is counting DNA (per focus). How to convert DNA (per focus) to DNA (per cell)? The x-axis is marked as "Genome size" without any unit (e.g., kb? Mb?) The x-axis seems to be the C factor, not the genome size.

      To determine the amount of DNA present in each labeled DNA focus, we first segmented the whole nucleus and measured the total intensity of DAPI (DNA amount) which is called IDNA TOTAL. Then the labeled replication foci are segmented and the intensity of label present in each segmented foci is measured (IRFi). Throughout the S-phase progression the amount of DNA increases twofold from early to late S-phase. The cells at each cell cycle stage were determined using the PCNA pattern. By plotting the frequency (number of cells) and the relative genome content normalized to the G1 stage we calculated the relative genome size otherwise called cell cycle correction factor for each stage from G1 to G2. The ratio of DNA intensity in labeled replication (IRFi)/ to the total DNA intensity of DAPI (IDNA total) gives the fraction of DNA present in each foci compared to the whole nucleus. This ratio was then multiplied by the genome size (Kbp) of HeLa Kyoto cells which was measured and published in Chagin et al. (2016; DOI:10.1038/ncomms11231). This gives us the approximate amount of DNA present in each labeled replication foci in Kbp. Since the genome duplicates over cell cycle stages, the measured DNA content in IRFi was corrected to the cell cycle stage (determined by PCNA) by multiplying the cell cycle correction factor.

      3) HeLa cells are known to be highly heterogeneous and heavily aneuploidy. Cells in one sample have different numbers of chromosomes ranging from 50 - 80. Therefore, GS (genome size) for each cell should not be the same. Using one constant GS in the equation for every cell introduces errors. Has the cell-to-cell variation been considered and corrected in the data? If not, the authors should provide information regarding cell-to-cell variations, such as the intensity variation of nuclear DAPI signals in synchronized cells.

      It is true that the HeLa genome is aneuploid. However, the heterogeneity of the genome is true, if one compares different HeLa strains as studied in Frattini et al. (2015; DOI:10.1038/srep15377), where they show the variability of genome and RNA expression profiles and small genomic rearrangements among different HeLa strains. However, to our knowledge, it is not studied extensively or shown whether the heterogeneity and aneuploidy would also be a cell to cell variation. Therefore, we performed a control experiment to verify the variability between HeLa Kyoto cells, where we either synchronized or not and stained with DAPI and the DNA content profiles of all cells were plotted as a histogram (supplementary figure 1B) to show that cell to cell variations is not present and by synchronizing, we see that the cell population in G1, has similar DNA content showing that the cell to cell variability is negligible in our detection methods. Nonetheless, we have obtained data using normal diploid human fibroblasts, which validated our outcome.

      STABLE:

      Macville, Merryn, et al. "Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping." Cancer research 59.1 (1999): 141-150.

      UNSTABLE:

      Liu, Yansheng, et al. "Multi-omic measurements of heterogeneity in HeLa cells across laboratories." Nature biotechnology 37.3 (2019): 314-322.

      Landry, Jonathan JM, et al. "The genomic and transcriptomic landscape of a HeLa cell line." G3: Genes, Genomes, Genetics 3.8 (2013): 1213-1224.

      4) The chromatin foci are in a variety of sizes and intensities. How were boundaries of foci determined? Weak foci were picked up in one image but not in another. This is a concern because the size of the chromatin domain could influence mobility measurement. The authors should provide control experiments or better explanations for detecting and selecting chromatin foci.

      The method for detecting chromatin foci is described in “Materials and Methods” section “Automated tracking of chromatin structures in time-lapse videos”. “Chromatin structures are detected by the spot-enhancing filter (SEF) (Sage et al., 2005; doi:10.1109/TIP.2005.852787) which consists of a Laplacian-of-Gaussian (LoG) filter followed by thresholding the filtered image and determination of local maxima. The threshold is automatically determined by the mean of the absolute values of the filtered image plus a factor times the standard deviation.” For reasons of consistency, we used the same threshold factor for all images of an image sequence. Therefore, depending on the intensity distribution in an image, it can happen that weak foci are not detected in some images. Alternatively, one could manually adapt the threshold factor for all single images, which, however, would be subjective. We now added the information that we used the same threshold factor for all images of an image sequence.

      5) In Figure 3, the authors combined MSD from G1 and G2 in one group. Has any published data suggested that chromatin dynamics are the same in G1 and G2?

      To clarify this we separated G1 and G2 mobility measurements in supplementary figure S2 and updated the figures and text accordingly.

      6) In Figure 3B, cytoplasmic CY3-dUTP foci are found in the G1/G2 and S images. Are these CY3-dUTP aggregates? If so, are they also found in the nucleus? What is the mobility of the cytoplasmic CY3-dUTP foci?

      These are aggregates and not found in the nucleus. These foci were excluded from the analysis by using a nuclear mask based on the PCNA signal. This information was added to the figure 3B legend.

      7) In Figure 4, how is colocalization defined? 1.8 um is approximately the size of a chromosome territory, which is much larger than 0.5 Mb. Two foci that are 1.8 um apart should not be considered in the same chromosome.

      We agree that colocalized would indeed mean that the signals are overlapping. Therefore, we updated the figures and text as center to center distance or proximity analysis.

      Minor comments:

      1) Figure 3D should be presented by a box and whisker plot. The histogram does not show an actual distribution of the data.

      The histograms shown in figure 3D is the average mean square displacement measurement value for different cell cycle stages. These are the same data shown in the table. Therefore, the histogram is removed and the table in figure 3C is retained.

      2) Please explain Figure 3C error bars in the figure legend. Are they SD?

      The error bars of the MSD curves (highlighted in bright color around the curves) in figure 3C show the standard error of the mean (SEM) representing the deviations between the MSD curves for an image sequence. We clarified this in the legend of Figure 3C.

      3) In Figure 5C, some western blotting results seem to be assembled from replicate experiments. Comparing signals from one experiment with the same background is suggested.

      We made sure that the western blots from the same replicates are cropped and the information is also added to the respective figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thorough assessment of our study, and their acknowledgment of its strengths and weaknesses. We did our best below to address the weaknesses raised in their public review, and to comply with their recommendations.

      Reviewer #1 (Public Review):

      Segas et al. present a novel solution to an upper-limb control problem which is often neglected by academia. The problem the authors are trying to solve is how to control the multiple degrees of freedom of the lower arm to enable grasp in people with transhumeral limb loss. The proposed solution is a neural network based approach which uses information from the position of the arm along with contextual information which defines the position and orientation of the target in space. Experimental work is presented, based on virtual simulations and a telerobotic proof of concept

      The strength of this paper is that it proposes a method of control for people with transhumeral limb loss which does not rely upon additional surgical intervention to enable grasping objects in the local environment. A challenge the work faces is that it can be argued that a great many problems in upper limb prosthesis control can be solved given precise knowledge of the object to be grasped, its relative position in 3D space and its orientation. It is difficult to know how directly results obtained in a virtual environment will translate to real world impact. Some of the comparisons made in the paper are to physical systems which attempt to solve the same problem. It is important to note that real world prosthesis control introduces numerous challenges which do not exist in virtual spaces or in teleoperation robotics.

      We agree that the precise knowledge of the object to grasp is an issue for real world application, and that real world prosthesis control introduces many challenges not addressed in our experiments. Those were initially discussed in a dedicated section of the discussion (‘Perspectives for daily-life applications’), and we have amended this section to integrate comments by reviewers that relate to those issues (cf below).

      The authors claim that the movement times obtained using their virtual system, and a teleoperation proof of concept demonstration, are comparable to natural movement times. The speed of movements obtained and presented are easier to understand by viewing the supplementary materials prior to reading the paper. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the end effector. The state of the virtual shoulder in the pick and place task is quite dynamic and includes humeral rotations which would be challenging to engineer in a real physical prosthesis above the elbow. Another question related to the pick and place task used is whether or not there are cases where both the pick position and the place position can be reached via the same, or very similar, shoulder positions? i.e. with the shoulder flexion-extension and abduction-adduction remaining fixed, can the ANN use the remaining five joint angles to solve the movement problem with little to no participant input, simply based on the new target position? If this was the case, movements times in the virtual space would present a very different distribution to natural movements, while the mean values could be similar. The arguments made in the paper could be supported by including individual participant data showing distributions of movement times and the distances travelled by the end effector where real movements are compared to those made by an ANN.

      In the proposed approach users control where the hand is in space via the shoulder. The position of the upper arm and a given target are used as input to a classifier, which determines the positions of the lower arm, wrist and the effector. The supplementary materials suggest the output of the classifier occurs instantaneously, in that from the start of the trial the user can explore the 3D space associated with the shoulder in order to reach the object. When the object is reached a visual indicator appears. In a virtual space this feedback will allow rapid exploration of different end effector positions which may contribute to the movement times presented. In a real world application, movement of a distal end-effector via the shoulder is not to be as graceful and a speed accuracy trade off would be necessary to ensure objects are grasped, rather than knocked or moved.

      As correctly noted by the reviewer and easily visible on videos, the distal joints predicted by the ANN are realized instantaneously in the virtual arm avatar, and a discontinuity occurs at each target change whereby the distal part of the arm jumps to the novel prediction associated with the new target location. As also correctly noted by the reviewer, there are indeed some instances where minimal shoulder movements are required to reach a new target, which in practice implies that on those instances, the distal part of the arm avatar jumps instantaneously close to the new target as soon as this target appears. Please note that we originally used median rather than mean movement times per participant precisely to remain unaffected by potential outliers that might come from this or other situations. We nevertheless followed the reviewer’s advice and have now also included individual distributions of movement times for each condition and participant (cf Supplementary Fig. 2 to 4 for individual distributions of movement time for Exp1 to 3, respectively). Visual inspection of those indicates that despite slight differences between participants, no specific pattern emerges, with distributions of movement times that are quite similar between conditions when data from all participants are pooled together.

      Movement times analysis indicates therefore that the overall participants’ behavior has not been impacted by the instantaneous jump in the predicted arm positions at each of the target changes. Yet, those jumps indicate that our proposed solution does not satisfactorily reproduce movement trajectory, which has implications for application in the physical world. Although we introduced a 0.75 s period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN in our POC experiment (cf Methods), this would not be practical for a real-life scenario with a sequence of movements toward different goals. Future developments are therefore needed to better account for movement trajectories. We are now addressing this explicitly in the manuscript, with the following paragraph added in the discussion (section ‘Perspectives of daily-life applications’):

      “Although our approach enabled participants to converge to the correct position and orientation to grasp simple objects with movement times similar to those of natural movements, it is important to note that further developments are needed to produce natural trajectories compatible with real-world applications. As easily visible on supplementary videos 2 to 4, the distal joints predicted by the ANN are realized instantaneously such that a discontinuity occurs at each target change, whereby the distal part of the arm jumps to the novel prediction associated with the new target location. We circumvented problems associated with this discontinuity on our physical proof of concept by introducing a period before the beginning of each trial for the robotic arm to smoothly reach the first prediction from the ANN. This issue, however, needs to be better handled for real-life scenarios where a user will perform sequences of movements toward different objects.”

      Another aspect of the movement times presented which is of note, although it is not necessarily incorrect, is that the virtual prosthesis performance is close too perfect. In that, at the start of each trial period, either pick or place, the ANN appears to have already selected the position of the five joints it controls, leaving the user to position the upper arm such that the end effector reaches the target. This type of classification is achievable given a single object type to grasp and a limited number of orientations, however scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision which are not trivial in nature. On this topic, it is also important to note that, while very elegant, the teleoperation proof of concept of movement based control does not seem to feature a similar range of object distance from the user as the virtual environment. This would have been interesting to see and I look forward to seeing further real world demonstrations in the authors future work.

      According to this comment, the reviewer has the impression that the ANN had already selected a position of the five joints it controls at the start of each trial, and maintained those fixed while the user operates the upper arm so as to reach the target. Although the jumps at target changes discussed in the previous comment might give this impression, and although this would be the case should we have used an ANN trained with contextual information only, it is important to stress that our control does take shoulder angles as inputs, and produced therefore changes in the predicted distal angles as the shoulder moves.

      To substantiate this, we provide in Author response image 1 the range of motion (angular difference at each joint between the beginning and the end of each trial) of the five distal arm angles, regrouped for all angles and trials of Exp1 to 3 (one circle and line per participant, representing the median of all data obtained by that participant in the given experiment and condition, as in Fig. 3 of the manuscript). Please note that those ranges of motion were computed on each trial just after the target changes (i.e., after the jumps) for conditions with prosthesis control, and that the percentage noted on the figure below those conditions correspond to the proportion of the range of motion obtained in the natural movement condition. As can be seen, distal angles were solicited in all prosthesis control conditions by more than half the amount they moved in the condition of natural movements (between 54 and 75% depending on conditions).

      Author response image 1.

      With respect to the last part of this comment, we agree that scaling this approach to work robustly in a real world environment will necessitate solving a number of challenges in machine learning and in particular computer vision. We address those in a specific section of the discussion (‘Perspectives for daily-life application’) which has been further amended in response to the reviewers’ comments. As also mentioned earlier and at the occasion of our reply to other reviewers’ comments, we also agree that our physical proof of concept is quite preliminary, and we are looking forward to conduct future work in order to solve some of the issues discussed and get closer to real world demonstrations.

      Reviewer #2 (Public Review):

      Segas et al motivate their work by indicating that none of the existing myoelectric solution for people with transhumeral limb difference offer four active degrees of freedom, namely forearm flexion/extension, forearm supination/pronation, wrist flexion/extension, and wrist radial/ulnar deviation. These degrees of freedom are essential for positioning the prosthesis in the correct plan in the space before a grasp can be selected. They offer a controller based on the movement of the stump.

      The proposed solution is elegant for what it is trying to achieve in a laboratory setting. Using a simple neural network to estimate the arm position is an interesting approach, despite the limitations/challenges that the approach suffers from, namely, the availability of prosthetic hardware that offers such functionality, information about the target and the noise in estimation if computer vision methods are used. Segas et al indicate these challenges in the manuscript, although they could also briefly discuss how they foresee the method could be expanded to enable a grasp command beyond the proximity between the end-point and the target. Indeed, it would be interesting to see how these methods can be generalise to more than one grasp.

      Indeed, we have already indicated those challenges in the manuscript, including the limitation that our control “is suitable to place the hand at a correct position and orientation to grasp objects in a wide workspace, but not for fine hand and grasp control ...” (cf 4th paragraph of the ‘Perspectives for daily-life applications’ section of the discussion). We have nevertheless added the following sentence at the end of this paragraph to stress that our control could be combined with recently documented solutions for multiple grasp functions: “Our movement-based approach could also be combined with semi-autonomous grasp control to accommodate for multiple grasp functions39,42,44.”

      One bit of the results that is missing in the paper is the results during the familiarisation block. If the methods in "intuitive" I would have thought no familiarisation would be needed. Do participants show any sign of motor adaptation during the familiarisation block?

      Please note that the familiarization block indicated Fig. 3a contains approximately half of the trials of the subsequent initial acquisition block (about 150 trials, which represents about 3 minutes of practice once the task is understood and proficiently executed), and that those were designed to familiarize participants with the VR setup and the task rather than with the prosthesis controls. Indeed, it is important that participants were made familiar with the setup and the task before they started the initial acquisition used to collect their natural movements. In Exp1 and 2, there was therefore no familiarization to the prosthesis controls whatsoever (and thus no possible adaptation associated with it) before participants used them for the very first time in the blocks dedicated to test them. This is slightly different in Exp3, where participants with an amputated arm were first tested on their amputated side with our generic control. Although slight adaptation to the prosthesis control might indeed have occurred during those familiarization trials, this would be difficult in practice to separate from the intended familiarization to the task itself, which was deemed necessary for that experiment as well. In the end, we believe that this had little impact on our data since that experiment produced behavioral results comparable to those of Exp1 and 2, where no familiarization to the prosthesis controls could have occurred.

      In Supplementary Videos 3 and 4, how would the authors explain the jerky movement of the virtual arm while the stump is stationary? How would be possible to distinguish the relative importance of the target information versus body posture in the estimation of the arm position? This does not seem to be easy/clear to address beyond looking at the weights in the neural network.

      As discussed in our response to Reviewer1 and now explicitly addressed in the manuscript, there is a discontinuity in our control, whereby the distal joints of the arm avatar jumps instantaneously to the new prediction at each target change at the beginning of a trial, before being updated online as a function of ongoing shoulder movements for the rest of that trial. In a sense, this discontinuity directly reflects the influence of the target information in the estimation of the distal arm posture. Yet, as also discussed in our reply to R1, the influence of proximal body posture (i.e., Shoulder movements) is made evident by substantial movements of the predicted distal joints after the initial jumps occurring at each target change. Although those features demonstrate that both target information and proximal body posture were involved in our control, they do not establish their relative importance. While offline computation could be thought to quantify their relative implication in the estimation of the distal arm posture, we believe that further human-in-the-loop experiments with selective manipulation of this implication would be necessary to establish how this might affect the system controllability.

      I am intrigued by how the Generic ANN model has been trained, i.e. with the use of the forward kinematics to remap the measurement. I would have taught an easier approach would have been to create an Own model with the native arm of the person with the limb loss, as all your participants are unilateral (as per Table 1). Alternatively, one would have assumed that your common model from all participants would just need to be 'recalibrated' to a few examples of the data from people with limb difference, i.e. few shot calibration methods.

      AR: Although we could indeed have created an Own model with the native arm of each participant with a limb loss, the intention was to design a control that would involve minimal to no data acquisition at all, and more importantly, that could also accommodate bilateral limb loss. Indeed, few shot calibration methods would be a good alternative involving minimal data acquisition, but this would not work on participants with bilateral limb loss.

      Reviewer #3 (Public Review):

      This work provides a new approach to simultaneously control elbow and wrist degrees of freedom using movement based inputs, and demonstrate performance in a virtual reality environment. The work is also demonstrated using a proof-of-concept physical system. This control algorithm is in contrast to prior approaches which electrophysiological signals, such as EMG, which do have limitations as described by the authors. In this work, the movements of proximal joints (eg shoulder), which generally remain under voluntary control after limb amputation, are used as input to neural networks to predict limb orientation. The results are tested by several participants within a virtual environment, and preliminary demonstrated using a physical device, albeit without it being physically attached to the user.

      Strengths:

      Overall, the work has several interesting aspects. Perhaps the most interesting aspect of the work is that the approach worked well without requiring user calibration, meaning that users could use pre-trained networks to complete the tasks as requested. This could provide important benefits, and if successfully incorporated into a physical prosthesis allow the user to focus on completing functional tasks immediately. The work was also tested with a reasonable number of subjects, including those with limb-loss. Even with the limitations (see below) the approach could be used to help complete meaningful functional activities of daily living that require semi-consistent movements, such as feeding and grooming.

      Weaknesses:

      While interesting, the work does have several limitations. In this reviewer's opinion, main limitations are: the number of 'movements' or tasks that would be required to train a controller that generalized across more tasks and limbpostures. The authors did a nice job spanning the workspace, but the unconstrained nature of reaches could make restoring additional activities problematic. This remains to be tested.

      We agree and have partly addressed this in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion, where we expand on control options that might complement our approach in order to deal with an object after it has been reached. We have now amended this section to explicitly stress that generalization to multiple tasks including more constrained reaches will require future work: “It remains that generalizing our approach to multiple tasks including more constrained reaches will require future work. For instance, once an intended object has been successfully reached or grasped, what to do with it will still require more than computer vision and gaze information to be efficiently controlled. One approach is to complement the control scheme with subsidiary movements, such as shoulder elevation to bring the hand closer to the body or sternoclavicular protraction to control hand closing26, or even movement of a different limb (e.g., a foot45). Another approach is to control the prosthesis with body movements naturally occurring when compensating for an improperly controlled prosthesis configuration46.”

      The weight of a device attached to a user will impact the shoulder movements that can be reliably generated. Testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained when the limb is attached, and if not, then a procedure to scale inputs will need to be refined.

      We agree and have now explicitly included this limitation and perspective to our discussion, by adding a sentence when discussing possible combination with osseointegration: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets. Yet, testing with a physical prosthesis will need to ensure that the full desired workspace can be obtained with the weight of the attached device, and if not, a procedure to scale inputs will need to be refined.”

      The reliance on target position is a complicating factor in deploying this technology. It would be interesting to see what performance may be achieved by simply using the input target positions to the controller and exclude the joint angles from the tracking devices (eg train with the target positions as input to the network to predict the desired angles).

      Indeed, the reliance on precise pose estimation from computer vision is a complicating factor in deploying this technology, despite progress in this area which we now discuss in the first paragraph of the ‘Perspective for daily life applications’ section of the discussion. Although we are unsure what precise configuration of input/output the reviewer has in mind, part of our future work along this line is indeed explicitly dedicated to explore various sets of input/output that could enable coping with availability and reliability issues associated with real-life settings.

      Treating the humeral rotation degree of freedom is tricky, but for some subjects, such as those with OI, this would not be as large of an issue. Otherwise, the device would be constructed that allowed this movement.

      We partly address this when referring to osseointegration in the discussion: “Combining those with osseointegration at humeral level3,4 would be particularly relevant as this would also restore amplitude and control over shoulder movements, which are essential for our control but greatly affected with conventional residual limb fitting harness and sockets.” Yet, despite the fact that our approach proved efficient in reconstructing the required humeral angle, it is true that realizing it on a prosthesis without OI is an open issue.

      Overall, this is an interesting preliminary study with some interesting aspects. Care must be taken to systematically evaluate the method to ensure clinical impact.

      Reviewer #1 (Recommendations For The Authors):

      Page 2: Sentence beginning: "Here, we unleash this movement-based approach by ...". The approach presented utilises 3D information of object position. Please could the authors clarify whether or not the computer vision references listed are able to provide precise 3D localisation of objects?

      While the references initially cited in this sentence do support the view that movement goals could be made available in the context of prosthesis control through computer vision combined with gaze information, it is true that they do not provide the precise position and orientation (I.e., 6d pose estimation) necessary for our movementbased control approach. Six-dimensional object pose estimation is nevertheless a very active area of computer vision that has applications beyond prosthesis control, and we have now added to this sentence two references illustrating recent progress in this research area (cf. references 30 and 31).

      Page 6: Sentence beginning: "The volume spread by the shoulder's trajectory ...".

      • Page 7: Sentence beginning: "With respect to the volume spread by the shoulder during the Test phases ...".

      • Page 7: Sentence beginning: "Movement times with our movement-based control were also in the same range as in previous experiments, and were even smaller by the second block of intuitive control ...".

      On the shoulder volume presented in Figure 3d. My interpretation of the increased shoulder volume in Figure 3D Expt 2 shown in the Generic ANN was that slightly more exploration of the upper arm space was necessary (as related to the point in the public review). Is this what the authors mean by the action not being as intuitive? Does the reduction in movement time between TestGeneric1 and TestGeneric 2 not suggest that some degree of exploration and learning of the solution space is taking place?

      Indeed, the slightly increased shoulder volume with the Generic ANN in Exp2 could be interpreted as a sign that slightly more exploration of the upper arm space was necessary. At present, we do not relate this to intuitiveness in the manuscript. And yes, we agree that the reduction in movement time between TestGeneric1 and TestGeneric 2 could suggest some degree of exploration and learning.

      Page 7: Sentence beginning: "As we now dispose of an intuitive control ...". I think dispose may be a false friend in this context!

      This has been replaced by “As we now have an intuitive control…”.

      Page 8: Section beginning "Physical Proof of Concept on a tele-operated robotic platform". I assume this section has been added based on suggestions from a previous review. Although an elegant PoC the task presented in the diagram appears to differ from the virtual task in that all the targets are at a relatively fixed distance from the robot. In respect to the computer vision ML requirements, this does not appear to require precise information about the distance between the user and an object. Please could this be clarified?

      Indeed, the Physical Proof of Concept has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review. Although preliminary and suffering from several limitations (amongst which a reduced workspace and number of trials as compared to the VR experiments), this POC is a first step toward realizing this control in the physical world. Please note that as indicated in the methods, the target varied in depth by about 10 cm, and their position and orientation were set with sensors at the beginning of each block instead of being determined from computer vision (cf section ‘Physical Proof of Concept’ in the ‘Methods’: “The position and orientation of each sponge were set at the beginning of each block using a supplementary sensor. Targets could be vertical or tilted at 45 and -45° on the frontal plane, and varied in depth by about 10 cm.”).

      Page 10: Sentence beginning: "This is ahead of other control solutions that have been proposed ...". I am not sure what this sentence is supposed to convey and no references are provided. While the methods presented appear to be a viable solution for a group of upper-limb amputees who are often ignored by academic research, I am not sure it is appropriate for the authors to compare the results obtained in VR and via teleoperation to existing physical systems (without references it is difficult to understand what comparison is being made here).

      The primary purpose of this sentence is to convey that our approach is ahead of other control solutions proposed so far to solve the particular problem as defined earlier in this paragraph (“Yet, controlling the numerous joints of a prosthetic arm necessary to place the hand at a correct position and orientation to grasp objects remains challenging, and is essentially unresolved”), and as documented to the best we could in the introduction. We believe this to be true and to be the main justification for this publication. The reviewer’s comment is probably directed toward the second part of this sentence, which states that performances of previously proposed control solutions (whether physical or in VR) are rarely compared to that of natural movements, as this comparison would be quite unfavorable to them. We soften that statement by removing the last reference to unfavorable comparison, but maintained it as we believe it is reflecting a reality that is worth mentioning. Please note that after this initial paragraph, and an exposition of the critical features of our control, most of the discussion (about 2/3) is dedicated to limitations and perspectives for daily-life application.

      Page 10: Sentence: "Here, we overcame all those limitations." Again, the language here appears to directly compare success in a virtual environment with the current state of the art of physical systems. Although the limitations were realised in a virtual environment and a teleoperation PoC, a physical implementation of the proposed system would depend on advances in machine vision to include movement goal. It could be argued that limitations have been traded, rather immediately overcome.

      In this sentence, “all those limitations” refers to all three limitations mentioned in the previous sentences in relation to our previous study which we cited in that sentence (Mick et al., JNER 2021), rather than to limitations of the current state of the art of physical systems. To make this more explicit, we have now changed this sentence to “Here, we overcome those three limitations”.

      Page 11: Sentence beginning: "Yet, impressive progresses in artificial intelligence and computer vision ...".

      • Page 11: Sentence beginning: "Prosthesis control strategies based on computer vision ..."

      The science behind self-driving cars is arguably of comparable computational complexity to the real-world object detection and with concurrent real-time grasp selection. The market for self-driving cars is huge and a great deal of R&D has been funded, yet they are not yet available. The market for advanced upper-limb prosthetics is very small, it is difficult to understand who would deliver this work.

      We agree that the market for self-driving cars is much higher than that for advanced upper-limb prosthetics. Yet, as mentioned in our reply to a previous comment, 6D object pose estimation is a very active area of computer vision that has applications far beyond prosthesis control (cf. in robotics and augmented reality). We have added two references reflecting recent progress in this area in the introduction, and have amended the discussion accordingly: “Yet, impressive progress in artificial intelligence and computer vision is such that what would have been difficult to imagine a decade ago appears now well within grasp38. For instance, we showed recently that deep learning combined with gaze information enables identifying an object that is about to be grasped from an egocentric view on glasses33, and this even in complex cluttered natural environments34. Six-dimensional object pose estimation is also a very active area of computer vision30,31, and prosthesis control strategies based on computer vision combined with gaze and/or myoelectric control for movement intention detection are quickly developing39–44, illustrating the promises of this approach.”

      Page 15: Sentence beginning: "From this recording, 7 signals were extracted and fed to the ANN as inputs: ...".

      • Page 15: Sentence beginning: "Accordingly, the contextual information provided as input corresponded to the ...".

      The two sentences appear to contradict one another and it is difficult to understand what the Own ANN was trained on. If the position and the orientation of the object were not used due to overfitting, why claim that they were used as contextual information? Training on the position and orientation of the hand when solving the problem would not normally be considered contextual information, the hand is not part of the environment or setting, it is part of the user. Please could this section be made a little bit clearer?

      The Own ANN was trained using the position and the orientation of a hypothetic target located within the hand at any given time. This approach has been implemented to increase the amount of available data. However, when the ANN is utilized to predict the distal part of the virtual arm, the position and orientation of the current target are provided. We acknowledge that the phrasing could be misleading, so we have added the following clarification to the first sentence: "… (3 Cartesian coordinates and 2 spherical angles that define the position and orientation of the hand as if a hypothetical cylindrical target was placed in it at any time, see an explanation for this choice in the next paragraph)".

      Page 16: Sentence beginning: "A trial refers to only one part of this process: either ...". Would be possible to present these values separately?

      Although it would be possible to present our results separately for the pick phase and for the place phase, we believe that this would overload the manuscript for little to no gain. Indeed, nothing differentiates those two phases other than the fact that the bottle is on the platform (waiting to be picked) in the pick phase, and in the hand (waiting to be placed) in the place phase. We therefore expect to have very similar results for the pick phase and for the place phase, which we verified as follows on Movement Time: Author response image 2 shows movement time results separated for the pick phase (a) and for the place phase (b), together with the median (red dotted line) obtained when results from both phases are polled together. As illustrated, results are very similar for both phases, and similar to those currently presented in the manuscript with both phases pooled (Fig3C).

      Author response image 2.

      Page 19: Sentence beginning "The remaining targets spanned a roughly ...". Figure 2 is a very nice diagram but it could be enhanced with a simple visual representation of this hemispherical region on the vertical and horizontal planes.

      We made a few attempts at enhancing this figure as suggested. However, the resulting figures tended to be overloaded and were not conclusive, so we opted to keep the original.

      Page 19: Sentence beginning "The Movement Time (MT) ..."

      • Page 19: Sentence beginning "The shoulder position Spread Volume (SV) ..." Would it be possible to include a traditional timing protocol somewhere in the manuscript so that readers can see the periods over which these measures calculated?

      We have now included Fig. 5 to illustrate the timing protocol and the periods over which MT and SV were computed.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments

      Page 6: "Yet, this control is inapplicable "as is" to amputees, for which recording ..." -> "Yet, this control is inapplicable "as is" to amputees, for WHOM recording ... "

      This has been modified as indicated.

      Throughout: "amputee" -> "people with limb loss" also "individual with limb deficiency" -> "individual with limb difference"

      We have modified throughout as indicated.

      It would have been great to see a few videos from the tele-operation as well. Please could you supply these videos?

      Although we agree that videos of our Physical Proof of Concept would have been useful, we unfortunately did not collect videos that would be suitable for this purpose during those experimental phases. Please note that this Physical Proof of Concept was not meant to be published originally, but has been added after the original submission in order to comply with requests formulated at the editorial stage for the paper to be sent for review.

      Reviewer #3 (Recommendations For The Authors):

      Consider using the terms: intact-limb rather than able-bodied, residual limb rather than stump, congenital limb different rather than congenital limb deficiency.

      We have modified throughout as indicated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1:

      The authors present a carefully controlled set of experiments that demonstrate an additional complexity for GPCR signaling in that endosomal signaling make be different when b-arrestin is or isn't associated with a G protein-bound V2R vasopressin receptor. It uses state of the art biosensorbased approaches and b-arrestin KO lines to assess this. It adds to a growing body of evidence that G proteins and b-arrestin can associate with GPCR complexes simultaneously. They also demonstrate the possibility that Gaq might also be activated by the V2R receptor. My sense is one thing they may need to be considered is the possibility of such "megacomplexes" might actually involve receptor dimers or oligomers.

      1.1 Can the authors please review the data that describes the concept of "GPCR megacomplexes"? I feel this is missing from the introduction. The notion means different things to different people. As you will see from my other comments, you should especially focus on evidence at the level of the single receptor.

      We appreciate the reviewer’s comments and have now included a more wholesome description of the GPCR megacomplex, or ‘megaplex’, concept in the introduction (page 2, 1st paragraph).

      1.2 The authors use mini-G proteins to conclude that V2R receptors interact with Gaq (in addition to Gas). I would prefer if there were a more direct measure of this. Can the authors show that the receptor interacts with full length Gaq (and not the other G proteins in Figure)? Is there a signaling phenotype associated with Gaq coupling? Is it sensitive to Gaq inhibition?

      Excellent point and we are happy to expand further on this. The ability of the V2R to activate Gq/11 has already been demonstrated before (Zhu, X. et al. Mol Pharmacol 46(3):460-9 (1994); Lykke, K. et al. Physiol Rep. 3(8):e12519 (2015); Avet, C. et al. eLife 11: e74101 (2022); Heydenreich, F.M. et al. Mol Pharmacol 102(3):139-49 (2022). Therefore, we did not attempt to document this activation using more traditional assays. On the other hand, to demonstrate an interaction between V2R and Ga subunit in cells is challenging for several reasons. First, the full-length Ga subunit is already located at the plasma membrane at basal state, and thus, generates high background signals in proximity assays. Second, upon receptor activation, the Ga subunit interaction with V2R is so transient that it is difficult, if not impossible, to catch this transient moment in a proximity assay. Although the miniG proteins are highly engineered, coupling specificity of the different subtypes (Gas, Gai/o, Gaq/11, and Ga12/13) to GPCRs is maintained. In addition, as they are homogenously expressed in the cytosol under basal states rather than at the membrane, they generate low background noise. Upon agonist stimulation, miniG proteins are recruited from the cytosol to the V2R at the plasma membrane, resulting in a robust signal in proximity assays. Thus, miniG proteins are unique in that they can actually detect GPCR–G protein interactions in cellular proximity assays, which is very challenging using full-length Ga subunits.

      That being said, we fully understand the reviewer’s concern and greatly value the effort in enhancing robustness of our study. Therefore, we have now monitored downstream signaling events of Gaq/11 in the absence or presence of the selective Gaq/11 inhibitor YM-254890 as a secondary method of documenting Gaq/11 activity. Specifically, we used a newly developed biosensor to measure diacylglycerol (DAG) production, a downstream second messenger of Gaq/11 activation, at both the plasma membrane and endosomes. Using a second biosensor, we detect general protein kinase C (PKC) activation, which is another downstream signaling event of Gaq/11 activation. Together, we demonstrated that AVP-stimulation leads to DAG production at both the plasma membrane and endosomes (Fig. 1C-D) as well as PKC activation (Fig. 1E), which all are sensitive to YM-254890 inhibition (Fig. 1C-D and E). Together these results rigorously suggest that the V2R interacts with and activates Gaq/11.

      1.3 I raise a similar concern with Gaq coupling in endosomes.

      For similar reasons that miniG proteins are excellent tools for demonstrating V2R interaction with G proteins at the plasma membrane, miniG proteins can also be used to detect V2R interaction with G proteins at endosomes by measuring proximity between miniG and an endosomal marker in response to agonist challenge. However, to ensure that the endosomal recruitment of miniGsq to the V2R demonstrated in our study corresponds to endosomal Gaq/11 activation, we monitored the production of DAG at the early endosomes in a similar way to which we detected DAG production at the plasma membrane. As shown in Fig. 1D, stimulation of V2R with AVP induces recruitment of the DAG-binding biosensor to the early endosomal marker Rab5. Pre-treatment of the cells with the selective Gaq/11 inhibitor YM-254890 abrogated this response, confirming that V2R activation leads to production of DAG at the early endosomes in a Gaq/11-dependent manner (Fig. 1D).

      1.4 Can the confocal data be shown for Gai and Ga12?

      Yes, we can certainly show this data as negative control. We have now included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen on this figure, mGsi does not colocalize with Lck (plasma membrane), nor with EEA1 (early endosomes) upon stimulation of cells with AVP in line with a receptor that does not couple to Gai/o.

      We did not include data using Halo-mG12, as this G protein subtype, similar to Gi/o, does not couple functionally to V2R. Therefore, it is highly unlikely we would obtain different results from the experiments using Halo-mGsi.

      1.5 The authors want us to believe that there is simultaneous binding of G proteins and b-arrestin. This is never demonstrated and is at odds with the structural basis of G protein and b-arrestin binding. Have the authors considered that "simultaneous" occupancy might simply reflect binding at distinct GPCR monomers in the context of dimeric or oligomeric receptors? They could I suppose provide data at the level of a single receptor rather than using the bulk BRET approaches used.

      We appreciate the comment and opportunity to highlight some of our previous work, which address the megacomplexes at the level of a single receptor. First, we have characterized the megacomplex biochemically and structurally at a low resolution (Thomsen ARB et al. 2016, Cell 166(4):907-19). The results unequivocally demonstrate that a single GPCR interacts simultaneously with heterotrimeric G protein, at the receptor core, and with b-arrestin via the phosphorylated receptor carboxy-terminal. We also documented functionality of the megacomplex as the receptor can interact with and activate the G protein, which were shown by 3 different biochemical approaches (Thomsen ARB et al. 2016, Cell 166(4):907-19). In addition, we solved a high-resolution cryo-EM structure of a megacomplex further highlighting the architecture of this complex (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31). As both biochemical and structural analyses were done in vitro in which the receptor was embedded in a detergent micelle, we also confirmed that the megacomplex structural architecture fits naturally within the context of a membrane in molecular dynamics simulation experiments (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31).

      In cells, we and others have also showed that GPCRs such as the V2R can bind b-arrestins exclusively via the phosphorylated carboxy-terminal tail as it does in the megacomplex (Kumari P et al. 2016, Nat Commun 7:13416; Cahill III TJ et al. 2017, PNAS 114(10):2562-67; Kumari P et al. 2017, Mol Biol Cell 28(8):1003-10; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). In addition, we and others have used BRET and confocal microscopy to show that the V2R and other GPCRs recruit G protein and b-arrestin simultaneously and that the three components colocalize in endosomes upon prolonged agonist exposure (Thomsen ARB et al. 2016, Cell 166(4):907-19; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). As the reviewer correctly points out, in these cellular experiments (as well as in single molecule microscopy), the working resolution is not high enough to rule out that the receptors that co-recruit G protein and b-arrestin in endosomes could be dimeric instead of monomeric. Thus, we conducted a series of experiments with GPCR–b-arrestin fusions where the two proteins are covalently attached at the receptor carboxy-terminal tail. We showed that despite the GPCR–b-arrestin coupling being fully functional (in respect to b-arrestin promoting a highaffinity state of the receptor for agonist binding and constitutively internalizing the receptor) the receptor could still activate G proteins (Thomsen ARB et al. 2016, Cell 166(4):907-19; Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31), which demonstrates that the single receptor megaplex can physically form in cells.

      We have now included an extra paragraph in the discussion to go over these megaplex-related considerations (5th paragraph in the discussion), and we thank the reviewer for raising this point.

      1.6 Please introduce abbreviations when you first use this- this was not done consistently.

      Thank you for noticing these errors, which we now have corrected.  

      REVIEWER #2:

      This manuscript by Daly et al., probes the emerging paradigm of GPCR signaling from endosomes using the V2R as a model system with an emphasis on Gaq/11 and b-arrestins. The study employs cellular imaging, enzyme complementation assays and energy transfer-based sensors to probe the potential formation of GPCR-G-protein-b-arrestin megaplexes. While the study is certainly very interesting, it appears to be very preliminary at many levels, and clearly requires further development in order to make robust conclusions. The authors should consider expanding on this work further to make the points more convincingly to make the work solid and impactful. The two corresponding authors are among the leaders in the field having demonstrated the existence of megaplexes, and building on the work in a systematic fashion should certainly move the paradigm forward. As the work presented in the current manuscript is already pre-printed, the authors should take this opportunity to present a completer and more comprehensive story to the field.

      We are grateful for the time and efforts the reviewer has put into reviewing our work. We are certainly excited to learn that the reviewer finds our work “very interesting”. Regarding the robustness, we have added extra control experiments to increase the completeness of the study. These experiments include:

      • Measurements of AVP-stimulated diacylglycerol production, a signaling event downstream of Gaq/11 activation. These measurements were conducted both at plasma membrane (Fig. 1C) and early endosomes (Fig. 1D) using a newly developed DAG-binding biosensor, and demonstrate that the V2R activates Gaq/11 at both of these subcellular locations.

      • Monitoring AVP-promoted protein kinase C activation, another downstream signaling effect of Gaq/11 activation (Fig. 1E). The result of this approach shows in another way that V2R activates of Gaq/11.

      • Inhibition of signaling events downstream of Gaq/11 activation using the selective of Gaq/11 inhibitor YM254890. YM-254890 inhibits both AVP-stimulated DAG production at plasma membrane and endosomes as well as PKC activation (Fig. 1C-E), which strongly confirms that these signaling outputs are results of Gaq/11 activation.

      • We have also included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen in this figure, mGsi does not translocate to the plasma membrane or early endosomes upon stimulation with AVP, which validates that V2R activation does not couple to and activate Gai/o.

      Finally, we would like to kindly remind the reviewer that the production of the pre-print manuscript is part of the peer-review process in eLife.

      2.1 The use of miniG proteins in these experiments is a major concern as these are highly engineered and may not represent the true features of G proteins. While these have been used as a readout in other publications, their use in demonstrating megaplex formation is sub-optimal, and native, full-length G proteins should be used.

      We are a bit unsure as to what the reviewer means by using native full-length G proteins. If the reviewer is suggesting to co-immunoprecipitate V2R with native unlabeled G protein and b-arrestin, it should be considered that the G protein interaction with the receptor is extremely transient and unlikely to survive the pull-down procedure unless stabilized by a nanobody or crosslinking. Although the b-arrestin interaction with the receptor is more stable of nature, co-immunoprecipitation with the receptor requires crosslinking or stabilization with a Fab/nanobody. Therefore, we do not think this approach can be used as a more accurate way of detecting native megaplexes.

      If the reviewer is suggesting the use of full-length G proteins in our cell-based proximity assays instead of miniG proteins, we would like to highlight that this approach is somewhat prone to false-positive responses. The major reason behind this is that G proteins are located at regions in membranes close to the receptor whereas b-arrestins are distributed throughout the cytosol. Upon activation of the V2R, barrestins translocate to the receptor at the plasma membrane, which results in enhanced BRET between V2R-coupled G protein subtypes and b-arrestins (see Author response image 1 below of preliminary data). This translocation also results in non-specific BRET signals between b-arrestins and G protein subtypes at the plasma membrane that do not couple to V2R but are located in close proximity to the receptor. As these nonspecific BRET signals do not report on the formation of functional V2R megaplexes (see Author response image 1), we have purposely not used this approach.

      Author response image 1.

      To overcome this technical hurdle in detection of functional megaplexes, we have replaced full-length G proteins by miniG proteins as the latter are located in the cytosol at resting states and only translocate to the membrane area if a receptor adopts an active conformation. This replacement is advantageous since activation of megaplex-forming receptors such as the V2R results in simultaneous translocation of miniG proteins and b-arrestins from the cytosol to the receptor at the plasma membrane, which produces a highly specific proximity signal (see Author response image 2 below of preliminary data). When stimulating the V2R, we only observe increases in proximity between b-arrestin1 and miniG proteins that are activated by the V2R (miniGs and miniGsq) but not the miniG proteins that are not activated by this receptor (miniGsi and miniG12) (see Author response image 2). Therefore, usage of miniG proteins offers a more accurate experimental approach to detect functional megaplexes as compared to the usage of full-length G proteins.

      Author response image 2.

      2.2 The interpretation of complementation (NanoLuc) or proximity (BRET) as evidence of signaling is not appropriate, especially when overexpression system and engineered constructs are being used.

      We thank the reviewer for raising this concern. We have previously demonstrated global Gas activation and Gas signaling in form of cAMP stimulated by internalized V2R (Thomsen ARB et al. 2016, Cell 166(4):907-19). As mentioned previously, in the current updated manuscript we have now included experiments to document downstream signaling events in response to Gaq/11 activation. These experiments include measurement of production of DAG at the plasma membrane (Fig. 1C) and early endosomes (Fig. 1D), as well as phosphorylation/activation of PKC (Fig. 1E). Pre-incubation with the selective Gaq/11 inhibitor YM-254890, abrogated all these downstream signals and confirms that the V2R stimulates Gaq/11 protein signaling at both the plasma membrane and endosomes (Fig. 1C-E).

      2.3 After the original work from the same corresponding authors on megaplex formation, the major challenge in the field is to demonstrate the existence and relevance of megaplex formation at endogenous levels of components, and the current study focuses solely on showing the proximity of Gaq and b-arrestins.

      We completely agree with the reviewer that it will be important to demonstrate functionality endogenous megaplexes and we are currently working on this in other studies using different receptor systems. However, doing this is not trivial and we will have to overcome major technical barriers that we feel is somewhat out of the scope of the current study. The goal of our V2R study is to demonstrate that V2R megaplexes form with Gaq/11 resulting to Gaq/11 activation at endosomes, and that endosomal G protein activation by the V2R can occur independently of b-arrestin, which we in our humble opinion accomplish.

      2.4 The study lacks a coherent approach, and the assays are often shifted back and forth between the two b-arrestin isoforms (1 and 2), for example, confocal vs. complementation etc.

      We understand the reviewer’s concern. However, as opposed to the β2-adrenergic receptor that binds βarrestin2 with higher affinity than β-arrestin1, V2R has a strong affinity for both β-arrestin1 and β-arrestin2 (Oakley et al. 2000, JBC 275(22):17201-10). The V2R’s almost identical affinity for β-arrestin1 and βarrestin2 is well illustrated in Fig. 3B. Thus, although different β-arrestin isoforms were used in some experiments, it is very unlikely that the overall results and conclusions from this study will change by adding extra experiments to ensure that both β-arrestin isoforms are used in every experiment.

      2.5 In every assay, only the G proteins and b-arrestins are monitored without a direct assessment of the presence of receptor, and absent that data, it is difficult to justify calling these entities megaplexes.

      Mini G proteins and b-arrestin come into close proximity upon agonist stimulation of the V2R. Using confocal microscopy, we observed this co-recruitment of miniGs/miniGsq and b-arrestin in response to prolonged V2R stimulation at endosomes specifically (Fig. 3D-F). In absence of GPCR stimulation, both miniG and b-arrestin would be homogenously distributed throughout the cytosol, and thus, the only reason to why both proteins have been recruited to endosomes in response to AVP challenge is that they are recruited to internalized and active V2R. This point was obviously not adequately described in the original manuscript, and thus, we have now clarified this further in the updated manuscript at the 8th sentence of the last paragraph of the "The V2R recruits Gas/Gaq and barrs simultaneously" section.

      REVIEWER #3:

      The manuscript by Daly et al. examines endosomal signaling of the vasopressin type 2 receptors using engineered mini G protein (mG proteins) and a number of novel techniques to address if sustained G protein signaling in the endosomal compartment is enhanced by b-arrestin. Employing these interesting techniques they have how V2R could activates Gas and Gaq in the endosomal compartments and how this modulation could occur in arrestin-dependent and -independent manner. Although the phenomenon of endosomal signaling is complex to address the authors have tried their best to examine these using a number of well controlled set of experiments. Though this is an interesting and well carried out study of endosomal signaling of G proteins, my concerns are:

      3.1 The study is done in overexpressed HEK 293 cells with these engineered constructs making me wonder if the kinetics would be the same in primary cells?

      The reviewer raises an interesting and valid point. It is possible that in the context of primary cells the kinetic would differ slightly and it would definitely be interesting to address this in a subsequent study. However, despite being an interesting aspect of our study, the kinetic itself is not our major take home message, but rather the subcellular localization of the G protein activation and the role of β-arrestin in these events. We have now highlighted this aspect in our updated manuscript (1st paragraph of the discussion) and we thank the reviewer for addressing this.

      3.2 The use of the phrase "G protein activation independent of b-arrestins to a minor degree" would make me question its physiological relevance. The authors should discuss the relevance of their findings in physiological or pathological context.

      We are glad that the reviewer focuses on this point, and we would like to highlight that other GPCRs including the glucagon-like peptide-1 receptor (GLP1R) internalizes in a β-arrestin-independent manner (Claing A et al. 2000 PNAS 97(3):1119-24), while signaling through Gas from endosomes. In the case of the GLP1R, this endosomal Gas signaling promotes glucose-stimulated insulin secretion in pancreatic βcells (Kuna RS et al. 2013 Am J Physiol Endocrinol Metab 305:E161-70). Consequently, β-arrestinindependent endosomal G protein signaling appears to have some physiological relevance. Similarly, in a very recent pre-print from the von Zastrow group (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997), it was reported that endogenously-expressed vasoactive intestinal peptide receptor 1 (VIPR1), which regulates gastro-intestinal functions, promotes robust G protein signaling from endosomes in a completely β-arrestin-independent fashion. This again suggest that endogenously expressed GPCRs can internalize and activate G proteins from endosomes independently from β-arrestin to produce physiological responses. We have now discussed about these studies in the 6th paragraph of the discussion.

      3.3 The confocal colocalization studies shown in Figure 2 and their conclusion "suggesting a certain level of endosomal Gas/Gaq signaling despite the absence of barr2" seems rather inconclusive.

      As opposed to V2R a receptor that retains β-arrestin in endosomes upon internalization, β-arrestin quickly dissociates from V2b2AR after internalization due to the low affinity of the carboxy-terminal of β2AR for βarrestin. In the previous Fig. 2 (now Fig. 3), after 45 minutes of AVP stimulation, no β-arrestin is visible at endosomes in cells expressing V2b2AR as β-arrestin has already dissociated from the receptor and translocated back to the cytosol. However, clear green clusters of mGs and mGsq are still visible at endosomes indicating the presence of active receptor interacting with Gas or Gaq despite the fact that βarrestin is back to the cytosol. We quantified the percentage of the green mGs or mGsq clusters that do not colocalize with β-arrestin and have added this information to the updated version of the manuscript (Fig. 3G). In V2R-expressing cells, almost all active receptors that interact with Gas or Gaq/11 also associate with β-arrestin (Fig. 3G). In contrast, in V2b2AR-expressing cells, approximately 75% of the active receptors do not interact with β-arrestin (Fig. 3G). This suggests that β-arrestin binding to V2R is not an absolute requirement for endosomal Gas and Gaq activation by V2R. This point was obviously not addressed adequately in the original manuscript, and thus, we have now elaborated further on this in the updated version in the last paragraph of the "The V2R recruits Gas/Gaq and βarrs simultaneously" section.

      3.4 Though a novel observation it is not clear to me how V2R would internalize after activation without arrestin. Is it some sort of generalized microcytosis occurring in these overexpressed cells? Should discuss.

      This is certainly a very interesting observation and something other research laboratories also have seen recently – in particular, in context to endosomal G protein signaling (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997). The main and best characterized pathway for GPCR internalization is clathrin-dependent where receptors most commonly are associated with β-arrestins. However, for some GPCRs, the β-arrestin association is not required for clathrin-mediated internalization. One example is the apelin receptor that can internalize via clathrin-coated pits, but in β-arrestinindependent manner (Pope GR et al. 2016 Moll Cell Endocrinol. 437:108-19). Alternatively, GPCRs can also internalize independently of any clathrin and β-arrestin associations via caveolae or fast endophilinmediated endocytosis (FEME). We have now expanded our discussion of possible mechanisms for βarrestin-independent receptor internalization in the updated manuscript in the 6th paragraph of the discussion, and we thank the reviewer for the suggestion.

      3.5 Is use of mini G protein a good representation? The authors should justify.

      Excellent point and something we have comprehensively discussed in our response to reviewer 1 and 2 (points 1.2 and 2.1).

    1. Author Response

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all motor neurons are NotchON neurons while all sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single cell RNAseq on LPCs to look for molecular heterogeneities. Thanks for the great comment!

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in L1, we need to express Dl-RNAi before Dl protein is expressed in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4 that we used. There is no L1-gal4 line expressed early enough to eliminate L1 expression of Dl.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in new-born neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      Thanks for the comment! We will annotate Pdm3/Ap+ as L4/L5 fate in the corresponding figures.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons. We will include the data to support this.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show that the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently we only have Hey as an available Notch target reporter in new-born neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. Thank you for requesting it!

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree, and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5 specific gene transcription during synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiates L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree, and will update the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree, and will update the figure annotation.

      ● Bsh role in L4/L5 cell fate:

      o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a followup manuscript on LPC heterogeneity, but those experiments have just barely been started.

      o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We will include this explanation in the text.

      o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we will make that change.

      o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We will rephrase it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We will include Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we will update it.

      ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-210).

      ● Dip-β regulation:

      ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained it above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We’ll include this explanation in the text.

      ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We will add this to the text.

      ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We will include this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

      That is a great point, thank you! We will include this in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important study shows that two methods of sleep induction in the fly, optogenetically activation of the dorsal fan-shaped body (which is rapidly reversible and maintains a neuronal activity signature similar to wakefulness), and Gaboxadol-induced sleep (which shuts down neuronal activity), produce distinct forms of sleep and have different effects on brain-wide neural activity. The majority of the conclusions of the paper are supported by compelling data, but the evidence supporting the claim that the two interventions trigger distinct transcriptional responses is incomplete.

      Thank you for the helpful and detailed reviews. We feel that these have improved the manuscript considerably, and hopefully the additional figures in this Reply letter will help further convince our readers.

      Public Review

      In this study, Anthoney and coworkers continue an important, unique, and technologically innovative line of inquiry from the van Swinderen lab aimed at furthering our understanding of the different sleep stages that may exist in Drosophila. Here, they compare the physiological and transcriptional hallmarks of sleep that have been induced by two distinct means, a pharmacological block of GABA signaling and optogenetic activation of dorsal fan-shaped-body neurons. They first employ an incredibly impressive fly-on-the-ball 2-photon functional imaging setup to monitor neural activity during these interventions, and then perform bulk RNA sequencing of fly brains at different stages. These transcriptomic analyses leads them to (a) knocking out nicotinic acetyl-choline receptor subunits and (b) knocking down AkhR throughout the fly brain testing the impact of these genetic interventions on sleep behaviors in flies. Based on this work, the authors present evidence that optogenetically and pharmacologically induced sleep produces highly distinct brain-wide effects on physiology and transcription. The study is of significant interest, is easy to read, and the figures are mostly informative. However there are features of the experimental design and the interpretation of results that diminish enthusiasm.

      a- Conditions under which sleep is induced for behavioral vs neural and transcriptional studies

      1- There is a major conceptual concern regarding the relationships between the physiological and transcriptomic effects of optogenetic and pharmacological sleep promotion, and the effects that these manipulations have on sleep behavior. The authors show that these two means of sleep-induction produce remarkably distinct physiological and transcriptional responses, however, they also show that they produce highly similar effects on sleep behavior, causing an increase in sleep through increases in the duration of sleep bouts. If dFB neurons were promoting active sleep, the sleep it produces should be more fragmented than the sleep induced by the drug, because the latter is supposed to produce quiet sleep. Yet both manipulations seem to be biasing behavior toward quiet sleep.

      This is a correct observation, which is already evident in our sleep architecture data (Figure 2E-H): chronic optogenetic sleep induction promotes longer sleep bouts that are similar in structure (bout number vs bout duration) to those produced by THIP feeding. Since our plots in Figure 2E-H follow the 5min sleep criterion cutoff, upon the Reviewer’s advice we re-analyzed our optogenetic experiments for short (1-5min) sleep. These are graphed below in Author response image 1. As can be seen, and as suspected by the Reviewer, the optogenetic manipulation does not increase the total amount of short sleep; indeed, it decreases it compared to baseline (these are for the exact same data as in Figure 2). Optogenetic sleep induction does not create a bunch of short sleep bouts.

      Author response image 1.

      Short sleep in optogenetic experiments. A. Average baseline (±SEM) 1-5min sleep across a day and night. B. Average (±SEM) 1-5min sleep in optogenenetically-activated flies, across a day and night.

      We agree with the reviewer that this observation might seem inconsistent with the idea that optogenetic activation promotes active sleep, and that short sleep is active sleep. However, it does not necessarily follow that optogenetic activation has to produce short sleep. Indeed, we know from our brain imaging data (and the associated behavioral analysis) that active sleep will persist for as long as we induce it with red light. While we have not induced it for longer than 15 minutes (Tainton-Heap et al, Current Biology, 2021; Troup et al, J. of Neuroscience, 2023), this is already clearly longer than a <5min sleep bout. So our interpretation is that the longer sleep bouts induced by optogenetic activation are prolonged active sleep, rather than quiet sleep. In other words, this artificial sleep manipulation induces prolonged active sleep, rather than many short sleep bouts. This is of course different than what happens during spontaneous sleep. We have tried to be clearer about sleep bout durations in the revised manuscript (e.g., the new Figure 3), and we now admit early in the results (lines 376-380) that that we don’t know what optogenetic activation looks like in the fly brain beyond 15 minutes.

      2- The authors show that the pharmacological block of GABA signaling and the optogenetic activation of dorsal fan-shaped-body neurons cause different responses on brain activity. Based on these recordings and the behavioral and brain transcriptomic data they then claim that these responses correspond to different sleep states and are associated with the expression and repression of a different constellation of genes. Nevertheless, neural activity in animals was recorded following short stimulations whereas behavioral and transcriptomic data were obtained following chronic stimulation. In this regard, it would be interesting to determine how the 12-hour pharmacological intervention they employed for their transcriptomic analysis changes neural activity throughout the brain - 12 hours will likely be too long for the open-cuticle preps, but an in-between time-point (e.g. 1h) would probably be equally informative.

      The longest we’ve imaged brain activity for optogenetic sleep induction is 15 minutes, as discussed above. We see no changes in activity across this time, which would normally have led to a quiet sleep stage in spontaneous sleep recordings. Whole-brain imaging after 10 hours of optogenetic sleep induction (our RNA collection timepoint) is not realistic, and even 1 hour is difficult. We have however conducted overnight electrophysiological recordings (with multichannel silicon probes), where we activated the same R23E10 neurons for successive 20-minute bouts (alternating with 20min of no red light). We are preparing this work for publication (Van De Poll, et al). We see no evidence of optogenetic activation of this circuit ever producing anything resembling quiet sleep. Since we are not in a position to provide this new electrophysiological data in the current study, we are careful to clarify that we have not investigated what brain imaging looks like after chronic optogenetic activation (lines 376-380). We are showing through diverse lines of evidence that what is called sleep can look different in flies.

      b- Efficiency of THIP treatment under different conditions

      1- There are no data to quantify how THIP alters food consumption. It is evident that flies consume it otherwise they would not show increased sleep. However, they may consume different amounts of food overall than the minus THIP controls. This might have an influence on the animal's metabolism, which could at least explain the fact that metabolism-related genes are regulated (Figure 5). Therefore, in the current state, it is not possible to be certain that gene regulation events measured in this experiment are solely due to THIP effects on sleep.

      We have two arguments against this reasonable criticism. First, as discussed above, the optogenetic flies are sleeping at least as much as the THIP-fed flies, so in principle they also might be feeding less. But we see no metabolic gene downregulation in the optogenetic dataset. We include this counterargument in the discussion (lines 752-756). Then, together with our co-author Paul Shaw we have shown that THIP-fed flies are not eating less compared to controls (Dissel et al, Current Biology, 2015), by tracking dye consumption. We show those results again below in Author response image 2 to support our reasoning that feeding is not an issue.

      Author response image 2.

      Flies were fed blue dye in their food while being sleep deprived (SD), or while being induced to sleep with 0.1mg/ml THIP in their food, or both. Dye consumption was measured in triplicate for pooled groups of 16 flies. Average absorbance at 625nm (±stan dev) is shown. Experiments were not significantly different (ANOVA of means).

      2- A similar problem exists in the sleep deprivation experiments. If flies are snapped every 20 seconds, they may not have the freedom to consume appropriate amounts of food, and therefore their consumption of THIP or ATR may be smaller than in non-sleep deprived controls. Thus, it would be crucial to know whether the flies that are sleep-deprived (i.e. shaken every 20 seconds for 12 hours) actually consume comparable amounts of food (and therefore THIP) as those that are undisturbed. If not, then perhaps the transcriptional differences between the two groups are not sleep-specific, but instead reflect varying degrees of exposure to THIP.

      Please see our response to the similar critique above, and how Figure R2 addresses this concern.

      3- The authors should further discuss the slow action of THIP perfusion vs dFB activation, especially as flies only seem to fall asleep several minutes after THIP is being washed away. Is it a technical artifact? If not, it may not be unreasonable to hypothesize that THIP, at the concentration used, could prevent flies from falling asleep, and that its removal may lower the concentration to a point that allows its sleep-promoting action. The authors could easily test this by extending THIP treatment for another 4-5 minutes.

      The reviewer is partially correct in suggesting a technical artifact: THIP does not get washed away immediately after 5min of perfusion. The drip system we employ means that THIP concentration will slowly increase to the maximum concentration of 0.2mg/ml, and then slowly get diluted away at a rate of 1.25ml/minute (this is all in the Methods). In a previous study (Yap et al, Nature Communications, 2017) we used this exact same perfusion procedure to test a range of THIP concentrations, and settled on 0.2mg/ml as the lowest that reliably induced quiet sleep within 5 minutes. Higher concentrations induced quiet sleep faster, so the alternate explanation proposed by the Reviewer is not supported. We feel that our previous electrophysiological study provided the necessary groundwork for using the same approach and dosage here for our whole-brain imaging readout.

      c- Comments regarding the behavioral assays

      1- L319-322: the authors conclude that dFB stimulation and THIP consumption have similar behavioral effects on sleep. However, this is inaccurate as in Figure S1 they explain that one increases bout number in both day and night and the other one only during the day.

      We have now added a caveat about night bout architecture being different (lines 353-356). Figure S1 is now Figure 3.

      2- The behavioral definitions used for active and quiet sleep do not fit well with strong evidence that deep sleep (defined by lowered metabolic rates) is probably most closely associated with bouts of inactivity that are much longer than the >5min duration used here, i.e., probably 30min and longer (Stahl et al. 2017 Sleep 40: zsx084). Given that the authors are providing evidence that quiet sleep is correlated with changes in the expression of metabolism related genes, they should at least discuss the fact that reductions in metabolism have been shown to occur after relatively long bouts of inactivity and might reconsider their behavioral sleep analysis (i.e., their criteria for sleep state) with this in mind.

      Interestingly, induced sleep bout durations are on average longer for the optogenetic manipulation (40min vs 25min); this was evident in Figure S1C vs S1F (now Figure 3). So as discussed above, this provides a counterargument for sleep bout duration alone being indicative of metabolic processes associated with quiet sleep: the optogenetic dataset did not uncover metabolic-related pathways as relevant to that sleep manipulation. We refer to Stahl et al, Sleep, 2017, in our discussion (lines 748-750), making exactly this point about metabolic rates being decreased in longer sleep bouts, and flowing up with our observation that optogenetic flies sleep just as much, and their bouts are actually longer. So clearly different processes must be involved.

      d- Comments regarding the recordings of neuronal activity

      1- There is an additional concern regarding the proposed active and quiet sleep states that rest at the heart of this study. Here these two states in the fly are compared to the REM and NREM sleep states observed in mammals and the parallels between active fly sleep and REM and quiet fly sleep and NREM provide the framework for the study. The establishment of such parallel sleep states in the fly is highly significant and identifying the physiological and molecular correlates of distinct sleep stages in the fly is of critical importance to the field. However, the proposal that the dorsal fan shaped body (dFB) neurons promote active sleep runs counter to the prevailing model that these neurons act as a major site of sleep homeostasis. If quiet sleep were akin to NREM, wouldn't we expect the major site of sleep homeostasis in the brain to promote it? Furthermore, the authors state that the effects of dFB neuron excitation on transcription have "almost no overlap" (line 500) with the transcriptomic effects of sleep deprivation (Supplementary Table 3), which is not what would be expected if dFB neurons are tracking sleep pressure and promoting sleep, as suggested by a growing body of convergent work summarized on page four of the manuscript. Wouldn't the 10h excitation of the dFB neurons be predicted to mimic the effects of sleep deprivation if these neurons "...serve as the discharge circuit for the insect's sleep homeostat..." (line 60)? Shouldn't their prolonged excitation produce an artificial increase in sleep drive (even during sleep) that would favor deep, restorative sleep? How do the authors interpret their results with regard to the current prevailing model that dFB neurons act as a major site of sleep homeostasis? This study could be seen as evidence against it, but the authors do not discuss this in their Discussion.

      These are all excellent and thoughtful points, which have made us re-think parts of our discussion. First off, the potential comparison with REM and NREM is entirely speculative, and we have tried to make that more obvious in introduction) and the discussion (e.g, see lines 43, 708, 818). The evidence that the FB neurons (and maybe others) are involved in the homeostatic regulation of sleep is well-supported in the literature, so that part of the discussion holds. However, we concede that the timing of our sleep manipulations could benefit from more explanation. We conducted these during the flies’ subjective day, after the animals had presumably had a good night’s sleep. This means that we induced either kind of sleep for 10 daytime hours, which presumably replaced whatever behavioural states would ‘naturally’ be happening during the day. Female flies sleep less during the day than at night, and we have shown in previous work that daytime sleep quality is different than night-time sleep (van Alphen et al, Journal of Neuroscience, 2013), leading us to suggest that most ‘deep’ or quiet sleep happens at night, for flies. Following this reasoning, daytime optogenetic activation might not be depriving flies of much quiet sleep, or accumulating a deep sleep drive as the Reviewer proposes. Rather, both induced sleep manipulations could be providing 10 hours of either kind of sleep that the flies don’t really ‘need’. Why did we design it this way? Firstly, we were interested in simply asking what these chronic sleep manipulations do to gene expression in rested flies, and how they might be similar or different. We focussed on daytime manipulations to avoid precisely the confound of sleep pressure, and also because we observed red-light artifacts at night for our optogenetic experiments (which we reported). Our sleep deprivation strategy was designed specifically as a control for the THIP (Gaboxadol) experiments, to control for non-sleep related effects of the drug (see below our rationale for why this was less crucial for the optogenetic experiments). In conclusion, we had a logical rationale for how the experiments were done, centred on the straightforward question of whether these two different approaches to sleep induction were having similar effects in well-rested flies. In retrospect, we were not anticipating the Reviewer’s thoughtful logic regarding the dFB’s potential role in also regulating deep sleep homeostasis. We now provide some discussion along these lines to make readers aware of this line of reasoning, as well as our rationale for why prolonged optogenetic sleep induction was not sleep-depriving (lines 768-777).

      2- Regarding the physiological effects of Gaboxadol, to what extent is the quieting induced by this drug reminiscent of physiology of the brains of flies spontaneously meeting the behavioral criterion for quiet sleep? Given the relatively high dose of the drug being delivered to the de-sheathed brain in the imaging experiments (at least when compared to the dose used in the fly food), one worries that the authors may be inducing a highly abnormal brain state that might bear very little resemblance to the deeply sleeping brain under normal conditions. As the authors acknowledge, it is difficult to compare these two situations. Comparing the physiological state of brains put to sleep by Gaboxadol and brains that have spontaneously entered a deep sleep state therefore seems critical.

      As discussed above, our Gaboxadol (THIP) perfusion concentration (0.2mg/ml) was the minimal dosage that effectively induced sleep within 5 minutes, based upon previously published work (Yap et al, Nature Communications, 2017). Lower concentrations were unreliable, with some never inducing sleep at all. Comparisons with feeding THIP are tenuous, and we make that clear in our discussion (lines 731-735). Nevertheless, the Reviewer makes an excellent point about comparisons with spontaneous ‘quiet’ sleep. Here, we feel well supported (please see Author response image 3 below, comparing THIP-induced sleep (this work, B) and spontaneous sleep (A) from previous study). In our previous study (Tainton-Heap et al, 2021) we showed that neural activity and connectivity decreases during spontaneous quiet sleep. This is what we also see with THIP perfusion. In contrast, in Troup et al, J. of Neuroscience (2023) we confirm that neither neural activity nor connectivity changes during optogenetic R23E10 activation, and general anesthesia – unlike THIP – does NOT produce a quiet brain state. Our finding that THIP effects are nothing like general anesthesia (at the level of brain activity levels) suggests a physiological sleep state closer to spontaneous quiet sleep. We elaborate on this important observation in our results, also pointing to crucial differences with general anesthesia (lines 411-415).

      Author response image 3.

      THIP-induced sleep resembles quiet spontaneous sleep. A. Calcium imaging data from spontaneously sleeping flies, taken from Tainton-Heap et al, 2021. Left, percent neurons active; right, mean degree, a measure connectivity among active neurons. Both measures decrease during later stages of sleep. B. Calcium imaging data from flies induced to sleep with 5min of 0.2mg/ml THIP perfusion (this study). Left, percent neurons active; right, mean degree. Both measures are significantly decreased, resembling the later stages of spontaneous sleep, which we have termed ‘quiet sleep. Hence THIP-induced sleep resembles quiet sleep. Note that the genetic background is different in A and B, hence the different baseline activity levels.

      3- There are some issues with Figure 3, in particular 3C-D. It is not clear whether these panels show representative traces or an average, however both the baseline activity and fluorescence are different between C and D, in particular in their amplitude. Therefore, it is difficult to attribute the differences between C and D to the stimulation itself or to the previously different baseline. In addition, the fact that flies with dFB activation seem to keep a basal level of locomotor activity whereas THIP-treated ones don't is quite striking, however it is not being discussed. Finally, the authors claim that the flies eventually wake up from THIP-induced sleep (L360-361), however there are no data to support this statement.

      These are representative traces, which is a way of showing the raw calcium data (Cell ID) so readers can see for themselves that one manipulation silences whereas the other does not – even though flies become inactive for both. The Y-axis scale is standard deviation of the experiment mean. Since THIP decreases neural activity, then the baseline is comparatively higher. Since optogenetic activation does not change average neural activity levels, the baseline is centered on zero. This is an outcome of our analysis method and does not reflect any ‘true’ baseline. We have now clarified this in our figure legend. We now also confess that flies rendered asleep optogenetically can be ‘twitchy’ (line 374). Finally, we show data for 3 flies that were recorded until they woke up. The rest were verified behaviorally, after the experiment. This is now explained in the Methods.

      4- In Figure 4C, it is strange that the SEM is always exactly the same across the whole experiment. Readers should be aware that there might have been an issue when plotting the figure.

      This is not a mistake, the standard errors are just all quite close (between 0.17 and 0.22). This is because of the way we did the analysis, asking how many flies responded to each stimulus event, with incremental levels of responsiveness. This is explained in the Methods. The figure makes the important point of sleep and recovery.

      e- Comments regarding the transcript analyses

      1- General comment: the title of this manuscript is inaccurate - the "transcriptome" commonly refers to the entirety of all transcripts in a cell/tissue/organ/animal (including genes that are not differentially expressed following their interventions), and it is therefore impossible to "engage two non-overlapping transcriptomes" in the same tissue. Perhaps the word "transcriptional programs" or transcriptional profiles" would be more accurate here?

      We thank the Reviewer for this advice and have changed the title as proposed.

      2- Given the sensitivity of transcriptomic methods, there is a significant concern that the optogenetic experiments are not as well controlled as they could be. Given the need for supplemental all-trans retinal (ATR) for functional light gating of channelrhodopsins in the fly, it is convenient to use flies with Gal4-driven opsin that have not been given supplemental ATR as a negative control, particularly as a control for the effects of light. However, there is another critical control to do here. Flies bearing the UAS-opsin responder element but lacking the GAL4 driver and that have been fed ATR are critical for confirming that the observed effects of optogenetic stimulation are indeed caused by the specific excitation of the targeted neurons and not due to leaky opsin expression, or the effect of ATR feeding under light stimulation or some combination of these factors. Given the sensitivity of transcriptomic methods, it would be good to see that the candidate transcripts identified by comparing ATR+ and ATR- R23E10GAL4/UAS-Chrimson flies are also apparent when comparing R23E10GAL4/UAS-Chrimson (ATR+) with UAS-Chrimson (ATR+) alone.

      We have not done these experiments on UAS-Chrimson/+ controls. Like many others in our field, we viewed non-ATR flies as the best controls, because this involves identical genotypes. Since we were however aware that ATR feeding itself could be affect gene expression, we specifically checked for this with our early (1hour) collection timepoint. We only found 26 gene expression differences between ATR and -ATR flies at this early timepoint, compared with 277 for the 10-hour timepoint. We detail this rationale in our results, explaining why this is a convincing control for ATR feeding. If there was leaky opsin expression / activity, this would have been evident in our design. Regarding the cumulative effect of light, this would also have been accounted in our design, as only 1 hour would have elapsed in our first timepoint compared to 10 hours in our second. While the Reviewer is correct in saying that parental controls are called for in many Drosophila experiments, this becomes quickly unmanageable in transcriptomic studies, which is exactly why well-designed +ATR vs -ATR comparisons in the exact same strain are most appropriate. We feel that our 1-hr timepoint mostly addresses this concern.

      3- Figures about qPCR experiments (5G and 6G) are problematic. First, whereas the authors seem satisfied with the 'good correspondence' between their RNA-seq and qPCR results, this is true for only ~9/19 genes in 5G and 2/6 genes in 6G. Whereas discrepancies are not rare between RNA-seq and qPCR, the text in L460-461 and 540-541 is misleading. In addition, it is unclear whether the n=19 in L458 refers to the number of genes tested or the number of replicates. If the qPCR includes replicates, this should be more clearly mentioned, and error bars should be added to the corresponding figures.

      We consider that our qPCR validations were convincing, as they were all mostly changed in the ‘right’ direction. We agree that are some discrepancies, so have modified our language to reflect this. We have also clarified that 19 refers to the number of genes validated by qPCR in that THIP dataset. All qPCRs involved three technical replicates. We prefer to keep these histograms the way they are to convey these simple trends. For complete transparency, we now provide a supplemental Excel worksheet with all of the qPCR data, alongside corresponding RNAseq data and stats for the selected genes (Supplementary Table 9).

      4- There is a lack of error bars for all their RNAseq and qPCR comparisons, which is particularly surprising because the authors went to great lengths and analyzed an applaudably large amount of independent biological replicates, yet the variability observed in the corresponding molecular data is not reported.

      The genes reported in each of our datasets and associated supplemental figures and tables were all significant, as determined by criteria outlined in the Methods. However, we appreciate that readers might want to get a sense of the values and variances involved, as well as access to the entire gene datasets. We now provide all of these as additional ‘sheets’ in our existing supplemental tables (S2-S7), so this should be very easy to navigate and evaluate. In addition to the previously provided lists for significant genes, in the second Excel sheet (‘All genes’) readers will be able to see the data for all 5 replicates, for the significant genes as well as all other ~15,000 genes (listed in alphabetical order). We feel that this will be a helpful resource, because admittedly significance thresholds can still be a little arbitrary and some readers might want to look up ‘their’ genes of interest.

      Comments to authors

      Other comments

      1- Text in L441 & 606 is misleading. According to ref 52, AkhR is involved specifically in starvation-induced sleep loss, and not in general sleep regulation.

      Corrected.

      2- The language used in L568-570 and 573-574 is confusing. The authors should specify that the knock down of cholinergic subunits, rather than the subunits themselves is what causes sleep to increase or decrease.

      Corrected.

      3- The authors' investigation of cholinergic receptor subunits function is very preliminary, and it is difficult to draw any conclusion from what is presented here. In particular, their behavioral data is difficult to reconcile with the RNA-seq data showing overexpression of both short sleep increasing and short sleep decreasing subunits. Without knowing where in the brain these subunits are required for controlling sleep, the data in Figure 7 is difficult to appreciate.

      We have now conducted additional experiments where we specifically knocked down these alpha receptor subunits (all 7 of them) in the R23E10 neurons. This seemed an obvious knockdown location, to determine if any of these subunits regulated activity in the same sleep promoting neurons that were the focus of this study. We found that alpha1 knockdown in these neurons had similar sleep phenotypes, which we believe is an important result. Since this functional localisation is a logical ending for the paper, we have now made it the final figure.

      Suggestions & comments

      1- It would be interesting if the authors could discuss their findings that metabolism genes are downregulated in THIP flies in the context of recent work that showed upregulation of mitochondrial ROS after sleep deprivation (Kempf et al, 2019).

      We now add the Kempf 2019 reference and allude to how those findings could be consistent with ours.

      2- The fact that THIP-induced sleep persists long after THIP removal (Fig 3D) is very intriguing and interesting. This suggests that the drug might trigger a sleep-inducing pathway that can continue on its own without the drug, once activated.

      This is correct, and in stark contrast to the optogenetic manipulation we employ, which does not appear to show such sleep inertia. We have now added a sentence highlighting this interesting difference (lines 394-396).

      3- The authors identify many new genes regulated in response to specific methods for sleep induction. These are all potentially interesting candidates for further studies investigating the molecular basis of sleep. It would be interesting to know which of these genes are already known to display circadian expression patterns.

      By providing all of the gene lists, these are now available to ask questions such as these. We hesitate however to delve into this domain for this work, as our main goal was to compare these two kinds of sleep in flies.

      4- The brain-wide monitoring of neural activity invites a number of very exciting follow-up experiments - most importantly, it would be fascinating to establish, which neurons are active in the different phases the authors describe! Are these neurons that are involved in transmitting external visual stimuli to the central brain? Do they also project into the central complex? They could make use of the large collection of existing driver lines in the fly and they could also exploit the extraordinary knowledge of the connectome and transcriptome of the fly brain.

      Thank you for sharing our enthusiasm for these likely future directions.

      5- The Dalpha2,3,4,6 and 7 Knock-out strains they generate will be a useful reagent for the Drosophila neuroscience community once the efficiency/success of the knock-out has been confirmed by qPCR.

      These knockout strains have all been confirmed by our co-authors Hang Luong, Trent Perry, and Philip Batterham. These knockout confirmations are outlined in publications that we reference (Perry et al, 2021).

      Materials and methods:

      1- This study has employed custom-built apparatus and custom-written code/scripts, but these do not appear to be available to the reader. For the sake of replicability, the authors should make these available.

      The code/scripts are available via the University of Queensland research data management system as described in the Methods, and can be sent by the Lead Contact. The imaging hardware and analysis code are identical to what was described in a previous publication, and available as directed therein (Tainton-Heap et al, 2021).

      2- Also, the authors should give details on the food used to rear their flies. Fly media comes in several common forms and sleep is sensitive to diet.

      This has now been elaborated in the beginning of the Methods.

      3- The light regime used for optogenetic excitation of dFB neurons consists of 12h of uninterrupted bright red LED light. Most optogenetic stimulations consist of pulsed high frequency flashes interlaced with pauses in illumination. Can dFB neurons be driven constitutively with 12 hours of bright light?

      We showed in Tainton-Heap (2021) that 7Hz pulsed red light had exactly the same effect on R23E10/Chrimson readouts as continuous red light, which is why we opted here to provide continuous red light. That optogenetic sleep induction can be driven continuously for 12 hours is evident by our 24-hour sleep profiles. However, we agree that one could question whether sleep quality is similar after 12 hours. To address this, we did an additional experiment where we stimulated the flies hourly, to determine if their behavioural responsiveness to mechanical stimuli changed over the course of continued sleep induction, for both optogenetic and THIP-induced sleep. We present the data below in Author response image 4. As can be seen in these new analyses, while optogenetic sleep induction persists across 12 daytime hours (speed is close to zero throughout), flies do indeed become more responsive later in the day. This could have two different interpretations: either some sleep functions are being satisfied over time, or the activation regime is becoming less effective over time. Either way, these data show that at our 10-hour daytime timepoint, unstimulated flies are still largely inactive, even though their arousal thresholds might have gradually changed; so the uninterrupted red-light regime is still effective. The comparison with THIP is interesting: here there does not seem to be a change in responsiveness over time; the drug just decreases behavioral responsiveness throughout. Together, these experiments support our view that both approaches are sleep-promoting throughout the 12-hour day, although we appreciate that sleep quality is not identical.

      Author response image 4.

      A) The average speed of baseline (grey) and optogenetically-activated flies (green) across 24 hours. Red dots indicate vibration stimulus times. B) The average speed of control (grey) and THIP-fed flies (blue) across 24 hours. Flies are all R23E10/Chrimson. N= 87 for optogenetic, n=88 for -THIP, n=85 for +THIP.

      4- The authors use the SNAP apparatus to prevent THIP-treated flies from sleeping to tease out possible sleep-independent effects. This is an excellent control. Why have the authors not done the same with the optogenetic treatment? It's surprising not to see this control given the concern the authors express (lines 501 - 502) that the dFB manipulation might be paralyzing awake flies, which certainly seems possible given the light regimes used. Why not test this directly with SNAP?

      We appreciate that this may have been a valuable additional control. However, we designed this control for the THIP experiments specifically because of concerns about THIP’s (yet unknown) mechanism of action in flies. THIP is a gabaergic drug with most likely many off-target effects that have little to do with sleep, hence the need for a control where we compare to flies that ingested THIP but have been prevented from sleeping. In contrast, R23E10-driven sleep induction is exactly that, a circuit when activated that induces sleep. Whatever specific neurons might really be involved, the Gal4 circuit is sleep-inducing. This is well supported by multiple publications. The most appropriate control for assessing transcriptomic effects during optogenetic sleep here is not preventing sleep, but rather no increased sleep in flies that have not ingested ATR, and comparing that to effects of ATR alone, which is what we have done. Adding a sleep-deprivation layer onto both of these analyses may have been interesting, but a lot more analyses and not strictly required to identify relevant sleep-related genes. We have rephrased the misleading sentence about paralyzing flies, to instead clarify that lack of overlap with the SD dataset suggests that optogenetic activation is not preventing sleep functions from being engaged.

      5- A pairwise comparison of ZT01 and ZT10 does not address circadian expression cycles in a meaningful way. There will be strong effects of the LD cycle here. I suggest toning this down. (Though it is gratifying to see the expected changes in the core clock genes.)

      We have changed the language from ‘circadian’ to ‘light-dark’ to address this, although have kept the word ‘circadian’ when referring specifically to genes such as per, clock, timeless, etc.

      6- Line 109: There is a reference missing.

      We now provide the relevant reference.

      Results

      1- General comment regarding the figures: a general effort could be made to improve the design and quality of the figures and make them more readable. There are a lot of issues such as stretched or misaligned text, badly drawn frames, etc.

      We think we know which figures this might relate to (e.g., Figures 3,4B), so we have adjusted where appropriate.

      2- Instead of 'dFB-induced' (e.g., L77) it would be more accurate to use 'optogenetically-induced'

      Thank you for this helpful advice. We have changed our language throughout to say ‘optognetically-induced’

      3- Figure S1 should be integrated in the main figure to make the quantification more easily 4accessible.

      We have integrated Figure S1 into the main figures. It is now Figure 3.

      5- It would be good to include red light controls in Figure 2C, E, G.

      Making Figure S1 a main figure has better highlighted the fact that we have done red light controls (‘baseline’).

      6- line 313: Fig2E-H - these graphs would benefit if the authors made it more obvious where the maximum sleep amount would fall - i.e. the combination of bouts and minutes that add up to 12 hours (and therefore the entire day/night)

      If a fly were to sleep uninterrupted for all 12 hours of a day or night, that would amount to a sleep bout 720 minutes long. We do not feel that identifying this maximum on these graphs would be helpful. It should be clear from the data that a floor is reached with very few sleep bouts exceeding 60 minutes in our paradigm. To help orient the reader though, we now clarify in the figure legend that the maximum is 720 minutes or 12 hours.

      7- Fig. 2B, D: It was not clear why the authors took the 3-day average here. Doesn't that lead to a whole range of very different behaviors? I could, perhaps naively, imagine that a fly's behavior changes after 2 days of almost-permanent sleep?

      We took the 3-day average because the effect of THIP on each successive day was not significantly different (see Author response image 5, below). Flies wake up enough to have a good feed (see Author response image 2) and then go back to sleep. Since this is however an important point raised by the reviewer, we now mention in the Methods that sleep duration was not different among the 3 averaged days and nights (lines 193-195).

      Author response image 5.

      Data from THIP feeding experiment (Figure 2B) in manuscript, separated into 3 successive days and nights, with THIP-fed flies (blue) compared to controls (white). Averages  SD are shown, samples sizes are the same as in Figure 2D. No THIP data was significantly different across days and nights (ANOVA of means).

      8- In Figure 2C the authors compare optogenetically induced to "spontaneous sleep," which I think refers to baseline sleep before stimulation, according to the figure. I think the proper comparison would be to the red light control (ATR-); though see the comment above regarding optogenetic controls).

      This information was provided in Figure S1. We now provide it as a main Figure 3, as requested above.

      We also made a point about red light having an effect at night, which is why we focussed on daytime effects for our transcriptomic comparisons. We feel that the ATR-fed flies (minus red light) are an appropriate control here for optogenetically-induced sleep: same exact genotype and ATR feeding, just no optogenetic activation. We therefor would prefer to keep these graphs as they are, especially since we show -ATR data subsequently.

      9- Figures 3A and 4A are redundant; Figure 3B has some active ROIs that are outside of the brain. I am not sure how this is possible?

      We have removed the redundant 4A and replaced it with the THIP molecule to clearly signal what this figure is focussed on. In Figure 3B (now 4B), the brain mask is a visual estimate made from the middle of the image stack. Some neurons in other layers are outside this single-layer estimate. All neurons were all accounted for.

      10- Figure 4B is confusing. It took me a while to understand and so it can do with re-drawing in a more accessible way.

      We agree that this was confusing, e.g. there were too many arrows. We have redrawn and simplified (Now 5A).

      11- The authors state that flies wake up from THIP-induced sleep on the ball, but in Figure 4D there appears to be fewer samples for flies who have woken up from THIP (3) compared to those observed before THIP administration. Are flies dying?

      None of the flies died. Most flies were removed from imaging to confirm recovery, while 3 were left in our imaging setup to measure brain activity upon recovery. These results are in Figure 5C and now clarified in the Methods.

      12- Fig5C,D: I'm surprised that by far the most significant changes (in terms of log2-FC and p-val) occur in the sleep-deprived flies? It is not clear to me what the authors mean by effects that "relate waking process"? Perhaps they could elaborate on this?

      We have removed the phrase ‘relates to waking processes’. We now also remark on the high level of fold-change in many of these genes but refrain from discussing this further in the results. It is interesting though.

      13- The sentence in L425-428 is unclear - it would be good to rephrase this.

      We have rephrased this sentence, hopefully it’s clearer now.

      14- Text in L544-545 is confusing. What do you mean by 'less clear'?

      We have replaced ‘less clear’ with ‘not dominated by a single category’.

      15- It is unclear what is the control in Fig 7A. It would be good to mention what strain was used.

      Different knockout strains had different controls. These are identified in the figure legend and Methods.

      16- L579-581: it would be helpful to include this data in a supplementary figure.

      We now provide this as a supplementary figure as requested (Supplementary Figure 6).

      17- There is no information about R57C10 in the methods - it would be good to explain which neurons this line labels, and why you chose it.

      We now clarify in the methods that R57C10-Gal4 is a pan-neural driver, and provide a reference.

      18- Table S5 - If I'm not mistaken then the first line should say 1h, not 10h.

      Corrected

    1. Author Response

      We are grateful for the constructive comments of the reviewers and for the succinct assessment of our work by the editors. Here we provide a brief summary of our response to answer the major criticism of our reviewers. We will give a detailed point-to-point response soon when we upload a revision of our paper.

      1) The MATLAB code for the spatial autocorrelation analysis is now freely available at the following site: : https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m If any question arises during its implementation, please contact Csaba Dávid (david.csaba@koki.hu)

      2) Concerning the computer resources and times required to perform Moran’s I image analysis, here we provide a brief description of the hardware and the calculations for images with different sizes.

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Computation times are shown in Author response table 1.

      Author response table 1.

      3) In response to the comment:

      “While the method's avoidance of AI training appeals to those lacking computational know-how and shows improved accuracy over basic threshold-based techniques, there are valid concerns regarding its performance in comparison to advanced methodologies”.

      Comparison of Moran’s I image analysis with AI based segmentations raises conceptual problems which will be addressed in detail in the revised version. Briefly, the basis of AI based analyses is that the ground truth is known and using a large teaching set AI learns to extract the relevant information for image segmentation. In several cases, however (like protein distribution in the membrane) the ground truth is not known and cannot be easily determined by any single observer. Defining spatial inhomogeneities in protein distribution, differentiating proteins involved vs not involved in clusters is highly subjective. Indeed, our analysis showed the 23 expert human observers varied hugely in establishing the boundaries of a protein cluster. As a consequence, establishing and using a teaching set would be highly contentious in these cases. In an average laboratory setting generating a teaching set using hundreds of images examined by two dozen people would not be impossible but not really plausible. The beauty of Moran’n I analysis is that it is able to extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe the synthesis and testing of the anti-cancer activity of a new molecule CK21 against pancreatic cancer mouse models. This part of the study is very strong showing regression of pancreatic tumors at non-toxic concentrations, which is very hard to achieve for practically uncurable pancreatic cancer. Authors synthesized CK21 as an analog of a known inhibitor of RNA synthesis which is very toxic. The authors did very little attempt to understand whether the mechanism of anti-cancer efficacy of CK2 is similar to this known inhibitor of transcription or not. One cannot compare gene expression profiles between untreated and CK21-treated cells, taking into account that CK2 may inhibit the expression of all genes. The effect of CK2 on general transcription needs to be tested first, and then based on this data absolute changes in the expression of genes may be considered for the revealing of the mechanism of activity of CK21.

      We also appreciated the toxicity concerns; thus, we designed the transcriptomic analysis on the human organoid cultured cells for early time points of 3, 6, 9 and 12 h, and with a CK21 concentration of 50nM, to ensure that at the time of harvest, the cells were ~100% viable. At these time points, many genes were upregulated but defined by IPA as enriched for cell death (apoptosis and necrosis), senescence and cell cycle arrest (Fig 5). This led us to hypothesize that the direct effect of CK21 on the tumor cells is the induction of apoptosis, but via multiple pathways.

      Reviewer #3 (Public Review):

      This manuscript describes CK21, a modified version of Triptolide, a natural compound with antcancer activities, to improve its bioavailability. The authors tested the compound in two human pancreatic cancer cell lines, in vitro and in vivo. The authors also use two human organoid lines derived from pancreatic cancer, and mouse KC and KPC cell lines. In all models, CK21 treatment induces dose-dependent cytotoxicity. In vivo, CK21 causes tumor regression. The authors perform gene expression analysis and show that treated organoids have generally lower transcription, consistent with cytotoxicity, and a reduction in the KFkB pathway activation.

      Key experiments that would strengthen the current manuscript are: the inclusion of normal cell lines and organoids, too, presumably, show no cytotoxic effect. If that is the case, the authors would have the opportunity to compare responses and determine whether a tumor-specific mechanism can be defined.

      Our in vivo studies suggest that CK21 is more specific to tumors, as CK21 ≤3 mg/kg treated mice were 100% viable and gained weight comparably to no treatment group (Fig.2d). Furthermore, in vitro studies with primary fibroblast cells indicate that comparable significant toxicity to CK21 after 72h culture was observed at 500 nM (Fig.s2). In contrast, CK21 induced significant toxicity in AsPC1 and Panc-1 cells at 50 nM (Fig. 1f.)

      The authors observe that few gene changes - besides from overall lowering in transcription, occur upon treatment with CK21. They suggest that the drug acts through inhibition of the NFkB pathway and an increase in reactive oxygen species (ROS). However, no experiments to test whether either/both of these findings explain the cytotoxic effect (rescue experiments would be particularly valuable).

      We performed a rescue study using an ROS inhibitor (acetylcysteine) but observed no significant effect (data not shown). We speculate that ROS and/or NF-B might function synergistically; additionally, it is possible that other mechanisms might be involved in the anti-tumor effects of CK21.

      In the last figure, the authors text whether CK21 is immunosuppressive by testing immunity against a mis-matched tumor cell line (using KPC tumors, mixed strain, in mixed strain mice). The immunity against HLA mis-matched cells is a very strong immune reaction, and mild immune suppression might be missed, which diminishes the value of these findings.

      KPC-960 tumor cells were derived from KPC (C57BL/6 background); therefore, KPC-960 tumors were HLA matched with host C57BL/6 mice. We were surprised to observe spontaneous rejection of the KPC-960 tumor line, since this contrasts with Torres et al. 2013. We speculate that this could be due to the increased number of passages resulting in antigenic drift, which may result in the accumulation of mutations that induce spontaneous rejection.

      We agree that there might be mild immunosuppression that we did not detect; we have included this caveat in the discussion. KC-6141 tumor cells used as CTL targets were from KC mice (mixed background – B6.129).

    1. Author Response

      Reviewer #1:

      This is a very timely paper that addresses an important and difficult-to-address question in the decision-making field - the degree to which information leakage can be strategically adapted to optimise decisions in a task-dependent fashion. The authors apply a sophisticated suite of analyses that are appropriate and yield a range of very interesting observations. The paper centres on analyses of one possible model that hinges on certain assumptions about the nature of the decision process for this task which raises questions about whether leak adjustments are the only possible explanation for the current data. I think the conclusions would be greatly strengthened if they were supported by the application and/or simulation of alternative model structures.

      We thank the reviewer for this positive appraisal of our study. We now entirely agree with their central comment about whether leak adjustments are the only (or even the best) explanation for the current data. We hope that the additional modelling sections that we have discussed in response to main comment 1 above have strengthened the paper. We have responded point-by-point to their public review, as this contained their main recommendations for revision.

      The behavioural trends when comparing blocks with frequent versus rare response periods seem difficult to tally with a change in the leak. […] Are there other models that could reproduce such effects? For example, could a model in which the drift rate varies between Rare and Frequent trials do a similar or better job of explaining the data?

      We can see why the reviewer has advocated for a possible change of drift rate (or ‘gain’ applied to sensory evidence) between conditions to explain our behavioural findings. We found, however, that changes in drift rate could elicit qualitatively similar changes in integration kernels to changes in decision threshold:

      Author response image 1.

      Changes in gain applied to incoming sensory evidence (A parameter in model) have similar effects on recovered integration kernels from Ornstein-Uhlenbeck simulation as changes in decision threshold.

      The likely reason for this is that the overall probability of emitting a response at any point in the continuous decision process is determined by the ratio of accumulated evidence to decision threshold. A similar logic applies to effects on reactions times and detection probability (main figure 2): increasing sensory gain/decreasing decision threshold will lead to faster reaction times and increased detection probability during response periods.

      Both parameters may even have a similar effect on ‘false alarms’, because (as the reviewer notes below) false alarms in our paradigm are primarily being driven by the occurrence of stimulus changes as well as internal noise. In fact, the false alarm findings mean it is difficult to fully reconcile all of our behavioural findings in terms of changes in a single set of model parameters in the O-U process. It is possible that other changes not considered within our model (such as expectations of hazard rates of inter-response intervals leading to dynamic thresholds etc.) may have had a strong impact upon the resulting false alarm rates. A full exploration of different variations in O-U model (with varying urgency signals, hazard rates, etc.) is beyond the scope of this paper.

      For this reason, we have decided in our new modelling section to focus primarily on a single, well-established model (the O-U process) and explore how changes in leak and threshold affect task performance and the resulting integration kernels. We note that this is in line with the suggestion of reviewer #2, who focussed on similar behavioural findings to reviewer #1 but suggested that we look at decision threshold rather than drift rate as our primary focus.

      This ties in to a related query about the nature of the task employed by the authors. Due to the very significant volatility of the stimulus, it seems likely that the participants are not solely making judgments about the presence/absence of coherent motion but also making judgments about its duration (because strong coherent motion frequently occurs in the inter-target intervals). If that is so, then could the Rare condition equate to less evidence because there is an increased probability that an extended period of coherent motion could be an outlier generated from the noise distribution? Note that a drift rate reduction would also be expected to result in fewer hits and slower reaction times, as observed.

      As mentioned above, the rare and frequent targets are indeed matched in terms of the ease with which they can be distinguished from the intervening noise intervals. To confirm this, we directly calculated the variance (across frames) of the motion coherence presented during baseline periods and response periods (until response) in all four conditions:

      Author response image 2.

      The average empirical standard deviation of the stimulus stream presented during each baseline period (‘baseline’) and response period (‘trial’), separated by each of the four conditions (F = frequent response periods, R = rare, L = long response periods, S = short). Data were averaged across all response/baseline periods within the stimuli presented to each participant (each dot = 1 participant). Note that the standard deviation shown here is the standard deviation of motion coherence across frames of sensory evidence. This is smaller than the standard deviation of the generative distribution of ‘step’-changes in the motion coherence (std = 0.5 for baseline and 0.3 for response periods), because motion coherence remains constant for a period after each ‘step’ occurs.

      Some adjustment of the language used when discussing FAs seems merited. If I have understood correctly, the sensory samples encountered by the participants during the inter-response intervals can at times favour a particular alternative just as strongly (or more strongly) than that encountered during the response interval itself. In that sense, the responses are not necessarily real false alarms because the physical evidence itself does not distinguish the target from the non-target. I don't think this invalidates the authors' approach but I think it should be acknowledged and considered in light of the comment above regarding the nature of the decision process employed on this task.

      This is a good point. We hope that the reviewer will allow us to keep the term ‘false alarms’ in the paper, as it does conveniently distinguish responses during baseline periods from those during response periods, but we have sought to clarify the point that the reviewer makes when we first introduce the term.

      “Indeed, participants would occasionally make ‘false alarms’ during baseline periods in which the structure of the preceding noise stream mistakenly convinced them they were in a response period (see Figure 4, below). Indeed, this means that a ‘false alarm’ in our paradigm has a slightly different meaning than in most psychophysics experiments; rather than it referring to participants responding when a stimulus was not present, we use the term to refer to participants responding when there was no shift in the mean signal from baseline.”

      And:

      “The fact that evidence integration kernels naturally arise from false alarms, in the same manner as from correct responses, demonstrates that false alarms were not due to motor noise or other spurious causes. Instead, false alarms were driven by participants treating noise fluctuations during baseline periods as sensory evidence to be integrated across time, and the physical evidence preceding ‘false alarms’ need not even distinguish targets from non-targets.”

      The authors report that preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods. It is not clear what identifies this signal as reflecting motor preparation. Did the authors consider using other effectorselective EEG signatures of motor preparation such as beta-band activity which has been used elsewhere to make inferences about decision bounds? Assuming that this central ERP signal does reflect the decision bounds, the observation that it has a larger amplitude at the response on Rare trials appears to directly contradict the kernel analyses which suggest no difference in the cumulative evidence required to trigger commitment.

      Thanks for this comment. First, we should simply comment that this finding emerged from an agnostic time-domain analysis of the data time-locked to button presses, in which we simply observed that the negative-going potential was greater (more negative) in RARE vs. FREQUENT trials. So it is simply the fact that it precedes each button press that we relate it to motor preparation; nonetheless, we note that (Kelly and O’Connell, 2013) found similar negative-going potentials at central sensors without applying CSD transform (as in this study). Like them, we would relate this potential to either the well-established Bereitschaftpotential or the contingent negative potential (CNV).

      We agree that many other studies have focussed on beta-band activity as another measure of motor preparation, and to make inferences about decision bounds. To investigate this, we used a Morlet wavelet transform to examine the time-varying power estimate at a central frequency of 20Hz (wavelet factor 7). We repeated the convolutional GLM analysis on this time-varying power estimate.

      We first examined average beta desynchonisation at a central cluster of electrodes (CPz, CP1, CP2, C1, Cz, C2) in the run-up to correct button presses during response periods. We found a reliable beta desynchonisation occurred, and, just as in the time-domain signal, this reached a greater threshold in the RARE trials than in the FREQUENT trials:

      Author response image 3.

      Beta desynchronisation prior to a correct response is greater over central electrodes in the RARE condition than in the FREQUENT condition.

      We agree with the reviewer that this is likely indicative of a change in decision threshold between rare and frequent trials. We also note that our new computational modelling of the O-U process suggests that this in fact reconciles well with the behavioural findings (changes in integration kernels). We now mention this at the relevant point in the results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      We did also investigate the lateralised response (left minus right beta-desynchronisation, contrasted on left minus right responses). We found, however, that we were simply unable to detect a reliable lateralised signal in either condition using these lateralised responses. We suspect that this is because we have far fewer response periods than conventional trialbased EEG experiments of decision making, and so we did not have sufficient SNR to reliably detect this signal. This is consistent with standard findings in the literature, which report that the magnitude of the lateralised signal is far smaller than the magnitude of the overall beta desynchronisation (e.g. (Doyle et al., 2005))

      P11, the "absolute sensory evidence" regressor elicited a triphasic potential over centroparietal electrodes. The first two phases of this component look to have an occipital focus. The third phase has a more centroparietal focus but appears markedly more posterior than the change in evidence component. This raises the question of whether it is safe to assume that they reflect the same process.

      We agree. We have now referred to this as a ‘triphasic component over occipito-parietal cortex’ rather than centroparietal electrodes.

      Reviewer #2:

      Overall, the authors use a clever experimental design and approach to tackle an important set of questions in the field of decision-making. The manuscript is easy to follow with clear writing. The analyses are well thought-out and generally appropriate for the questions at hand. From these analyses, the authors have a number of intriguing results. So, there is considerable potential and merit in this work. That said, I have a number of important questions and concerns that largely revolve around putting all the pieces together. I describe these below.

      Thanks to the reviewer for their positive appraisal of the manuscript; we are obviously pleased that they found our work to have considerable potential and merit. We seek to address the main comments from their public review and recommendations below.

      1) It is unclear to what extent the decision threshold is changing between subjects and conditions, how that might affect the empirical integration kernel, and how well these two factors can together explain the overall changes in behavior.

      I would expect that less decay in RARE would have led to more false alarms, higher detection rates, and faster RTs unless the decision threshold also increased (or there was some other additional change to the decision process). The CPP for motor preparatory activity reported in Fig. 5 is also potentially consistent with a change in the decision threshold between RARE and FREQUENT. If the decision threshold is changing, how would that affect the empirical integration kernel? These are important questions on their own and also for interpreting the EEG changes.

      This important comment, alongside the comments of reviewer 1 above, made us carefully consider the effects of changes in decision threshold on the evidence integration kernel via simulation. As discussed above (in response to ‘essential revisions for the authors’), we now include an entirely new section on how changes in decision threshold and leak may affect the evidence integration kernel, and be used to optimise performance across the different sensory environments. In particular, we agree with the reviewer that the motor preparatory activity that differs between RARE and FREQUENT is consistent with a change in decision threshold, and our simulations have suggested that our behavioural findings on evidence integration are also consistent with this change as well. These are detailed on pp.1-4 of the rebuttal, above.

      2) The authors find an interesting difference in the CPP for the FREQUENT vs RARE conditions where they also show differences in the decay time constant from the empirical integration kernel. As mentioned above, I'm wondering what else may be different between these conditions. Do the authors have any leverage in addressing whether the decision threshold differs? What about other factors that could be important for explaining the CPP difference between conditions? Big picture, the change in CPP becomes increasingly interesting the more tightly it can be tied to a particular change in the decision process.

      We fully agree with the spirit of this comment, and we’ve tried much more carefully to consider what the influences of decision threshold and leak would be on our behavioural analyses. As discussed in the response to reviewer 1, we think that the negative-going potential at the time of responses (which is greater in RARE vs. FREQUENT, main figure 7b, and mirrored by equivalent changes in beta desynchronisation, see Reviewer Response Figure 5 above) are both reflective of a change in decision threshold between RARE and FREQUENT conditions. We have tried to make this link explicit in the revised results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      I'll note that I'm also somewhat skeptical of the statements by the authors that large shifts in evidence are less frequent in the RARE compared to FREQUENT conditions (despite the names) - a central part of their interpretation of the associated CPP change. The FREQUENT condition obviously has more frequent deviations from the baseline, but this is countered to some extent by the experimental design that has reduced the standard deviation of the coherence for these response periods. I think a calculation of overall across-time standard deviation of motion coherence between the RARE and FREQUENT conditions is needed to support these statements, and I couldn't find that calculation reported. The authors could easily do this, so I encourage them to check and report it.

      See Author response image 2.

      3) The wide range of decay time constants between subjects and the correlation of this with another component of the CPP is also interesting. However, in trying to interpret this change in CPP, I'm wondering what else might be changing in the inter-subject behavior. For instance, it looks like there could be up to 4 fold changes in false alarm rates. Are there other changes as well? Do these correlate with the CPP? Similar to my point above, the changes in CPP across subjects become increasingly interesting the more tightly it can be tied to a particular difference in subject behavior. So, I would encourage the authors to examine this in more depth.

      Thanks for the interesting suggestion. We explored whether there might be any interindividual correlation in this measure with the false alarm rate across participants, but found that there was no such correlation. (See Author response image 4; plotting conventions are as in main figure 9).

      Author response image 4.

      No evidence of between-subject correlations in CPP responses and false alarm rates, in any of the four conditions.

      We hope instead that the extended discussion of how the integration kernel should be interpreted (in light of computational modelling) provides at least some increased interpretability of the between-subject effects that we report in figure 9.

      Reviewer #3 (Public Review):

      The main strength is in the task design which is novel and provides an interesting approach to studying continuous evidence accumulation. Because of the continuous nature of the task, the authors design new ways to look at behavioral and neural traces of evidence. The reverse-correlation method looking at the average of past coherence signals enables us to characterize the changes in signal leading to a decision bound and its neural correlate. By varying the frequency and length of the so-called response period, that the participants have to identify, the method potentially offers rich opportunities to the wider community to look at various aspects of decision-making under sensory uncertainty.

      We are pleased that the reviewer agrees with our general approach as a novel way of characterising various aspects of decision-making under uncertainty.

      The main weaknesses that I see lie within the description and rigor of the method. The authors refer multiple times to the time constant of the exponential fit to the signal before the decision but do not provide a rigorous method for its calculation and neither a description of the goodness of the fit. The variable names seem to change throughout the text which makes the argumentation confusing to the reader. The figure captions are incomplete and lack clarity.

      We apologise that some of our original submission was difficult to follow in places, and we are very grateful to the reviewer for their thorough suggestions for how this could be improved. We address these in turn below, and we hope that this answers their questions, and has also led to a significant improvement in the description and rigour of the methodology.

    1. Author Response

      Reviewer #3 (Public Review):

      Dysbiosis has a substantial impact on host physiology. Using the nematode C. elegans and E.coli as a model of host-microbe interactions, Yang et al. defined a mechanism by which the host deals with gut dysbiosis to maintain fitness. They found that accumulation of E. coli in the intestine secreted indole, a tryptophan metabolite, and activated the transcription factor DAF-16. DAF-16 induced the expression of lys-7 and lys-8, which in turn limited E. coli proliferation in the gut of worms and maintained the longevity of worms. Finally, these authors demonstrated that indole-activated DAF-16 via TRPA-1 in neurons of worms.

      This study revealed a new mechanism of host-microbe interaction. The concept of their work is of broad interest and the results they present are convincing. However, there are some issues that need to be addressed to support the conclusions.

      Major issues

      1) The authors isolated the crude extract from a high-performance liquid chromatograph (HPLC). A candidate compound was detected by activity-guided isolation and further identified as indole with mass spectrometry and NMR data. The HPLC fractionations and activity-guided isolation experiments should be described in more detail with a schematic figure to reveal how these experiments were performed and how indole was identified. Showing a chemical characterization of indole in Figure 2A is not sufficient for the evaluation of the results. Rather, a figure comparing the fraction 26th with standard indole by MS and NMR is more appealing.

      We appreciate the concerns of the reviewer. Activity-guided isolation was performed as follows: The crude extract of E. coli supernatant metabolites was divided into 45 fractions according to polarity using Ultimate 3000 HPLC (Thermofisher, Waltham, MA) coupled with automated fraction collector. After freeze-drying each fraction, 1 mg of metabolites were dissolved in DMSO for DAF-16 nuclear localization assay in worms (Please see new Supplementary Table S2). The 26th fraction with DAF-16 nuclear translocation-inducing activity was then separated on silica gel column (200-300 mesh) with a continuous gradient of decreasing polarity (100%, 70%, 50%, 30%, petroleum ether/acetone) to yield four fractions (26a-d). Only the fraction of 26b could induce DAF-16 nuclear translocation. Then the fraction was further separated using a Sephadex LH-20 column to yield 32 fractions. The 26b-11th fraction with DAF-16 nuclear translocation-inducing activity contained a single compound identified by thin layer chromatography, mass spectrometry and nuclear magnetic resonance (NMR). The compound exhibited a quasimolecular ion peak at m/z 181.0782 [M+H]+ in the positive APCI-MS, and was assigned to a molecular formula of C8H7N. A comparison of these 1H NMR and 13C NMR spectra with the data reported in the literature revealed that the compound was indole (Yagudaev, 1986). The figure shows the comparison of the 26b-11 fraction with the standard indole by MS (Author response image 1).

      Author response image 1.

      High resolution mass spectrum of the candidate compound and indole.

      2) DAF-16::GFP was mainly located in the cytoplasm of the intestine in worms expressing daf-16p::daf-16::gfp fed live E. coli OP50 on Day 1 (Figure 1A and 1B). The nuclear translocation of DAF-16 in the intestine was increased in worms fed live E. coli OP50 on Days 4 and 7, but not in age-matched WT worms fed heat-killed (HK) E. coli OP50 (Figure 1A and 1B). Since DAF-16 functions downstream of DAF-2, have the levels of DAF-2 been tested during aging on OP50 and (HK) OP50, or with and without indole supplementation?

      In response to the reviewer’s suggestion, we carried out the RT-PCR experiment in 4-day-old and 7-day-old worms. It has been shown that DAF-2 initiates a kinase cascade that leads to the phosphorylation and cytoplasmic retention of DAF-16. By contrast, a reduction in the DAF-2 signaling leads to the dephosphorylation of DAF-16, allowing its nuclear translocation. In response to the reviewer’s suggestion, we tested the expression of daf-2 in 4-day-old and 7-day-old worms fed with OP50 and (HK) OP50. We found that the mRNA levels of daf-2 were significantly increased in worms on days 4 and 7 in the presence of either live or dead E. coli OP50, compared with those in worms on day 1 (Author response image 2A). In addition, supplementation with indole did not alter the mRNA levels of daf-2 in young adult worms (Author response image 2B). To conclude, the activation of DAF-16 is independent of DAF-2.

      Author response image 2.

      DAF-16 nuclear translocationisindependent of DAF-2.(A) The mRNA levelsof daf-2weregradually increasedin worms with age.P< 0.01;*P< 0.001; ns, not significant. (B)The mRNA levelsof daf-2were not alteredaftertreatment withindole for 24 hours.ns, not significant.

      3) In lines 155-157, the author argued that the increase in the levels of indole in worms results from the intestinal accumulation of live E. coli OP50, rather than exogenous indole produced by E. coli OP50 on the NGM plates. However, the work also showed that supplementation with indole (50-200 μM) could significantly increase the indole levels in young adult worms on Day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). This result suggested that worms could take in indole from outside culturing environment. The concentration of indole in OP50 and (HK) OP50 could be measured.

      We appreciate the concerns of the reviewer. Reviewer #2 also pointed out this problem. In this study, our data showed that the levels of indole were 30.9, 71.9, and 105.9 nmol/g dry weight in worms fed live E. coli OP50 on days 1, 4, and 7, respectively (Figure 2C). This increase in the levels of indole in worms was accompanied by an increase in CFU of live E. coli OP50 in the intestine of worms with age (Figure 2C). In addition, we determined the levels of indole in worms fed HK E. coli OP50, and found that the levels of indole were 28.2, 31.6, and 36.1 nmol/g dry weight in worms fed HK E. coli OP50 on days 1, 4, and 7, respectively (Figure 2-figure supplement 3A). It should be noted that the levels of indole in worms fed dead E. coli OP50 on day 1 were comparable of those in worms fed live E. coli OP50 on day 1 (30.9 vs 28.2 nmol/g dry weight). However, the levels of indole were not increased in worms fed HK E. coli OP50 on days 4 and 7. Furthermore, the observation that DAF-16 was retained in the cytoplasm of the intestine in worms fed live E. coli OP50 on day 1 (Figure 1A and 1B) also indicated that indole produced by E. coli OP50 on the NGM plates is not enough to induce DAF-16 nuclear translocation. By contrast, supplementation with indole (50-200 μM) significantly increased the indole levels in worms on day 1 (Figure 2-figure supplement 3B), which could induce nuclear translocation of DAF-16 in worms (Figure 2B). Thus, the increase in the levels of indole in worms with age results from intestinal accumulation of live E. coli OP50, rather than indole produced by E. coli OP50 on the NGM plates.

      4) Recent work showed that the multicopy DAF-16 transgene acts differently from the single copy GFP knock in DAF-16 transgene. Which DAF-16 transgene was used in this work?

      The strain we used is TJ356. Its genotype has been described as zIs356 [daf-16p::daf-16a/b::GFP+rol-6(su1006)] (Lee, Hench, & Ruvkun, 2001; Lin, Hsin, Libina, & Kenyon, 2001), from the Caenorhabditis Genetics Center (CGC).

      5) In lines 190-193, the author argued that the supplementation with indole (100 M) inhibited the CFU of E. coli K-12 in WT worms, but not daf-16(mu86) mutants, on Days 4 and 7 (Figure 3H and 3I). These results suggest that endogenous indole is involved in maintaining a normal lifespan in worms. This is overstating. The data here more likely suggest that indole could inhibit the proliferation of E. coli through DAF-16.

      We really appreciate this reviewer’s preciseness. In response to the reviewer’s suggestion, we had changed "...indole is involved in maintaining a normal lifespan in worms" to "...indole produced by bacteria in the gut could inhibit the proliferation of E. coli via DAF-16 in worms".

      6) Sonowal (2017) reported that AHR mediates indole-promoted lifespan extension at 16 C. Yet this work argued that RNAi knockdown of ahr-1 did not affect the nuclear translocation of DAF-16 in worms fed E. coli K12 strain on Day 7 (Figure 4-figure supplement 1A) or young adult worms treated with indole (100 M) for 24 h. The difference between these two works should be discussed.

      We really appreciate this reviewer’s preciseness. It has been shown that AHR-1 mediates indole-promoted lifespan extension in worms at 16 C (Sonowal et al., 2017). However, our data show that AHR-1 is not involved in activation of DAF-16 by indole-induced nuclear translocation of DAF-16 at 20 C. This means that AHR-1 and TRPA-1-lifespan extension by indole are essentially different. In our study, indole is added to NGM plates when worms reached the young adult stage. In the study by Sonowal et al., indole is supplemented at the stage of L1 larva. In addition, lifespan of C. elegans varies at different temperatures (Xiao et al., 2013). Thus, indole may promote lifespan extension via different mechanisms, which is dependent on exposure time and temperature.

      7) Sonowal (2017) conducted mRNA profiling for worms growing on K12 and K12△tnaA. Is TRPA1 in their de-regulated gene list? Have other de-regulated genes been tested in this work?

      We appreciate the concerns of the reviewer. We found that TRPA-1 is not included in the de-regulated gene list. Sonowal et al. focus on the gene expression profiles in worms from L1 larvae to young adults, whereas we pay attention to gene expression profiles in worms from young adults to aged worms. Thus, we did not test the de-regulated genes in their work.

      8) How does indole activate TRPA1? In the absence of trpa1, what is the concentration of indole in worms? Since TRPA1 is a channel, is there any possibility that TRPA1 is involved in the transport of indole? It is really interesting and surprising that neuronal TRPA-1, but not intestinal TRPA-1, mediates the beneficial effect of indole. How does indole specifically activate TRPA-1 in neurons to preserve the longevity of worms?

      We appreciate the concerns of the reviewer. TRPA1 is a nonselective cation channel permeable to Ca2+, Na+, and K+ (Zygmunt & Hogestatt, 2014). It is unlikely that TRPA1 is capable of transporting heterocyclic organic compounds, such as indole.

      In response to the reviewer’s suggestion, we detected the content of indole in trpa-1(ok999) worms. We found that the levels of indole in trpa-1(ok999) worms were slightly increased in worms on days 4 and 7, compared to those in WT worms on days 4 and 7 (Author response image 3).

      Recently, Ye et al. have demonstrated that indole and indole-3-carboxaldehyde (IAld) are agonists of TRPA1, which is conserved in vertebrates (Ye et al., 2021). Thus, it is mostly likely that indole acts as an agonist of TRPA-1 in C. elegans by directly binding to TRPA-1. One possibility is that activation of TRPA-1 in neurons by indole could induce a pathway that release a neurotransmitter, which in turn triggers a signaling pathway to extend lifespan of worms via activating DAF-16 in a non-cell autonomous manner. In contrast, the activation of TRPA-1 in the intestine by indole is unable to release such a neurotransmitter. Indeed, TRPA1 induces the releasing of calcitonin gene-related peptide in perivascular sensory nerves, leading to membrane hyperpolarization and arterial dilation on smooth muscle cells (Talavera et al., 2020). Moreover, the activation of TRPA1 by indole and IAld induces the secretion of the neurotransmitter serotonin in zebrafish (Ye et al., 2021).

      Author response image 3.

      The indole levels in trpa-1 mutants are increased on days 4 and 7, compared with those in WT worms. *P < 0.05.

      9) How neuronal- and intestinal-specific knockdown of trpa-1 by RNAi was conducted? And what is the tissue-specific expression pattern of trap-1? Speculating how indole was transported to neuron cells is pretty appealing.

      We appreciate the concerns of the reviewer. SID-1 is required cell-autonomously for systemic RNAi (Winston, Molodowitch, & Hunter, 2002). Thus, the sid-1 mutants are resistant to RNAi in the neuronal- and intestinal-specific RNAi strains, sid-1 was expressed under control of the neuronal-specific unc-119 and the intestinal-specific vha-6 promoters, respectively. Although it has been reported that TRPA-1 is expressed in neurons, muscles, hypodermal cells, and the intestine, Xiao et al. proved that only TRPA-1 expressed in the intestine and neurons contributes to life extension at low temperature (Xiao et al., 2013). The transporter of indole has not been identified. In Arabidopsis, ATP-binding cassette (ABC) transporter G family 37(ABCG37) has been reported to transport a range of indole derivatives (Ruzicka et al., 2010). However, all fifteen C. elegans ABC transporters share less than 30% sequence identity with ABCG37. Thus, it is impossible to determine which one is the transport channel for indole and indole derivatives in C. elegans.

      10) Supplementation with indole only up-regulated the expression of lys-7 and lys-8 in worms subjected to intestinal-specific (Figure 7-figure supplement 2C), but not neuronal-specific, RNAi of trpa-1 (Figure 7-figure supplement 2D). If this is the case, should the addition of indole specifically induce the expression of lys-7p::gfp or lys-8p::gfp in neurons?

      We really appreciate this reviewer’s preciseness. Indeed, lys-7 and lys-8 are expressed in both neurons and the intestine (Author response image 4A and 7B). However, the expression of lys-8p::gfp and lys-7p::gfp in neurons was not altered in worms after treatment with indole or knockdown of trpa-1 by RNAi (Author response image 4C and 4D).

      Author response image 4.

      The expression of LYS-7 and LYS-8 in neurons is not altered after treatment with indole or knockdown of trpa-1 by RNAi. (A and C) Representative images of lys-7p::gfp (A) and lys-8p::gfp (C). Both lys-7 and lys-8 could be expressed in neurons and the intestine. (B and D) Quantification of fluorescent intensity of lys-7p::gfp (B) and lys-8p::gfp (D) in neurons. These results are means ± SD of three independent experiments. ns, not significant.

      11) The authors demonstrated that K-12△tnaA strain had undetectable tnaA mRNA or indole levels. Furthermore, the deletion of tnaA significantly inhibited the nuclear translocation of DAF-16 in worms. However, mutations in E. coli still have non-specific effects as there are several transposon insertions or polar mutations influencing downstream genes. The authors should demonstrate that only disruption of TnaA causes the failure of nuclear translocation of DAF-16.

      In response to the reviewer’s suggestion, we rescued the expression of tnaA in the K-12 △tnaA strain. As expected, the indole level of from the supernatant in the K12 △tnaA::tnaA strain cultures was 34.1 μmol/L, which was comparable of that in the K12 strain cultures (42.5 μmol/L)(new Figure 2-figure supplement 4D). In addition, DAF-16 nuclear accumulation was increased in worms grown in the K12 △tnaA::tnaA strain on days 4 and 7 (new Figure 2-figure supplement 4E).

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Akter et al demonstrates that astrocyte-derived L-lactate plays a key role in schema memory formation and promotes mitochondrial biogenesis in the Anterior Cingulate Cortex (ACC).

      The main tool used by the authors is the DREADD technology that allows to pharmacologically activate receptors in a cell-specific manner. In the study, the authors used the DREADD technique to activate appropriately transfected astrocytes, a subtype of muscarinic receptor that is not normally present in cells. This receptor being coupled to a Gi-mediated signal transduction pathway inhibiting cAMP formation, the authors could demonstrate cell-(astrocyte) specific decreases in cAMP levels that result in decreased L-lactate production by astrocytes.

      Behaviorally this pharmacological manipulation results in impairments of schema memory formation and retrieval in the ACC in flavor-place paired associate paradigms. Such impairments are prevented by co-administration of L-lactate.

      The authors also show that activation of Gi signaling resulting in L-lactate decreased release by astrocytes impairs mitochondrial biogenesis in neurons in an L-lactate reversible manner.

      By using MCT 2 inhibitors and an NMDAR antagonist the authors conclude that the molecular mechanisms underlying the observed effects are mediated by L-lactate entering neurons through MCT2 transporters and involve NMDAR.

      Overall, the article's conclusions are warranted by the experimental evidence, but some weak points could be addressed which would make the conclusions even stronger.

      The number of animals in some of the experiments is on the low side (4 to 6).

      In the revised manuscript, we have increased the animal numbers in two key experimental groups (hM4Di-CNO and Control groups) of behavioral experiments. Now the animal numbers in different groups are as follows:

      • 15 rats in hM4Di-CNO group

      o Further divided into two subgroups for probe tests (PT1-4) conducted during flavor-place paired associate training; 8 rats in the hM4Di-CNO (saline) and 7 rats in the hM4Di-CNO (CNO) subgroups receiving I.P. saline or I.P. CNO, respectively, before these PTs.

      • 8 rats in the Control group

      • 7 rats in the Rescue group (hM4Di-CNO+L-lactate)

      • 4 rats in the Control-CNO group. Animal number in this group was not increased as it was apparent from these 4 rats that CNO alone was not impairing the PA learning and memory retrieval in these rats (AAV8-GFAP-mCherry injected). Their result was very similar to the control group. Additionally, in a previous study (Liu et al., 2022), we showed that CNO administration in the rats injected with AAV8-GFAP-mCherry into the hippocampus does not show any impairments in schema.

      Also, in the newly added open field test experiments to investigate the locomotor activity as suggested by the Reviewer #2, 8 rats were used in each group.

      The use of CIN to inhibit MCT2 is not optimal. Authors may want to decrease MCT2 expression by using antisense oligonucleotides.

      In the revised manuscript, we have conducted the experiment using MCT2 antisense oligodeoxynucleotide (ODN) as suggested.

      To test whether the L-lactate-induced neuronal mitochondrial biogenesis is dependent on MCT2, we bilaterally injected MCT2 antisense oligodeoxynucleotide (MCT2-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) or scrambled ODN (SC-ODN, n=8 rats, 2 nmol in 1 μl PBS per ACC) into the ACC. After 11 hours, bilateral infusion of L-lactate (10 nmol, 1 μl) or ACSF (1 μl) was given into the ACC and the rats were kept in the PA event arena. After 60 mins (12 hours from MCT2-ODN or SC-ODN administration), the rats were sacrificed. As shown in Author response image 1B, SC-ODN+L-lactate group showed significantly increased relative mtDNA copy number compared to the SC-ODN+ACSF group (p<0.001, ANOVA followed by Tukey's multiple comparisons test). However, this effect was completely abolished in MCT2-ODN+L-lactate group, suggesting that MCT2 is required for the L-lactate-induced mitochondrial biogenesis in the ACC.

      We have integrated this new data and results in the revised manuscript.

      Author response image 1.

      Mitochondrial biogenesis by L-lactate is dependent on MCT2 and NMDAR. A. Experimental design to investigate whether MCT2 and NMDAR activity are required for L-lactate-induced mitochondrial biogenesis. B and C. mtDNA copy number abundance in the ACC of different rat groups relative to nDNA. Data shown as mean ± SD (n=4 rats in each group). ***p<0.001, ANOVA followed by Tukey's multiple comparisons test.

      The experiment using AVP to block NMDAR only partially supports the conclusions. Indeed, blocking NMDAR will knock down any response that involves these receptors, whether L-lactate is necessary or not.

      In the current study we found that Astrocytic Gi activation in the ACC reduced L-lactate level in the ECF of ACC which was also associated with decreased PGC-1α/SIRT3/ATPB/mtDNA abundance suggesting downregulation of mitochondrial biogenesis pathway. We also found that exogenous administration of L-lactate into the ACC of astrocytic Gi-activated rats rescued this downregulation. In line with this, in a recently published study (Akter et al., 2023), we found upregulation of mitochondrial biogenesis pathway in the hippocampus neurons of exogenous L-lactate-treated anesthetized rats. Another recent study has demonstrated that exercise-induced L-lactate release from skeletal muscle or I.P. injection of L-lactate can induce hippocampal PGC-1α (which is a master regulator of mitochondrial biogenesis) expression and mitochondrial biogenesis in mice (Park et al., 2021). Together, these results provide compelling evidence that L-lactate promotes mitochondrial biogenesis.

      L-lactate is known to promote expression of synaptic plasticity genes like Arc, c-Fos, and Zif268 in neurons (Yang et al., 2014). After entry into the neuronal cytoplasm, mainly through MCT2, it is converted into pyruvate by lactate dehydrogenase 1 (LDH1). This conversion also produces NADH, affecting the redox state of the neuron. NADH positively modulates the activity of NMDAR resulting in enhanced Ca2+ currents, the activation of intracellular signaling cascades, and the induction of the expression of plasticity-associated genes (Yang et al., 2014; Magistretti & Allaman, 2018). The study demonstrated that L-lactate–induced plasticity gene expression was abolished in the presence of NMDAR antagonists including D-APV (Yang et al., 2014). These results suggested that the MCT2 and NMDAR are key players in the regulation of L-lactate induced plasticity gene expression.

      In the current study, we investigated whether similar mechanisms might be involved in L-lactate-induced neuronal mitochondrial biogenesis. We now used MCT2 antisense oligodeoxynucleotide to decrease the expression of MCT2 (as mentioned in the previous response and Author response image 1B) and showed that MCT2 is necessary for L-lactate-induced mitochondrial biogenesis to manifest, indicating that L-lactate’s entry into the neuron is required. As mentioned before, after entry into neuron, L-lactate is converted into pyruvate by LDH, which also produce NADH, which in turn potentiates NMDAR activity. Therefore, we investigated whether NMDAR activity is required for L-lactate-induced mitochondrial biogenesis. We used D-APV to inhibit NMDAR (Author response image 1C) and found that L-lactate does not increase mtDNA copy number abundance if D-APV is given, suggesting that NMDAR activity is required for L-lactate to promote mitochondrial biogenesis.

      NMDAR serves diverse functions. Therefore, as mentioned by the reviewer, blocking NMDAR may knock down many such functions. While our current data only suggests the involvement of MCT2 and NMDAR in the upregulation of mitochondrial biogenesis by L-lactate, we have not investigated other mechanisms and pathways modulating mitochondrial biogenesis that are either dependent or independent of MCT2 and NMDAR activity. Further studies are needed in future to dissect and better understand this interesting observation. We have now clarified this in the discussion section of the manuscript.

      Is inhibition of glycogenolysis involved in the observed effects mediated by Gi signaling? Indeed, L-lactate is formed both by glycolysis and glycogenolysis. The authors could test whether the glycogen metabolism-inhibiting drug DAB would mimic the effects of Gi activation.

      In this study we have shown that astrocytic Gi activation in the ACC leads to a decrease in the cAMP and L-lactate. L-lactate is produced by glycogenolysis and glycolysis. cAMP in astrocytes acts as a trigger for L-lactate production (Choi et al., 2012; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021; Zhou et al., 2021) by promoting glycogenolysis and glycolysis (Vardjan et al., 2018; Horvat, Muhič, et al., 2021; Horvat, Zorec, et al., 2021). Therefore, one promising explanation of reduced L-lactate level observed in our study is the reduction of L-lactate production in the astrocyte due to decreased glycogen metabolism as a result of decreased cAMP. We have now mentioned this in the discussion.

      DAB is an inhibitor of glycogen phosphorylase that suppresses L-lactate production. It was shown to impair memory by decreasing L-lactate (Newman et al., 2011; Suzuki et al., 2011; Iqbal et al., 2023). As we found that the impairment in the schema memory and mitochondrial biogenesis was associated with decreased L-lactate level in the ACC and that the exogenous L-lactate administration can rescue the impairments, it is likely that DAB will mimic the effect of Gi activation in terms of schema memory and mitochondrial biogenesis. However, further study is needed to confirm this.  

      Reviewer #2 (Public Review):

      The manuscript of Akter et al is an important study that investigates the role of astrocytic Gi signaling in the anterior cingulate cortex in the modulation of extracellular L-lactate level and consequently impairment in flavor-place associates (PA) learning. However, whereas some of the behavioral observations and signaling mechanism data are compelling, the conclusions about the effect on memory are inadequate as they rely on an experimental design that does not allow to differentiate acute or learning effect from the effect outlasting pharmacological treatments, i.e. effect on memory retention. With the addition of a few experiments, this paper would be of interest to the larger group of researchers interested in neuron-glia interactions during complex behavior.

      • Largely, I agree with the authors' conclusion that activating Gi signaling in astrocytes impairs PA learning, however, the effect on memory retrieval is not that obvious. All behavioral and molecular signaling effects described in this study are obtained with the continuous presence of CNO, therefore it is not possible to exclude the acute effect of Gi pathway activation in astrocytes. What will happen with memory on retrieval test when CNO is omitted selectively during early, middle, or late session blocks of PA learning?

      We have now added 8 more rats to the hM4Di-CNO group (i.e., the group with astrocytic Gi activation) to clarify the memory retrieval. These rats underwent flavor-place paired associate (PA) training similar to the previously described rats (n=7) of this group, that is they received CNO 30 minutes before and 30 minutes after the PA training sessions (S1-2, S4-8, S10-17). However, contrasting to the previous rats of this group which received CNO before PTs (PT1, PT2, PT3), we omitted the CNO (instead administered I.P. saline) selectively on these PTs conducted at the early, middle, and late stage of PA training, as suggested by the reviewer. These newly added rats did not show memory retrieval in these PTs, suggesting that the rats were not learning the PAs from the PA training sessions. See Author response image 2C-E, where this subgroup is denoted as hM4Di-CNO (Saline).

      We then continued more PA training sessions (S21 onwards, Author response image 2B) for these rats without CNO. They gradually learned the PAs. PTs (PT5, PT6, PT7; Author response image 2G-I) were done during this continuation phase of PA training; once without CNO (i.e., with I.P. saline instead), and another one with CNO. As seen in the Author response image 2H and 2I, they retrieved the memory when PT6 and PT7 were done without CNO. However, if these PTs were done with CNO, they could not retrieve the memory. Together these results suggest that ACC astrocytic Gi activation by CNO during PT can impair memory retrieval in rats which have already learned the PAs.

      As shown in the Author response image 2B, we replaced two original PAs with two new PAs (NPA 9 and 10) at S34. This was followed by PT8 (S35). As seen in Author response image 2J, these rats retrieved the NPA memory if the PT is done without CNO. However, they could not retrieve the NPA memory if the PT was done with CNO. This result suggests that ACC astrocytic Gi activation by CNO during PT can impair NPA memory retrieval.

      In summary, these data show that astrocytic Gi activation in the ACC can impair PA memory retrieval. We have integrated this new data and results in the revised manuscript.

      Author response image 2.

      A. PI (mean ± SD) during the acquisition of the six original PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=8), hM4Di-CNO (n=15), and rescue (hM4Di-CNO+L-lactate) (n=7) groups. From S6 onwards, hM4Di-CNO group consistently showed lower PI compared to control. However, concurrent L-lactate administration into the ACC (rescue group) can rescue this impairment. B. PI (mean ± SD) of hM4Di-CNO group (n=8) from S21 onwards showing gradual increase in PI when CNO was withdrawn. C, D, and E. Non-rewarded PTs (PT1, PT2, and PT3 conducted on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control, hM4Di-CNO, and rescue groups. The percentage of digging time at the cued location relative to that at the non-cued locations are shown (mean ± SD). In both PT2 and PT3, the control group spent significantly more time digging the cued sand well above the chance level, indicating that the rats learned OPAs and could retrieve it. Contrasting to this, hM4Di-CNO group did not spend more time digging the cued sand well above the chance level irrespective of CNO administration before the PTs. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO, indicating that this group learned OPAs and could retrieve it. p < 0.05, p < 0.01, p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. F. Non-rewarded PT4 (S20) which was conducted after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control, hM4Di-CNO, and rescue groups. Results show that the control group spent significantly more time digging the new cued sand well above the chance level indicating that the rats learned the NPAs from S19 and could retrieve it in this PT. Contrasting to this, hM4Di-CNO group did not spend more time digging the new-cued sand well above the chance level irrespective of CNO administration before the PT. The rescue group showed results similar to the hM4Di-CNO group if CNO is given without L-lactate. On the other hand, they showed results similar to the control group if L-lactate is concurrently given with CNO indicating that this group learned NPAs from S19 and could retrieve it. p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%. G, H, and I. Non-rewarded PTs (PT5, PT6, and PT7 conducted on S23, S27, and S33, respectively) to test memory retrieval of OPAs for the hM4Di-CNO group. In both PT6 and PT7, the rats spent significantly more time digging the cued sand well above the chance level if the tests are done without CNO, indicating that the rats learned the OPAs and could retrieve it. However, CNO prevented memory retrieval during these PTs. p < 0.001, one-sample t-test comparing the proportion of digging time at the cued sand well with the chance level of 16.67%. J. Non-rewarded PT4 (S35) which was conducted after replacing two OPAs with two NPAs (NPA 9 & 10) in S34 for the hM4Di-CNO group. Results show that the rats spent significantly more time digging the new cued sand well above the chance level if CNO was not given before the PT, indicating that the rats learned the NPAs from S34 and could retrieve it in this PT. However, if CNO is given before the PT, the retrieval is impaired. *p < 0.001, one-sample t-test comparing the proportion of digging time at the new cued sand well with the chance level of 16.67%.

      • I found it truly exciting that the administration of exogenous L-lactate is capable to rescue CNO-induced PA learning impairment, when co-applied. Would it be possible that this treatment has a sensitivity to a particular stage of learning (acquisition, consolidation, or memory retrieval) when L-lactate administration would be the most efficacious?

      The hM4Di-CNO group, when continued with PA training without CNO (S21-S32) (Author response image 2B), was able to learn the six original PAs (OPAs). In the PT7 done at S33 (Author response image 2I), this group of rats was able to retrieve the memory if the test was done without CNO but could not retrieve the memory if CNO was given. Similarly, the Rescue group (hM4Di-CNO+L-lactate) (Author response image 2A), which received both CNO and L-lactate during PA training sessions (S1-S17), they were able to learn the OPAs. And at PT3 done at S18 (Author response image 2E), these rats were able to retrieve the memory when the test was done with CNO+L-lactate but not if the test is done with only CNO. Together, these results clearly show that ACC astrocytic Gi activation with CNO impairs memory retrieval and exogenous L-lactate can rescue the impairment. Therefore, it can be concluded that the memory retrieval is sensitive to L-lactate.

      The PA learning is hippocampus-dependent. Over the course of repeated PA training, systems consolidation occurs in the ACC, after which the already learned PA memory (schema) becomes hippocampus-independent (Tse et al., 2007; Tse et al., 2011). A higher activation (indicated by expression of c-Fos) in the hippocampus relative to the ACC during the early period of schema development, and the reverse at the late stage was observed in our previous study (Liu et al., 2022). However, rapid assimilation of new PA into the ACC requires simultaneous activation/retrieval of previous schema from ACC and hippocampus dependent new PA learning (Tse et al., 2007; Tse et al., 2011). During new PA learning, increase of c-Fos neurons in both CA1 and ACC was detected (Liu et al., 2022).

      Our hM4Di-CNO group received CNO 30 mins before and after each PA training session in S1-S17 (Author response image 2A). Also, the Rescue group similarly received CNO+L-lactate before and after each PA training session in S1-S17. Therefore, while this study design allowed us to conclude that ACC astrocytic Gi activation impairs PA learning and that exogenous L-lactate can rescue the impairment, it does not allow clear differentiation of the effects of these treatments on memory acquisition and consolidation. Further studies are needed to investigate this.

      • The hypothesis that observed learning impairments could be associated with diminished mitochondrial biogenesis caused by decreased l-lactate in the result of astrocytic Gi-DREADDS stimulation is very appealing, but a few key pieces of evidence are missing. So far, the hypothesis is supported by experiments demonstrating reduced expression of several components of mitochondrial membrane ATP synthase and a decrease in relative mtDNA copy numbers in ACC of rats injected with Gi-DREADDs. L-lactate injections into ACC restored and even further increased the expression of the above-mentioned markers. Co-administration of NMDAR antagonist D-APV or MCT-2 (mostly neuronal) blocker 4-CIN with L-lactate, prevented L-lactate-induced increase in relative mtDNA copy. I am wondering how the interference with mitochondrial biogenesis is affecting neuronal physiology and if it would result in impaired PA learning or schema memory.

      The observation of diminished mitochondrial biogenesis in the astrocytic Gi-activated rats that showed impaired PA learning is exciting. However, our study does not provide experimental data on how mitochondrial biogenesis could be associated with impaired PA learning and schema memory. Results from several previous studies linked mitochondrial biogenesis and its regulators such as PGC-1α and SIRT3 to diverse neuronal and cognitive functions as described in the discussion section of the manuscript. In the revised manuscript, we have provided further discussion as follows to discuss potential mechanisms:

      “In this study, we have demonstrated that ACC astrocytic Gi activation impairs PA learning and schema formation, PA memory retrieval, and NPA learning and retrieval by decreasing L-lactate level in the ACC. Although we have shown that these impairments are associated with diminished expression of proteins of mitochondrial biogenesis, the precise mechanisms of how astrocytic Gi activation affects neuronal functions and schema memory remain to be elucidated. We previously demonstrated that neuronal inhibition in either the hippocampus or the ACC impairs PA learning and schema formation (Hasan et al., 2019). In another recent study (Liu et al., 2022), we showed that astrocytic Gi activation in the CA1 impaired PA training-associated CA1-ACC projecting neuronal activation. Yao et al. recently showed that reduction of astrocytic lactate dehydrogenase A (an enzyme that reversibly catalyze L-lactate production from pyruvate) in the dorsomedial prefrontal cortex reduces L-lactate levels and neuronal firing frequencies, promoting depressive-like behaviors in mice (Yao et al., 2023). These impairments could be rescued by L-lactate infusion. It is possible that the impairment in PA learning and schema observed in our study might have involved a similar functional consequence of reduced neuronal activity in the ACC neurons upon astrocytic Gi activation.

      Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema. Our previous study also showed that ACC myelination is necessary for PA learning and schema formation, and that repeated PA training is associated with oligodendrogenesis in the ACC (Hasan et al., 2019). Oligodendrocytes facilitate fast, synchronized, and energy efficient transfer of information by wrapping axons in myelin sheath. Furthermore, they supply axons with glycolysis products, such as L-lactate, to offer metabolic support (Fünfschilling et al., 2012; Lee et al., 2012). The association of oligodendrogenesis and myelination with schema memory may suggest an adaptive response of oligodendrocytes to enhance metabolic support and neuronal energy efficiency during PA learning. Given the impairments in PA learning observed in the ACC astrocytic Gi-activated rats in the current study, it is reasonable to conclude that the direct metabolic support to axons provided by oligodendrocytes is not sufficient to rescue the schema impairments caused by decreased L-lactate levels upon astrocytic Gi activation. On the other hand, L-lactate was shown to be important for oligodendrogenesis and myelination (Sánchez-Abarca et al., 2001; Rinholm et al., 2011; Ichihara et al., 2017). Therefore, it is tempting to speculate that a decrease in L-lactate level may also impede oligodendrogenesis and myelination, consequently preventing the enhanced axonal support provided by oligodendrocytes and myelin during schema learning. Recently, a study has demonstrated that upon demyelination, mitochondria move from the neuronal cell body to the demyelinated axon (Licht-Mayer et al., 2020). Enhancement of this axonal response of mitochondria to demyelination, by targeting mitochondrial biogenesis and mitochondrial transport from the cell body to axon, protects acutely demyelinated axons from degeneration. Given the connection between schema and increased myelination, it remains an open question whether L-lactate-induced mitochondrial biogenesis plays a beneficial role in schema through a similar mechanism. Nevertheless, our results contribute to the mounting evidence of the glial role in cognitive functions and underscores the new paradigm in which glial cells are considered as integral players in cognitive functions alongside neurons. Disruption of neurons, myelin, or astrocytes in the ACC can disrupt PA learning and schema memory.”

      Reviewer #3 (Public Review):

      Akter et al. investigated how the astroglial Gi signaling pathway in the rat anterior cingulate cortex (ACC) affects cognitive functions, in particular schema memory formation. Using a stereotactic approach they intracranially introduced AAV8 vectors carrying mCherry-tagged hM4Di DREADD (Designer Receptor Exclusively Activated by Designer Drugs) under astrocyte selective GFAP promotor (AAV8-GFAP-hM4Di-mCherry) into the AAC region of the rat brain. hM4Di DREADD is a genetically modified form of the human M4 muscarinic (hM4) receptor insensitive to endogenous acetylcholine but is activated by the inert clozapine metabolite clozapine-N-oxide (CNO), triggering the Gi signaling pathway. The authors confirmed that hM4Di DREADD is selectively expressed in astrocytes after the application of the AAV8 vector by analysing the mCherry signals and immunolabeling of astrocytes and neurons in the ACC region of the rat brain. They activated hM4Di DREADD (Gi signalling) in astrocytes by intraperitoneal administration of CNO and measured cognitive functions in animals after CNO administration. Activation of Gi signaling in astrocytes by CNO application decreased paired-associate (PA) learning, schema formation, and memory retrieval in tested animals. This was associated with a decrease in cAMP in astrocytes and L-lactate in extracellular fluid as measured by immunohistochemistry in situ and in awake rats by microdialysis, respectively. Administration of exogenous L-lactate rescued the astroglial Gi-mediated deficits in PA learning, memory retrieval, and schema formation, suggesting that activation of astroglial Gi signalling downregulates L-lactate production in astrocytes and its transport to neurons affecting memory formation. Authors also show that expression level of proteins involved in mitochondrial biogenesis, which is associated with cognitive functions, is decreased in neurons, when Gi signalling is activated in astrocytes, and rescued when exogenous L-lactate is applied, suggesting the implication of astrocyte-derived L-lactate in the maintenance of mitochondrial biogenesis in neurons. The latter depended on lactate MCT2 transporter activity and glutamate NMDA receptor activity.

      The paper is very well written and discussed. The conclusions of this paper are well supported by the data. Although this is a study that uses established and previously published methodologies, it provides new insights into L-lactate signalling in the brain, particularly in AAC, and further confirms the role of astroglial L-lactate in learning and memory formation. It also raises new questions about the molecular mechanisms underlying astrocyte-derived L-lactate-mediated mitochondrial biogenesis in neurons and its contribution to schema memory formation.

      • The authors discuss astrocytic L-lactate signalling without considering the recently discovered L-lactate-sensitive Gs and Gi protein-coupled receptors in the brain, which are present in both astrocytes and neurons. The use of nonendogenous L-lactate receptor agonists (Compound 2, 3-chloro-5-hydroxybenzoic acid) would clarify the implication of L-lactate receptor signalling in schema memory formation.

      In the revised manuscript, we have included this point in the discussion section to mention the potential role of HCAR1 in schema memory as follows:

      “Schema consolidation is associated with synaptic plasticity-related gene expression (such as Zif268, Arc) in the ACC (Tse et al., 2011). L-lactate, after entry into neurons, can be converted to pyruvate during which NADH is also produced, promoting synaptic plasticity-related gene expression by potentiating NMDA signaling in neurons (Yang et al., 2014; Margineanu et al., 2018). Furthermore, L-lactate acts as an energy substrate to fuel learning-induced de novo neuronal translation critical for long-term memory (Descalzi et al., 2019). On the other hand, mitochondria play crucial role in fueling local translation during synaptic plasticity (Rangaraju et al., 2019). Therefore, it could be hypothesized that the rescue of astrocytic Gi activation-mediated impairment of schema by exogenous L-lactate could have been mediated by facilitating synaptic plasticity-related gene expression by directly fueling the protein translation, potentiating NMDA signaling, as well as increasing mitochondrial capacity for ATP production by promoting mitochondrial biogenesis. Furthermore, the potential involvement of HCAR1, a receptor for L-lactate that may regulate neuronal activity (Bozzo et al., 2013; Tang et al., 2014; Herrera-López & Galván, 2018; Abrantes et al., 2019), cannot be excluded. Future research could explore these potential mechanisms, examining the interactions among them, and determining their relative contributions to schema.”

      • The use of control animals transduced with an "empty" AAV9 vector (AAV8-GFAP-mCherry) compared with animals transduced with AAV8-GFAP-hM4Di-mCherry throughout the study would strengthen the results of this study, since transfection itself, as well as overexpression of the mCherry protein, may affect cell function.

      We thank the reviewer for pointing this. The schema experiment includes a control group (Control-CNO group) of rats injected with AAV8-GFAP-mCherry bilaterally into the ACC. As shown in Author response image 3, after habituation and pretraining, these rats were trained for PA learning similarly to the other groups. Before 30 mins and after 30 mins of each PA training session, they received I.P. CNO. The PA learning, schema formation, memory retrieval, NPA learning and retrieval, and latency (time needed to commence digging at the correct well) were similar to the control group of rats. This result is consistent with our previous study where rats bilaterally injected with AAV8-GFAP-mCherry into CA1 of hippocampus did not show impairments in PA learning and schema formation upon CNO treatment (Liu et al., 2022).

      Author response image 3.

      A. PI (mean ± SD) during the acquisition of the original six PAs (OPAs) (S1-2, 4-8, 10-17) and new PAs (NPAs) (S19) of the control (n=6) and control-CNO (n=4) groups. B. Non-rewarded PTs (PT1, PT2, and PT3 done on S3, S9, and S18, respectively) to test memory retrieval of OPAs for the control-CNO group. C. Non-rewarded PT4 (S20) which was done after replacing two OPAs with two NPAs (NPA 7 & 8) in S19 for the control-CNO group. D. Latency (in seconds) before commencing digging at the correct well for control and control-CNO groups. Data shown as mean ± SD.

      References

      Abrantes, H. d. C., Briquet, M., Schmuziger, C., Restivo, L., Puyal, J., Rosenberg, N., Rocher, A.-B., Offermanns, S., & Chatton, J.-Y. (2019). The Lactate Receptor HCAR1 Modulates Neuronal Network Activity through the Activation of Gα and Gβγ Subunits. The Journal of Neuroscience, 39(23), 4422-4433. https://doi.org/10.1523/jneurosci.2092-18.2019

      Akter, M., Ma, H., Hasan, M., Karim, A., Zhu, X., Zhang, L., & Li, Y. (2023). Exogenous L-lactate administration in rat hippocampus increases expression of key regulators of mitochondrial biogenesis and antioxidant defense [Original Research]. Frontiers in Molecular Neuroscience, 16. https://doi.org/10.3389/fnmol.2023.1117146

      Bozzo, L., Puyal, J., & Chatton, J.-Y. (2013). Lactate Modulates the Activity of Primary Cortical Neurons through a Receptor-Mediated Pathway. PLoS One, 8(8), e71721. https://doi.org/10.1371/journal.pone.0071721

      Choi, H. B., Gordon, G. R., Zhou, N., Tai, C., Rungta, R. L., Martinez, J., Milner, T. A., Ryu, J. K., McLarnon, J. G., Tresguerres, M., Levin, L. R., Buck, J., & MacVicar, B. A. (2012). Metabolic communication between astrocytes and neurons via bicarbonate-responsive soluble adenylyl cyclase. Neuron, 75(6), 1094-1104. https://doi.org/10.1016/j.neuron.2012.08.032

      Covelo, A., Eraso-Pichot, A., Fernández-Moncada, I., Serrat, R., & Marsicano, G. (2021). CB1R-dependent regulation of astrocyte physiology and astrocyte-neuron interactions. Neuropharmacology, 195, 108678. https://doi.org/https://doi.org/10.1016/j.neuropharm.2021.108678

      Descalzi, G., Gao, V., Steinman, M. Q., Suzuki, A., & Alberini, C. M. (2019). Lactate from astrocytes fuels learning-induced mRNA translation in excitatory and inhibitory neurons. Communications Biology, 2(1), 247. https://doi.org/10.1038/s42003-019-0495-2

      Endo, F., Kasai, A., Soto, J. S., Yu, X., Qu, Z., Hashimoto, H., Gradinaru, V., Kawaguchi, R., & Khakh, B. S. (2022). Molecular basis of astrocyte diversity and morphology across the CNS in health and disease. Science, 378(6619), eadc9020. https://doi.org/10.1126/science.adc9020

      Fünfschilling, U., Supplie, L. M., Mahad, D., Boretius, S., Saab, A. S., Edgar, J., Brinkmann, B. G., Kassmann, C. M., Tzvetanova, I. D., Möbius, W., Diaz, F., Meijer, D., Suter, U., Hamprecht, B., Sereda, M. W., Moraes, C. T., Frahm, J., Goebbels, S., & Nave, K.-A. (2012). Glycolytic oligodendrocytes maintain myelin and long-term axonal integrity. Nature, 485(7399), 517-521. https://doi.org/10.1038/nature11007

      Harris, R. A., Lone, A., Lim, H., Martinez, F., Frame, A. K., Scholl, T. J., & Cumming, R. C. (2019). Aerobic Glycolysis Is Required for Spatial Memory Acquisition But Not Memory Retrieval in Mice. eNeuro, 6(1). https://doi.org/10.1523/ENEURO.0389-18.2019

      Hasan, M., Kanna, M. S., Jun, W., Ramkrishnan, A. S., Iqbal, Z., Lee, Y., & Li, Y. (2019). Schema-like learning and memory consolidation acting through myelination. FASEB J, 33(11), 11758-11775. https://doi.org/10.1096/fj.201900910R

      Herrera-López, G., & Galván, E. J. (2018). Modulation of hippocampal excitability via the hydroxycarboxylic acid receptor 1. Hippocampus, 28(8), 557-567. https://doi.org/https://doi.org/10.1002/hipo.22958

      Horvat, A., Muhič, M., Smolič, T., Begić, E., Zorec, R., Kreft, M., & Vardjan, N. (2021). Ca2+ as the prime trigger of aerobic glycolysis in astrocytes. Cell Calcium, 95, 102368. https://doi.org/https://doi.org/10.1016/j.ceca.2021.102368

      Horvat, A., Zorec, R., & Vardjan, N. (2021). Lactate as an Astroglial Signal Augmenting Aerobic Glycolysis and Lipid Metabolism [Review]. Frontiers in Physiology, 12. https://doi.org/10.3389/fphys.2021.735532

      Ichihara, Y., Doi, T., Ryu, Y., Nagao, M., Sawada, Y., & Ogata, T. (2017). Oligodendrocyte Progenitor Cells Directly Utilize Lactate for Promoting Cell Cycling and Differentiation. J Cell Physiol, 232(5), 986-995. https://doi.org/10.1002/jcp.25690

      Iqbal, Z., Liu, S., Lei, Z., Ramkrishnan, A. S., Akter, M., & Li, Y. (2023). Astrocyte L-Lactate Signaling in the ACC Regulates Visceral Pain Aversive Memory in Rats. Cells, 12(1), 26. https://www.mdpi.com/2073-4409/12/1/26

      Jourdain, P., Rothenfusser, K., Ben-Adiba, C., Allaman, I., Marquet, P., & Magistretti, P. J. (2018). Dual action of L-Lactate on the activity of NR2B-containing NMDA receptors: from potentiation to neuroprotection. Sci Rep, 8(1), 13472. https://doi.org/10.1038/s41598-018-31534-y

      Kofuji, P., & Araque, A. (2021). G-Protein-Coupled Receptors in Astrocyte-Neuron Communication. Neuroscience, 456, 71-84. https://doi.org/10.1016/j.neuroscience.2020.03.025

      Lee, Y., Morrison, B. M., Li, Y., Lengacher, S., Farah, M. H., Hoffman, P. N., Liu, Y., Tsingalia, A., Jin, L., Zhang, P. W., Pellerin, L., Magistretti, P. J., & Rothstein, J. D. (2012). Oligodendroglia metabolically support axons and contribute to neurodegeneration. Nature, 487(7408), 443-448. https://doi.org/10.1038/nature11314

      Licht-Mayer, S., Campbell, G. R., Canizares, M., Mehta, A. R., Gane, A. B., McGill, K., Ghosh, A., Fullerton, A., Menezes, N., Dean, J., Dunham, J., Al-Azki, S., Pryce, G., Zandee, S., Zhao, C., Kipp, M., Smith, K. J., Baker, D., Altmann, D., Anderton, S. M., Kap, Y. S., Laman, J. D., Hart, B. A. t., Rodriguez, M., Watzlawick, R., Schwab, J. M., Carter, R., Morton, N., Zagnoni, M., Franklin, R. J. M., Mitchell, R., Fleetwood-Walker, S., Lyons, D. A., Chandran, S., Lassmann, H., Trapp, B. D., & Mahad, D. J. (2020). Enhanced axonal response of mitochondria to demyelination offers neuroprotection: implications for multiple sclerosis. Acta Neuropathologica, 140(2), 143-167. https://doi.org/10.1007/s00401-020-02179-x

      Liu, S., Wong, H. Y., Xie, L., Iqbal, Z., Lei, Z., Fu, Z., Lam, Y. Y., Ramkrishnan, A. S., & Li, Y. (2022). Astrocytes in CA1 modulate schema establishment in the hippocampal-cortical neuron network. BMC Biol, 20(1), 250. https://doi.org/10.1186/s12915-022-01445-6

      Magistretti, P. J., & Allaman, I. (2018). Lactate in the brain: from metabolic end-product to signalling molecule. Nat Rev Neurosci, 19(4), 235-249. https://doi.org/10.1038/nrn.2018.19

      Margineanu, M. B., Mahmood, H., Fiumelli, H., & Magistretti, P. J. (2018). L-Lactate Regulates the Expression of Synaptic Plasticity and Neuroprotection Genes in Cortical Neurons: A Transcriptome Analysis. Front Mol Neurosci, 11, 375. https://doi.org/10.3389/fnmol.2018.00375

      Netzahualcoyotzi, C., & Pellerin, L. (2020). Neuronal and astroglial monocarboxylate transporters play key but distinct roles in hippocampus-dependent learning and memory formation. Progress in Neurobiology, 194, 101888. https://doi.org/https://doi.org/10.1016/j.pneurobio.2020.101888

      Newman, L. A., Korol, D. L., & Gold, P. E. (2011). Lactate produced by glycogenolysis in astrocytes regulates memory processing. PLoS One, 6(12), e28427. https://doi.org/10.1371/journal.pone.0028427

      Park, J., Kim, J., & Mikami, T. (2021). Exercise-Induced Lactate Release Mediates Mitochondrial Biogenesis in the Hippocampus of Mice via Monocarboxylate Transporters. Front Physiol, 12, 736905. https://doi.org/10.3389/fphys.2021.736905

      Peterson, S. M., Pack, T. F., & Caron, M. G. (2015). Receptor, Ligand and Transducer Contributions to Dopamine D2 Receptor Functional Selectivity. PLoS One, 10(10), e0141637. https://doi.org/10.1371/journal.pone.0141637

      Rangaraju, V., Lauterbach, M., & Schuman, E. M. (2019). Spatially Stable Mitochondrial Compartments Fuel Local Translation during Plasticity. Cell, 176(1), 73-84.e15. https://doi.org/10.1016/j.cell.2018.12.013

      Rinholm, J. E., Hamilton, N. B., Kessaris, N., Richardson, W. D., Bergersen, L. H., & Attwell, D. (2011). Regulation of oligodendrocyte development and myelination by glucose and lactate. J Neurosci, 31(2), 538-548. https://doi.org/10.1523/JNEUROSCI.3516-10.2011

      Sánchez-Abarca, L. I., Tabernero, A., & Medina, J. M. (2001). Oligodendrocytes use lactate as a source of energy and as a precursor of lipids. Glia, 36(3), 321-329. https://doi.org/10.1002/glia.1119

      Suzuki, A., Stern, S. A., Bozdagi, O., Huntley, G. W., Walker, R. H., Magistretti, P. J., & Alberini, C. M. (2011). Astrocyte-neuron lactate transport is required for long-term memory formation. Cell, 144(5), 810-823.

      Tang, F., Lane, S., Korsak, A., Paton, J. F. R., Gourine, A. V., Kasparov, S., & Teschemacher, A. G. (2014). Lactate-mediated glia-neuronal signalling in the mammalian brain. Nature Communications, 5(1), 3284. https://doi.org/10.1038/ncomms4284

      Tauffenberger, A., Fiumelli, H., Almustafa, S., & Magistretti, P. J. (2019). Lactate and pyruvate promote oxidative stress resistance through hormetic ROS signaling. Cell Death Dis, 10(9), 653. https://doi.org/10.1038/s41419-019-1877-6

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., & Morris, R. G. (2011). Schema-dependent gene activation and memory encoding in neocortex. Science, 333(6044), 891-895. https://doi.org/10.1126/science.1205274

      Vardjan, N., Chowdhury, H. H., Horvat, A., Velebit, J., Malnar, M., Muhič, M., Kreft, M., Krivec, Š. G., Bobnar, S. T., Miš, K., Pirkmajer, S., Offermanns, S., Henriksen, G., Storm-Mathisen, J., Bergersen, L. H., & Zorec, R. (2018). Enhancement of Astroglial Aerobic Glycolysis by Extracellular Lactate-Mediated Increase in cAMP [Original Research]. Frontiers in Molecular Neuroscience, 11. https://doi.org/10.3389/fnmol.2018.00148

      Vezzoli, E., Cali, C., De Roo, M., Ponzoni, L., Sogne, E., Gagnon, N., Francolini, M., Braida, D., Sala, M., Muller, D., Falqui, A., & Magistretti, P. J. (2020). Ultrastructural Evidence for a Role of Astrocytes and Glycogen-Derived Lactate in Learning-Dependent Synaptic Stabilization. Cereb Cortex, 30(4), 2114-2127. https://doi.org/10.1093/cercor/bhz226

      Wang, J., Tu, J., Cao, B., Mu, L., Yang, X., Cong, M., Ramkrishnan, A. S., Chan, R. H. M., Wang, L., & Li, Y. (2017). Astrocytic l-Lactate Signaling Facilitates Amygdala-Anterior Cingulate Cortex Synchrony and Decision Making in Rats. Cell Rep, 21(9), 2407-2418. https://doi.org/10.1016/j.celrep.2017.11.012

      Yang, J., Ruchti, E., Petit, J. M., Jourdain, P., Grenningloh, G., Allaman, I., & Magistretti, P. J. (2014). Lactate promotes plasticity gene expression by potentiating NMDA signaling in neurons. Proc Natl Acad Sci U S A, 111(33), 12228-12233. https://doi.org/10.1073/pnas.1322912111

      Yao, S., Xu, M.-D., Wang, Y., Zhao, S.-T., Wang, J., Chen, G.-F., Chen, W.-B., Liu, J., Huang, G.-B., Sun, W.-J., Zhang, Y.-Y., Hou, H.-L., Li, L., & Sun, X.-D. (2023). Astrocytic lactate dehydrogenase A regulates neuronal excitability and depressive-like behaviors through lactate homeostasis in mice. Nature Communications, 14(1), 729. https://doi.org/10.1038/s41467-023-36209-5

      Yu, X., Zhang, R., Wei, C., Gao, Y., Yu, Y., Wang, L., Jiang, J., Zhang, X., Li, J., & Chen, X. (2021). MCT2 overexpression promotes recovery of cognitive function by increasing mitochondrial biogenesis in a rat model of stroke. Anim Cells Syst (Seoul), 25(2), 93-101. https://doi.org/10.1080/19768354.2021.1915379

      Zhou, Z., Okamoto, K., Onodera, J., Hiragi, T., Andoh, M., Ikawa, M., Tanaka, K. F., Ikegaya, Y., & Koyama, R. (2021). Astrocytic cAMP modulates memory via synaptic plasticity. Proc Natl Acad Sci U S A, 118(3), e2016584118. https://doi.org/10.1073/pnas.2016584118

      Zhu, J., Hu, Z., Han, X., Wang, D., Jiang, Q., Ding, J., Xiao, M., Wang, C., Lu, M., & Hu, G. (2018). Dopamine D2 receptor restricts astrocytic NLRP3 inflammasome activation via enhancing the interaction of β-arrestin2 and NLRP3. Cell Death Differ, 25(11), 2037-2049. https://doi.org/10.1038/s41418-018-0127-2

    1. Author Response

      Reviewer #2 (Public Review):

      Zou et al. presented a comprehensive study where they generated single-cell RNA profiling of 138,982 cells from 13 samples of six patients including AK, squamous cell carcinoma in situ (SCCIS), cSCC, and their matched normal tissues, covering comprehensive clinical courses of cSCC. Using bioinformatics analysis, they identified keratinocytes, CAFs, immune cells, and their subpopulations. The authors further compared signatures within subpopulations of keratinocytes along with the clinical progression, especially basal cells, and identified many interesting genes. They also further validate some of the markers in an independent cohort using IHC, followed by some knockdown experiments using cSCC cell lines.

      The strength of this study is the unique data set they have created, providing the community with invaluable resources to study and validate their findings. However, a lot of analyses were not robust enough to support the claims and conclusions in the paper. More clarification and cross-comparison with polished data are needed to further strengthen the study and claims.

      1) Stemness markers were used. The authors used COL17A1, TP63, ITGB1, and ITGA3 to represent stemness markers. However, these were not common classic stemness markers used in cSCC. What is the source claiming these genes were stemness markers in cSCC? TP63 is a master regulator and early driver event in SCC, while COL17A1, ITGB1, and ITGA3 are all ECM genes. The authors need to use commonly well-known stem cell markers in cSCC, e.g., LGR5, to mark stem-like cells.

      Thanks for raising this good point. We may not have provided a clear description of the markers COL17A1, TP63, ITGB1, and ITGA3 in the previous texts. We would like to clarify that these genes were used as the markers of epidermal stem cells in normal skin samples rather than tumor stem cells in cSCC. To avoid any possible misunderstanding, we revised the main text accordingly and added the references [4-11].

      2) Cell proportion analysis. The authors used the mean proportions to compare different clinical groups for subpopulations of keratinocytes, e.g., Figure 2B, and Figure 5B. This is not robust, as no statistics can be derived from this. For example, from Fig 2A, it is clearly shown there is a high level of heterogeneity of cellular compositions for normal samples. One cannot say which group is higher or lower simply based on mean not variance as well.

      We replotted the proportion analysis with statistics and presented the new graphs in Figure 2-figure supplement 1 for Figure 2B and Figure 5-figure supplement 1 for Figure 5B.

      3) Basal tumour cells in SCCIS and SCC. To make the findings valid, authors need to compare these cells/populations with the keratinocyte cell populations defined by Ji et al. Cell 2020. Do basal-SCCIS-tumours cells, also in SCC samples, resemble any of the population defined in Ji et al. Ji et al. also had 10 match normal, thus the authors need to validate their findings of SCC vs normal analysis using the Ji et al. dataset.

      Thanks for this valuable suggestion. We compared basal tumor cell in our study with the cell populations defined in Ji et al. Cell 2020 data using SingleCellNet [1]. The results showed that both the basal-SCCIS-tumor cells of SCCIS and basal tumor cells of cSCC in our study closely resemble the Tumor_KC_Basal subcluster defined in Ji et al’s paper (Figure 4-figure supplement 4, C and D). Tumor_KC_Basal highly expressed CCL2, CXCL14, FTH1, MT2A, which is consistent with our findings in basal tumor cells.

      4) Copy number analysis. Authors used inferCNV to perform copy number analysis using scRNA-seq data and identified CNVs in subpopulations of keratinocytes in SCCIS and SCC. To ensure these CNVs were not artefacts, were some of the CNVs identified by inferCNV well-known copy number changes previously reported in cSCC?

      In poorly-differentiated cSCC sample, the significant gains in chromosome 7, 9 and deletion in chromosome 10 were reported in previous study, indicating the reliability of the CNV analysis results (Figure 5-figure supplement 2) [12].

      5) Pseudotime analysis lines 308-313. Not sure the pseudotime analysis added much as, as it is unclear two distinct subgroups were identified from this analysis. Suggest removing this to keep it neater

      Thank you for this suggestion. We have deleted the result of pseudotime analysis.

      6) Selection of candidate genes for validation using IHC and cell line work. For example, lines 205-206, lines 352-356 and lines 437-441, authors selected several genes associated with AK and SCC to further validate using IHC and cell line knockdown work. What are the criteria for selecting those genes for validation? It is unclear to readers how these were selected. It reads like a fishing experiment, then followed by a knockdown. Clear rationale/criteria need to be elaborated.

      The first consideration of candidate gene selection is the fold change of expression. We have provided the statistical results of DEGs in Supplementary file 1b, 1h, 1j-1m. Then we selected top changed genes and conducted an extensive literature search on these genes. We prioritized genes that, although not directly associated with cSCC development, have a close relationship with related pathways, as determined through functional enrichment analysis. These genes were arranged for further verification experiments. We have added more details in main text and methods section.

      7) TME. Compared to keratinocytes populations, the investigation of TME cells was weak. (a) can authors produce UMAP files just for T cells, DC cells, and fibroblasts separately? Figure 7B is not easy to see those subclusters. (b) similar to what was done for keratinocytes, can authors find differentially expressed clusters and genes among the different clinical groups, associated with disease progression? (c) where are the myeloid cell populations, also B cells?

      Thank you for your suggestions. (a) We have added the UMAP files for T cells, DC cells and stromal cells separately in new Figure 7A. (b) We identified DEGs in TME cells among the different groups. Several key genes showed monotonically changing trends associated with disease progression. For example, with the increase of malignancy, FOS shows down-regulation while S100A8 and S100A9 monotonically increase in all three types of TME cells (Figure 7C). (c) We identified two types of myeloid cell populations, macrophage and monocyte derived DCs (MoDC). We didn’t find other myeloid cells, such as neutrophil. For B cells, there were only 28 B cells in poorly-differentiated cSCC sample, which didn’t meet the threshold for further cell-cell communication analysis.

      8) Heat shock protein genes line 327-329. HSP signature was well-known to be induced via tissue dissociation and library prep during the scRNA experiment. How could the authors be sure these were not artefacts induced by the experiment? If authors regress their gene expression against HSP gene signatures, would this cluster still be identified?

      Thank you for this valuable suggestion. It is important to note that the Basal-SCCIS-tumor cluster was identified through CNV analysis, rather than the HSP signature. To address this concern and further validate this result, “AddModuleScore” function in Seurat package was used to regress gene expression against HSP gene signatures for retrieved basal cells. Our result showed that Basal_SCCIS tumor population still can be identified after regression, even more clearly (Author response image 1).

      Author response image 1.

      The identity of Basal-SCCIS-tumor cluster considering regression against HSP signatures.

      9) Cell-cell communication analysis. The authors claimed that that cell-to-cell interaction was significantly enhanced in poorly-differentiated cSCC, and multiple interaction pathways were significantly active. How was this kind of analysis carried out? How did the authors define significance? what statistical method was used? these were all unclear. Furthermore, it is difficult to judge the robustness of the cell-cell communication analysis. Were these findings also supported by another method, such as celltalker, and cellphoneDB?

      To determine the significance of the increased overall cell-to-cell interaction strength between two groups, we utilized CellChat to obtain the communication strength in different samples. We combined the communication strength based on cell type pairs, where missing values were set to 0. We performed a paired Wilcoxon test to determine whether the enhancement of cell-to-cell interaction between samples was significant.

      For the comparison of outgoing or incoming interaction strength of the same cell types between two groups, we first extracted the communication strength of each signal pathway contributing to outgoing or incoming strength, and then merged the strengths of signal pathways among samples, where the strength of non-shared pathways with missing value was determined to be 0. Subsequently, we performed a paired Wilcoxon test to define the significance.

      For multiple groups comparisons, the Kruskal-Wallis rank sum test was first performed. If the p-value is less than 0.1, the pairwise Wilcoxon test was used for subsequent pairwise comparisons. The comparison of individual signaling pathways between groups is similar to the above. We defined p-value < 0.1 as significance threshold. We have added the significance test method in figure legend for Figure 7 and Figure 8 as well as and detailed statistical data in new Supplementary file 1q-1u.

      As suggested, we also used the approach of CellPhoneDB based on CellChatDB database to verify our cell-cell communication results. There are 55-58% of the ligand-receptor interactions predicted by CellChat were also predicted by CellPhoneDB (Author response image 2). The enhancement of cell interaction through MHC-II, Laminin and TNF signaling pathways in poorly-differentiated cSCC sample compare to normal sample were consistent in both CellChat and CellPhoneDB (Figure 8C and Figure 8-figure supplement 1B).

      Author response image 2.

      The overlap of the predicted ligand-receptor interactions between CellChat and CellPhoneDB.

      10) Statistics and significance. In general, the detail of statistics and significance was lacking throughout the paper. Authors need to specify what statistical tests were used, and the p-values. It is difficult to judge the correctness of the test, and robustness without seeing the stats.

      We have included all statistics and significance values in the figure legend and supplemental tables, and described the statistical tests in the methods section. In this revision, we have added the necessary details of statistics and significance in the main text and figures.

      11) Overall, this manuscript needs a lot of re-writing. A lot of discussion was also included in the results, making it really difficult to read overall. The authors should simplify the results sections, remove the discussion bits, and further highlight and streamline with the key results of this paper.

      Thanks a lot for this advice. We have revised the paper thoroughly, removed discussion in results section to make the manuscript easier to read.

    1. Author Response

      Reviewer #1 (Public Review):

      Zhao et al. investigated the molecular nature of the binding site for carbohydrates within the UDP-sugars known to activate the P2Y14 receptor. In order to do so, they built a molecular model of the hP2Y14, docked the corresponding agonists, and performed MD simulation on the resulting complexes. The modeling was used to identify the key molecular interactions with a cluster of charged residues in the extracellular side of the TM region of the receptor, which they show are conserved within the P2Y receptors. The binding site of the UDP region was, not surprisingly, overlapping with the analogous ADP binding site experimentally observed for the P2Y12 receptor, and consequently, the region that recognizes the sugars could be anticipated. Nevertheless, the detailed modeling and simulation work shows the consistency of this hypothesis and provides a quantification of the particular interactions involved, pinpointing specifically the residues candidate to be involved in the recognition of sugars.

      It follows the characterization, by functional assays, of the effect of single-point mutations of these residues in the efficacy of the different UDP-sugars. Here the results show a tendency to correlate with the molecular models, however some of the data has very low statistical significance and consequently the interpretation and conclusions extracted from this data should be taken with caution. This pertains to the particular role of the identified residues in the binding of the different sugars, which in some cases should be taken as a suggestion rather than a proof, though the general conclusion of the identification of the binding region for the sugar, its conservation among P2Y receptors and the role of some specific residues in sugar recognition seems convincing and the data are conveniently presented.

      Finally, the design of ADP-sugars that activate the P2Y12 receptor, based on the transferability of the observations with the UDP-sugars for the P2Y14 receptor, is a first indication that such a recognition is possible and should happen in an analogous binding region. However, the low potencies exhibited by the ADP-sugars, in the micromolar range, are too far from the ADP agonist and the relevance of this mechanism remains to be proved. The difference between P2Y12 and P2Y14, with the last one showing much higher potencies for UDP-sugar derivatives than P2Y12 for the corresponding ADP-sugars, remains an interesting question not explored in this manuscript.

      Thanks for your valuable comments. We have revised the interpretation of the data that has relatively low statistical significance in the manuscript. The conclusions extracted from this data have also been modified as suggestions. In this work, to investigate whether sugar nucleotides can also activate human P2Y12, we tested three ADP-sugars for human P2Y12. Discovery of highly potent P2Y12 agonists requires screening of a large number of compounds. It is possible there are the other ADP-sugars, which are highly potent P2Y12 agonists. It is technically challenging to synthesize ADP-sugars. Currently, we can only obtain ADP-Glc, ADP-GlcA and ADP-Man. Once the other ADP-sugars are available for us, we will test them and try to discover highly potent agonists in the future work. The highly potent agonists will be useful chemical tools to unveil the relevance mechanism of P2Y12. To explore the nature of binding site of the P2Y12 and P2Y14, we performed more experiments of mutagenesis study and added relevant data in the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript employs multiple approaches, including molecular docking, molecular dynamic simulations, and functional experiments to uncover a distinct uridine diphosphate-sugar-binding site on P2Y14 - a key drug target for inflammation and immune responses. Overall, the manuscript is clearly written, and the experimental techniques are well-documented. However, it may benefit from further analysis, particularly in terms of validating the binding pose.

      Thanks for your comments. We used MMPBSA to analyze the ligand-binding energy for each receptor residue using MD trajectories. To further characterize the ligand-binding pose, we calculated the percentage of occurrence of hydrogen binding between the ligand and the carbohydrate-binding site (K277, E278, R253 and K77). We also calculated the ligand RMSF and ligand RMSD to show the stability of the ligand-binding pose and the simulation convergence. These data have been included in the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Seeking a selective inhibitor that precisely inhibits on-target activities and avoids side effects is a major challenge in the field of drug discovery and therapeutics. The authors proposed an alternative method that combines multiple inhibitors to maximize on-target inhibition and minimize off-target inhibition. Focusing on the kinase-inhibitor interaction dataset, the authors developed a quantitative way to measure the selectivity for mixtures of inhibitors by using the Jenson-Sahannon distance metric. The method sounds technical.

      From their computation and assays, the multi-compound-multitarget scoring (MMS) method framework was validated to be able to select a combination of inhibitors that is more selective than a single highly selective inhibitor for one kinase target, or for multiple targets. The MMS method is a promising solution to reduce off-target effects and could be applicable to other inhibitor-target interactions. My suggestion is that a comparative analysis of MMS with other similar methods can be conducted to highlight the advantage of MMS over others.

      We thank the reviewer for this excellent summary and their suggestions. We agree that comparing new methods to prior ones is an important step in benchmarking new approaches and methods. However, to our knowledge, no other method exists for calculating selective combinations of kinase inhibitors. We compare our JSD selectivity scoring metric to other representative target-specific and non target-specific selectivity metrics (Figure 2 Figure Supplement 2).

      The paper is not well organized and not easily readable. For example, first, the captions of the figures are two long; some of these texts could be moved to methods or results sections. Second, the concept of "penalty distribution" or "penalty prior" is vital to understand the MMS method, thus, at least a brief definition and introduction should be put in the main text rather than supporting method, as well as the rationale to use it. Third, the method section can be divided into several subsections with clear organizations and connections. Fourth, what is the difference between "a less selective inhibitor profile" and "an even less selective inhibitor profile" in Figure 3? Overall, the details of the paper are difficult to understand in the current version. I suggest rewriting the paper in a more concise and logical style.

      We appreciate these suggestions and have significantly edited and revised our manuscript in order to facilitate clear communication. Specifically:

      1) We have added an additional description of the penalty distribution to the description of the MMS method in the main Results section of the manuscript as opposed to solely in the Materials and Methods section.

      2) We have provided a high-level concise summary of the MMS method in the results section in order to help orient a reader to the method. This description follows the same order (1 to 5) as the associated Figure 2, we hope this helps more clearly communicate the method.

      3) We have moved descriptive figure captions to the methods section and, in general, substantially reduce the size of figure captions.

      4) We have subdivided the Materials and Methods section as suggested.

      5) We now describe in our main text how the simulated profiles were generated by smoothing the PKIS2645-like profile with two restraints; non-zero activity for LS inhibitors, and similar on-target probability for PKIS2-645-like, RS, and LS inhibitors to facilitate direct comparisons. We provide a new figure to quantify the selectivity of these simulated inhibitors and their similarity with true compounds (Figure 3 Figure Supplement 1).

      6) We have removed content from the introduction and results sections that was less important to communicate to a general audience in order to make the manuscript more concise. We have also removed or condensed extraneous supplemental figures that were not required to communicate the central results and findings of experiments (ex: supplemental figures for Figure 3 and Figure 4 from the prior submission).

    1. Author Response

      Joint Public Review

      (1) The developed model considers the interaction of multiple signaling networks that are essential for morphogenesis and homeostasis in the intestinal tissue, as well as other elements that had been proposed as relevant in the literature. Nevertheless, the details of how these interactions are modeled couldn't be evaluated in the current revision as the model was not shared with the reviewers and it is not available yet online, nor specified in any detail in the current manuscript. Additionally, how quantitative information from Wnt and BMP signaling pathways is incorporated in a quantitative way in the model is not clear.

      Model files are provided with this reply. These are ‘.jl’ files for use with Julia. The model (the files provided with this reply) will be freely publicly available through BioModels upon acceptance of this manuscript for publication.

      The model includes abstracted values to reproduce Wnt and BMP signalling gradients and their effect on cell proliferation and differentiation to generate the three-dimensional crypt spatial cell distribution. To further clarify the implementation of the quantitative information from Wnt and BMP signalling pathways in the model, we have added the following paragraph in the Appendix Section 8) Cell fate: proliferation, differentiation, arrest, apoptosis

      "…During this migration the Wnt content in absorptive progenitors is halved in each division and, away from Wnt sources, progressively decreases, while BMP signals increase, towards the villus. In our model, differentiation into enterocytes occurs when progenitors encounter a BMP signal level, higher that their Wnt signal content. For instance, in the ileal crypt in homeostasis this occurs approximately at cell position 16 from the crypt base, where progenitors migrating from the stem cell niche reach a reduced content of Wnt signals of about 8 a.u. On the other hand, the BMP signalling level has a maximum value of 64 at approximately cell position 23 from the crypt base, where BMP signals are generated by mature enterocytes. These BMP signals diffuse towards the crypt base and, hence, decrease exponentially to reach values of 8 a.u. at approximately position 16, which, hence, enable differentiation into enterocytes. Epithelial injuries resulting in a decreased number of enterocytes reduce BMP signal production and its diffusion range which results in the enlargement of the proliferation compartment as cells encounter the required level of BMP signals for differentiation only at higher positions in the crypt."

      (2) Some conclusions by the authors are not properly justified in the text, as "Paneth cells are the main driver behind the differential mechanical environment in the niche", "Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche", the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length, and "their recovery [absorptive progenitors] started before the end of the treatment, driven by a negative feedback loop from mature enterocytes to their progenitors".

      We have reworded these statements as described below.

      The paragraph “Paneth cells are the main driver behind the differential mechanical environment in the niche, where cells with longer cycles accumulate more Wnt and Notch signals. In agreement with experimental reports {Pin, 2015 #719}, in our model Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the region” has been modified and now reads as follows “In agreement with experimental reports {Pin, 2015 #719}, Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the niche. Due to this increased mechanical pressure, cells in the niche have longer division cycles and can accumulate more Wnt and Notch signals.”

      The sentence “Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche” has been deleted from paragraph, that now reads “To generate a niche of stable size, we implemented a negative Wnt-mediated feedback loop that resembles the reported stem cell production of RNF43/ZNRF3 ligands to increase the turnover of Wnt receptors in nearby cells {Hao, 2012 #2086;Koo, 2012 #2089;Clevers, 2013 #538;Clevers, 2013 #2098}. Similarly, in our model, a number of stem cells in excess of the homeostatic value reduces cell tethering of Wnt ligands and hence inhibits Paneth and stem cell generation (Figures 1A-B).”

      Regarding the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length. We have simplified the text in the main manuscript that now reads “Using the model of Csikasz-Nagy et al. {Csikasz-Nagy, 2006 #1870}, we modulated the duration of G1 through the production rate of the p27 protein. The p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and the beginning of S-phase {Morgan, 2007 #2073}. We, hence, hypothesized that rapid cycling absorptive progenitors located in regions of low mechanical pressure outside the stem cell niche have low levels of p27, which bring forward the start of S-phase to shorten G1 (Figures 2D). In support of this hypothesis, it has been demonstrated that p27 inhibition has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074} (see the Appendix for a full description).

      In the Appendix Section 2 we provide an extended explanation of the use of the p27 and Wee1 kinetic governing parameters to decrease the length of the cell cycle by decreasing mainly G1 but maintaining the length of S phase constant, which is as follows

      "Regarding G1 phase, the p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and defines the beginning of S-phase {Morgan, 2007 #2073}. We hypothesized that fast cycling cells have low levels of p27 which result in earlier DNA replication, bringing forward the start of S-phase and shortening the length of G1. In support of this hypothesis, it has been experimentally demonstrated that inhibiting p27 has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074}. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, the duration of G1 can be modulated through the parameter V_si, which is the basal production rate of p21/p27 (in the Csikasz-Nagy model, the p21 and p27 proteins are represented by a single variable, here we refer to that model quantity as p21/p27).

      Additionally, the end of S-phase is associated with the decrease of Wee1 to basal levels due to Cdc14 mediated phosphorylation of Wee1. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, this reaction is described by a Goldbeter-Koshland function, which includes the parameter KA_Wee1p to regulate the level of Cdc14 required for the phosphorylation of Wee1.

      Therefore, we modified these two parameters, V_si and KA_Wee1p, to ensure that variations of the cycle duration mostly impact on G1 while the length of S phase remains constant. We assumed that the value of the two parameters scales linearly with the duration of the division cycle, t_cycle, between a lower and upper bound, which prevent aberrant behaviour of the cell cycle model in the dynamically changing conditions of the crypt."

      The paragraph related to “their recovery started before the end of the treatment…” sentence has been amended in the text and now reads “Simulated proliferative absorptive progenitors were indirectly affected by stem cell ablation and their decrease was followed by a reduction in mature enterocytes. The progenitors recovered soon after treatment interruption to later reach values above baseline when responding to the negative feedback signalling from mature enterocytes (Figure 3A).”

      (3) Only the results of the "main" model are shown, with no information about its sensitivity to parameter values, and how their conclusions depend on specific decisions on the model. For example, the authors said that "an optimal crypt cell composition is achieved when BMP and Wnt differentiation thresholds result in progenitors dividing approximately four times before differentiating into enterocytes", but the results of alternative scenarios are not shown.

      To address this comment, we have included a new section in the Appendix, called “What-if Analysis”, and new figures (Figure S4-S8) with simulations of alternative scenarios affecting the main signalling pathways that govern crypt composition, in particular, we simulated stronger and weaker Wnt, BMP, Notch and ZNRF3/RNF43 signalling.

      We attach the new section here:

      "10) What-if Analysis

      We investigated the effect on the simulated crypt of increasing and decreasing the strength of the main signalling pathways, Wnt, BMP and ZNRF3/RNF43 signalling, and modifying the Notch thresholds. For each alternative parameterisation, except when decreasing ZNRF3/RNF43 signalling, the simulation was run for 30 days to ensure stability was reached with the new parameter set and the final 10 days were included in the analysis. When decreasing ZNRF3/RNF43 signalling, we simulated 60 days to demonstrate the expansion of the niche and analysed the final 10 days. The reference parameter set used as baseline was the ileal mouse crypt parameter set reported in Appendix Table 1. In all cases, we only consider modifications of one signalling mechanism at a time.

      To study alternative Wnt signalling scenarios, we used the WntRange parameter (Appendix Table 1), to double and halve the spreading area of Wnt signals emitted by Paneth cells while we maintained the original WntRange value for Wnt-emitting mesenchymal cells at the bottom of the crypt (Appendix Section 7.1) (Figures S4A-S4F). When WntRange was doubled, we observed increased number of stem and Paneth cells in a noticeably enlarged niche (Figures S4C-S4D), with cells choosing the stem cell fate instead of differentiating into absorptive progenitors. On the other hand, decreasing Wnt signalling, by halving WntRange in Paneth cells but maintaining its homeostatic value in mesenchymal cells, resulted in no apparent changes in the niche cell composition (Figures S4E-S4F) which resembled published experimental results of persisting functional stem cells after Paneth cell ablation {Durand, 2012 #434}.

      The ZNRF3/RNF43-mediated negative feedback mechanism regulates the size of the niche by modulating Wnt signalling. We simulated increasing and decreasing the strength of the ZNRF3/RNF43, by doubling and halving, respectively, the parameter Z described in the Appendix Section 7.2 (Figures S5A-S5F). Following the increase of the intensity of ZNRF3/RNF43 signalling, we observed a decrease in the number of stem and Paneth cells together with relatively minor changes in the transit-amplifying region (Figures S5C-S5D). On the other hand, when decreasing ZNRF3/RNF43 signalling levels, the niche expanded , resulting in a crypt dominated by Paneth and stem cells (Figures S5E-S5F ) which replicates reported experimental phenotypes {Koo, 2012 #2089}.

      To modify Notch signalling, we increased and decreased by 1 A.U. the Notch threshold required for lateral inhibition (Figures S6A-S6F). This Notch signalling threshold determines the number of contacting Notch-secreting cells (secretory lineage) to inhibit the differentiation of stem cells into the secretory lineage. Thus, increasing this Notch threshold enhances the production of secretory cells leading to the increase of Paneth, goblet and enteroendocrine cells (Figure S6C-S6D). Alternatively, decreasing the Notch threshold enhances differentiation into the absorptive lineage, reducing the number of Paneth and secretory cells (Figure S6E-S6F).

      We modified the range of diffusion of BMP signals by doubling and halving the parameter A , (Figures S7A-S7F) which denotes the amount of diffusing BMP signals towards the base of the crypt (Appendix Section 7.4). When we increased the BMP signalling range, enterocytes differentiated at lower crypt positions effectively reducing the transit-amplifying zone (Figure S7A, Figure S7B). Decreasing BMP signalling strength by halving A resulted in the increase of proliferative absorptive progenitors, which reach higher positions in the crypt (Figure S7C-S7D). The niche was largely unaffected in both cases (Figure S7E-S7F)."

      (4) Regarding the construction of the model, the authors used "counts of Ki-67 positive cells recorded by position" while the original data reported "overall cell counts per crypt and villus". Some explanation about how this conversion was made, why it is valid, as well as any potential problems, is needed. Additionally, the model is based on experiments done by others in mouse models; the similarity to the response in human intestinal crypts is not discussed.

      Ki-67 immunostaining data during 5-FU treatment was derived from the same experiments. The overall cell counts per crypt and villus are published in {Jardi, 2022 #2416}. For this manuscript, we reanalysed the intestinal samples to estimate counts of cell types by position in the crypt.

      We have clarified the text, which now reads …“The samples from this later study {Jardi, 2022 #2416} were analysed again to count Ki-67 positive cells at each position along the longitudinal crypt axis, for 30-50 individual hemi crypt units per tissue section per mouse as previously described {Williams, 2016 #2165}.”

      We agree that the understanding of the translation of results derived from animal models into a human or clinical context is of high relevance. The mouse crypt is a model of choice to study epithelial biology and exhibits remarkable similarities with the human crypt. In our team, we are focussed on developing translational modelling strategies and have a version of the model that describes a human crypt. That model assumes mostly conserved crypt biology and structure across species and includes changes in parameter values needed to compensate reported differences in morphometrics and cell cycle duration. Due to the relevance and extent of this translational work, we chose to focus on the mouse crypt entirely in this manuscript. We think the translational modelling strategy to explore the quantitative translation between human and mouse and/or other species/settings merits a full report.

      (5) The authors imply that their mathematical model of the intestinal crypt is an improvement over those already published but there is no direct comparison or review of the literature to substantiate this claim.

      An extended literature review including more details of previous ABMs to enable a direct comparison with our model is now included in the manuscript and reads as follows:

      “Several agent-based models (ABMs) have been proposed to describe the complexity and dynamic nature of the intestinal crypt. Early models were used as in silico platforms to study the dynamics and cellular organisation of the crypt. For instance, one of the pioneering ABMs was used to study the distribution and organisation of labelling and mitotic indices {Meineke, 2001 #326}. This model comprises a fixed ring of Paneth cells beneath a row of stem cells, which divide asymmetrically to produce a stem cell and a transit-amplifying cell that terminally differentiates after a fixed number of divisions. Some subsequent models are lattice-free, recapitulate neutral drift of equipotent stem cells and describe proliferation and cell fate regulated by a fixed Wnt signalling spatial gradient, which is defined by the distance from the crypt base, with proliferating cells progressing through discrete phases of the cell cycle and showing variable duration of the G1 phase {Pitt-Francis, 2009 #129}. Further model refinements can be seen in the model of Buske et al (2011), with stochastic cell growth and division time {Buske, 2011 #1}, Wnt levels defined by the fixed local curvature of the crypt and lateral inhibition driven by Notch signalling. Here, we present a lattice-free agent-based model that describes the spatiotemporal dynamics of single cells in the small intestinal crypt driven by the interaction of surface tethered Wnt signals, cell-cell Notch signalling, BMP diffusive signals, RNF43/ZNRF3-mediated feedback mechanisms and the cycle protein network responding to the crypt mechanical environment. We show that our computational model enables the simulation of the ablation and recovery of the stem cell niche as well as of how drug-induced molecular perturbations trigger a cascade of disruptive events spanning from the cell cycle to single cell arrest and/or apoptosis, altered cell migration and turnover and ultimately loss of epithelial integrity.”

      (6) The authors claim that the simulated data and the available mouse data match up. Nevertheless, the data vs the model still appear both quantitatively and qualitatively different (as presented in Figures 2E, F, and 5C, D). This puts in doubt how much the model can actually reproduce the experimental data. In conclusion, the model would benefit from further refinement, particularly if the goal is to use the model for predicting the dynamics of oncogenic drug candidates.

      To address this comment, we have made several adjustments: we refined the counting algorithm that determines cell position and improved the Ki67 and BrdU staining simulations by modifying the simulated staining criteria and adding an estimation of the experimental error to the simulated responses. A description of these changes is described in a new section in the appendix called “ABM simulation of Ki-67 and BrdU Staining”

      With these changes we think we have achieved a more satisfactory agreement between observed and predicted results and updated all figures with Ki67 and BrdU staining simulated results.

    1. Author Response

      We are grateful to the editors and the reviewers for the thorough evaluation of our manuscript and their feedback, as it allows us to provide additional clarification of our findings and improve the manuscript.

      In their evaluation reviewers raised a key conceptual point linked to the inhibitory mechanism that appeared to be insufficiently explained in the manuscript, leading to a misconception regarding the physiological relevance. They have also missed experimental data related to the concentrations of Aβ used and their relevance for Alzheimer’s disease (AD). We believe that our studies, although performed in vitro in model systems, provide novel conceptual framework and shed light on the unexplored mechanisms underlying AD.

      We discuss these points below in a provisional response to their comments.

      Reviewer #1 (Public Review):

      Summary:

      Human Abeta42 inhibits gamma-secretase activity in biochemical assays.

      Strengths:

      Determination of inhibitory concentration human Abeta42 on gamma-secretase activity in biochemical assays.

      Weaknesses:

      Human Abeta42 may concentrate up to microM order in endosomes.

      This is correct.

      If so, production of Abeta42 would be attenuated then lead to less Abeta deposition in the brain. The authors finding is interesting but does not fit the physiological condition in the brain.

      We thank the reviewer for raising this key conceptual point, as this gives us the opportunity to clarify it for the future readers.

      The characterized inhibitory mechanism is more complex than the reviewer’s interpretation, and a number of factors must be considered. Indeed, our data show that Aβ42 upon intracellular concentration inhibits γ-secretase activity, resulting in increased γ-secretase substrate (C-terminal fragment, CTF) levels. It is important however to highlight that this inhibition is competitive in nature, implying that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the substrates. The model that we put forward is that cellular uptake and intracellular concentration of Aβ42 facilitates γ-secretase inhibition, which results in the accumulation of APP-CTFs (and γ-secretase substrates in general). However, as Aβ42 levels fall, the increased concentration of substrates shifts the equilibrium towards their processing and Aβ production. As Aβ42 concentration raises again, equilibrium is shifted back towards inhibition and so on. This inhibitory mechanism will translate into pulses of (partial) γ-secretase inhibition, which will alter γ-secretase mediated signalling (arising from increased CTF levels or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signalling, implicated in memory formation (2), and potentially others (related to e.g. cadherins, p75 or neuregulins).

      It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor (semagacestat) have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (2, 3); and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (Koch et al, 2023). We will include this clarification in the discussion of the revised manuscript and create an additional figure presenting the proposed mechanism.

      It is not clear whether the FRET-based assay in living cells really reflect gamma-secretase activity.

      The specificity of this assay is supported by the γ-secretase inhibitor treatment included as a positive control (Figure 3). In addition, the following literature supports that this assay truthfully assesses γ-secretase activity in cellular context (4-7).

      Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase.

      This is correct, and therefore we have analysed the contribution of other APP-CTF degradation pathways by performing cycloheximide-based stability assay in the presence of γ-secretase inhibitor. Quantitative analysis of the levels of both APP-CTFs and APP-FL over the 5h time-course failed to reveal significant differences between Aβ42 treated cells and controls. As expected, Bafilomycin A1 treatment markedly prolonged the half-life of both proteins (Figure 7B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γ-secretase inhibition is consistent with the proposed inhibitory mechanism. Finally, we note that the inhibition will not only affect APP-CTF, but also the processing of γ-secretase substrates in general.

      Reviewer #2 (Public Review):

      Summary:

      In the current study, the authors tested the hypothesis that Aβ42 toxicity arises from its proven affinity for γ-secretases. Specifically, the increases in Aβ42, particularly in the endolysosomal compartment, promote the establishment of a product feedback inhibitory mechanism on γ-secretases, and thereby impair downstream signaling events. They showed that human Aβ42 peptides, but neither murine Aβ42 nor human Aβ17-42 (p3), inhibit γ-secretases and trigger accumulation of unprocessed substrates in neurons, including (CTFs of APP, p75 and pan-cadherin. Moreover, Aβ42 dysregulated cellular homeostasis by inducing p75-dependent neuronal death. Because γ-secretases process many other membrane proteins, including NOTCH, ERB-B2 receptor tyrosine kinase 4 (ERBB4), N-cadherin (NCAD) and p75 neurotrophin receptor (p75-NTR), revealing a broad range of downstream signaling pathways, including those critical for neuronal structure and function. Hence, they propose to identification of a selective role for the Aβ42 peptide, and raise the intriguing possibility that compromised γ-secretase activity against the CTFs of APP and/or other neuronal substrates contributes to the pathogenesis of AD. Overall, the data are not very convincing to support the main claim.

      Strengths.

      Different in vitro and cellular approaches are employed to test the hypothesis.

      Weaknesses.

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 4G). Treatment with this conditioned medium led to the increase APP-CTF levels, supporting that low nM concentrations of Aβ are sufficient for partial inhibition of γ-secretase.

      We would like to underline that Aβ is estimated to be present in the brain in concentration ranging from fM to mM, depending on the pool (soluble, aggregated, fibrillar, etc) that is considered (8, 9). However, it is rather the local than the global concentration of Aβ that is critical for the disease pathogenesis. In this regard, it is proposed that as AD progresses Aβ42 slowly accumulates in the endo-lysosomal system wherein it reaches μM concentrations that are required for aggregation and seeding (1, 10, 11). Our findings are consistent with the analysis showing that extracellular soluble Aβ42 peptide, at low nM concentrations, is taken up by cortical neurons and neuroblastoma (SH-SY5Y) cells, and concentrated in the endo-lysosomal system wherein effective peptide concentrations reach ~2.5 μM (1). Hence, a slow vesicular peptide accumulation and/or degradation imbalance (1, 11, 12) could lead to several order of magnitude increases in the effective concentration of Aβ42 over the span of years to decades in AD pathogenesis. We note that our experimental settings, using low μM concentrations of extracellular Aβ42 over 24h treatment, were designed to accelerate this 'peptide concentration’ process in vitro. As discussed in our report, a high μM Aβ peptide concentration in the endo-lysosomal system not only leads to aggregation but also facilitates γ-secretase inhibition. Of note, we are currently developing protocols and will undertake follow up studies to quantitatively define the Aβ concentration in synaptosomes and endosomes in AD brain, as well as in in vitro systems (i.e. cells treated with Aβ preparations obtained from AD brains).

      Finally, we would like to highlight that analyses of the brains of the AD affected individuals have shown that APP-CTFs accumulate in both sporadic and genetic forms of the disease (13-15); and recently, Ferrer-Raventós et al have revealed a correlation between APP-CTFs and Aβ levels at the synapse (13).

      To conclude, we would like to highlight that as clarified above, the Aβ peptide concentrations and the conditions tested fit well within pathophysiology, and that the data presented in our report collectively provide evidence in support of an Aβ42-mediated inhibitory effect on γ-secretase.

      References:

      1. X. Hu et al., Amyloid seeds formed by cellular uptake, concentration, and aggregation of the amyloid-beta peptide. Proc Natl Acad Sci U S A 106, 20324-20329 (2009).
      2. B. De Strooper, Lessons from a failed γ-secretase Alzheimer trial. Cell 159, 721-726 (2014).
      3. R. S. Doody et al., A phase 3 trial of semagacestat for treatment of Alzheimer's disease. N Engl J Med 369, 341-350 (2013).
      4. M. C. Houser et al., A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel) 20, (2020).
      5. M. C. Q. Houser et al., Limited Substrate Specificity of PS/γ-Secretase Is Supported by Novel Multiplexed FRET Analysis in Live Cells. Biosensors (Basel) 11, (2021).
      6. M. Maesako et al., Visualization of PS/γ-Secretase Activity in Living Cells. iScience 23, 101139 (2020).
      7. M. Maesako, M. C. Q. Houser, Y. Turchyna, M. S. Wolfe, O. Berezovska, Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci 42, 145-154 (2022).
      8. B. R. Roberts et al., Biochemically-defined pools of amyloid-β in sporadic Alzheimer's disease: correlation with amyloid PET. Brain 140, 1486-1498 (2017).
      9. J. A. Raskatov, What Is the "Relevant" Amyloid β42 Concentration? Chembiochem 20, 1725-1726 (2019).
      10. M. P. Schützmann et al., Endo-lysosomal Aβ concentration and pH trigger formation of Aβ oligomers that potently induce Tau missorting. Nat Commun 12, 4634 (2021).
      11. E. Wesén, G. D. M. Jeffries, M. Matson Dzebo, E. K. Esbjörner, Endocytic uptake of monomeric amyloid-β peptides is clathrin- and dynamin-independent and results in selective accumulation of Aβ(1-42) compared to Aβ(1-40). Sci Rep 7, 2021 (2017).
      12. M. F. Knauer, B. Soreghan, D. Burdick, J. Kosmoski, C. G. Glabe, Intracellular accumulation and resistance to degradation of the Alzheimer amyloid A4/beta protein. Proc Natl Acad Sci U S A 89, 7437-7441 (1992).
      13. P. Ferrer-Raventós et al., Amyloid precursor protein Neuropathol Appl Neurobiol 49, e12879 (2023).
      14. M. Pera et al., Distinct patterns of APP processing in the CNS in autosomal-dominant and sporadic Alzheimer disease. Acta Neuropathol 125, 201-213 (2013).
      15. L. Vaillant-Beuchot et al., Accumulation of amyloid precursor protein C-terminal fragments triggers mitochondrial structure, function, and mitophagy defects in Alzheimer's disease models and human brains. Acta Neuropathol 141, 39-65 (2021).
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      As written in my public review I consider the science of this work to be high quality. I have some suggestions for the write-up though. As a general comment, I think that too much has been put into the appendices. In particular, the main text could contain more details about the model.

      We are pleased that this Reviewer feels that our work to be of “high quality”. We value the reviewer’s insightful suggestions and comments. Following this Reviewer’s suggestion we have moved certain sections to the main text.

      In what follows, we provide responses to each of the reviewer’s inquiry, and indicate the appropriate changes in the revised version.

      P2 -

      ϕ is introduce as packing fraction - on p3 it’s called cell density. Also it is not clear whether it is an area fraction or a cell number density. Please define properly and I would suggest sticking to one notion.

      ϕ is the cell packing fraction. In two dimensions (as is the case in our simulations) it is the area fraction. However, in order to stick to one general notation (independent of dimension) we use “packing fraction” to represent how densely the cells are packed. We changed it the revised manuscript to ensure uniformity.

      P3 -

      “which should and should slow down the overall dynamics” Typo?

      Corrected it in the revised manuscript.

      “One would intuitively expect that the ϕfree should decrease with increasing cell density” Please, define ϕfree

      ϕfree is defined in Eqn. 4. We ought to have defined it in the introduction.

      “When ϕ exceeds ϕS, the free area ϕfree saturates because the soft cells interpenetrate each other,” I suggest clearly distinguishing between biological cells and the agents (disks) used in the simulation. Please, also clarify What interpenetration of agents corresponds to in tissues?

      We have rewritten the sentence as, ”The simulations show that when..” Soft disks used in the simulations seem to be not an unrealistic model for biological cells. The small deformations noted in our model is not that different from the cells in the tissues. For visual reference, please see Author response image 1. In the left panel of the figure, a 2D snapshot of the experimental zebrafish tissue, displays the deformation of cells labeled as 1 and 2. Likewise, the right panel illustrates the extent to which such deformations are replicated in the simulation by allowing two cells to overlap (the white area in the right panel of Author response image 1 represents the interpenetration). In the revised manuscript, we have made the necessary change from “soft cells” to “soft disks.”

      Author response image 1.

      Snapshots of zebrafish tissue (left panel) (Ref. [14] main text) and model two dimensional tissue (right). In the right panel the white area represents the overlap and the black vertical line represents the intersection.

      “The facilitation mechanism, invoked in glassy systems [22] allows large cells to move with low mobility.” What is the facilitation mechanism?

      Facilitation, which is an intuitive idea, that refers to a mechanism by which cells in a in highly jammed environment can only move if the neighboring cells get out of the way. In our case (as shown in the text (Fig.3 (A) and Fig. 13 (A) & (B)) the smaller cells move faster almost independent of ϕ. When a small cell moves, it creates a void which could facilitate neighboring cells (including big ones) to move.

      “η (or relaxation time)” I suggest explaining the link between η and the relaxation time.

      First, in making this point on aging we only showed that the relaxation time is independent of the waiting time. In the revised manuscript we deleted η.

      Although not germane to this study, in the literature on glass transition, it is not uncommon to use relaxation time τα (as a proxy of viscosity η) to describe the dynamics. The relation between τα and η is given by

      where G∞ is the “infinite frequency” shear modulus, which holds in unjammed or in liquids. This relation suggests that τα is proportional to η, which is almost never satisfied in glass forming systems.

      P5 - “In addition, the elastic forces characterizing cell-cell interactions are soft, which implies that the cells can penetrate with rij − (Ri + Rj) < 0 when they are jammed.” Is this about the model or the biological tissue? Presumably the former, because real cells do not penetrate each other, right? What are rij, Ri and Rj?

      This is about the model. The cells are sufficiently soft that they can be deformed, which allows for modest interpenetration. Real cells exhibit similar behavior (see Fig. 1). In inset of Fig. 4 (b) rij is the center to center distance between cells with radii Ri and Rj. It is better to use the word overlap instead of penetrate, which is what we have done in the revised version.

      “we simulated a highly polydisperse system (PDs) in which the cell sizes vary by a factor of ∼ 8” Is it important to have a factor 8 - the zebra fish tissue presents a factor 5 − 6?

      This is an important question, which is difficult to answer using analytic theory. It does require simulations unfortunately. We do not know a priori the polysipersity value needed to observe saturation in η at high value of ϕ. However, we have shown that the a system with one type of cell (monodisperse) crystallizes. Furthermore, mixtures of two cell types do not show any saturation in η over the parameter range that we explored. A systematic simulation study is needed to explore a range of parameter values to determine the minimum PD, which would match the experimental findings.

      We performed 3D simulations to figure out if much less PD would yield saturation in η. Preliminary simulations in three dimensions with a lower value of PD (11.5% with a size variations by a factor of ≈ 2 ) exhibits saturation in the relaxation time. For comparison, the value of PD in the current work is ≈ 24% with a size variation by a factor of 8.

      P6 -

      “which is related to the Doolittle equation [26] for fluidity ( )” what is the Doolittle equation? Is it important here? Also: “VFT equation for cells”? Is it the same as given on p.2 - so nothing special for cells - or a different one?

      Historically, the Doolittle equation was proposed to describe the change in η in terms of free volume in the context polymer systems over 60 years ago. The physics in the polymers is very different from the soft models for cells considered here. Nevertheless, the equations has meaning in the context as well. The Doolittle (other names associated with similar equations are Ferry, Flory... ) equation is given by

      , where A and B are constants, V is the total volume and Vhc is the hardcore volume. Essentially, is the relative free volume. It can be shown that one can arrive at the VFT equation starting from the Doolittle equation.

      The VFT equation for cells is same as given in page 2, which we restate for completeness. Here, we introduce the apparent activation energy.

      “The stress-stress tensor” Why not simply stress tensor?

      We have corrected it.

      “shows qualitatively the same behavior as the estimate of viscosity (using dimensional arguments) made in experiments.” Where is this shown?

      The dependence of viscosity as a function ϕ is shown in Figure 1 (c).

      P7 -

      Fig 2A caption “dashed line” Maybe full line?

      This should be full line. It is fixed in in the revised manuscript.

      P8 -

      “a puzzling finding that is also reflected” Why is it puzzling?

      In figure 2 (C), it shows that the increase in the duration in the plateau of Fs(q,t) ceases when ϕ exceeds ≈ 0.90. This to us is puzzling (always a matter of perspective) because we expected that the duration of Fs(q,t) plateau to increase as a function of ϕ based on the VFT behavior for ϕ ≤ ϕS. As a result, we imagined that the relaxation time τα would continue to increase beyond ϕS. However, the simulations show that the relaxation time is essentially a constant for ϕ > 0.90, which implies that the soft disk system (our model for the tissue) is an unusual with behavior that has no counter part in the material world.

      “If the VFT relation continues” –“If the VFT relation continued”

      We have fixed it.

      First paragraph does not seem to be coherent

      What is RS (or Rs)?

      RS is the radius of the small cell. In the revised manuscript we have made this clear.

      P10 -

      Please, define the waiting time.

      The waiting time refers to the period between sample preparation and data collection either in experiments or in simulations. In an ergodic system, the properties should not depend on the waiting time provided provided it is large. In other words, after the system reaches thermal equilibrium, the waiting time tω should not have an impact on the properties of the system.

      “fully jammed” Please, define.

      The term “fully jammed” refers to a state in which the constituent particles in a system do not move. For example, it a hard sphere system at a packing fraction of approximately 0.84 is fully jammed, which implies there is wiggle room for a particle move without violating the excluded volume restriction. At this specific packing fraction, the hard sphere system undergoes a jamming transition, resulting in the particles becoming completely immobile. The nonconfluent tissue modeled here is not fully jammed.

      P11 -

      Fig.4 it is hard to see that the width of P(hij) increases with ϕ.

      Please see Author response image 2 with a less number of curves for a better visualization. We have replaced this figure in the revised version.

      Author response image 2.

      Probability of overlap (hij) between two cells, P(hij), for various ϕ values.

      “Thus, even if the cells are highly jammed at ϕ ≈ ϕS, free area is available because of an increase in the overlap between cells.” This conclusion seems premature at this point.

      The Referee is correct. This is shown in Fig. 5. We amended the ends of the sentence to reflect this observation.

      P12 -

      “as is the case when the extent of compression increases” extent of compression = density?

      This is correct. Extent of compression corresponds to the packing fraction or the density.

      “This effect is expected to occur with high probability at ϕS and beyond,” Why? What is special about ϕS.

      To achieve high packing fractions beyond a certain value of ϕ soft cells have, which would occur at a certain value ϕS. In the system studied here, ϕ ≈ 0.90 = ϕS. Note that ϕS could be altered by changing the system parameters.

      P15 -

      “local equilibrium” In a thermodynamic sense? There is also cell migration, so thermodynamic equilibrium does not seem to be appropriate.

      This is an important point. The observation that equilibrium concepts hold in what is manifestly a non-equilibrium system is a surprise. It is referred in a thermodynamic sense. We agree with the reviewer because of cell division (in Ref. [14] main text), cell death, thermodynamic equilibrium does not seems to be appropriate. This is exactly the point we raise in the introduction. However, considering the timescale of cell division and death it appears that there may be a local steady state, which we we call a “local equilibrium”. As a consequence phase transition ideas and Green-Kubo relations are applicable. Indeed, a surprise in the conclusion in Ref. [14] is that in the zebrafish morphogenesis equilibrium description seems adequate.

      “number of near neighbor cells that is in contact with the ith cell. The jth cell is the nearest neighbor of the ith cell, if hij > 0” A neighbour cell or the nearest neihbor?

      A neighbour cell is accurate.

      P16 -

      “In our model there is no dynamics with only systematic forces because the temperature is zero.” What is a systematic force? I do not understand the sentence.

      Systematic force between two cells is defined in Eqn. 5 in the main text. Because temperature is not a relevant variable in our model, we want to emphasize that in the absence of self propulsion, the cells would not move at all.

      Reviewer #2

      Major comments:

      A/ Role of size polydispersity

      In the text, and also in the methods (Appendix A), the authors mention that they need large polydispersity of particle sizes to explain the viscous plateau, as the dynamics of small vs large cells are ”dramatically different” (Appendix G). They simulate a system where cell sizes vary by a factor 8, mentioning this is typical in tissues, but I found this quite surprising - this would be heterogeneities in cell volume of 500, many orders of magnitude above what has been measured in tissues. As far as I’m aware, divisions are quite symmetric and synchronous in early vertebrate embryogenesis, so volume variations are expected to be very small (similarly in epithelial tissues, where jamming has been looked at extensively, I’m not aware of examples with ratio of 8 between cell diameters). One question I had is that when the authors look at ”small polydispersity”, there are 50 − 50 mixtures. Would small polydispersity with continuous distributions change this picture? Could they take their current simulations but smoothly change the ratio of polydispersity from 8 to 0 to see exactly how much they need to explain viscosity plateauing, and at which point is the transition?

      We thank the reviewer for raising this important question, which was also a concern for Reviewer #1. The value of polydispersity (PD) required to observe such behavior is not known a priori even within the simple model used. We selected a PD value, with a size variation of a factor of 8, guided in part by the experiment (projection onto 2D) shown in Figure 1(B) and Figure 6(D). We also showed that the monodisperse system crystallizes, and the binary system do not show signs of saturation within the explored range of parameter space and ϕ. This suggests that a certain degree of size dispersity is necessary to obtain saturation in η.

      As discussed in Appendix B, the binary system is characterized by the variables , where RB and RS represent the radii of the big and small cells, respectively, and the packing fraction ϕ. By more fully exploring the parameter space encompassing λ and ϕ than we did, it maybe possible, as the Referee suggests, that a system with two different cell sizes would yield the experimentally observed dependence of η on ϕ.

      As part of an answer to the Reviewer #1 on a the same issue, we mentioned results of preliminary simulations in three dimensions with reduced levels of polydispersity, and discovered that at lower levels of polydispersity (variation in size by a factor of ≈ 2 and polydispersity value 11.50%), the relaxation time does saturate beyond a certain packing fraction (see Fig. 3). We have not established if η, the key quantity of interest, would exhibit a similar behavior in 3D.

      Author response image 3.

      (A) τα as a function of ϕ for 11% polydispersity with size variation by a factor of ∼ 2 in the three dimensional system. (B) Same as (A) except polydispersity value is 24% and a size variation by a factor of ∼ 8.

      B/ Role of fluctuations/self-propulsion in this system, and relationship to recent findings

      “A priori it is unclear why equilibrium concepts should hold in zebrafish morphogenesis, which one would expect is controlled by non-equilibrium processes such as self-propulsion, growth and cell division. ”

      This is raised as a key paradox, but is not very clear to me in the context raised by the authors. In particular, they use self-propulsion as a source of activity and explain the evolution of viscosity but a facilitation process involving re-arrangements/motility. But I don’t think self-propulsion has been argued to play a role in zebrafish blastoderm - Ref 14 argues that this is effectively a zerotemperature phenomenon and that cell motility/rearrangements do not show any correlation with viscosity. So this part of the model assumption was not clear to me in relationship with the proposed experimental system. Active noise has been proposed to play key roles in other systems, including motility-driven and tension fluctuation-driven unjamming (among many others Bi et al, PRX, 2016, Mitchel et al, Nat Comm, 2020, Pinheiro et al, Nat Phys, 2022 as well as Kim & Campas, Nat Physics, 2021) - maybe this is somewhere where the author model could fit? In Kim & Campas, Nat Phys, 2021 in particular, the authors develop simulations of non-confluent tissues with noise, that seems to bear some resemblance to the model developed here, so it would be important to discuss the similarities and distinctions (usually I think polydispersity is not considered indeed). In general, the authors look here at a particle based model, but cells have adhesions with well-defined contact angles, so there is a question of the cross-over between their findings and the large body of recent literature on active foams/vertex models (which are not really discussed there).

      We appreciate the lengthy comment here, and there is a lot to unpack. We also thank the referee for the references, some of which we did not know about earlier.

      The primary objective of our study is to determine the simplest minimal model that would explain the experimentally observed dependence of viscosity in zebrafish blastoderm tissue as ϕ is increased beyond a certain packing fraction during morphogenesis. In Reference 14, the authors analyzed the data using the framework of rigidity percolation theory and presented evidence of a genuine equilibrium phase transition. Consequently, one would that expect zebrafish blastoderm tissue to be in equilibrium, which is surprising from many perspectives. However, since the tissue is a growing system involving numerous cell divisions and cell death, it is not immediately evident whether the assumption of equilibrium is valid. Indeed, the same problem arises when considering the glass transition where rapid cooling drives the system out of equilibrium. Nevertheless, heat capacity and η are often analyzed using the notion of equilibrium. Hence, considering this issue within the context of our research appears to be reasonable.

      To the best of our knowledge, the authors in Ref. 14 did not provide an explanation for the η behavior. The focus was, which was excellent and is the basis on which we initiated this study, was on the use of rigidity percolation theory to explain the results. Indeed, they performed an experiment by mildly reducing myosin II activity, which apparently affects cell motility. The quantitative effect was not reported.

      We did not impose any requirement of cell rearrangements etc in the model. There is essentially one variable, free area available, that explains the η dependence on ϕ. It is possible that one can come up with other zero temperature models that could also explain the data. To the best of our knowledge, it has not been proposed.

      It would be interesting to set our model in the context of other models that the referee points out. This would be an interesting research topic to explore. The only comment we would like to make is that it is unclear how vertex model for confluent tissues could explain the viscosity data.

      C/ Calculation of the effective shear viscosity

      The authors calculate viscosity from a Green-Kubo relation, although it would be good to clarify at which time scale (and maybe even shear amplitude) they expect this to be valid. These kinds of model would be expected to show plastic rearrangements for large deformations for instance, could the authors simulate realistic rheological deformations (e.g. Kim & Campas, 2021 applying external shear on the simulations) to see how much this matches both their expectation and the data?

      Once it is established that there is local equilibrium (as implied by the use of phase transition ideas to analyse the experimental data in Ref. 14), it is natural to use the Green-Kubo relation to calculate transport properties. Hence, for our purposes, it is valid for all time scales and amplitude. The Reviewer also wonders if the model could be used to simulate response to shear in order to probe rheological properties. There is no conceptual issue here and indeed this is an excellent suggestion that we intend to pursue in the future.

      D/ Role of cell adhesion

      The authors consider soft elastic disks of different sizes but unless I missed it, there is no adhesion being considered. This is expected to play a key role in jamming and multicellular mechanics, so I think the authors should either look at what this changes in their simulations, or at least discuss why they are neglecting it. One reason I’m asking is that it’s not totally clear to me that the ”free space” picture, coming from the fact that cells can interpenetrate in their model would hold in a model of deformable cells adhering to each other with constant volume (leading to more equilibration of deformations it would seem?).

      The referee raises another question regarding the lack of adhesion in the simulations. As pointed out before, we were trying to create a minimal model to account for the experimental observations for η upon changing the packing fraction. Thus, we a coarse-grained model where we considered poly-disperse cells with elastic interactions which recapitulates the experimental observations. The referee is correct that adhesion plays a role in jammed systems, and examination of how it would affect is an aspect that would be interesting to consider in the future. We hasten to add that even systems without attractive adhesion-type interaction become jammed. In principle, in many-body systems, the parameter space is large and one needs to carefully determine which parameter is important for the problem at hand. Therefore, in the first pass we did not find the need to consider the role of adhesion.

      Minor comments:

      The writing could be condensed in some places, with some details being moved to SI (for instance, section E on ageing is very short and seem more suited for supplements, or at least not as an independent section, note that the figure numbering also jumps to Fig. 9 there, although it’s Fig. 3 just before and Fig. 9 just after - re-ordering into main and supporting figures would be clearer.

      We thank the Reviewer for this recommendation. The ageing section, although is short, it does provide a line of evidence that equilibrium approaches could be valid. We have modestly expanded the section by moving Appendix D to the main text, a general suggestion made by Referee 1. We have tried to be consistent in the numbering of figures in the revision.

      Reviewer #3

      I am very much in favor of the manuscript in its present form - I only suggest commenting (in the manuscript) on the issue described below.

      Motivated by the fact that the experimental system consists of living, motile cells the authors use an active particle model (eq. 6) with stochastic selfpropulsion as the only source for noise (zero-temperature). It would be useful to elaborate briefly how important this stochastic self-propulsion is for the emergent rheological properties of the system (as summarized above): would these properties also be present in the “passive” version of the same model at “non-vanishing” temperature, and if not, why? Or analogously in a “passive” version which is “shaken”, reminiscent of shaken granular matter? To clarify these issues would relate this study to (or discriminate it from) passive, but complex, liquids or granular matter.

      We appreciate the reviewer’s positive feedback on our work. The reviewer has raised an important question concerning our model in which self-propulsion serves as the source of noise. Without self-propulsion, the system would come to a stationary state after reaching mechanical equilibrium. As mentioned in Eqn. (6) (in the main text), we can define a characteristic time . It is possible that scaling the time t by τ would not alter the results.

      The second question raised by the reviewer is also important. A passive version of the model would be to consider Eq. 6 in our article, and instead of using activity use the standard stochastic force. The resulting force would be at a finite temperature,. The coefficient of noise (a diffusion term) would be related to γi through the Fluctuation dissipation theorem(FDT)). Such a system of equations cannot ne mapped to Eq. 6 in which µ and γi are independently varied. It is unlikely that such a model, incorporating a “non-vanishing” temperature, would not result in the observed dependence of η on ϕ for the following reason. The passive model represents a polydisperse system, which would form a glass with η increasing with volume fraction, following the VFT law, as has been demonstrated in the glass transition literature for harmonic glasses. The other proposal whether the shaken version version would explain the experiments is also interesting. These are worth pursuing in future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you very much for the kind comments about our manuscript. We have improved the text to address all reviewers’ comments and suggestions. Additionally, we corrected and improved the supplementary tables.

      Reviewer #1 (Public Review):

      This paper provides new evidence on the relationship between genetic/chromosome divergence and capacity for asexual reproduction (via unreduced, clonal gametes) in hybrid males or females. Whereas previous studies have focussed just on the hybrid combinations that have yielded asexual lineages in nature, the authors take an experimental approach, analysing meiotic processes in F1 hybrids for combinations of species spanning different levels of divergence, whether or not they form asexual lineages in nature. As such, the findings here are a substantial advance towards understanding how new asexual lineages form.

      The quality of the work is high, the analyses are sound, and the authors sensibly link their observations to the speciation continuum. I should also add that the cytogenetic work here is just beautiful!

      A key finding is that the precondition for asexual reproduction - the formation of unreduced gametes - is not unusual among hybrid females, so that we have to consider other factors to explain the rarity of asexual species - a major unresolved issue in evolutionary biology. This work also highlights a previously overlooked effect of chromosome organisation on speciation.

      Thank you for the nice comments about our work as well as for appreciating our cytogenetics work and figures.

      Reviewer #2 (Public Review):

      The authors investigate the origin of asexual reproduction through hybridization between species. In loaches, diploid, polyploid, and asexual forms have been described in natural populations. The authors experimentally cross multiple species of loaches and conduct an impressively detailed characterization of gametogenesis using molecular cytogenetics to show that although meiosis arrests early in male hybrids, a subset of cells in females undergo endoreplication before meiosis, producing diploid eggs. This only occurred in hybrids of parental species that were of intermediate divergence. This work supports an expanding view of speciation where asexuality could emerge during a narrow evolutionary window where genomic divergence between species is not too high to cause hybrid inviability, but high enough to disrupt normal meiotic processes.

      Thank you.

      I enjoyed reading this study and I appreciate the amount of work it takes to conduct these types of cytogenetic experiments. But, my main concern with this study is I was left wondering if the sample sizes are large enough to get a sense how variable endoreplication is in these loach species. Most of the hybrids between species are the result of crosses between 1-2 families. Within males and females, meiocyte observations are limited to a handful of pachytene and diplotene stages. I think it would be helpful to be more transparent about the sample sizes in the main text.

      Thank you for raising this point. We have improved the Supplementary Tables S2 and S3 to clarify how many individuals we analyzed from each genetic family and added this information to the main text. In total we obtained 12 combinations with 19 F1 hybrid families. For the combination, C. elongatoides x C. taenia hybrids we obtained three families, for C. elongatoides x C. ohridana, C. elongatoides x C. tanaitica, C. elongatoides x C. bilineata and C. ohridana x C. bilineata, we obtained two families For the rest of the combinations of hybrids we obtained single family. From these families, 79 individuals were used for the analysis of the meiocites. Additionally, 24 parental individuals, males and females, were analysed. For the parental species, we analysed 852 cells, for hybrid males we investigated 244 cells, and 665 cells for hybrid females.

      Along these lines, the authors argue against the possibility that endoreplication may be predisposed to occur at a higher rate in some species (line 291). Instead, they suggest that endoreplication is a result of perturbing the cell cycle by combining the genomes of two different species. Their main argument is based on gonocyte counts from parental females in a previous reference. It is essential to include counts from the parents used in this study to make a clear comparison with the F1s.

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytene cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have a significantly lower incidence of abnormal pachytene cells. We have now included this information in the main text.

      In the discussion (lines 320-333), the authors postulate the sex-specific clonality they observe could be a result of Haldane's rule. Given these fish do not have known sex chromosomes, I do not find this argument strong. Haldane's rule refers to the exposure of recessive incompatibilities with the sex chromosomes in the hybrid heterogametic sex. This effect would therefore be limited to degenerated sex chromosomes where much of the sequence content on the Y or W has been lost. These species may have homomorphic sex chromosomes, but if this is the case, they likely are not very degenerated. Instead, it seems more plausible that the sex-specific effect the authors observe is due to intrinsic differences of spermatogenesis and oogenesis. Is there any information about sex-specific differences in the fidelity of gametogenesis from other species that would support a higher likelihood of endoreplication?

      Thank you for this important question, however, we think it was a misunderstanding. We do not postulate that our observation conforms to Haldanes’ rule as, by contrast to this rule based on sex chromosomes, our previous publication demonstrated that whatever the gonadal sex differentiation is in our taxa, the ability to overcome sterility by asexual gametogenesis is always confined to female gonadal environment (or oogenesis in general), even in the transplanted spermatogonial cells (Tichopad et al. 2022). What we meant by our text is that our results do not fully conform to Haldane’s rule. We therefore reworded our text to rule out such a misconception.

      Nonetheless, we note that it has been demonstrated that Haldanes’ rule is also applicable to species with little differentiated sex chromosomes (e.g. Presgraves and Orr 1998) and that recessive incompatibilities are not the only explanation as faster male theory or faster X may also apply in such cases (Dufresnes et al. 2016). Therefore, we have kept our remarks about Haldane’s rule here. Moreover, for several parental species, we preliminary found the occurrence of an XY gonadal sex differentiation system, albeit these are unpublished and need further validation.

      The final thing I was left wondering about was this missing link between endoreplication and activating the embryonic development of the diploid egg. In these loach species, a sperm is required to activate egg development, but the sperm genome is discarded (line 100). What is the mechanism of this and how does it evolve concurrently during hybridization?

      Thank you for the comment. There have been many speculations about why gynogens actually need sperm to activate their egg development, but to our knowledge, no explanation has been validated to date. Interestingly, a recent theoretical model by Fyon et al. BiorXiv 2023 suggested that the ability of sperm exclusion may evolve separately from the ability to produce clonal eggs. Hence, this topic is complex and remains unresolved, and we feel that it is out of the scope of the present MS. We have slightly modified the text and added 2 refs., to address your suggestion.

      Reviewer #1 (Recommendations For The Authors):

      The paper is well prepared - though the resolution of Fig 1 on the pdf is rather poor.

      Thank you! We have now provided the high-resolution figures.

      Overall, I have few suggestions for improvements:

      Line 58. How does endoduplication itself "overcome accumulated incompatibilities" other than failure of synapsis? Perhaps by maintaining the F1 state, and so avoiding reduced fitness arising from recombination and disruption of coadapted gene combinations.

      We have added a sentence to the main text “Premeiotic genome endoreplication thus not only ensures clonal reproduction but also allows hybrids to overcome problems in chromosome pairing that would otherwise lead to their sterility 15,17.” that we hope sufficiently addresses this issue.

      Line 118 - please explain the AKD index here - as you have some in SI. Also please be clearer on how you measure genetic divergence as proportion of heterozygous SNPs - presumably this is via exon sequences from F1 females?

      Please note that we have explained the AKD index in the relevant part of the Methods section already. However, we have now also added a brief explanation to the Results section, as suggested. We apologize for imprecise description of the genetic divergence measurements. As described in the Methods section, this is not measured by heterozygosity (as we wrongly stated here), but as p-distance among sequences of coding regions between parental species.

      Lines 126 ff. It is unfortunate that the design of the crosses was not more balanced or extensive. Nonetheless, I do appreciate the effort involved here and think the results are solid as is.

      Thank you.

      Line 142. Please define PS and TB (and other acronyms) at first use.

      We have added the definition for all acronyms at the first use.

      Lines 192-193. What about EP and EN - as shown to have unreduced gametes in Fig. 2?

      Thank you for this question. Based on analyses of the diplotene stage, we showed that EP and EN hybrids produced diploid eggs. However, in pachytene, we did not find duplicated oocytes due to the rarity of endoreplication. Similarly, the low incidence of duplicated pachytene cells was observed in natural as well as F1-hybrids in loaches and reptiles (Newton et al., 2016, Dedukh et al., 2021, 2022).

      Lines 217-219. The observed correlation of chromosome divergence (AKD index) and numbers of bivalents in pachytene makes sense and is an important observation. Did this GLM simultaneously consider the effect of genetic divergence (as implied in methods)?

      Thank you for this comment. We originally tested separately the fit of two models, one with AKD and the other with SNP divergence. Since the AKD model significantly outperformed the SNP-based one, we focused our interpretation on the former. However, as you suggested, we now re-calculated the model taking into account the joint effects of both predictors in a single model and indeed, this model outperformed both single predictors. In conclusion, while AKD is still the strongest single predictor for the observed amounts of bivalents, the additional effect of genetic distance still significantly improves the model fit. We have now included this result into the main text.

      This finding does not alter our conclusions, it just suggests that the effect of chromosomal morphology is probably more complex, involving the role of more subtle sequence divergence or structural variants.

      Line 242. The Discussion is a great read - careful interpretation and a really interesting interpretation in context of the broader literature.

      Thank you for the appreciation. Your positive feedback and evaluation are highly motivating us to expand our work.

      Line 396. Some references from book chapters (18, 52) are incomplete. Please fix.

      We have now corrected these references accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Transparency about meiocyte sample sizes: These counts are all in supplemental table 3. From this table, it is unclear if a majority of these meiocytes are from a single individual or from multiple males or females. Or, in the crosses where there are multiple families, are the meiocytes sampled from all families? I am trying to get a sense whether endoreplication and the fidelity of oogenesis could be influenced by genetic variants segregating within species. If the meiotcytes are only sampled from a single individual from a single cross, you may not see this variation. If this is the case, perhaps the correlation between genetic divergence and the formation of asexual clones may not be as strong. Additional replicates may not be feasible, but at a minimum I think it would be helpful to address whether endoreplication could or could not be variable and if the sample sizes are sufficient.

      Thank you for raising this point. We have improved the Supplementary table to clarify how many individuals we analyzed from each family and added this information to the main text. Unfortunately, additional replicates are not feasible due to the long generation time of the fish. We otherwise agree with your comment and included this point in the Discussion.

      Gonocyte counts from parental females: The authors say they "analysed hundreds of gonocytes of sexual females without a single incidence of genome endoreplication." I could not find a clear count in the references given. They note that the incidence of endoreplication was very low in pachytene cells in this study (0.7%).

      Thank you, we agree with your comment and included the observations of meiocytes from several parental species, i.e. C. elongatoides, C. taenia, C. pontica, C. tanaitica, and C. ohridana. Among 852 cells analyzed, we did not observe cells with duplicated genomes and abnormalities in chromosomal pairing. By contrast, among 665 pachytenic cells of F1 hybrid females, we revealed altogether ~1% of endoreplicated ones. We tested these data by binomial GLM and found these differences to be significant, suggesting that sexuals, even if they may have some unnoticed duplication events, clearly have significantly lower incidence. of abnormal pachytene cells. We have now included this information in the main text.

      They refer to supplemental table 4 (line 196), which does not exist in the supplement. The authors should report these numbers in the revised manuscript.

      Thank you for pointing this out. We have corrected the name of the supplementary table, it actually is supplementary table S3.

    1. Author Response

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The sgRNAs used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 1. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1) This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      2) The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      3) Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The sgRNAs used to generate PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), while isoforms 4 and 5 are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data in Author response image 2.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown in Author response image 3. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      4) FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      5) All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      6) Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1) Utilization of known AhR ligands as controls will strengthen the interpretation of the conclusions.

      We agree with the reviewer that AhR ligands could be used as controls for delineating structure-activity relationships and cell context-specific effects. However, such studies are beyond the scope of the current manuscript. The AhR has many endogenous ligands, including several tryptophan derived metabolites, that have been shown to elicit different responses depending on the dose and cell type. Our unpublished data show that the expression of AhR target genes such as Cyp1a1, Cyyp2e1, and Tiparp were not modulated by I3A in RAW cells, which suggests that the observed effects may occur independent of the AhR.

      Reviewer #2:

      Specific comments:

      1) The title is misleading "Microbially-derived indole-3-actate" suggests that this article is about the production of I3A by the gut microbiota, in fact this is a dietary supplementation article. The title needs to reflect this fact.

      Our title reflects the natural source of I3A in mice. We used oral supplementation to study the effects of this metabolite. Per suggestion by the reviewer, we changed the title as follows: <br /> “Oral supplementation of gut microbial metabolite indole-3-acetate alleviates diet-induced steatosis and inflammation in mice”

      2). The description of the amount of I3A in the drinking water is not properly described. The actual concentration in the drinking water should be given.

      The concentration of I3A in drinking water was as follows: WD50 = 0.5mg/ml and WD100 = 1mg/ml. We added this information in the revised manuscript.

      3) The serum concentration data of I3A is critical data and should be moved in Figure 1.

      We have now included serum levels of I3A as part of Figure 1.

      4) The authors should have determined the actual concentration of indole-3-actetate in serum by running a standard curve of I3A during the LC-MS analysis. Also, recovery and matrix effects should be determined. Without this information their data will be difficult to compare to other studies.

      We agree with the reviewer that quantification of I3A in serum would be useful. However, we are unable to do so due to limited sample available as well as concerns with sample integrity after long-term storage.

      5) In the data in Figure S1C, there appears to be only 2-3 mice out of nine that exhibit a difference in serum indole-3-acetate levels between the WD-50 and WD-100. Do the authors have an explanation for this small difference compared to the other endpoints assessed?

      The serum I3A measurements at week 16 are a snapshot that may not reflect tissue levels due to differences in water intake, I3A metabolism in the body, and/or elimination of I3A. The other phenotypic assays are physiological measurements that reflect the result of sustained administration of I3A.

      6) Since the Ah receptor may play a role in the results obtained CYP1A1 mRNA levels in the liver and intestinal tract should have been measured.

      We measured alterations in Cyp1a1 mRNA in the liver and no significant change was observed in the WD50 and WD100 groups relative to controls. Also, see response to reviewer 1.

      7) The main mechanistic experiment performed is shown in Figure 6 and the figure legend states that they are examining macrophages, but these are cell lines, they are macrophages models, and this should be clearly stated. The first two panels are liver data, so the title of the figure legend needs to reflect that fact.

      We agree and have changed the title of Figure 6 to “I3A modulates AMPK phosphorylation and suppresses RAW 264.7 macrophage cell inflammation in an AMPK dependent manner”.

      8) In Figure 6, 1 mM I3A is added to the cells, how is this very high concentration relevant to the concentrations observed in vivo? Does adding 1 mM acetate to the cell culture media lower the pH of the media and could this influence the results obtained? Would acetic acid yield the same results? Could treatment with an acid even explain in vivo results?

      It is difficult to match the concentration of I3A in the in vitro experiments to liver tissue concentrations. Addition of 1 mM I3A did not lower the pH of cell culture media or reduce the viability of cultured RAW 264.7 macrophages. As I3A is not known to degrade into acetic acid and indole, we do not expect acetic acid to recapitulate the effects elicited by I3A.

      Reviewer #3:

      My primary concern with the manuscript is the organization and interpretation of the data. It appears that little effort was given by the authors on interpreting the data and digesting it for the reader into a coherent package. Rather, the authors have collected a vast amount of data and organized it without much thought about what the reader would take away from it. Furthermore, it seems the authors have taken this as an opportunity to overload this manuscript with data that are superfluous to the conclusions the authors draw at the end. Based on this, I think the authors need to invest more time into distilled their complex biological data into a unifying scientific interpretation for the readers that advances our understanding of I3A. My suggestions for the authors are described below.

      1) The data lack a rationale behind how they are organized within the manuscript. For example, the authors will combine disparate biological pathways and lump data together without logic as in Figure 2. Why are inflammatory pathways and bile acid synthesis combined in a figure? What was the rationale?

      We respectfully disagree that the data are presented without rationale. Both inflammation and bile acid dysregulation are commonly observed with NAFLD and thus are presented in two separate panels of Figure 2 (A, inflammatory cytokines, and B bile acids).

      2) The authors give very little effort to performing integrative omics analysis even though multi-omics is provided. Example given, the authors provide proteomic data on the fatty acid metabolism pathway, however, no mention of this pathway within the metabolomic dataset. Vice versa, the authors provide in depth investigation in the metabolic changes within the tryptophan pathway, however, no investigation into the proteomic changes that may underlie this phenomenon. It would be recommended that the authors invest more energy into performing more in-depth analysis of their multi-omics data presented.

      We attempted to co-analyze the proteomic and metabolomic data, but this analysis was not informative. Protein and metabolite abundances do not necessarily correlate, and the two types of omics data carry different observation biases. For example, label-free, untargeted proteomics data favor abundant proteins, whereas untargeted metabolomics data are influenced by concentration and ionization efficiency, among other factors. Therefore, we opted to analyze the two datasets independently, and then linked the findings from the two analyses using biological pathways as guides. For example, we describe changes in acyl-carnitine and discuss how this observation is consistent with changes in abundance of fatty acid metabolism enzymes.

      3) Figures 1&2 shows that low dose treatment reduces inflammation but does not alter hepatic TG levels. This is in direct disagreement with the graphical model provided by the authors (Supp. Fig 9). In the author's model, I3A is directing hepatic lipid metabolism through modulation of macrophage inflammation. This interpretation is erroneous and needs to be reevaluated by the authors. Furthermore, the tryptophan pathway and bile acid pathways are not even represented in the model, which begs the question of why that data are included in the manuscript to begin with.

      We would like to respectfully point out that Figure 1D does show a statistically significant (p < 0.05) difference in liver TG between the WD and WD100 groups. Supp. Figure S9 is meant to be a summary of the main biochemical changes elicited by I3A that we have shown in the current study (e.g., the involvement of AMPK) rather an atlas of all the changes detected in the metabolomics and proteomic data. Specifically, we have not included the tryptophan or bile acid pathways as we do not have mechanistic information on how these changes are mediated by I3A.

      4) The authors switch from hepatocytes to macrophages without giving any rationale, The authors need to invest more time into describing a logical flow of thought when assembling the manuscript.

      We mention the rationale for investigating the effect of I3A on macrophages in the introduction (last paragraph of the section): “In vitro, both I3A and TA attenuated the expression of inflammatory cytokines (Tnfα, Il-1β and Mcp-1) in macrophages exposed to palmitate and LPS.”. We also explain why we used an in vitro model, RAW cells, at the beginning of the corresponding Results section: “Since our previous study found that the metabolic effects of I3A in hepatocytes depend on the AhR, we tested if this was also the case in macrophages.” Moreover, the strong effects of I3A on liver inflammatory cytokines also motivates the macrophage experiments.

    1. Author Response

      We thank the Editors and the Reviewers for the time spent on our manuscript entitled “The CD4 transmembrane GGXXG and juxtamembrane (C/F)CV+C motifs mediate pMHCII-specific signaling independently of CD4-Lck interactions”. We appreciate the helpful feedback and the opportunity to participate in eLife’s new model for publishing.

      We are writing to provide the following provisional author responses for posting with the first version of the reviewed preprint:

      1) To address comments about the limited scope of this study and referencing of the Methods section to our prior study, we would like to note that we submitted the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publication (PMID: 35861317) and address an unresolved question from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reductionist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the question being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such articles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanistic insights or extend the pathway under investigation…”).

      a. The Methods were not duplicated in this manuscript because we referenced our prior study as per instructions for the Research Advance mechanism.

      2) The constituent residues of the motifs analyzed in this and our prior study were determined to be functionally significant in vivo through the computational reconstruction of CD4’s evolutionary history, which provided us with data from ~435 million years of natural experiments with CD4 in numerous jawed vertebrate species. We agree that having conditional knock-in mice of these CD4 mutants, and those characterized in our last study, would be useful for determining how these mutations impact T cell development, activation, differentiation, and effector function. Given the costs involved with making genetically engineered mouse model systems, the computational and experimental data we have generated in the current and prior study will help us prioritize next steps to dig deeper into the details of why the residues we are studying are under purifying selection (fail to propagate to progeny if mutated, meaning terminal). In short, only now, with the data in hand, can we prioritize mouse studies. We think it is important for the advancement of the field that we make these results available in a timely manner rather than waiting to report them together with the results of mouse models once generated and analyzed.

      3) The reductionist experimental data presented here provide us with mechanistic insights into why the residues we are studying are functionally important. We therefore think it is of value to note that 58a-b- T cell hybridomas were used in seminal work that established a link between CD4Lck association, via motifs in the CD4 intracellular domain, and signaling output as measured by IL-2 production (Glaichenhaus, et al., 1991). Importantly, the impact of disrupting CD4-Lck interactions on proximal signaling were not interrogated until the work we describe here and in our preceding study, wherein we establish that CD4-Lck association does not regulate proximal signaling in 58a-b- T cell hybridomas. Given that this experimental system was used to help establish the dominant paradigm (i.e. the widely held view that CD4 recruits Lck to TCR-CD3 to initiate pMHCII-specific signaling), we think it is a legitimate system to directly test this model and further test core questions of CD4 function by employing more modern experimental techniques.

    1. Author Response:

      We would like to express our heartfelt gratitude for the reviewers’ scholarly and insightful reviews of our manuscript. The constructive comments and thought-provoking experimental proposals have been invaluable not only in improving the quality of this study but also in shaping the direction of future research. In revision, all comments will be addressed point-by-point, and the manuscript will be revised thoroughly. Here in this reply, we focus on the most critical issue regarding the source of noises during stability inference.

      When faced a stack of objects, individuals are more likely to assess taller stacks of objects as being more unstable compared to shorter ones (Fig. 2b & 2d). This bias persists even when comparing single objects of different heights that share the same contact area with the supporting surface. Known as “stability inference bias,” this phenomenon challenges deterministic models with a single, fixed vector for the representation of gravity’s direction (i.e., directly downward). To reconcile this bias with deterministic models, previous studies (e.g., Allen et al., 2020; Battaglia et al., 2013; Kubricht et al., 2017) have incorporated external noises such as perceptual uncertainty and external force perturbations to increase their fit to human performance, also pointed out by Reviewer 1.

      In this study, we introduced an alternative perspective through a stochastic model in which variability is instead embedded in the representation of gravity’s direction. In this framework, gravity’s direction is not a fixed vector but a distribution of possible vectors, with the vertical direction serving as the maximum likelihood. While the distinction between deterministic and stochastic models is conceptually clear, mathematically they are equivalent. In addition, our stochastic model does not negate the role of external noises in stability inference, because gravity is seldom the sole force acting upon a moving object in the physical world, as pointed out by Reviewer 1. Together, these two factors make it challenging to ascribe the source of variability to either external or internal noises (Smith & Vul, 2013). This is the major concern raised by all three reviewers.

      To distinguish between the deterministic and stochastic models, we designed a series of experiments aimed at demonstrating that internal noises, rather than external noises such as perceptual uncertainty or external force perturbations, influences our inference about object stability. However, the supporting evidence was dispersed and at times implicit throughout the manuscript. In revision, we will thoroughly clarify the ambiguities. In this reply, we will consolidate and present the evidence comprehensively.

      1. The examination of external noises.

      1.1 External Force Perturbations. Deterministic models suggests that during object stability inference, individuals implicitly assume the presence of external forces (e.g., wind) that could destabilize stacks. While this assumption aligns with the omnipresence of such forces in natural settings, it overlooks a crucial variable: the directionality of these external forces. In psychological studies, individual differences are commonly observed, and the perceived force direction is not an exception. That is, some may assume that it comes from the left, while others from the right. In essence, if external forces were to play a significant role in stability inference, one would expect the perceived force directions to exhibit non-uniform distributions (i.e., anisotropy) in the horizontal plane within individuals and to show substantial variability between individuals.

      Contrary to this expectation, our study revealed a different pattern. In the study, we specifically measured the distribution of 𝜑, the horizontal component reflecting the direction of object collapse. Our results indicated that all participants exhibited a uniform distribution of gravity’s directions in the horizontal plane (Fig. 1d right; Extended Data Fig. 2 and 3). This uniformity suggests that if external forces were a key determinant in stability inference, participants would have to assume a varying direction of external force in each trial—an assumption we consider unlikely. Instead, our RL model simulation suggests that the isotropy of 𝜑 arises from agent-environment interactions, notably in the absence of external forces (Extended Data Fig. 6).

      In summary, the uniform distribution of horizontal direction component, 𝜑, observed in all participants, challenges the argument for the dominant role of external forces in stability inference. We are sorry that this aspect was not explicitly emphasized in the original text, and in revision we will explain why external forces are unlikely to substantially shape our perception of object stability.

      1.2 Perceptual uncertainty. To assess the impact of perceptual uncertainty on stability inference, we examined whether the representation of gravity’s direction is cognitive impenetrable. Specifically, we posited that if noises are external (i.e., perceptual uncertainty), the inference bias should be modulated by task context; in contrast, if noises are internal, the stochastic representation of gravity’s direction will be encapsulated from the context. To test this idea, we inverted the virtual environment, making gravity appear to point upward (also see a similar idea by Reviewer 3). In this unfamiliar context, which diverges dramatically from daily experiences, one would expect heightened perceptual uncertainty, which according to deterministic models would result in a larger inference bias – manifested as an increased width of the distribution (𝜎) of gravity’s direction. Contrary to this prediction, we observed that the width of the distribution remained unchanged (Fig. 1d and 1f). Furthermore, there was a high correlation (r = 0.91) between widths in the upright and inverted conditions across participants (Extended Data Fig. 2 and 3).

      In summary, this finding suggests that the manipulation of perceptual uncertainty is unable to cognitively penetrate the representation of gravity’s direction, casting doubt on its dominant role in stability inference. We are sorry that in the original text, we did not clarify the rationale for employing the approach of cognitive impenetrability. In revision, this will be clarified.

      2. The origin of intrinsic noises in stability inference.

      In deterministic models, either external force perturbations or perceptual uncertainty is often assumed but rarely empirically tested. Indeed, these external noises are introduced primarily to account for observed biases in stability inference. In this study, we explicitly examined the possible origin of the intrinsic noises embedded in the representation of gravity’s direction. Without assumed perceptual uncertainty and external perturbation of forces, the RL model simulation showed that the distribution could evolve naturally based mainly on the agent’s experience, as it used the mismatch between the expectation and the observed state of the stack under natural gravity to update its representation of gravity’s direction (Fig. 3a). Importantly, the width of the distribution for the agent was comparable to that of human participants as measured in the psychophysics experiments (Fig. 3b). Therefore, the experience alone may be sufficient to generate stochastic representation of gravity’s direction, obviating the need for external noises.

      Taken together, these findings underscore the limitations of the combination of deterministic models and external noises in accounting for stability inference, and suggest that intrinsic noises embedded in the representation of gravity play a pivotal role in shaping our stability inference of the physical world.

      3. Thought experiments.

      Although the evidence shown above may provide valuable insights, our study does not definitively settle the debate between deterministic models and our proposed stochastic model. Specifically, our study only preliminarily investigates two sources of external noise, perceptual uncertainty and external force perturbations, leaving many other factors such as object mass and surface friction, unexplored (for studies on these factors, please see Hamrick et al., 2016). As such, the reviewers have proposed a series of thought experiments that warrant further investigation. Below, we enumerate some of them, followed by ours.

      3.1 Experiment 1. Reviewer 3 proposed a thought experiment in which participants assess stability of a single block of varying heights. The reviewer argues that a block, regardless of its height, will remain stable on a horizontal surface unless externally disturbed. This assertion is perfectly true in the physical realm. However, in the cognitive domain, both deterministic models and our stochastic model predict differently. Take an extreme example of a standing needle: while it would remain upright in the physical world without external disturbances, both deterministic and stochastic models, which account for mental inference of physical events, will predict a likelihood of it falling, aligning with our subjective feelings. This is because in both models, noises are considered in the intuitive physics engine. In deterministic models, external force perturbations, as well as perceptual uncertainty, are assumed to be omnipresent noises in probabilistic reasoning. In our stochastic model, noises are embedded in the representation of gravity’s direction. Therefore, although this thought experiment, along with other thought experiments on object mass, surface friction (proposed by Reviewer 3), and falling trajectories behind an occlude (proposed by Reviewer 1), is insightful, but it cannot serve to differentiate deterministic and stochastic models. 3.2 Experiment 2. Reviewer 2 suggested constructing a wall on one side of the virtual scene to make it improbable that participants would infer an external force perturbation emanating from that direction. In this setting, deterministic models would predict a non-uniform distribution of the horizontal component, 𝜑, skewed away from the wall. In contrast, according to our stochastic model, the distribution of 𝜑 would remain unaffected, maintaining the uniform distribution observed in previous experiments. Extending this logic, another test scenario could contrast an indoor scene with an outdoor scene. In a confined and static indoor environment, the likelihood of external force perturbations should be much lower than in a dynamic, open outdoor setting. Here, deterministic models would predict an increase in the width of the distribution, 𝜎, in the outdoor environment, whereas our model would anticipate no such change. The underlying rationale for these experiments parallels that of our previous setup (figure 1e), where we inverted the virtual environment and reversed the direction of gravity. Indeed, they all aim to assess the extent to which manipulations of external factors can cognitively penetrate the representation of gravity’s direction.

      3.3 Experiment 3: A noteworthy insight derived from our RL model simulation relates to variations in the number of blocks within the virtual worlds. Deterministic models would predict an enlarged bias in stability inference as the number of blocks increased, which is attributed to elevated levels of perceptual uncertainty and an expanded area susceptible to external force perturbations. However, the results from our RL model simulation contradict this prediction, revealing that an augmented number of blocks instead led to a narrowing of the width of the distribution. This decrease in width can be ascribed to richer information provided by a larger number of blocks for refining its representation of gravity’s direction. In line with this rationale, we propose a new experiment from the perspective of ecological psychology, which emphasizes that cognitive processes are shaped by our interactions with the environment. Specifically, we hypothesize that individuals raised in mountainous terrains may exhibit more accurate representations of gravity’s direction than those raised in flat terrains. This proposed experiment could not only help resolving the ongoing debate between two models to some extent, but also advocate future studies on intuitive physics within a more ecologically valid framework.

      To conclude, both deterministic and stochastic models align closely with Bayesian principles, where stability inference is conceptualized as probabilistic reasoning. Nevertheless, the divergence between them is no trivial, as it hinges on distinct philosophical assumptions about the relationship between the inner mind and the external world. Deterministic models propose that the mind serves as a faithful reflection of the world; therefore, gravity’s direction is represented as a single, fixed vector directly downward, the same as that in the world. In these models, uncertainty for probabilistic reasoning emanates from factors external to the module of the intuitive physics engine. In contrast, our stochastic model underscores the notion that the mind is an active inference machine, continually reinterpreting inputs from outside world; therefore, the mind gains increased adaptability, allowing for a more nuanced accounting of uncertainty in the world – factors often crucial for survival. Such active inference necessitates flexible representations; accordingly, within the model of intuitive physics engine, variations are embedded into the representation of gravity’s direction. While resolving this philosophical debate is beyond the capacity of the present study, we contend that the field of intuitive physics offers a valuable lens through which to pry open the complex interplay between the mind and the world we live in.

      References

      • Allen, K. R., Smith, K. A., & Tenenbaum, J. B. (2020). Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences, 117(47), 29302–29310.
      • Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.
      • Kubricht, J. R., Holyoak, K. J., & Lu, H. (2017). Intuitive physics: Current research and controversies. Trends in Cognitive Sciences, 21(10), 749–759.
      • Smith, K. A., & Vul, E. (2013). Sources of uncertainty in intuitive physics. Topics in Cognitive Science, 5(1), 185–199.
    1. Author Response:

      Reviewer #1 (Public Review):

      Summary: The authors made significant updates to Hippacampome.org including 50 new cell types.

      Strengths: The authors have been thorough in basing their views on peer-reviewed literature. They have made the data highly accessible and the user has the ability to control what is included.

      Weaknesses: There are many inconsistencies in the literature regarding cell types and how these are incorporated into hippocampome.org is not clear.

      We agree with the Reviewer that there can be inconsistencies in the literature, especially when it comes to nomenclature. This is why for Hippocampome.org v1.0 we decided to focus on the morphologies, the distributions of axons and dendrites across the layers and parcels of the hippocampal formation, rather than the names authors have applied to the neurons they are studying. We have also clarified our stance on nomenclature in our Brain Informatics manuscript that accompanied v1.1. We will revise the manuscript to make these points explicit.

      Properties are often a result of modeling and not biological data, and caveats to this approach, and other assumptions are unclear.

      The foundation for Hippocampome.org has always been the data that are published in the literature. Those include, among others, the axonal and dendritic spans in each layer and subregion, the molecular expression patterns, the total neuron count by layer and subregion, the membrane properties, firing patterns, and experimental synaptic signals and corresponding covariates. For all of those, we do not depend on how the data are modeled, although there is always some level of interpretation of the data to make them machine readable and ready for incorporation into our database. However, some of the simulation-ready parameters now also included in Hippocampome.org are indeed the result of modeling, such as the neuronal input/output functions (Izhikevich model) and the unitary synaptic values (Tsodyks-Markram model). Other simulation-ready parameters are the result of specific analysis approaches, including the connection probabilities (axonal-dendritic spatial overlaps) and the neuron type census (numerical optimization of all constraints). We plan to explicitly distinguish among these various cases in the revised manuscript.

      Several interneuron subtypes in the dentate gyrus do not appear to be listed, such as neurogliaform cells.

      The neuron types listed in Figure 2 of the current manuscript are only the new additions to the catalog of neuron types at Hippocampome.org v2.0. DG Neurogliaform cells were included in our original eLife manuscript, which described the deployment of v1.0 of the website. We will clarify this in the revisions.

      The nomenclature HIPROM should be distinguished or made synonymous with HIPP. Same for MOCAP and MOPP/HICAP.

      The Reviewer has referred to 5 separate neuron types in Hippocampome.org. Each neuron type has a unique distribution of axonal and dendritic invasions of the 26 layers and parcels of the hippocampal formation. For example, HIPROM cells have dendrites in the inner one-third of stratum moleculare, stratum granulosum, and hilus and axons in all four layers of the dentate gyrus in addition to axonal projections into CA3 stratum radiatum, stratum lucidum, stratum pyramidale, and stratum oriens. HIPP cells in contrast have dendrites only in the hilus and axons only in the outer two-thirds of stratum moleculare with no cross-subregional projections. Similar considerations distinguish MOPP, MOCAP, and HICAP cells in Hippocampome.org. In expanding the nomenclature to include the neuron types we first described at Hippocampome.org, we attempted to mimic the styling of the already established neuron types of the DG: HIPROM (Hilar Interneuron with PRojections to the Outer Molecular layer), HIPP (HIlar Perforant Path-associated), MOCAP (MOlecular Commissural-Associational Pathway-related axons and dendrites), MOPP (MOlecular layer Perforant Path-associated), and HICAP (HIlar Commissural-Associational Pathway-related). We intend to insert a paragraph in the revised version to clarify these issues.

      Dorsal ventral and sex differences are not mentioned.

      We thank the Reviewer for pointing this out. As a result of the dearth of literature describing differences between dorsal and ventral hippocampus when we first assembled Hippocampome.org v1.0, we made the decision to focus solely on the distributions of the axons and dendrites along the depth, or layers, of the hippocampal formation. As the amount of literature concerning relating to the other axes of the hippocampus continues to grow, we will gradually incorporate information along the added dimensions into our knowledge base. In the revised manuscript we intend to note this, and also stress the fact that Hippocampome.org contains knowledge from a mixture of sexes, and that whenever the original papers report the animal sex, so does our knowledge base. The revised manuscript will also mention that, whenever possible (e.g. synaptic physiology parameters), values are reported separately for males and females.

      Reviewer #2 (Public Review):

      Summary and strengths: The authors have developed a helpful resource for the community regarding hippocampal cell types and their interactions from many perspectives. There have been many updates to hippocampome v1.0 to v1.12, that are nicely summarized and explained (e.g., Table 1). The content and impact are also presented (Fig. 4).

      Weaknesses: My main comment is that it is not completely clear and/or it is a bit buried as to what makes this v2.0 (rather than v1.13). The title would seem to encompass it ('... enabling data-driven spiking neural network simulations...), but in the introduction, the authors seem to emphasize "50 newly identified neuron types...". Is it the case that launching network simulations (using CARLsim) was not possible up to v1.12? I don't think so? I think that this research advance is to announce and summarize the various updates and to demonstrate how network simulations can be easily done? If so, this should and could be made more clear so that the reader does not necessarily have to go through all the previous versions to understand what is 'special' or different about v2.0. This could perhaps be achieved by situating their tool and its goals relative to other efforts (e.g., blue brain project) that are mentioned in the Discussion?

      We thank the Reviewer for their helpful suggestions. Hippocampome.org v1.12 included the final piece needed, the synaptic physiology parameter values, to start fully simulating the hippocampal formation. In the revised manuscript, we will endeavor to emphasize more the specialness of v2.0 over the various v1.X in the Abstract, Introduction, and Discussion, in part by more fully describing the differences between our work and that of other efforts, such as the Blue Brain Project.

      Reviewer #3 (Public Review):

      Summary: The authors aim to provide a multidisciplinary resource on the structural and physiological organization of the hippocampal system and make the available experimental data available for further theoretical work, providing tools to do so in a very flexible and user-friendly way. Since this is a new version of an already existing data-resource, the authors certainly reach their aim and fulfil expectations that the reader might have. The content of the database is as good as the original data, collected from the published knowledge-database, sometimes with the help of the original authors, and the overall quality depends further on how the data are curated by the team of authors and many others who helped them. That process is briefly described and more details are available in descriptions of previous versions and on the website. The data extraction, examples of how data can be used, and the part on attempts to model the hippocampus are exciting and open doors to new and exciting research opportunities.

      Strengths: Excellent description with many outlined opportunities. Nicely illustrated and inviting to explore the online database.

      Weaknesses: The figures are complex, containing a heavy information load with many abbreviations. You need some general knowledge of the system in order to grasp the enormous potential of what is provided.

      We agree with the Reviewer that we generously used abbreviations throughout our figures as a means of conserving limited space. We have attempted to balance that by providing a complete glossary of all the abbreviations used throughout the manuscript. However, we will make an effort to supply definitions of the abbreviations in the figure captions and at their first use in the manuscript, or even replacing the abbreviations altogether in key places in the figures.

    1. Author Response

      We are very thankful for the editors' and reviewers' thoughtful feedback and criticisms on our manuscript. We have carefully considered all of the comments and will provide a revised manuscript with detailed responses as soon as we can. In the meantime, we will make our best effort to conduct additional experiments to further support our conclusions.We greatly appreciate the time and consideration given to improving our work.

      Reviewer #1 (Public Review):

      Summary:

      The question at hand is whether astrocytes contribute to the mechanism of long-term synaptic potentiation (LTP) at synaptic contacts between excitatory glutamatergic neurons and inhibitory neurons (E-I synapses). This is a legitimate query considering the immense body of work that has now established synaptic plasticity (LTP, LTD and spike-timing dependent plasticity) as an astrocyte-dependent process at excitatory synapses and, by contrast, the lack of knowledge on whether and how astrocytes control IN activity. Taking direct inspiration from that same body of work, authors recapitulate a number of experiments and approaches from prior seminal studies and provide evidence that E-I synapses in the stratum radiatum of the hippocampus display NMDAR-dependent plasticity, which can be suppressed by pharmacologically hindering astrocytes physiology, preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors. Under any of these conditions, LTP can still be rescued by exogenously applying D-serine, a naturally occurring co-agonist of NMDARs primarily released by astrocytes. Coincidently, authors show that the conditions used to elicit LTP also cause a transient increase in NMDAR co-agonist site occupancy. Lastly, based on some evidence that gamma-CaMKII is predominantly expressed in INs rather than excitatory neurons, authors conducted AAV-mediated IN-specific gamma-CaMKII shRNA experiments and found that this is sufficient to suppress LTP at E-I synapses. They found that this approach also impairs contextual fear learning in behaving mice. Authors conclude that astrocytes gate LTP at E-I synapses via a mechanism wherein neuronal depolarization during LTP induction elicits endocannabinoid release which drives CB1-dependent astrocyte Ca2+ activity, causing the release of the NMDAR co-agonist D-serine (required for NMDAR activation).

      Strengths:

      This is an important question and the experimental work seems to have been conducted at high standards. The electrophysiology traces are impeccable, the experiments are well powered, including the behavioral testing, and multiple controls and validations are provided throughout. The figures are clear and easy to understand. Overall, the conclusions from the study are consistent, or partially consistent, by the findings.

      We greatly appreciate you taking the time to evaluate our study thoroughly and provide such thoughtful feedback.

      Main Weaknesses:

      1) A major point of concern is the lack of proper acknowledgment of the seminal studies that were mimicked in this manuscript, notably Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017. The entire study design is a replica of these landmark studies: it isn't built upon or inspired from them, it exactly repeats the experiments and methods performed in them, coming dangerously close to being simply a hidden attempt to plagiarize published work. The resemblance goes as far as using an identical figure display (see Fig4.D vs Fig 2D of Ref#4). The issue is that authors frame the problem, scientist logic, reasoning, technical tricks, approaches, and interpretations as their own whereas, in reality, they were taken verbatim out of previous work and applied to a (shockingly) similar problem. The probity of the present study is thus in question. Authors need to clearly acknowledge, in all relevant instances, that the work presented here recapitulates the approach, reasoning and methodology used in past seminal studies that tackled the mechanisms of astrocyte regulation of LTP.

      Thank you very much for your review and valuable comments on our manuscript. We greatly appreciate your concern regarding the proper acknowledgment of previous studies. We sincerely apologize for not adequately citing and acknowledging the seminal works in our manuscript. We highly value avoiding academic misconduct.

      For the research design, although there are some similarities between our work and other studies, our key scientific questions and technical approaches are markedly different, as evidenced by our central hypothesis and experimental methods. We did not completely replicate their research design.

      Regarding research methods, many basic techniques like electrophysiology, chemogenetic are common experimental methods, not patented by any one paper. Our choice of methods is based on the research needs, not to replicate a particular paper. But we recognize that there are similarities in our experimental methods, specifically the chemogenetic stimulation of astrocytes to induce de novo LTP, which has been inspired by previous studies (Van Den Herrewegen et al. Molecular Brain (2021), Adamsky et al. Cell (2018), Nam et al. Cell reports (2019)). We were also inspired by the previous work of Henneberger et al. in Nature (2010) to investigate whether stimulation, specifically we using TBS (theta burst stimulation), could transiently increase NMDA receptor-mediated synaptic responses.

      For the similarity between our Fig. 4D and Fig. 2D of Ref. 4, it is primarily because both studies have the similar purpose(we monitored NMDA currents in interneurons, others monitored in pyramidal cells) using similar methods, but our figure layout follows a regular display pattern. Additionally, we would like to draw your attention to our previous studies, specifically Shen et al., Scientific Reports (2017), Supplementary figure 4, and Shen et al., Journal of Neurochemistry (2021), Supplementary figures 8 and 9. In these studies, we also employed a regular display pattern in our figure layouts. It is important to note that while there may be similarities in the figure arrangement, each study presents distinct findings and contributes to the broader understanding of the topic.Our use of a similar way to present data does not equal plagiarism. We apologize for any confusion caused by the lack of explicit citation and acknowledgment in our manuscript again. In the revised version, we will ensure to provide clear and detailed references to all relevant studies.

      In terms of citations, we have cited Henneberger et al, Nature 2010, Adamsky et al, Cell 2018; and Robin et al., Neuron 2017.'s work in multiple places, indicating we have learned from their research ideas and findings. We will supplement any missing citations. But overall, our work has distinct differences and innovations.

      We are not intended as a hidden attempt to plagiarize or simply replicate their methods. Rather, they are part of a deliberate effort to establish a comparable and reproducible experimental framework. Our study aims to validate and further explore the conclusions drawn by replicating the experiments of these seminal studies and deepening our understanding of the mechanisms of astrocyte regulation of LTPE-I.

      We sincerely appreciate your review and guidance. We will carefully consider your criticism and incorporate more accurate and thorough citations in the revised version, ensuring proper respect and acknowledgment of the previous works.

      2) Relatedly, in past work, field recordings were used to monitor LTP in hippocampal slices (refs 4, 26 and others). This method captures indiscriminately all excitatory synapses where glutamate is released to cause AMPAR-dependent (and NMDAR) transmembrane flux of cations in the postsynaptic element, including E-I synapses and not just E-E synapse like the authors claim. Therefore, a strong argument can be made that there is no actual ground to differentiate the present results from past ones.

      Thank you for your thoughtful comments regarding the differentiation of our results from previous studies. We appreciate the opportunity to address this issue and provide further clarification.

      Indeed, in past studies, field recordings were commonly utilized to monitor long-term potentiation (LTP) in hippocampal slices. It is true that this method captures all flux of cations in excitatory synapses, inhibitory synapses and glia. This includes both excitatory-excitatory (E-E) and excitatory-inhibitory (E-I) synapses.

      When using the LTP recording protocol, one limitation is that the experimenter cannot determine the exact contribution of E-E and E-I currents to the recorded current. Additionally, it is not possible to know, with the same induction protocol, the specific effects on E-E synapses versus E-I synapses. It is plausible that E-E synapses could undergo LTP, while E-I synapses could undergo LTD, or vice versa.

      Thus, it becomes crucial to carefully dissect the functioning of E-I synapses and investigate how astrocytes modulate these synapses. Past field recordings have provided important insights, our selective interrogation of the astrocyte-E-I synapse interface represents a conceptual advance to delineate the nuanced modulation of distinct synaptic connections by astrocytes. We specifically focus on studying the modulation of E-I synapses by astrocytes and aim to elucidate the intricate dynamics and underlying mechanisms. By untangling the complex contributions of astrocytes to E-I synapse function and plasticity, we can unveil novel aspects of neuroglial interactions and advance our understanding of the fundamental principles governing neural network activity.

      3) There is a general lack of excitement about this study. One reason is that it replicates almost identically past work, as mentioned above. Another is that the scientific question and importance of the findings are not framed appropriately. The work is presented as an astrocyte-focused investigation, but it has very limited value to the astrocyte field. The findings are, in all accounts, identical to those unveiled by previous work especially because E-I synapses are, in fact, excitatory synapses. Where this study does bring value, however, is to the field of interneurons, but it would need to be reframed to shift the emphasis from astrocytes to E-I connections. Authors would need to elevate the text by framing their work around relevant considerations, such as IN diversity, mechanisms of LTP in IN subtypes, role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, or grid cells activity etc...

      We appreciate your insightful comments and concerns regarding the lack of excitement surrounding our study. We would like to clarify that while our study use similar certain methodologies, for example electrophysiology, chemogenetics and pharmacology, our research aims to provide a deeper understanding of the underlying mechanisms of how astrocytes regulate E-I synapses. We apologize if this replication aspect was not adequately highlighted in our manuscript, and we will make sure to emphasize the novel contributions of our study in the revised version.

      Regarding the framing of our study, we recognize the importance of interneurons and the role of E-I connections in hippocampal circuit function, information processing, behavior, spatial learning, navigation, and other relevant aspects. However, the scientific question and scope of the study are to explore whether and how astrocytes modulate E-I synapses. We believe that this study brings value to the field of astrocyte-neuron interaction. Of course, this study also brings value to the field of interneurons. Perhaps the lack of excitement among audiences stems from the mechanisms for astrocytes modulating E-I and E-E synapses are the same.

      4) A clear weakness of the study is that it fails to consider the molecular and functional diversity of interneurons in the stratum radiatum and provides no insights or considerations related to it. Authors provide no information on what type of IN were patched, or the location of their cell body in the s.r., effectively treating all patched IN as a homogeneous ensemble of cells - which they are not. Relatedly, the study is extremely evasive on the importance of the results in the context of inhibitory interneurons. This renders the significance of the insights highly uncertain and dampens both the impact of the study and the excitement it generates. Hippocampal interneurons are very diverse in molecular identity, sub-anatomical location, morphology, projections, connectivity and functional importance. Some experts go as far as recognizing 29 subtypes in the CA1, including 9 in the stratum radiatum alone (based on the location of their soma). However, this is neither addressed nor acknowledged by the authors, with the exception of a statement (line 659) where they claim to have "focused on a subpopulation of interneurons in the stratum radiatum" without providing any precision or evidence to corroborate this assertion. This diversity, alone, could explain why not all cells showed LTP, or why the mechanisms authors describe in the radiatum do not seem to be at play in the oriens. Hence, carefully considering the diversity of INs in the present work is necessary. It would refine and augment the conclusions of the paper. Instead of a sub-region specificity, the study might fuel the notion of an IN subtype specificity of LTP mechanisms, which is more useful to the field.

      Thank you very much for your review and valuable comments on our study. We agree with the point you raised regarding a clear weakness in our study, specifically the lack of consideration the diversity of interneurons in the stratum radiatum.

      As the reviewer notes, there are many subtypes of interneurons in hippocampal region CA1 that likely contribute in distinct ways to circuit function. Unfortunately we did not gather information on the specific molecular or morphological identity of the interneurons we recorded from.This is a limitation of our study. We will add discussion of this issue as a caveat, and highlighted it as an opportunity for future work to dissect how long-term potentiation in interneurons regulated by astrocytes may differ across interneuron subpopulations. Thank you once again for your insightful comments.

      5) Authors take several shortcuts. Some of the conclusions are a leap from the experiments and are only acceptable due to the close analogy with very similar investigations conducted in the past that provided identical results. For instance, the present study provides no evidence of any sort that D-serine is involved - rather, it provides evidence that the pathway at hand contributes to increasing the occupancy of the co-agonist binding site of NMDARs. Considering the absence of work demonstrating that D-serine is the endogenous co-agonist of NMDARs at E-I synapses, most of the authors claims on D-serine are unfounded. This would necessitate using tools such as the canonical D-serine scavengers DAAS or DsDA, serine racemase KO mice etc. Similarly, authors provide no compelling evidence that endocannabinoid CB1 receptors involved in this pathway are located on astrocytes

      Thank you for your insightful comments on our study. We appreciate your attention to detail and your concerns regarding our conclusions. We agree that further evidence is needed to establish the involvement of D-serine as the endogenous co-agonist of NMDARs at E-I synapses. We will take into consideration your suggestion of using tools such as D-serine scavengers to provide clearer evidence.

      Regarding the involvement of endocannabinoid CB1 receptors on astrocytes in this pathway, we provide evidence that astrocytic calcium signaling could blocked by CB1 receptor antagonist AM251, as shown in figure 3.However, we agree that further research is necessary to accurately identify the localization of CB1 receptors. As part of our future investigations, we will take note of this limitation in our discussion and emphasize the need for additional studies to explore the precise location of CB1 receptors. In addition, we will endeavor to perform immunohistochemistry to identify the exact location of CB1 receptors in astrocytes.

      Thank you once again for your valuable feedback. We will carefully address these concerns and make appropriate revisions to ensure the clarity and accuracy of our findings.

      6) An important caveat in this study is the protocol employed to induce LTP, which includes steps of sustained depolarization of the patched IN to -10mV. Neuronal depolarization is known to induce endocannabinoids production. In several instances, this was shown to 'activate' astrocytes and elicit the release of astrocyte-derived transmitters at nearby synapses. This implies that the endocannabinoid-dependent pathway described in the study is, most likely, artificially engaged by the protocol itself. Hence, the present work only provides evidence that an astrocyte-dependent, CB1-D-serine-pathway can be artificially called upon with this specific LTP protocol, but does not convincingly demonstrate that it is naturally occurring or necessary for plasticity at E-I synapses. Authors would need to thoroughly address this caveat by replicating some of their key findings (AM251, calcium-clamp, D-serine and CaMKII shRNA) using a protocol that does not entail the artificial depolarization of the patched interneuron.

      Thank you for raising this important point. We agree that the sustained depolarization protocol we used to induce LTP could potentially engage endocannabinoid signaling and astrocyte activation. However, we observed that preventing astrocyte Ca2+ transients or blocking endocannabinoid CB1 receptors prevented the induction of LTP by this depolarization protocol suggests that this astrocyte-endocannabinoid-dependent pathway is necessary,

      Importantly, synaptic depolarization of neurons can occur naturally during learning and memory. Though ‘artificial’ here, our protocol may mimic aspects of natural activity patterns that engage ‘endocannabinoid release’ and astrocyte involvement in plasticity.

      Another limitation of our study is that we currently cannot conclusively determine the source of the CB1. We cannot distinguish whether the CB1 originates from neurons or astrocytes based on our current experiments. We will explicitly acknowledge this caveat in the discussion, noting that further experiments are needed to clarify the cellular origin of the CB1. Thank you for drawing our attention to this critical issue - we will refine the manuscript accordingly to more comprehensively and accurately present the study conclusions and limitations. Your feedback helps improve the rigor of our research.

      7) Reading and understanding are hindered by a rather vast array of issues with the text itself. It needs thorough editing for typos, misnomers, meaning-altering errors in syntax, and a number of issues with English.

      Thank you very much for your review and feedback on our text. We highly appreciate your comments and take them seriously. We will carefully address the issues you mentioned and thoroughly edit the text to eliminate any typos, misnomers, syntax errors that may alter the meaning, and other English-related issues. We truly value your input and appreciate your patience as we work on these improvements.

      Reviewer #2 (Public Review):

      Summary:

      This work explores the implication of astrocytes in the regulation of long-term potentiation of excitatory synapses onto inhibitory neurons in CA1 hippocampus. They found that astrocytes of a sub-region of CA1 regulate this plasticity through their activation of endocannabinoids that lead to the release of the NMDA receptor co-agonist, D-serine.

      Strengths:

      The experiments are well considered and conceptualized, and use appropriate tools to explore the role of astrocytes in the tripartite synapse. The results highlight a novel role of astrocytes in an important aspect of the synaptic regulation of the hippocampal circuit. There are extensive levels of analysis for each experimental group of evidence.

      Thank you for your positive feedback on our study. We appreciate your recognition of the careful consideration and conceptualization of our experiments, as well as the use of appropriate tools to investigate the role of astrocytes in the tripartite synapse. We are pleased to hear that the results have highlighted a novel role of astrocytes in an important aspect of synaptic regulation in the hippocampal circuit.

      Thank you for taking the time to review our work and for providing such positive feedback. We will continue to improve and refine our study based on your valuable comments.

      Weaknesses:

      The authors underscore and used an oversimplified view of the heterogeneity of interneuron populations and their selective roles in the hippocampal network. Also, there is an uneven level of astrocyte-selective tools used in the different experiments which creates an uneven strength of arguments and conclusions regarding the role of glial cells. Finally, the wording used by the authors often lead to some confusion or sense of overinterpretation

      We appreciate the reviewer raising these important points about the characterization of interneuron and astrocyte populations in our study. We agree that oversimplifying or overlooking cellular heterogeneity could undermine the conclusions. In the revised manuscript, we will:

      1) Add more detailed discussion of interneuron diversity. We will note this as an area for further study.

      2) Review the wording used when describing results and conclusions, ensuring we avoid overstating interpretations of the data.

      Thank you again for the thoughtful feedback.

    1. Author Response

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly. Here we address 2 major points.

      1) Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Author response table 1. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a Gly-X-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We conducted pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18) but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      2) Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank both reviewers for their detailed and positive assessment of our work.

      To Reviewer #2, we have now explicated the pattern -- (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition -- in the first paragraph of the discussion.

      To Reviewer #3, we have made slight modifications to the text in the “Q zippers poison themselves” results section, to attempt to further clarify the mechanism of self-poisoning.

      Briefly, the reviewer questions if an alternative model -- where inhibition involves non-structured rather than Q-zipper containing oligomers -- better explains the data. We provided two lines of evidence that we believe exclude this alternative model. First, we point out in the first paragraph of the “Q zippers poison themselves” section that the cells that unexpectedly lack amyloid in the high concentration regime have negligible levels of AmFRET, indicating that the inhibitory oligomers themselves occur at low concentrations regardless of the total concentration, and are therefore limited by a kinetic barrier. Second, we point out in the third paragraph of the section that the severity of amyloid inhibition with respect to concentration has a sequence dependence that matches the expectation of converging phase boundaries for crystal polymorphs -- specifically, inhibition is most severe for sequences that have a local Q density just high enough to form a Q zipper on both sides of each strand. Inhibition relaxed for sequences having more or less Qs than that threshold. In contrast, disordered oligomerization is not expected to have such a dependence on the precise pattern of Qs and Ns.


      The following is the authors’ response to the original reviews.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in what we intend to be a constructive public dialogue.

      Response to Reviewer 1

      This review is highly critical but lacks specifics. The reviewer’s criticisms reflect a position that seems to dismiss a critical role for (or perhaps even the existence of) conformational ordering in polyQ amyloid, which is untenable.

      The reviewer states that our objective to characterize the amyloid nucleus “rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids”. We do not fully agree with this assertion because our findings show that detectable aggregation is rate-limited by conformational ordering, as evident by 1) its discontinuous relationship to concentration, 2) its acceleration by a conformational template, and 3) its strict dependence on very specific sequence features that are consistent with amyloid structure but not disordered aggregation).

      We strongly disagree with the reviewer’s subjective statement that we have not critically assessed our findings and that they do not stand up to scrutiny. This statement seems to rest on the perceived contradiction of our findings with that of Crick et al. 2013. Contrary to the reviewer’s assessment, we argue here that the conclusions of Crick et al. do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained below, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and plausibly akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). Importantly, the physical parameters governing the transition between amyloid spherulites and fibrils have been characterized in the case of insulin (Smith et al. 2012), where it was found that spherulites form at lower protein concentrations than fibrils. This mirrors the observation by Crick et al. that fibrils have a higher solubility limit than the spherical oligomers. . Further rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by the fact that folded proteins can form crystals, and the folded state of the protein. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). When placed in a subsaturated solution, the protein crystals dissolve into the constituent monomers, and yet those monomers still retain intramolecular order. Our present findings for polyQ are conceptually no different.

      To further extrapolate this simple example to polyQ, one can also draw on the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (included in our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We have added a new figure (Fig. 6) to the manuscript to illustrate qualitative features of the amyloid pathway we have deduced for polyQ.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttals to other critiques

      We do not deny that flanking domains can modulate the kinetics and stability of polyQ amyloid. However, as stated and referenced in the introduction, they do not appear to change the core structure. We have also added a paragraph concerning flanking domains to the discussion, and acknowledged that “the extent to which our findings will translate in these different contexts remains to be determined.” Nevertheless, that the intrinsic behavior of the polyQ tract itself is central to pathology is evident from the fact that the nine pathologic polyQ proteins have similar length thresholds despite different functions, flanking domains, interaction partners, and expression levels.

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we have modified the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Response to Reviewer 2

      We thank the reviewer for their detailed and helpful critique.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The reviewer mentions “several caveats” that come with our result, but their subsequent elaboration suggests they are to be interpreted more as considerations than caveats. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this will be confusing to many readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      We believe the revised text also now incorporates the remaining suggestions of this reviewer, with two exceptions. 1) We retain the phrase “hidden pattern”, because we believe our data argue for a nucleus whose formation requires that Qs occur in a pattern that we now elaborate as (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition. In amyloids formed from long polyQ molecules, the nucleus will involve any subset of 12 Qs that match this pattern. 2) We decided not to re-order the mansucript to discuss self-poisoning after establishing the monomer nucleus (even though we agree that doing so would improve the logical flow) because the interpretation of the data with respect to self-poisoning helps to establish critical strand lengths, and self-poisoning creates an anomaly in the DAmFRET data that is difficult to ignore. We add text clarifying that high local concentrations “effectively shifts the rate-limiting step to the growth of a higher order relatively-disordered species”.

      Response to Reviewer 3

      We thank the reviewer for their helpful comments.

      We opted to retain Figures 1A and B because we think they are important for comprehending the subject and objectives of the study. We modified the former to attempt to make it more clear. We have also elaborated on DAmFRET as it is a relatively new approach that may be unfamiliar to many readers. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We have revised the tautological statement by removing “non-amyloid containing”.

      Concerning the correlation of our data with the pathological length threshold -- as we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      We have softened the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of our statements concerning the possible role of self-poisoned oligomers in toxicity.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Regarding the arguments for lateral and axial growth, we agree that the data are indirect. However, that polyQ forms lamellar amyloids both in vitro and in vivo is now established, so we do not feel it necessary to rigorously show that here. Nevertheless, we need to include this section primarily because it introduces the fact that ordering in polyQ amyloid occurs in the lateral as well as axial dimensions, and the onset of lateral ordering (lamellar growth) explains the very different behaviors of QU and QB sequences apparent on the DAmFRET plots. Ultimately, the two dimensions of growth are important to understand self-poisoning and maturation of the short nucleating zipper to amyloid.

      References

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301 Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The current manuscript provides a timely contribution to the ongoing discussion about the mechanism of the apical sodium/bile acid transporter (ASBT) transporters. Recent structures of the mammalian ASBT transporters exhibited a substrate binding mode with few interactions with the core domain (classically associated with substrate binding), prompting an unusual proposal for the transport mechanism. Early structures of ASBT homologues from bacteria also exhibit unusual substrate binding in which the core substrate binding domain is less engaged than expected. Due to the ongoing questions of how substrate binding and mechanism are linked in these transporters, the authors set out to deepen our understanding of a model ABST homolog from bacteria N. meningitidis (ABST-NM).

      The premise of the current paper is that the bacterial ASBT homologs are probably not physiological bile acid transporters, and that structural elucidation of a natively transported substrate might provide better mechanistic information. In the current manuscript, the authors revisit the first BASS homologue to be structurally characterized, ABST-NM. Based on bacteriological assays in the literature, the authors identify the coenzyme A precursor pantoate as a more likely substrate for ABSTNM than taurocholate, the substrate in the original structure. A structure of ASBT-NM with pantoate exhibits interesting differences in structure. The structures are complemented with MD simulations, and the authors propose that the structures are consistent with a classical elevator transport mechanism.

      The structural experiments are generally solid, although showing omit maps would bolster the identification of the substrate binding site.

      We have added an omit map in Fig S2.

      One shortcoming is that, although pantoate binding is observed, the authors do not show transport of this substrate, undercutting the argument that the pantoate structure represents binding of a "better" or more native substrate. Mechanistic proposals, like the proposed role of T112 in unlocking the transporter, would be much better supported by transport data.

      In the absence of being able to source radiolabelled pantoate at a reasonable cost, we decided to focus on binding studies, relying on the fact that pantoate/pyruvate uptake has been shown in other BASS transporters. While we agree that transport needs to be substantiated, our crystallographic and molecular dynamics studies combined provide a picture of sodium ions stabilising the substrate binding site to enable the binding of the substrate, which in turn induces further conformational changes. Such changes would be consistent with a mechanism of sodium driven transport with clear coupling of the sodium ions to substrate translocation. We are not saying this is a “better” substrate but rather that a substrate binding like this would be able to elicit the conformational changes necessary for transport – something that has been missing from previous studies.

      Reviewer #2 (Public Review):

      The manuscript starts with a demonstration of pantoate binding to ASBTnm using a thermostability assay and ITC, and follows with structure determinations of ASBTnm with or without pantoate. The structure of ASBTnm in the presence of pantoate pinpoints the binding site of pantoate to the "crossover" region formed by partially unwinded helices TMs 4 and 9. Binding of pantoate induces modest movements of side chain and backbone atoms at the crossover region that are consistent with providing coordination of the substrate. The structures also show movement of TM1 that opens the substrate binding site to the cytosol and mobility of loops between the TMs. MD simulations of the ASBT structure embedded in lipid bilayer suggests a stabilizing effect of the two sodium ions that are known to co-transport with the substrate. Binding study on pantoate analogs further demonstrates the specificity of pantoate as a substrate.

      The weakness of the manuscript includes a lack of transport assay for pantoate and a lack of demonstration that the observed conformational changes in TM1 and the loops are relevant to the binding or transport of pantoate.

      We agree that the manuscript would have been bolstered by transport data (see response to reviewer 1). The take-home message from the movement of TM1 and the loops is that they are flexible. It is probably unlikely that TM1 moves like this during the transport cycle and we have avoided overplaying the significance of this movement. Instead, we have focussed on the conformational changes in the pantoate binding site. We have made an additional movie concentrating on the binding site and not including TM1.

      Overall, the structural, functional and computational studies are solid and rigorous, and the conclusions are well justified. In addition, the authors discussed the significance of the current study in a broader perspective relevant to recent structures of mammalian BASS members.

      Reviewer #3 (Public Review)

      The manuscript describes new ligand-bound structures within the larger bile acid sodium symporter family (BASS). This is the primary advance in the manuscript, together with molecular simulations describing how sodium and the bile acids sit in the structure when thermalized. What I think is fairly clear is that the ligands are more stable when the sodiums are present, with a marked reduction in RMSD over the course of repeated trajectories. This would be consistent with a transport model where sodium ions bind first, and then the bile acid binds, followed by a conformational change to another state where the ligands unbind.

      While the authors mention that BASS transporters are thought to undergo an elevator transport mechanisms, this is not tested here. In my reading, all the crystal structures describe the same conformational state, and the simulations do not make an attempt to induce a transition on accessible simulation timescales. Instead, there is a morph between two states where different substrates are bound, which induces a conformational change that looks unrelated to the transport cycle.

      To make our conclusions clearer we have added another movie showing a morph between the structure without substrate (instead of using the structure with taurocholate, which we were using as a representative of the unbound structure) and that with pantoate and have omitted the panel domain including TM1. While both of these structures are inward-facing, there are significant conformational changes within TM4 that we have described in the article.

      Instead, the focus is on what kinds of substrates bind to this transporter, interrogating this with isothermal calorimetry together with mutations. With a Kd in the micromolar range, even the best binder, pantoate, actually isn't a particularly tight binder in the pharmaceutical sense. For a transporter, tight binding is not actually desirable, since the substrate needs to be able to leave after conformational change places it in a position accessible to the other side.

      As the referee points out the Kd that we observe would be consistent with those for substrates of other transporters.

      There is one really important point that readers and authors should be aware of. In Figure 2A, the names are not consistent with the chemical structure. "-ate" denotes when a carboxylic acid is in the deprotonated form, creating a charged carboxylate. What is drawn is pantoic acid, ketopantoic acid, and pantoethenic acid. Less importantly, the wedges and hashes for the methyl group are arguably not appropriate, since the carbon they are attached to is not a chiral center. For the crystallization, this makes no difference, since under near-neutral pKas the carboxylic acid will spontaneously deprotonate, and the carboxylate form will be the most common. However, if the structures in Figure 2A were used for classical molecular simulation, that would be a big problem, since now that would be modeling the much rarer neutral form rather than the charged state. I am reasonably sure based on Figure 5 that the MD correctly modeled the deprotonated form with a carboxylate, but that is inconsistent with Figure 2A. Otherwise, the structure and simulation analysis falls into the mainstream of modern structural biology work.

      We have corrected the inconsistency of the protonaNon state in the naming of the molecular structures. Thank you for poinNng this out – though the names represented the predominant form in soluNon, the more aestheNcally pleasing protonated form got the beOer of us in our representaNons. The correct form was used in the MD.

      Reviewer #1 (Recommendations For The Authors):

      1) Omit maps (Fo-Fc) should be shown for pantoate and for the sodiums in the structure.

      This has been added to supplementary Figure 2.

      2) Line 86 - could you briefly describe the alternative mechanism proposed for the mammalian NTPCs?

      We have added an extra line to describe this deviation from the classical alternating access model.

      3) Line 124 - where is the lipid like molecule, and does it interact with either the kinked helix or the substrate? A supplemental figure would be helpful.

      The lipid like molecule lies between the substrate and the kinked helix, but doesn’t interact strongly with either. It would appear that the lipid would bind in the crevice rather than causing the crevice. We add Author response image 1 here but have not added it to the supplementary figures. The maps and PDB file are available for download.

      Author response image 1.

      The 2mFo-DFc density is at 1σ, the mFo-DFc density is at 2.5σ.

      4) I notice that the apo and pantoate structures are crystallized in different space groups. How does this compare to the original TCH structure? Is there any chance that crystal packing is altering the TM1 geometry or loop 1?

      We cannot rule out the effect of the crystallisation conditions on the movement of the TM1. We have now solved a number of different structures of ASBTNM and this is the first time we observe TM1 in this conformation. As stated above we have refrained from overplaying the significance of the movement of TM1 to transport, other than to say that some adjustments need to be made to accommodate the pantoate.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Pg 3, "... with a 5-fold inverted repeat...", Should be 2-fold?

      Changed, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Is there any chance that the MD simulations (even in a reduced form) could be uploaded to Zenodo or a similar repository?

      We have taken up this suggestion and added the information in the paper: MD trajectories in the GROMACS XTC format were deposited in the OSF.io repository under DOI 10.17605/OSF.IO/KFDT5 under the open CC-BY Attribution 4.0 International license. The trajectories contain all atoms and were subsampled at 5-ns intervals. GROMACS run input files (TPR format) and initial coordinate files (GRO format) together with topology files (GROMACS format) are also included.

      Watch the "Å" symbol in Figures 5, S6, S7. This looks like they were made in matplotlib, and probably used something like: "$\AA$", which puts the symbol in math mode. This makes the Å symbol in italics. Matplotlib has gotten better UTF-8 support

      Changed, thank you.

      Your citation for LINCS duplicates the citation for PME. I think you want the Hess 1998 paper. 10.1002/(SICI)1096-987X(199709)18%3A12<1463%3A%3AAID-JCC4>3.0.CO%3B2-H

      Changed, thank you

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors performed a meta-analysis of GC concentrations and metabolic rates in birds and mammals. They found close associations for all studies showing a positive association between these two traits. As GCs have been viewed with close links to "stress," authors suggest that this overlooks the importance of metabolism and perhaps GC variation does not relate to "stress" per se but an increase in metabolism instead.

      This is an important meta-analysis, as most researchers acknowledge the link between GCs and metabolism, metabolism is often overlooked in studies. The field of conservation physiology is especially focused on GCs being a "stress" hormone, which overlooks the importance of GCs in mediating energy balance, i.e., an animal that has high GC concentrations may not be doing that poorly compared to an animal with low GC concentrations, it might just be expending more energy, e.g., caring for young. The results, with overwhelming directionality and strong effect sizes, support the link for a positive association with these two variables.

      My main concern lies in that most of the studies come from a few labs, therefore there may be limited data to test this relationship. I would include lab as a random effect to see how strong this effect might be.

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). This did not affect the results, leading to negligible changes in the model parameters (alternative model tables are shown in Author response table 1 and 2). In the revised version of the manuscript we mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      Author response table 1.

      Meta regression model testing the association between metabolic rate (MR) effect sizes and glucocorticoid effect sizes.

      Author response table 2.

      Meta regression model (quantitative approach) testing the effect of (a) Taxa, (b) Before / after effect, (c) Experiment / control effect, (d) Use of Metabolic Rate or Heart Rate as metabolic variable and (e) Treatment type, on the association between metabolic rate (MR) and glucocorticoid effect sizes across studies.

      Furthermore, I would like to see a test of the directionality of the two variables. Authors suggest that changes in metabolism affect GC levels but likely changes in GC levels would affect metabolism. Why not look into studies that have altered GC levels experimentally and see the effect on metabolism? Based on the close link, authors suggest that GCs may not play a role outside of "stress" beyond the stressor's effect on metabolic rate. However, if they were to investigate manipulations of GCs on metabolic rate, the link may or may not be there, which would be interesting to look at. I firmly believe that GCs are tightly linked to metabolism; however, I also think that GCs have a range of effects outside of metabolism as well, depending on the course and strength of the stressor.

      The directionality of the two variables is indeed a question of interest – we show that changes in metabolic rate affect GCs, but does the reverse also happen? In the schematic model we propose in Box 1, we propose that the effect is uni-directional, i.e. metabolic rate affects GC-levels, but GCs have no direct effect on metabolic rate. We note that there may however be an indirect effect, in that in the absence of a GC-response to an increase in metabolic rate the organism would after some time no longer be able to fuel the metabolic rate. Because we anticipate that more readers may raise this question, we have added the following paragraph to the discussion:

      “We selected studies in which experimental treatments affected MR, leading us to conclude that the most parsimonious explanation of our finding is that GC levels were causally related to MR. Suppose however that instead we reported a correlation between MR and GCs, using for example unmanipulated individuals. The question would then be justified whether changes in GCs affected MR or vice versa. Direct effects of GCs could be studied using pharmacological manipulations. However, while many studies show that GC administration induces a cascade of effects, when the function of GCs is to facilitate a level of MR, as opposed to regulate variation in MR, we do not anticipate such manipulations to induce an increase in MR (Box 1). On the other hand, when MR is experimentally increased in conjunction with pharmacological manipulations that supress the expected GC-increase (an experiment that to our best knowledge has not yet been done), we would predict that the increase in MR can be maintained less well compared to the same MR treatment in the absence of the pharmaceutical manipulation. This result, we would interpret to demonstrate that maintaining a particular level of MR may be dependent on GCs as facilitator, but it would be misleading to interpret this pattern to indicate that GCs regulate MR, as is sometimes proposed. Additionally, it would be informative to investigate whether energy turnover immediately before blood sampling is a predictor of GC levels, as we would predict on the basis of the interpretation of our findings. Increasing the use of devices and techniques that monitor energy expenditure or its proxies (e.g. accelerometers) may be a way to increase our understanding of the generality of the GC-MR association. “

      We based our hypotheses and searching criteria on the assumption that GCs induce physiological processes to help the organism facilitate energetic demands. Pharmacologically induced increases in GCs would lead to physiological responses and associations that we consider not comparable to the ones reported in this work, as we base our hypotheses on natural (i.e. non pharmacologically induced) GC and MR variation. This said, with exogenous GC administration, we may expect GC cascade effects, but not necessarily an increase in MR. Here - and acknowledging that the link between GCs and metabolic rate may entail complex steps - we predict that GC administration may lead to an increase in blood glucose and may affect energy allocation at a tissue-specific level. However, such increase may have no effect on whole-organism energy expenditure, unless energy expenditure is limited by glucose availability. We however acknowledge that it would be interesting to investigate the kind of associations between MR, GCs and other physiological variables (e.g. glucose) that appear when inducing an increase in GCs, as these would broaden our understanding of the mechanistic processes underlying these associations.

      We show that variation in GC levels was explained by variation in MR, independent of the stimulus that caused the increase in MR. We propose that the most parsimonious interpretation of our findings is that GC variation is an indicator of variation in MR, independent of the cause of variation in MR. We do not intend to prove causality when making predictions on the co-dependency of metabolic rate and GCs. In fact, our predictions do not imply that one trait necessarily affects the other per se, as these interplay is likely to be shaped by the environmental or physiological context (Box 1). Thus, the specific mechanisms underlying how changes in metabolic rate induce changes in GCs - or the other way around - need to be investigated. One step to tackle this in upcoming research would indeed be studying the effects of exogenous GCs on metabolic rate.

      In the manuscript, we clarify that GCs have a variety of cascade effects besides metabolism (Box 1). On the basis of our results, however, we suggest that many of the downstream effects of GCs may be interpreted as allocation adjustments to the metabolic level at which organisms operate (lines 235236), but we do acknowledge that these cascade effects are complex and affects many systems besides metabolism.

      This work helps in the thinking that GCs are not the same as a "stress" hormone or labelling hormones with only one function. As hormones are naturally pleiotropic, the view of any one hormone being X is overly simplistic.

      We fully agree, but stress that we focus on how GCs are regulated, which may be less complex than its pleiotropic functions. Indeed, we consider that the many functions of GCs have potentially clouded the question as to how GCs are regulated.

      Reviewer #2 (Public Review):

      Where this study is interesting is that the authors do a meta-analysis of studies in which metabolic rate was experimentally manipulated and both this rate and glucocorticoid levels were simultaneously measured. Unsurprisingly, there are relatively few such studies and many are from the lab of Michael Romero. While the results of the analysis are compelling, they are not surprising. That said, this work is important.

      It is worth noting that in this analysis, the majority of the studies, if not all, are dealing with variation in baseline levels of glucocorticoids. That means the hormone is mostly acting metabolically at these lower levels and not as a stress response hormone as it does when levels are much higher. This difference is probably due to differences in receptors being activated. This could be discussed.

      As mentioned in Box 1, within our hypothesis framework we make no distinction between baseline and stress-induced GC-levels, and thereby in effect assume these to be points in a continuum from a metabolic perspective. Our results support this view, as our sample includes baseline- and stressinduced –range GC values, and these are not distinguishable (Fig. 3). We do however recognize that we did not return to this issue in the Discussion, while the same issue may well occur to many readers familiar with the literature. We therefore added the following paragraph to the discussion:

      “ Note that in the context of our analysis we made no distinction between ‘baseline’ and ‘stressinduced GC-levels (Box 1). Firstly, because these concepts are not operationally well defined – baseline GC-levels are usually no better defined than ‘not stress-induced’. Secondly, when considering the facilitation of metabolic rate as primary driver of GC regulation, there does not appear a need to invoke different classes of GC-levels instead of the more parsimonious treatment as continuum. This is not to say that this also applies to the functional consequences of GC-level variation: it is well known that receptor types differ in sensitivity to GCs (Landys et al. 2006; Sapolsky et al. 2000; Romero 2004), thereby potentially generating step functions in the response to an increase in GC-levels.”

      We note further that to our best knowledge there are no standard or established thresholds that allow us to separate GC levels into “baseline” and “stress-induced”, and in any case these concentration ranges differ strongly among species and experimental set-ups (e.g. captive vs. free-living individuals). Consequently, many of the studies included in our work report what would typically be interpreted as “stress-induced” levels, and thus within the range of those reported by standardized stress protocols (e.g. levels above 20-30 ng/ml for corticosterone in bird species, Cohen et al. 2007, Jimeno et al. 2018; levels between 150-300 ng/ml in captive rats, Buwalda et al. 2012, Beerling et al. 2011; levels 2-10 times above baseline in humans, Sramek et al. 1999). We also want to note that we work with effect sizes, i.e. not GC levels, and that GC measurement units differ among studies. Mean GC values by study in the original units are shown in Table S3.

      Reviewer #1 (Recommendations For The Authors):

      L26: why is the causality in this direction? Not that I don't think that metabolic rate drives GC variation but the meta-analyses here could suggest the opposite direction as well? That GC phenotype could limit or promote metabolic activity? (In terms of the natural variation studies and not the experimental ones)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L27: again, I am not sure the meta-analyses can lead to this question. Although there is a tight link between GC and metabolic rate, there is still variation around that is unexplained.

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      L45: I think there is plenty of literature in the field that would say that GCs are linked to metabolism and don't define GCs as synonymous with stress. See MacDougall and others that you cite later in the paragraph: "GCs and stress are not synonymous." I think maybe shifting the strong language at the beginning might help with your argument later on.

      We do not disagree, but two considerations made us retain the ‘strong language’. Firstly, while many authors mention links between GCs and metabolic rate, as we read the literature, the quantitative importance of this link to understand GC variation is underestimated in our view. Secondly, the literature is rife with articles that clearly do not consider metabolic rate variation as a driver of the GC variation they observe.

      Box 1: on the diagram the link between GCs and learning is problematic as there are plenty of studies that show a negative effect on learning with GC exposure. It usually depends on the time course of GCs and learning outcomes.

      We agree with the referee´s point. Learning was deleted from the diagram to avoid confusion.

      The diagram also suggests that GCs in the blood decreases insulin. For Aves that are rather insulin insensitive, the evidence that GCs affect insulin concentrations are very limited, even in the poultry literature.

      Indeed, and we now mention in box 1 that GC effects on insulin are primarily found in mammals, and less so in birds.

      Box 1 at the end also makes a point about GCs having complex downstream effects at baseline and stressinduced levels, besides energy mobilization but the abstract seems to indicate that there are limited effects of GCs outside of metabolism. Hence why I also advocate being careful about the wording in the abstract.

      The related abstract sentence has been rewritten to avoid this inconsistency (lines 17-18)

      L107: "being or not significant" meaning significant or not? The wording is awkward

      We reworded the sentence for clarity. We included studies reporting both significant and nonsignificant increases in metabolic rate.

      L110: why not look at whether experimental increases in GCs also induce increases in metabolic rate, i.e., the directionality of the two variables. (point 2)

      See our detailed response above, on the directionality of the association and the hypotheses underlying our searching criteria and the paragraph on this topic added to the discussion.

      The studies, although there are ~30, are overlapping in terms of labs, i.e., a lot of them came from the same lab. Did you think to include lab as a random effect to see if there are effects of one or two labs doing work that strengthened the results?

      We think this is a good point, and we ran the main models included in the manuscript including Lab as random effect (N= 35 experiments, 21 studies, 16 labs). Including Lab as random factor did not affect the results, leading to negligible changes in the model parameters. We provide tables with the model results in our previous response. In the text we now mention that we tested the effect of Lab but did not keep this variable in the models (lines 183-185)

      L314: I think it depends on the time course and intensity of the stressor. I firmly believe that outside of metabolic demands, high levels of GCs chronically or the inability to mount a proper stress response is indicative of pathology or something outside of metabolism.

      Whether the association between GCs and MR holds under a context of ‘chronic stress’ (i.e. understood as chronically elevated GCs) remains to be tested. We note, however, that chronically high levels of metabolic rate may potentially have pathological effects.

      Reviewer #2 (Recommendations For The Authors):

      I find the title a bit misleading. The conclusion from the study is that glucocorticoid levels can reflect metabolic rate, not that glucocorticoid levels do not indicate stress. Remember, stress can certainly affect metabolic rate.

      We see the point but note that other drivers of variation in metabolic rate also increase GCs, as we show in our analysis, and hence we propose that GC variation always indicate variation metabolic rate, and only stress when stress is the cause of the increase in metabolic rate.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their insightful and detailed analysis of our work, in particular to reviewer 2. We also would like to thank the Elife editorial team for organizing this form of public review and debate, which we believe will be of interest to the science community.

      Reviewer #1 (Public Review):

      Despite durable viral suppression by antiretroviral therapy (ART), HIV-1 persists in cellular reservoirs in vivo. The viral reservoir in circulating memory T cells has been well characterized, in part due to the ability to safely obtain blood via peripheral phlebotomy from people living with HIV-1 infection (PWH). Tissue reservoirs in PWH are more difficult to sample and are less well understood. Sun and colleagues describe isolation and genetic characterization of HIV-1 reservoirs from a variety of tissues including the central nervous system (CNS) obtained from three recently deceased individuals at autopsy. They identified clonally expanded proviruses in the CNS in all three individuals.

      Strengths of the work include the study of human tissues that are under-studied and difficult to access, and the sophisticated near-full length sequencing technique that allows for inferences about genetic intactness and clonality of proviruses. The small sample size (n=3) is a drawback. Furthermore, two individuals were on ART for just one year at the time of autopsy and had T cells compatible with AIDS, and one of these individuals had a low-level detectable viral load (Figure S1). This makes generalizability of these results to PWH who have been on ART for years or decades and have achieved durable viral suppression and immune reconstitution difficult.

      While anatomic tissue compartment and CNS region accompany these PCR results, it is unclear which cell types these viruses persist in. As the authors point out, it is possible that these reservoir cells might have been infiltrating T cells from blood present at the time of autopsy tissue sampling. Cell type identification would greatly enhance the impact of this work. Several other groups have undergone similar studies (with similar results) using autopsy samples (links below). These studies included more individuals, but did not make use of the near-full length sequencing described here. In particular, the Last Gift cohort, based at UCSD and led by Sara Gianella and Davey Smith, has established protocols for tissue sampling during autopsy performed soon after death. https://pubmed.ncbi.nlm.nih.gov/35867351/ https://pubmed.ncbi.nlm.nih.gov/37184401/

      We agree with reviewer 1 that studies to identify specific cell types that harbor intact HIV-1 in individual tissue compartments would be very informative; our group has recently initiated such studies.

      Overall, this small, thoughtful study contributes to our understanding of the tissue distribution of persistent HIV-1, and informs the ongoing search for viral eradication.

      We thank reviewer 1 for these encouraging remarks.

      Reviewer #2 (Public Review):

      The manuscript by Sun et al. applies the powerful technology of profiling viral DNA sequences in numerous anatomical sites in autopsy samples from participants who maintained their antiviral therapy up to the time of death. The sequencing is of high quality in using end-point dilution PCR to generate individual viral genomes. There is a thoughtful discussion, although there are points that we disagree with. This is an important data set that increases the scope of how the field thinks about the latent reservoir with a new look at the potential of a reservoir within the CNS.

      We greatly appreciate the comments by reviewer 2 and would like to thank them for their detailed and very knowledgeable analysis of this paper.

      1) The participants are very different in their exposure to HIV replication and disease progression. Participant 1 appears to have been on ART for most of the time after diagnosis of infection (16 years) and died with a high CD4 T cell count. The other two participants had only one year on ART and died with relatively low CD4 T cell counts (under 200). This could lead to differences in the nature of the reservoir. In this regard, the amount of DNA per million cells appears to be about 10-fold lower across the compartments sampled for participant 1. Also, one might expect fewer intact proviruses surviving after 16 years on ART compared to only 1 year on ART. The depth of sampling may be too limited and the number of participants too few to assess if these differences are features of these participants because of their different exposures to HIV replication. On the positive side, finding similarities across these big differences in participant profiles does reinforce the generalizability of the observations.

      Many thanks for pointing this out. We also noticed that the total number of HIV-1 proviruses is smaller in our study participant 1 (who had been on ART for 16 years), compared to study persons 2 and 3 with more limited treatment durations (1-2 years), however, due to the small number of study persons, we think we cannot use these results for inferring how treatment duration influences viral reservoir size in tissues.

      2) The following analysis will be limited by sampling depth but where possible it would be interesting to compare the ratio of intact to defective DNA. A sanctuary might allow greater persistence of cells with intact viral DNA even without viral replication (i.e. reduced immune surveillance). Detecting one or two intact proviruses in a tissue sample does not lend itself to a level of precision to address this question, but statistical tests could be applied to infer when there is sampling of 5 or more intact proviruses to determine if their frequency as a ratio of total DNA in different anatomical sites is similar or different. This would allow adjustment for the different amount of viral DNA in different compartments while addressing the question of the frequency of intact versus defective proviruses. One complication in this analysis is if there was clonal expansion of a cell with an intact genome which would represent a fortuitous overrepresentation intact genomes in that compartment.

      We have performed the analysis suggested by reviewer 2 and included a diagram reflecting the ratio of intact/defective proviruses as a new supplemental figure (Figure S2). Unfortunately, we do not feel comfortable to draw any real conclusions from this additional analysis; the sample sizes are simply too limited.

      3) The key point of this work is that the participants were on therapy up to the time of death ("enforcing" viral latency). The predominance of defective genomes is consistent with this assumption. Is there data from untreated infections to compare to as a signature of whether the viral DNA population was under selective pressure from therapy or not? Presumably untreated infections contain more intact DNA relative to total DNA. This would represent independent evidence that therapy was in place.

      We agree that an analysis of autopsy samples from untreated persons living with HIV-1 would be of great interest, and are actively collaborating with neuropathologists from multiple sites to obtain such samples. Yet, we are not convinced that selection pressure on reservoir cells during ART can be appropriately identified through quantitative virological assays. Rather, we feel that the selection of proviruses can be best assessed when qualitative parameters, including proviral integration sites and their position relative to host epigenetic chromatin features, are evaluated.

      4) There are several points in Figure 5 to raise about V3 loop sequences. The analysis includes a large number of "undetermined" sequences that did not have a V3 loop sequence to evaluate. We would argue it is a fair assumption that the deleted proviruses have the same distribution of X4 and R5 sequences as the ones that have a V3 sequence to evaluate. In this view it would be possible to exclude the sequences for which there is no data and just look at the ratio of X4 and R5 in the different compartments, specifically does this ratio change in a statistically significant way in different compartments? The authors use "CCR5 and non-CCR5" as the two entry phenotypes. The evidence is pretty strong that the "other" coreceptor the virus routinely uses is CXCR4, and G2P is providing the FPR for X4 viruses. Perhaps the authors are trying to create some space for other coreceptors on microglia, but we are pretty sure what they are measuring is X4 viruses, especially in this late disease state of participant 2. Finally, we have previously observed that the G2P FPR score of <2 is a strong indicator of being X4, FPR scores between 2 and 10 have a 50% chance of being X4, and FPR scores above 10 are reliably R5 (PMID27226378). In addition, we observed that X4 viruses form distinct phylogenetic lineages. The authors might consider these features of X4 viruses in the evaluation of their sequences. Specifically, it would be helpful to incorporate the FPR scores of the reported X4 viruses.

      Many thanks for these thoughts. We have now included FPR scores for all sequences and considered sequences with FPR score <2 as X4-tropic. Among 497 proviral sequences derived from all three participants, only 14 proviral sequences had FPR scores between 2 and 10 and their tropism was classified as CCR5 in the new Figure 5. We agree that viral tropism analysis of proviral sequences from the CNS would be of particular interest for study subject 2; however, most brain-derived sequences from that person had large deletions in the env region, precluding an analysis of viral tropism.

      5) We have puzzled over the many reports of different cell types in the CNS being infected. When we examined these cell types (both as primary cells and as iPSC-derived cells), all cells could be infected with a version of HIV that had the promiscuous VSV-G protein on the virus surface as a pseudotype. However, only macrophages and microglia could be infected using the HIV Env protein, and then only if it was the M-tropic version and not the T-tropic version (PMID35975998). RNAseq analysis was consistent with this biological readout in that only macrophages and microglia expressed CD4, neurons and astrocytes do not. From the virology point of view, astrocytes are no more infectable than neurons.

      We appreciate these comments. As described in our discussion, we agree that the role of astrocytes as target cells for HIV-1 infection is highly controversial; we look forward to future opportunities to evaluate HIV sequences in sorted astrocytes from autopsy tissues.

      6) The brain gets exposed to virus from the earliest stages of infection but this is not synonymous with viral replication. Most of the time there is virus in the CSF but it is present at 1-10% of the level of viral load in the blood and phylogenetically it looks like the virus in the blood, most consistent with trafficking T cells, some of which are infected (PMID25811757). The fact that the virus in the blood is almost always T cell-tropic in needing a high density of CD4 for entry makes it unlikely that monocytes are infected (with their low density of CD4) and thus are not the source of virus found in the CNS. It seems much more likely that infected T cells are the "Trojan Horse" carrying virus into the CNS.

      We appreciate the reviewer’s referral to Greek mythology and agree that the hypothesis of infected T cells acting as “Trojan horses” is more intuitive and better supported by available data. We have adjusted our discussion accordingly.

      7) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml, as they are in our study subject 3. Nevertheless, we have changed the title to avoid confusion.

      Reviewer #1 (Recommendations For The Authors):

      I encourage the authors to compare their autopsy and tissue sampling procedures to those used by The Last Gift researchers and consider including references to this ongoing study. If the authors plan to continue in this line of research, the field would greatly benefit from a collaboration that would bring together their excellent and advanced PCR technique with the larger sample size offered by The Last Gift. Lastly, is there some way to simultaneously determine cell type when NFL sequencing is performed?

      We look forward to collaborating with investigators from the Last Gift Cohort in the future and have integrated additional references in the manuscript to acknowledge their work. At the current stage of technology development, we think that sorting of infected cells based on canonical markers of defined cell populations is the preferred approach for identifying phenotypic properties of infected cells; however, expansion of the PheP-Seq assay (Sun et al., Nature 2023), may facilitate this process in the future.

      Reviewer #2 (Recommendations For The Authors):

      1) The authors have chosen to lump all R5 viruses together in terms of their entry phenotype, giving all viruses an equal chance of infecting all potentially susceptible cell types. This ignores the fact that normal HIV is selected to infect cells, requiring a high density of CD4 as is found on T cells. We use the term R5 T cell-tropic to describe "normal" HIV. The ability to efficiently enter cells that have a low density of CD4, such as macrophages and microglia, involves the evolution of a distinct phenotype, termed macrophage tropism (PMID24307580, and work of others). This happens most often in the CNS where T cells are infrequent thus potentiating evolution to infect an alternative cell type. This change in entry phenotype is dramatic and, like X4 viruses, results in phylogentically distinct lineages (PMID22007152). There are no sequence signatures for M-tropic viruses as there are for X4 viruses, but the fact that there are sequences shared between the CNS and lymphoid tissue makes it much more likely that there are T cells migrating around the body, including into the CNS, that are carrying R5 T cell-tropic virus with them, with the cells potentially clonally expanding in situ in the CNS. The persistence of a potential CNS T cell reservoir was the point we were trying to make in our recent paper (ref. 38), not only that these CSF rebound viruses were R5 viruses but they were selected for replication in T cells as seen by their dependence of a high density of CD4 for entry. This is the conclusion one would reach if clonally expanded viral sequences were shared between two lymphoid compartments. It is not necessary to ascribe properties of infection and clonal amplification to microglia cells when a more parsimonious explanation is that there are low levels of T cells in the CNS, especially in the absence of entry phenotype data showing these sequences encode an M-tropic entry phenotype. As is the authors are just adding to the unproven belief that virus in the CNS must be in myeloid cells, which in this case in particular we suspect is the wrong interpretation.

      We are impressed by reviewer 2’s recent work, suggesting the viral reservoir in the CNS may primarily consist of clonally-expanded R5 T-cell tropic viruses. We have adjusted our discussion to emphasize this possibility, and to highlight that viral entry phenotyping data will be informative for better understanding viral persistence in the brain.

      2) The authors noted that the frequency of intact proviruses is highest in the lymph nodes of 2/2 participants for which they had lymph node samples, relative to the other tissues examined. They thus conclude, "Together, these results indicate that intact HIV-1 proviruses are preferentially detected in lymphoid and gastrointestinal (GI) tissues." However, an examination of Figure 2 reveals that the total HIV copy number is highest in the lymph nodes of these two people. Thus, it doesn't seem like HIV is preferentially intact in the lymph nodes as much as they sampled more provirus from that tissue and therefore were able to detect more intact proviruses.

      We have adjusted our manuscript to indicate that the highest numbers of intact HIV-1 proviruses were present in lymph nodes, both in terms of absolute numbers and after normalization to the total numbers of cells analyzed.

      3) In Figure 1A, the legend should be changed so that "PMSC" is spelled out as "premature stop codon" for ease of reading. This is done for Figure 1B.

      We have corrected this issue as suggested by the reviewer.

      4) The pie charts in Figure 5 could be better labeled for ease of interpreting. In Figure 5C, instead of just labeling it as "P2" it could be "Distribution of CXCR4-using proviruses, P2", as an example. As it stands, it is hard to know what the figure is describing without reading the text.

      We have changed this accordingly.

      5) While all participants were taking antiretroviral therapy at the time of their death, they were not all suppressed when the tissues were collected. The authors are careful not to mention "suppressive ART" in the text, which is appreciated. However, the title should be changed to also reflect this fact.

      Thanks for pointing this out. From our perspective, ART is never fully suppressive, as low-level viremia (below the detection threshold of commercial PCR assays) is detectable in almost all ART-treated persons. As such, it is not clear to us that “suppressive” necessarily implies suppression below the detection limits of commercial PCRs. We argue that ART can also be suppressive when plasma viral loads are in the range of 100 copies/ml. Nevertheless, we have changed the title to avoid confusion.

      Editorial comments:

      In addition to the reviewers suggestion, we feel that adding more information on how you define intact proviral sequence, e.g. are only disrupted essential genes or also in accessory genes considered? Previous studies have shown that brain-derived HIV-1 strains are usually CCR5-tropic, show high affinity for the CD4 receptor and frequently contain defective vpu genes. Some information and discussion if the brainderived sequences confirm these previous finding seems of significant interest.

      As described in our previous work (e. g. Lee et al, JCI 2017; Jiang et al, Nature 2020), accessory genes are not considered in our definition of “genome intactness”; this is consistent with approaches other investigators have chosen (e. g. Hiener et al, Cell Reports 2017). Within the genome intact sequences we identified in the CNS in our study persons, we found no evidence for deletions of vpu sequences; this has been emphasized in the revised manuscript.

    1. Author Response

      We thank the reviewers and editors for their deep, thoughtful and constructive assessment of our manuscript. We nevertheless would like to reply to the Reviewers reports.

      Reviewer #1.

      (...) The data can be well described by three components involving a closed state and two open states O1 and O2, in which the second component O2 is the one affected by the mutations and deletions

      This statement is not completely clear to us. What we propose is that O1 is not visible in WT, only in the mutants. What would be affected is the access to O1 and the transition between O1 and O2, but not O2 itself.

      From the beginning, it becomes challenging for non-experts to grasp the structural basis of the perturbations that are introduced (ΔPASCap and E600R), because no structural data or schematic cartoons are provided to illustrate the rationale for those deletions or their potential mechanistic effects. In addition, the lack of additional structural information or illustrations, and a somewhat confusing discussion of the structural data, make it challenging for a reader to reconcile the experimental data and mathematical model with a particular structural mechanism for gating, limiting the impact of the work.

      Thank you very much for pointing this out and our apologies for the missing cartoon. It will be provided in the revised version.

      There are several concerns associated with the analysis and interpretations that are provided. First, the conductance-voltage (G-V) relations for the mutants do not seem to saturate, and the absolute open probability is not quantified for any mutant under any condition. This makes it impossible to quantitatively compare the relative amplitudes of the two components because the amplitude of the second component remains undetermined. […] This reduces confidence in the parameters associated with G-V relations, as the shape and position of both components might change significantly if longer pulses were used.

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data therefore supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      Further, because the mutant channel currents do not saturate at the most positive potentials and time intervals examined, the kinetic characterization based on reaching 80% of the maximum seems inappropriate, because the 100% mark is arbitrary.

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). To address the concerns, we will add time constants from these fits in the revised version. Please note that in Figure 3, we do provide time constants, and they support the statement made.

      Further, the kinetics for some of the other examined mutants (e.g. those in Fig. 2A) are not shown, making it difficult to assess the extent to which the data could be affected by having been measured before full equilibration.

      This seems to be a misunderstanding. ∆2-10 kinetics is shown in Fig. 2c. ∆-eag is shown in Fig. 3. We will make sure to state this explicitly in the revised version.

      For example, I would expect that the enhanced current amplitudes from Figure 5 are only transient, ultimately reaching a smaller steady-state current magnitude that depends only on the stimulation voltage and is independent of the pre-pulse. The entire time course including the rise-time and decay is not examined experimentally. This raises concern on whether occupancy of state O1 might be overestimated under some experimental conditions if a fraction of the occupancy is only transient. The mathematical model is not utilized to examine some of these slower relaxations - this may be because the model does not reproduce these slow processes, which would represent a serious shortcoming given that the slow kinetics appear to be intrinsic to transitions around state O1.

      Thank you for thinking so deeply about the problem. We identified the same questions and did explore them using the model (Figure 8 c). Your intuition is confirmed there, the slow kinetics leads to a decrease of O1 occupancy after a transient accumulation. We intend to study this experimentally as well in the revised version.

      The significance of the results with the Δ2-10.L341Split is unclear. First, structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 linker, and thus the Split construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both state O1 and O2 require voltage sensor activation, it is unclear why the Split construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states.

      Thank you for pointing out the unclear nature of our arguments. We rephrase in the following and will do so in the revised document: If, in non-split mutants, the upward transition of S4 allows entry to O1, it is reasonable to assume that the movement is not transmitted the same way in the split and the transition into O1 is less probable. The observation that, in the split, entry into O1 requires higher depolarization and appears to be less likely, suggests that downstream of S4 (beyond position 342), there is a mechanism to convey S4 motion to the gate of the mutants.

      The figure legends and text do not describe which solutions exactly were utilized for each experiment, [...] Because no zero-current levels are shown on the current traces, it becomes very hard to determine which voltages correspond to each of the currents (see Fig. 1A).

      Will be corrected.

      … the rationale for choosing some solutions over others is not properly explained. […] The reversal potential for solutions used to measure voltage-activation curves falls right at the spot where occupancy of the first component peaks (e.g. see Figure 1B). […] It is unclear whether any artifacts could have been introduced to the mutant activation curves at voltages close to the reversal potential.

      The high potassium extracellular solution was chosen to obtain tail currents of sufficient size, warranting precise determination of the reversal potential for every individual experiment. In this way, we ensured that there were no artifacts introduced to the activation curves. Tail currents were used when closing was reasonably fast (∆PASCapL322H and E600RL322H), but otherwise, we used the amplitude at the end of the pulse to get the reversal potential.

      One key assumption that is not well-supported by the data pertains to the difference in single-channel conductance between states O1 and O2 - no analysis or discussion is provided on whether the data could also be well described by an alternative model in which O1 and O2 have the same conductance. No additional experimental evidence is provided related to the difference in conductance, which represents a key aspect of the mathematical model utilized to interpret the data.

      We agree that the relative conductance of O1 and O2 is a key point. Our proposal mainly stems from the data presented in Fig. 4 and the amplitudes of the two components of the tail at potentials where both states are visible. We also agree that whole cell currents represent a product of occupancy and conductance and that only single channel recordings can produce unambiguous proof for the higher conductance of O1. We have embarked on a series of experiments directly addressing this in the mutants that will be reported in the revised version. Still, we did explore this issue with the model. Following the path of the least number of assumptions, we initially tested models with equal conductance for both states. None of these models was able to reproduce the shape of the tails and the prepulse-dependent increase.

      The CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional non-specific effects on the oocytes that could affect the results.

      Thank you for the appreciative comments about the relevance of our results. We are aware of the potential side effects of the use of thapsigargin and ionomycin, but we still used this approach as an established method to raise intracellular Ca2+. This said, we would like to point out that the effects of Ca2+ increase on channel behavior do revert with a time course that mirrors the estimated time course of Ca2+ itself (supplement 1 to figure 7), suggesting that we are monitoring a Ca2+-dependent event.

      The description of the mathematical model that is provided is difficult to follow, and some key aspects are left unclear, such as the precise states from which state O1 can be accessed, and whether there is any direct connectivity between states O1 and O2 - different portions of the text appear to give contradictory information regarding these points.

      This seems to be a misunderstanding: supplement 1 to figure 8 graphically details the model’s layout and explicitly shows the connections to the two open states. It also shows that these are not connected. We will make sure that the text is more clearly stating this fact. We did explore models with one open state connected to more than one other state (loops) and found that none of these models can reproduce the large range of depolarizations for with conductance is reduced as compared to lower and higher depolarization (Figure 1).

      Several rate constants other than those explicitly mentioned to represent voltage sensor activation are also assigned a voltage dependence - the mechanistic basis of that voltage dependence is unclear.

      Some fundamental properties we observed in the mutants can be explained with constant, voltage-independent rate constants into and out of both open states. Specifically, it was possible to achieve behavior very close to that displayed in Figure 8c with constant η, θ, ε, and ζ. We then attempted to also reproduce the strong prepulse-dependence (Figure 6A and B) and found that we needed additional degrees of freedom to incorporate both behaviors with one parameter set. We could either add more states, and thereby rates, or introduce voltage dependence to η and θ. With already 32 states and 10 rates, we decided to adopt the less complex model variant. We agree that this probably reduced the interpretability of the model. As a rule, a transition with a voltage-dependence of the functional form of Eq.1 corresponds to the kinetic properties of two or three transitions, where one is voltage-independent (setting the maximal rate) and one has the classical exponential shape expected from truly molecular transitions.

      We also agree that, conceptually, the transitions between the two layers – tentatively associated with a transition in the ring structure– should be voltage-independent. Interestingly, their voltage dependence is very similar to the voltage dependence of the early activation, i.e. centered at -100 and -120mV, similar to β. We therefore attempted to replace the voltage dependence of κ and λ with a state-dependence. To this end, we introduced a parameter that modified κ and λ depending on the state’s position along the α-β axis. While it seemed possible to include all desired features in a model with state-dependent κ and λ, it proved extremely difficult to tune the parameters. Eventually, we reverted to purely voltage-dependent and not state-dependent transition rates κ and λ. Nevertheless, we believe that their voltage dependence could be replaced by some form of state-dependence, i.e. by rates κ and λ that change systematically from the left-hand side of the scheme to its right-hand side.

      Finally, a clear mechanistic explanation for the full range of effects that the ΔPASCap and E600R mutants have on channel function is lacking, as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel.

      We agree. Ultimate mechanistic explanations will have to await data from protein structures of intermediate states and in particular the mutant-specific open state.

      …as well as a detailed description of how those newly uncovered transitions would influence the activity of the WT channel; this latter point is important when considering whether the findings in the manuscript advance our understanding of the gating mechanism of Kv10 channels in general, or are specific to the particular mutants that are studied.

      We still do not know if the transitions to O1 are identical in the mutants and WT, although our data opens the path to dissecting the interplay of intracellular domains and voltage sensor. We think that the results are relevant for KCNH channels in general because we have made visible otherwise invisible states.

      It is unclear, for example, how both the mutation or the deletion at the cytoplasmic gating ring enable conduction by state O1, especially when considering the hypothesis put forward in this study that transition to O1 exclusively involves transitions by the voltage sensor and not the cytoplasmic gating ring.

      The transition to O1 is in our model made possible by a displacement of the voltage sensor. In our view, when this occurs with a properly folded and positioned intracellular ring, permeation (access to O1) is precluded. It is precisely the distortion in the intracellular ring induced by mutation or deletion what allows access to O1.

      It is also not clearly described whether a non-conducting state with the equivalent state-connectivity as O1 can be accessed in WT channels, or if a state like O1 can only be accessed in the mutant channels. Importantly, if a non-conducting state with the same connectivity to O1 were to be accessed in WT channels, it would be expected that an alternating pulse protocol as in Fig. 4 would result in progressively decreasing currents as the occupancy of the non-conducting state equivalent to O1 is increased. Because this is not the case, it means that mutation and deletion cause additional perturbations on the gating energetics relative to WT, which are not clearly fleshed out.

      Thank you for highlighting this important question. Following the arguments in the answer to the previous comment, our experiments cannot provide proof for the existence or accessibility of O1 in WT channels. We favor the interpretation that it is not accessible, because, as you point out, this is supported by the outcome of the alternating pulse on WT (figure 4A) and the paradoxical effect of CaM activation. However, this interpretation hinges on the hypothesis that the kinetics of entry into and departure from O1 would be the same in WT channels, as it is in the mutants. Because transitions into a non-conducting O1 would be only indirectly observable in the WT channel, this assumption would be extremely difficult to test.

      Reviewer #2.

      WT EAG currents are far right shifted compared to previously published data. It is not clear whether it is the recording conditions but at 0 mV very few channels are open. Compare this with recordings reported previously of the same channel hEAG1 by Gail Robertson's lab (Zhao et. al. (2017) JGP). In that case, most of the channels are open at 0 mV. There must be at least 25 mV shift in voltage-dependence. These differences are unusually large.

      G-V curves presented in the literature show a large variability. Depending on the conditions, reported V1/2 values in Xenopus oocytes range from -43 mV (Schönherr et al., 2002 DOI: 10.1016/s0014-5793(02)02365-7) to +16 mV (Lörinczi et al, 2015 DOI: 10.1038/ncomms7672) through +4.1 mV (Lörinczi et al., 2016 DOI: 10.1074/jbc.M116.733576), or +10 mV (in the IUPHAR database). The results in the current manuscript are not significantly different from our previously published results on WT channels. In the report the reviewer is referring to, one source of the difference could be that Zhao et al. had no independent information about the reversal potential. In our experiments, we used solutions with high [K]ext. This places the reversal potential in a voltage range within measurable eag currents and thus allows direct determination of the reversal potential, together with the slow kinetics of the tails and the negative shift in the activation. We would argue that this makes the G-V curves less prone to assumptions, albeit for the price of large error bars around the reversal potential. Additionally, the presence of Mg2+ in the extracellular solutions can change the apparent V1/2 depending on the stimulation protocol.

      In most of the mutants, O2 state becomes more prevalent at potentials above +50 mV. At these potentials, endogenous voltage-dependent currents are often observed in xenopus oocytes. The observed differences between the various mutants might simply be a function of the expression level of the channel versus endogenous currents.

      Because we were aware of the potential issue of endogenous chloride currents in oocytes, we included data recorded in chloride-free solutions. Those show comparable results, and thus we conclude that endogenous currents are not the origin of the differences between mutants. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      Voltage-dependence of the kinetics of WT currents appears a bit strange. Why is the voltage-dependence saturated at 0 mV even though very few channels have activated at that point? I cannot imagine any kinetic model that can lead to such unusual voltage-dependence of kinetics.

      The fact that voltage dependence of open probability and voltage dependence of activation time constant do not align reflects the multi-state nature of the underlying gating scheme. More than one of several sequential transitions limit the overall kinetics. In this case, the apparent kinetics can reflect a different “bottleneck” transition at different voltage ranges.

      One of the other concerns I have is that in many cases, it is clear that the pulse is too short to measure steady-state voltage-dependence. For instance, the currents in -160 mV and -100 mV in Figure 6A and 6B are not saturated.

      While we agree that steady-state curves can simplify quantitative evaluation – especially the normalization applied in the I/Imax curves in figure 6 – the conclusion of two components is independent of the absolute amplitude under steady state. The fact that in the raw current traces in Figure 6A, after a -160V prepulse, the same current amplitude is reached for two depolarizations (60 and 90 mV) but not for the intermediate depolarization, can only be explained by an I-V curve that has a minimum. Therefore, the raw data directly support the evidence of finding two components, even if the subsequent analysis is affected by insufficient test pulse durations.

      Reviewer #3

      Although very well established, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. The authors performed most of their functional studies in Cl-based solutions that can become a non-trivial issue when the range of voltages explored extends to very depolarizing potentials such as +120mV. Oocytes endogenously express Ca2+-activated Cl- channels that will rectify Cl- at very depolarizing potentials -due to an increase in the driving force- and contribute dramatically to the current's amplitude observed at the test pulse in the voltage ranges where the authors identify the second open state.

      As stated above, because we were aware of the potential issue of endogenous chloride currents in oocytes, we performed many of the experiments in chloride-free solutions. We conclude that endogenous currents are not the origin of the differences between mutants because the results were comparable regardless of the presence of chloride. We will clarify which solutions were used in the figure legends of the revised version and also include the argument against sizable endogenous current contributions in the revision. In a separate line of experiments, we expressed some of the mutants in HEK cells. Despite small current amplitudes, we were able to replicate the findings of two components, providing oocyte-independent evidence for the existence of a second open state.

      The authors propose a two-layer Markov model with two open states approximating their results. However, the results obtained with the mutants suggest an inactivated state accessible from closed states and a change in the equilibrium between the close/inactivated/open states that could also explain the observed results; therefore, other models could approximate their data.

      In the process of model development, we tested a large number of configurations. Those included models with a single open state which we connected to two closed (or inactivated) states that were not directly connected to each other and populated at different voltage ranges. In doing so, we attempted to allow access to the single open state from different regions of the “state-space”, reflecting the two voltage ranges of high conductance. However, in our hands, such a “loop” in the state-space inadvertently leads to a weak separation of the two states and a weak effect of prepulse potentials. The underlying reason is that given the short activation and deactivation time constants, a single open state in a loop provides an effective short-cut, linking otherwise separated parts of the state-space. To achieve the clear separation of the two component’s voltage dependence, two open states that are not connected to each other were essential. As we wrote in response to other comments above, the ultimate proof of two different open states cannot come from modeling, but from single channel measurements.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript, Brischigliaro et al. show that the disruption of respiratory complex assembly results in Drosophila melanogaster results in the accumulation of respiratory supercomplexes. Further, they show that the change in the supercomplex abundance does not impact respiratory function suggesting that the main role of supercomplex formation is structural. Overall, the manuscript is well written and the results and conclusion are supported. The D. melanogaster system, in which the abundance of supercomplexes can be altered through the genetic disruption of the assembly of the individual complexes, will be important for the field to discover the role of the supercomplexes. This manuscript will be of broad interest to the field of mitochondrial bioenergetics. The findings are valuable and the evidence is convincing.

      Strengths

      The system developed in which the relative levels of SCs can be varied will be extremely useful for studying SC physiology.

      The experiments are clearly described and interpreted.

      Weaknesses

      The statement in the abstract regarding low amounts of SCs in "insect tissues" needs further support or should be narrowed. I am only aware of detailed characterization of the mitochondrial SC composition from D. melanogaster, which is insufficient to make a broad statement about the large and diverse category of insects. This should be rewritten.

      Thank you for the comment. We have amended the text accordingly.

      In the introduction (line 76) and discussion (line 283), the authors reference the CoQ binding sites in CI and CIII2 being "too far apart" to allow for substrate channeling. The distance between the active sites, though significant, is insufficient to rule out substrate channeling. A stronger argument arises from the fact that the CoQ sites of both CI and CIII2 are open to the membrane and that there are no clear barriers for the free exchange of CoQ with the membrane pool.

      Thank you for the comment. We have modified both sentences accordingly.

      Line 195, the slight elevation in CI amounts referred to here, does not appear to be statistically significant and therefore should not be discussed a being altered relative to the control.

      To address this point of criticism we have revisited the statistical analysis, originally done by 2-way ANOVA and post-hoc test. After giving it some thought, we now consider that this might not have been the correct way to analyze either the mitochondrial respiratory chain (MRC) activity data or the densitometric quantifications. We have now used unpaired two-tailed Student’s t-test to compare the pairs of either KO or KD vs CTRL. The reason is that since the measurement of each individual MRC activity is actually an independent assay, it should be considered separately. The same applies to the densitometry because the absolute values of the intensity of individual CI and that within SCs largely differ. Therefore, we think that it is more correct to compare the abundance of individual CI in the WT vs. either KO or KD pairs and the abundance of the CI in SC independently using a t-test. With these new statistical analyses, the difference in the enzyme activity of CI reported in figure 4D is now significant, which we consider reflects better our observations. Also, with these new analyses, the difference in the amounts of CI+CIII are significantly higher in the Coa8 KD (Figure S1B). Therefore, the original affirmation is correct and we have left the sentence as it was.

      Figure 4H, the assignments of the observed larger bands seem incorrect. The largest band (currently assigned as SC I1+III2+IV1) represents too large of a shift for only the addition of CIV and the band currently assigned at SC I1+III2 appears to also contain CIV. The identity of these bands should be reevaluated and additional experiments are needed to definitively prove their identity. This uncertainty should be addressed experimentally or made more explicit in the text.

      Thank you for the comment. Taking a closer look at the images, we have to agree with the Reviewer that the assignment was incorrect. The higher band is too large indeed and the reviewer is correct that the band that we previously assigned as CI1+CIII2 does appear to contain CIV as well. Therefore, we have changed the labeling of that to CI1+CIII2+CIV1 because the stoichiometry is compatible with the apparent MW. Also, we have renamed the higher MW band to HMW-SC (high-MW SC) of uncertain nature (unknown stoichiometry) but clearly containing all three complexes I, III and IV. We amended the text (lines 219-221) plus figures 5H and S1 accordingly.

      Line 302, the authors state that the structural basis for less SC in D. melanogaster is "due to a more stable association of the NDUFA11 subunit..." However, this would not result is a less stable SC association and only explains why NDUFA11 is more stably associated with CI in the absence of CIII2. The more likely structural reason for the observation of less SC in D. melanogaster is the N-terminal truncation of Dm-NDUFB4 relative to mammalian NDUFB4. This truncation results in the loss of a major SC interaction site between CI and CIII2 in the matrix.

      Thank you for pointing this out. We have amended the text accordingly.

      Reviewer #2 (Public Review):

      Respiratory chain complexes assemble in higher-ordered structures termed supercomplexes or respirasomes. The functional significance of these assemblies is currently investigated, there are two main hypothesis tested, namely that supercomplexes provide kinetic advantages or structural stability. Here, the authors use the fruitfly to reveal that, while the respiratoy chain in the organism normally does not form higher-order assemblies, it does so under conditions when their assembly is impaired. Because the rather moderate increase in supercomplex formation does not change oxygen consumption stimulated by CI or CII substrate, the authors conclude that supercomplex formation has more a structural than a functional role. The main strength of this work is that the technical quality of the experiments is high and that the authors induced defects in respiratory chain assembly through sets of well-controlled genetic models. The obtained data are mostly descriptive using standard approaches and are very well executed. The authors claim that their experiments allow to conclude that the role of supercomplex formation is restricted to a structural role and, hence, exclude a function directly related to electron transport efficiency. However, while the authors can show convincingly that supercomplexes form in the mutants, but not in the wild type, their main claim is not well supported by data and both the structural mechanism of supercompelx formation and their significance remain unknown. While the supercomplex formation observed only in mitochondrial mutants per se is interesting, it would be good to great to define structural aspects of supercomplex formation and their potential impact on the stability of the respiratory chain complexes in these mutants.

      We thank the Reviewer for the positive assessment of our work and the suggestions to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The sentence on line 90, which starts "This is in contrast with..." is unclear and needs to be rewritten.

      Thank you. We have modified the sentence to make it clearer.

      Lines 153 and 155, reference is made to tissue specific expression patterns but no literature reference is provided.

      Thank you for the comment. The tissue specific expression patterns of the different isoforms are reported in the FlyBase database. We added the link to website in the text.

      Line 188, "...homogenates in presence of..." should read "homogenates in the presence of..."

      Thank you. Amended.

      Line 336, "...lower to the increase..." should read "...lower than the increase..."

      Thank you. Amended.

      Reviewer #2 (Recommendations For The Authors):

      • In order to unravel the molecular mechanism by which supercomplexes form in the mutant, it would be important to identify the factor mediating this. Prime candidates would be additional proteins that co-purify of co-fractionate with the respiratory chain when they assemble into supercomplexes or changes in the lipid composition of the mitochondria, where cardiolipin has been shown to stabilize supercomplex formation. The inclusion and analysis of complexome data for all mutants would be excellent, plus an MS analysis of a purified supercomplex.

      Thank you for the suggestion to which we completely agree. We have taken a closer look to the hierarchical clustering of peptide intensities in our complexome profiling data, which clusters the proteins according to their similarity in electrophoretic migration within the complexes. We have specifically looked for proteins in which the peptide intensity changed in a similar fashion as the complex I structural subunits. Among the four candidate proteins (Uniprot IDs Q8SXY6, Q95T19, Q9W0Y6, Q9VJQ3), only Q95T19 — Serine--tRNA synthetase-like protein Slimp is annotated as a mitochondrial protein. This protein is a Drosophila-specific paralog of the mitochondrial Serine-tRNA synthetase generated by gene duplication (PMID: 20870726), which carries out a function linking mitochondrial translation with mtDNA maintenance (PMID: 30943413). Therefore, in principle we would not consider it as a good candidate to be a ‘SC assembly factor’. The identification of factors promoting the formation of SC in Drosophila under these conditions is definitely an important point warranting future investigation.

      • The authors could define the stability of the respiratory chain complexes through metabolic pulse-chase labeling experiments. This could reveal that the role of supercomplex formation is indeed structural, improving stability.

      We agree that this would be an important piece of information to understand the phenomenon we have observed. Unfortunately, it is technically impossible to perform metabolic labeling of mitochondrial proteins in whole flies. It would be possible to perform in organello pulse-chase labelling, however our previous experience indicates that complex I does not completely assemble de novo in isolated mitochondria (PMID: 20385768).

      • The authors should analyze oxygen consumption from mitochondria isolated from larvae as in the other experiments on enzyme activities or the (high-quality) BN-PAGE, and not from whole flies that are homogenized. Moreover, they need to determine the quantities of the complexes by complementary experiments (MS, Western blotting or spectroscopy).

      Thank you for the comments. However, we believe that repeating the entire analyses with the larvae would not add significant information to the work and the main interpretation would not change, as the main claim of the paper is based on the data collected on adult flies. In addition, the band patterns of MRC complexes in the BNGE is the same between larvae and adults and therefore, does not depend on the developmental stage. Regarding the quantification of the complexes, we think that the data provided by using complementary approaches such as in gel activity assays (IGA), western blot (WB) and kinetic assays of MRC enzymatic activities, allowed us to confidently determine the amount of the individual complexes. Hence, we performed IGA assays and enzymatic activity assays (which reflect the amounts of fully assembled and functional complexes) in triplicate (independent samples). For the WB analyses, due to the scarcity of some of the antibodies available to detect the Dm MRC proteins, which were a kind gift of Dr. Edward Owusu-Ansah (Columbia University), we decided to pool the three independent samples of each group before running them through the Blue-Native gels. The densitometric curves of the WB bands (Figure S2) show the abundance of each individual MRC complex within the ‘free’ and SC forms. We prioritized the BN analyses over SDS-PAGE and WB analysis, as we consider that just measuring the steady-state levels of MRC subunits is not as informative, because it is possible that certain subunits are present in the mitochondrial membranes but not assembled into the final mature structures.

      • Can changes in Coenzyme Q levels explain the absence of a defect on electron transport? This could be determined for the mutant as well as the wild type animals.

      We agree that this would be a relevant aspect to investigate. For example, determining whether lower CoQ levels are able to maintain the same respiratory activities in the models in which higher amounts of SCs are formed, as it was proposed in Shimada et al. (PMID: 29191512) would be very interesting. However, the fact that the mild KD models show no MRC enzymatic defects whatsoever (Figure 4D, Figure 5I and Figure 6I), provides the most straightforward explanation to the observed absence of respiratory defects.