26,925 Matching Annotations
  1. Dec 2023
    1. Reviewer #2 (Public Review):

      This paper demonstrates that model-free reinforcement learning, with relatively small networks, is sufficient to observe collaborative hunting in predator prey environments. The paper then studies the conditions under which collaborative hunting emerges (namely, difficulty of hunting and sharing of the spoils) which is an interesting question to study and the paper contains a fascinating study in which a human is tasked with controlling the prey. However, the simplicity of the environment, a 2-d particle world with simple dynamics, makes it unclear how generalizable the results are and the results rely heavily on visual interpretation of t-SNE plots rather than more direct metrics.

      Strengths:<br /> - The distinct behaviors uncovered between the predators in shared vs. not-shared reward are quite interesting!<br /> - The realization that the ability of deep RL models to solve predator-prey problems has implication for models of what is needed for collaborative hunting is clever.

      Weaknesses:<br /> - The paper seems to make a claim that since this problem is solvable with model-free learning or a model-free decision tree, complicated cognition is not needed for collaborative hunting. However, the settings under which this hunting is done is exceedingly simple and it is possible that in more complex settings such as more partially observable settings or settings where the capabilities of the partners are unknown then more complicated forms of cognition might still be needed.<br /> - The problem is fully observed (I think), so there may be one uniquely good strategy that the predators can use that will work successfully against all prey. If this is the case, the human studies are of limited value, they are just confirming that the problem has a near-deterministic solution on the part of the predators.

    2. Reviewer #3 (Public Review):

      This paper aims to understand the nature of collaborative hunting. It sets out by first defining simple conditions under which collaborative hunting emerges, which leads to the emergence of a toy environment. The environment itself is simple, K prey chasing a single predator with no occlusions. I find this a little strange, since it was my understanding that collaborative hunting emerges in part because the presence of occlusions allows for more complex strategies that require planning.

      That being said, I do think the environment is sufficient for this paper, and I quite enjoyed using it to run some toy experiments. It reminds me of some of the simpler environments from Petting Zoo, a library for multi-agent learning in reinforcement learning.

      Once a simple environment was established, the authors fit a reinforcement learning model to the environment. In this case, the model is Q-learning. The predator and prey are treated as separate agents in the environment, each with their own independent Q functions. Each agent gets full observability of the surroundings. As far as I understand, the predators do not share an action space, and so they can only collaborate implicitly by inferring the actions of the other predators. However, there is a version of these experiments wherein the reward function is shared, all agents receiving a 1 when the prey is caught. One limitation of the current work is that it does not consider reinforcement learning methods methods wherein a value function is shared. This is a current dominant strategy in multi-agent RL. See for example OpenAI Five and also Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Missing these algorithms limits the scope of the work.

      Having fit an RL model, the next order of business is to try and search for internal representations in the agent's model that correspond to collaboration. The author's use t-SNE embeddings of the agents last hidden layers in the policy network.

      Analyzing these embeddings in Figure 3, we see that there are some representations that correspond to specific types of collaborative behavior, which indicates that the model is indeed learning to encode collaboration. I should note that this is not surprising from an RL perspective. Certainly, we are aware that Multi-Agent actor critic methods can exhibit cooperative behavior. See Emergent tool use from multi-agent interaction where agents jointly learn to push a table together. It is true that earlier work didn't specifically identify the units responsible for this behavior, and I think this work should be lauded for the novelty it brings to this discussion.

      A large underlying point of this paper seems to be that we we need to consider these simple toy environments where we can easily train Q-learning, because it is impossible to analyze the behaviors that emerge from real animal behavior. See lines 187-189. This makes sense on the surface, because there are no policy weights in the case of real-world behavior. However, it is unfortunately misleading. It is entirely possible to take existing animal behavior, fit a linear model (or a deep net) to this behavior, and then do t-SNE on this fit model. This is referred to as behavioral cloning. What's more, offline RL makes it entirely possible to fit a Q-function to animal behaviors, in which case the exact same t-SNE analysis can be carried out without ever running Q-learning in the environment. From my perspective, the fact that RL is not needed to carry out the paper's main analysis is the biggest weakness of the paper.

      Meanwhile, I do think the comparisons with human players was exceptionally interesting, and I'm glad it was included in this work.

      Finally, I would like to speak to the reinforcement learning sections of this paper, as this is my personal area of expertise. I will note that the RL used in this paper is all valid and correct. The descriptions of Q-learning and its modifications are technically accurate. It's worth noting that the performance offered by the Q-learning methods in this paper is not particularly close to optimal. I mean this in two ways. First, cooperative RL methods are much more performant. Second, the Q-learning implementation considered by the author's is far below state of the art standards.

      I will also note that, from the perspective of RL, there is no novelty in the paper. Indeed, many Deep Mind papers, including the original Q-learning paper, have similar t-SNE embeddings to try and understand the action space. And works such as Sentiment Neuron and Visualizing and Understanding Recurrent Networks, among many many others, have focused on the problem of understanding the correspondence between network weights and behaviors. Thus, the novelty must come from a biological perspective. Or perhaps from a perspective at the intersection of biology and RL. I do believe this is an area worth further studying.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors develop new models of sequential effects in a simple Bernoulli learning task. In particular, the authors show evidence for both a "precision-cost" model (precise posteriors are costly) and an "unpredictabilitycost" model (expectations of unpredictable outcomes are costly). Detailed analyses of experimental data partially support the model predictions.

      Strengths:

      • Well-written and clear.

      • Addresses a long-standing empirical puzzle.

      • Rigorous modeling.

      Weaknesses:

      • No model adequately explains all of the data.

      • New empirical dataset is somewhat incremental.

      • Aspects of the modeling appear weakly motivated (particularly the unpredictability model).

      • Missing discussion of some relevant literature.

      We thank Reviewer #1 for her/his positive comments on our work and her/his comments and suggestions.

      Reviewer #2 (Public Review):

      This paper argues for an explanation of sequential effects in prediction based on the computational cost of representing probability distributions. This argument is made by contrasting two cost-based models with several other models in accounting for first- and second-order dependencies in people's choices. The empirical and modeling work is well done, and the results are compelling.

      We thank Reviewer #2 for her/his positive comments on our work.

      The main weaknesses of the paper are as follows:

      1) The main argument is against accounts of dependency based on sensitivity to statistics (ie. modeling the timeseries as having dependencies it doesn't have). However, such models are not included in the model comparison, which makes it difficult to compare these hypotheses.

      Many models in the sequential-effects literature (Refs. [7-12] in the manuscript) are ‘leaky-integration’ models that interpret sequential effects as resulting from an attempt to learn the statistics of a sequence of stimuli, through exponentiallydecaying counts of the simple patterns in the sequence (e.g., single stimuli, repetitions, and alternations). In some studies, the ‘forgetting’ of remote observations that results from the exponential decay is justified by the fact that people live in environments that are usually changing: it is thus natural that they should expect that the statistics underlying the task’s stimuli undergo changes (although in most experiments, they do not), and if they expect changes, then they should discard old observations that are not anymore relevant. This theoretical justification raises the question as to why subjects do not seem to learn that the generative parameters in these tasks are in fact not changing — all the more as other studies suggest that subjects are able to learn the statistics of changes (and consistently they are able to adapt their inference) when the environment does undergo changes (Refs. [42,57]).

      Our models are derived from a different approach: we derive behavior from the resolution of a problem of constrained optimization of the inference process. It is not a phenomenological model. When the constraint that weighs on the inference process is a cost on the precision of the posterior, as measured by its entropy, we find that the resulting posterior is one in which remote observations are ‘forgotten’, through an exponentially discount, i.e., we recover the predictions of the leaky-integration models, which past studies have empirically found to be reasonably good accounts of sequential effects. (Thus these models are already in our model comparison.) In our framework, the sequential effects do not stem from the subjects’ irrevocable belief that the statistics of the stimuli change from time to time, but rather from the difficulty that they have in representing precise belief; a rather different theoretical justification.

      Furthermore, we show that a large fraction of subjects are not best-fitted by precision-cost models (i.e., they are not best-fitted by leaky integration), but instead they are best fitted by unpredictability-cost models. These models suggest a different explanation of sequential effects: that they result from the subjects favoring predictable environments, in their inference. In the revised version of the manuscript, we have made clearer that the derivation of the optimal posterior under a precision cost results in the exponential forgetting of remote observations, as in the leaky-integration models. We mention it in the abstract, in the Introduction (l. 76-78), in the Results when presenting the precision-cost models (l. 264-278), and in the Discussion (l.706-716).

      2) The task is not incentivized in any way. Since incentives are known to affect probability-matching behaviors, this seems important. In particular, we might expect incentives would trade off against computational costs - people should increase the precision of their representations if it generates more reward.

      We thank Reviewer #2 for her/his attention to our paper and for her/his comments. As for the point on the models, see answer above (point 1).

      As for the point on incentivization: we agree that it would be very interesting to measure whether and to which extent the performance of subjects increases with the level of incentivization. Here, however, we wanted, first, to establish that subjects’ behavior could be understood as resulting from inference under a cost, and second, to examine the sensitivity of their predictions to the underlying generative probability — rather than to manipulating a tradeoff involving this cost (e.g. with financial reward). We note that we do find that subjects are sensitive to the generative probability, which implies that they exhibit some degree of motivation to put some effort in the task (which is the goal of incentivization), in spite of the lack of economic incentives. But it would indeed be interesting to know how the potential sensitivity to reward interacts with the sensitivity to the generative probability. Furthermore, as Reviewer #2 mentions, some studies show that incentives affect probability-matching behavior: it is then unclear whether the introduction of incentives in our task would change the inference of subjects (through a modification of the optimal trade-off that we model); or whether it would change their probability-matching behavior, as modeled by our generalized probability-matching response-selection strategy; or both. Note that we disentangled both aspects in our modeling and that our conclusions are about the inference, not the response-selection strategy. We deem the incentivization effects very much worth investigating; but they fall outside of the scope of our paper.

      We now mention this point in the Discussion of the revised manuscript (l. 828-840).

      3) The sample size is relatively small (20 participants). Even though a relatively large amount of data is collected from each participant, this does make it more difficult to evaluate the second-order dependencies in particular (Figure 6), where there are large error bars and the current analysis uses a threshold of p < .05 across a large number of tests hence creating a high false-discovery risk.

      Indeed we agree with Reviewer #2 that as the number of tests increases, so does the probability that at least one null hypothesis is rejected at a given level, even if the null hypothesis is correct. But in the panels a, b and c of Figure 6, about half of the tests are rejected, which is very unlikely under the null hypothesis that there is no effect of the stimulus history on the prediction, all the more as the signs of the non-significant results are in most cases consistent with the direction of the significant results. (In panel e, which reports a finer analysis in which the number of subjects is essentially divided by 2, about a fourth of the tests are rejected, and here also the non-significant results are almost all in the same direction as the significant ones.)

      However, we agree that there remains a risk of false discovery, thus we applied a Bonferroni-Holm-Šidák correction to the p-values in order to mitigate this risk. With these more conservative p-values, a lower number of tests are rejected, but in most cases in Fig. 6abc the effects remain significant. In particular, we are confident that there is a repulsive effect of the third-to-last stimulus in the case of Fig. 6c, while there is an attractive effect in the other cases.

      In the revised manuscript, Figure 6 now reports whether the tests are rejected when the p-values are corrected with the Bonferroni-Holm-Šidák correction.

      (We also applied this correction to the p-values of the tests in Fig. 2, which has more data: the corrected p-values are all below 1e-13, which we now indicate in the caption of this figure.)

      4) In the key analyses in Figure 4, we see model predictions averaged across participants. This can be misleading, as the average of many models can produce behavior outside the class of functions the models themselves can generate. It would be helpful to see the distribution of raw model predictions (ideally compared against individual data from humans). Minimally, showing predictions from representative models in each class would provide insight into where specific models are getting things right and wrong, which is not apparent from the model comparison.

      In the main text of the original manuscript, we showed the behavior of the pooled responses of the best-fitting models, and we agree with Reviewer #2 that it did not make clear to the reader that the apparent ability of the models to reproduce the subjects’ behavioral patterns was not a misleading byproduct of the averaging of different models. In the original version of the manuscript, we had put a figure showing the behavior of each individual model (each cost type with each Markov order) in the Methods section of the paper; but this could easily be overlooked, and indeed it would be beneficial for the reader to be shown the typical behaviors of the models, in the main text. We have reorganized the presentation of the models’ behaviors: the first panels in Fig. 4 (in the main text) are now dedicated to showing the individual sequential effects of the precision-cost and of the unpredictabilitycost models with Markov order 0 and 1. The Figure 4 is reproduced in the response to Reviewer #1, above, along with comments on the sequential effects produced by these models (and also on the impact of the generalized probability-matching response-selection strategy, in comparison with the traditional probability matching). We believe that this figure makes clearer how the individual models are able to reproduce the patterns in subjects’ predictions — in particular it shows that this ability of the models is not just an artifact of the averaging of many models, as was the legitimate concern of Reviewer #2. We have left the illustration of the firstorder sequential effects of the other models (with Markov order 2 and 3) in the Methods section (Fig. 7), so as not to overload Fig. 4, and because they do not bring new critical conceptual points.

      As for the higher-order sequential effects, the updated Figure 5, also reproduced above in the responses to Reviewer #1, now includes the sequential effects obtained with the precision-cost model of a Bernoulli observer (m=0), in addition to the precision-cost model of a Markov observer (m=1) and to the unpredictabilitycost model of a Markov observer (m=3), in order to better illustrate the behaviors of the different models. The higher-order sequential effects of the other models can be found in Fig. 8 in Methods.

      Reviewer #3 (Public Review):

      This manuscript offers a novel account of history biases in perceptual decisions in terms of bounded rationality, more specifically in terms of finite resources strategy. Bridging two works of literature on the suboptimalities of human decision-making (cognitive biases and bounded rationality) is very valuable per se; the theoretical framework is well derived, building upon the authors' previous work; and the choice of experiment and analysis to test their hypothesis is adequate. However, I do have important concerns regarding the work that do not enable me to fully grasp the impact of the work. Most importantly, I am not sure whether the hypothesis whereby inference is biased towards avoiding high precision posterior is equivalent or not to the standard hypothesis that inference "leaks" across time due to the belief that the environment is not stationary. This and other important issues are detailed below. I also think that the clarity and architecture of the manuscript could be greatly improved.

      We thank Reviewer #3 for her/his positive comments on our work and her/his comments and suggestions.

      1) At this point it remains unclear what is the relationship between the finite resources hypothesis (the only bounded rationality hypothesis supported by the data) and more standard accounts of historical effects in terms of adaptation to a (believed to be) changing environment. The Discussion suggests that the two approaches are similar (if not identical) at the algorithmic level: in one case, the posterior belief is stretched (compared to the Bayesian observer for stationary environments) due to precision cost, in other because of possible changes in the environment. Are the two formalisms equivalent? Or could the two accounts provide dissociable predictions for a different task? In other words, if the finite resources hypothesis is not meant to be taken as brain circuits explicitly minimizing the cost (as stated by the authors), and if it produces the same type of behavior as more classical accounts: is the hypothesis testable experimentally?

      We agree with Reviewer #3 that the relation between our approach and other approaches in the literature should be made clearer to the reader.

      Since the 1990s, in the psychology and neuroscience literature, many models of perception and decision-making have featured an exponential decay of past observations, resulting in an emphasis, in decisions, of the more recent evidence (‘leaky integration’, Refs. [7-12, 76-86]). In the context of sequential effects, this mechanism has found a theoretical justification in the idea that people believe that statistics typically change, and thus that remote observations should indeed be discarded [8,12]. In inference tasks with binary signals, in which the optimal Bayesian posterior is in many cases a Beta distribution whose two parameters are the counts of the two signals, one way to conveniently incorporate a forgetting mechanism is to replace these counts with exponentially-filtered counts, in which more recent observations have more weight (e.g., Ref. [12]).

      Our approach to sequential effects is not grounded in the history of leakyintegration models: we assume, first, that subjects attempt at learning the statistics of the signals presented to them (this is also the assumption in many studies [712]), and second, that their inference is subject to a cost, which prevents them from reaching the optimal, Bayesian posterior; but under the constraint of this cost, they choose the optimal posterior. We formalize this as a problem of constrained optimization.

      The two formalisms are thus not equivalent. Beyond the fact that we clearly state the problem which we assume the brain is solving, we do not propose that the origin of sequential effects resides in an adaptation to putatively changing environments: instead, we assume that they originate in a cognitive cost internal to the decision-maker. If this cost is proportional to the entropy of the posterior, as in our precision cost, then the optimal approximate posterior is one in which remote observations are ‘forgotten’ through an exponential filter, as in the leakyintegration models. In other words, in the context of this task and with this kind of cost, the models are, as Reviewer #3 writes, identical at the algorithmic level. As for the unpredictability cost, it does not result in a solution that resembles leaky integration; about half the subjects, however, are best fitted by unpredictabilitycost models. We thus provide a different rationale for sequential effects — that the brain favors predictive environment, in its inference — and this alternative account is successful in capturing the behavior of a large fraction of the subjects.

      In the revised manuscript, we now clarify that the precision cost results in leaky integration, in the abstract, in the Introduction (l. 76-78), in our presentation of the precision-cost models (Results section, l. 264-275), and in the Discussion (l. 706716). (We also refer Reviewer #3 to our response to the first comment of Reviewer #2, above.)

      Finally, Reviewer #3 asks the interesting question as to whether the “two accounts provide dissociable predictions for a different task”. Given that the leakyintegration approach is justified by an adaptation to potential changes, and our approach relies on the hypothesis that precision in beliefs is costly, one way to disentangle the two would be to eliminate the sequential nature of the task and presenting instead observations simultaneously. This would eliminate the mere notion of change across time. In this case, the leaky account would predict that subjects’ inference becomes optimal (because the leak should disappear in the absence of change), while in the second approach the precision cost would still weigh on the inference, and result in approximate posteriors that are “wider” (less precise) than the optimal one. The resulting divergence in the predictions of these models is very interesting, but out of the scope of this study on sequential effects.

      2) The current analysis of history effects may be confounded by effects of the motor responses (independently from the correct response), e.g. a tendency to repeat motor responses instead of (or on top of) tracking the distribution of stimuli.

      We thank Reviewer #3 for pointing out the possibility that subjects may have a tendency to repeat motor responses that is not related to their inference.

      We note that in Urai et al., 2017, as in many other sensory 2AFC tasks, successive trials are independent: the stimulus at a given trial is a random event independent of the stimulus at the preceding trial; the response at a given trial should in principle be independent of the stimulus at the preceding trial; and the response at the preceding trial conveys no information about the response that should be given at the current trial (although subjects might exhibit a serial dependency in their responses). By contrast, in our task an event is more likely than not to be followed by the same event (because observing this event suggests that its probability is greater than .5); and a prediction at a given trial should be correlated with the stimuli at the preceding trials, and with the predictions at the preceding trials. In a logit model (or any other GLM), this would mean that the predictors exhibit multicollinearity, i.e., they are strongly correlated. Multicollinearity does not reduce the predictive power of a model, but it makes the identification of parameters extremely unreliable: in other words, we wouldn’t be able to confidently attribute to each predictor (e.g., the past observations and the past responses) a reliable weight in the subjects’ decisions. Furthermore, our study shows that past stimuli can yield both attractive and repulsive effects, depending on the exact sequence of past observations. To capture this in a (generalized) linear model, we would have to introduce interaction terms for each possible past sequence, resulting in a very high number of parameters to be identified.

      However, this does not preclude the possibility that subjects may have a motor propensity to repeat responses. In order to take this hypothesis into account, we examined the behavior and the ability to capture subjects’ data of models in which the response-selection strategy allows for the possibility of repeating, or alternating, the preceding response. Specifically, we consider models that are identical to those in our study, except for the response-selection strategy, which is an extension of the generalized probability-matching strategy, in which a parameter eta, greater than -1 and lower than 1, determines the probability that the model subject repeats its preceding response, or conversely alternates and chooses the other response. With probability 1-|η|, the model subject follows the generalized probability-matching response-selection strategy (parameterized by κ). With probability |η|, the model subject repeats the preceding response, if η > 0, or chooses the other response, if η < 0. We included the possibility of an alternation bias (negative η), but we find that no subject is best-fitted by a negative η, thus we focus on the repetition bias (positive η). We fit the models by maximizing their likelihoods, and we compared, using the Bayesian Information Criterion (BIC), the quality of their fit to that of the original models that do not include a repetition propensity.

      Taking into account the repetition bias of subjects leaves the assignment of subjects into two families of inference cost mostly unchanged. We find that for 26% of subjects the introduction of the repetition propensity does not improve the fit (as measured by the BIC) and can therefore be discarded. For 47% of subjects, the fit is better with the repetition propensity (lower BIC), and the best-fitting inference model (i.e., the type of cost, precision or unpredictability, and the Markov order) is the same with or without repetition propensity. Thus for 73% (=26+47) of subjects, allowing for a repetition propensity does not change the inference model. We also find that the best-fitting parameters λ and κ, for these subjects, are very stable, when allowing or not for the repetition propensity. For 11% of subjects, the fit is better with the repetition propensity, and the cost type of the inference model is the same (as without the repetition propensity), but the Markov order changes. For the remaining 16%, both the cost type and the Markov order change.

      Thus for a majority of subjects, the BIC is improved when a repetition propensity is included, suggesting that there is indeed a tendency to repeat responses, independent of the subjects’ inference process and generative stimulus probability. In Figure 7, in Methods, we show the behavior of the models without repetition propensity, and with repetition propensity, with a parameter η = 0.2 close to the average best-fitting value of eta across subjects. We show, in Methods, that (i) the unconditional probability of a prediction A, p(A), is the same with and without repetition propensity, and that (ii) the conditional probabilities p(A|A) and p(A|B) when η≠0 are weighted means of the unconditional probability p(A) and of the conditional probabilities when eta=0 (see p. 47-49 of the revised manuscript).

      In summary, our results suggest that a majority of subjects do exhibit a propensity to repeat their responses. Most subjects, however, are best-fitted by the same inference model, with or without repetition propensity, and the parameters λ and κ are stable, across these two cases; this speaks to the robustness of our model fitting. We conclude that the models of inference under a cost capture essential aspects of the behavioral data, which does not exclude, and is not confounded by, the existence of a tendency, in subjects, to repeat motor responses.

      In the revised manuscript, we present this analysis in Methods (p.47-49), and we refer to it in the main text (l. 353-356 and 400-406).

      3) The authors assume that subjects should reach their asymptotic behavior after passively viewing the first 200 trials but this should be assessed in the data rather than hypothesized. Especially since the subjects are passively looking during the first part of the block, they may well pay very little attention to the statistics.

      The assumptions that subjects reach their asymptotic behavior after being presented with 200 observations in the passive trials should indeed be tested. To that end, we compared the behavior of the subjects in the first 100 active trials with their behavior in the remaining 100 active trials. The results of this analysis are shown in Figure 9.

      For most values of the stimulus generative probability, the unconditional proportions of predictions A, in the first and the second half (panel a, solid and dashed gray lines), are not significantly different (panel a, white dots), except for two values (p-value < 0.05; panel a, filled dots). Although in most cases the difference between the two is not significant, in the second half the proportions of prediction A seem slightly closer to the extremes (0 and 1), i.e., closer to the optimal proportions. As for the sequential effects, they appear very similar in the two halves of trials. We conclude that for the purpose of our analysis we can reasonably consider that the behavior of the subjects is stationary throughout the task.

      4) The experiment methods are described quite poorly: when is the feedback provided? What is the horizontal bar at the bottom of the display? What happens in the analysis with timeout trials and what percentage of trials do they represent? Most importantly, what were the subjects told about the structure of the task? Are they told that probabilities change over blocks but are maintained constant within each block?

      We thank Reviewer #3 for her/his close attention to the details of our experiment. Here are the answers to the reviewer’s questions:

      • The feedback (i.e., a lightning strike on the left or the right rod, with the rod and the battery turning yellow if the strike is on the side predicted by the subject,) is immediate, i.e., it is provided right after the subject makes a prediction, with no delay. We now indicate this in the caption of Figure 1.

      • The task is presented to the subjects as a game in which predicting the correct location of the lightning strike results in electric power being collected in the battery. The horizontal bar at the bottom of the display is a gauge that indicates the amount of power collected in the current block of trials. It has no operational value in the task. We now mention it in the Methods section (l. 872-874).

      • The timeout trials were not included in the analysis. The timeout trials represented 1.27% of the trials, on average (across subjects); and for 95% of the subjects the timeout trials represented less than 2.5% of the trials. This information was added in Methods (l. 887-889).

      • Each new block of trials was presented to the subject as the lightning strikes occurring in a different town. The 200 passive trials at the beginning of each block, in which subjects were asked to observe a sequence of 200 strikes, were presented as the ‘track record’ for that town, and the instructions indicated that it was ‘useful’ to know this track record. No information was given on the mechanism governing the locations of the strikes. In the main text of the revised manuscript, we now include these details when describing the task (p. 6).

    1. eLife assessment

      This study provides a valuable investigation into whether phenotypic variance due to interactions between genetic variants can be measured using genome-wide association summary statistics. The authors present a method, i-LDSC, that uses statistics on the correlations between genotypes at different loci (linkage disequilibrium) to estimate the phenotypic variance explained by both additive genetic effects and pairwise interactions. While the authors present extensive simulations on the performance of their method and empirical results indicating the presence of epistasis (as they define epistasis) it is unclear how their method and results relate to the traditional definitions of additive and non-additive genetic effects, which are different from the authors' definitions.

    2. Joint Public Review:

      LD Score regression (LDSC) is a software tool widely used in the field of genome-wide association studies (GWAS) for estimating heritabilities, genetic correlations, the extent of confounding, and biological enrichment. LDSC is for the most part not regarded as an accurate estimator of \emph{absolute} heritability (although useful for relative comparisons). It is relied on primarily for its other uses (e.g., estimating genetic correlations). The authors propose a new method called \texttt{i-LDSC}, extending the original LDSC in order to estimate a component of genetic variance in addition to the narrow-sense heritability---epistatic genetic variance, although not necessarily all of it. Epistasis in quantitative genetics refers to the component of genetic variance that cannot be captured by a linear model regressing total genetic values on single-SNP genotypes. \texttt{i-LDSC} seems aimed at estimating that part of the epistatic variance residing in statistical interactions between pairs of SNPs. To simplify, the basic model of \texttt{i-LDSC} for two SNPs $X_1$ and $X_2$ is<br /> \begin{equation}\label{eq:twoX}<br /> Y = X_1 \beta_1 + X_2 \beta_2 + X_1 X_2 \theta + E,<br /> \end{equation}<br /> and estimation of the epistatic variance associated with the product term proceeds through a variant of the original LD Score that measures the extent to which a SNP tags products of genotypes (rather than genotypes themselves). The authors conducted simulations to test their method and then applied it to a number of traits in the UK Biobank and Biobank Japan. They found that for all traits the additive genetic variance was larger than the epistatic, but for height the absolute size of the epistatic component was estimated to be non-negligible. An interpretation of the authors' results that perhaps cannot be ruled out, however, is that pairwise epistasis overall does not make a detectable contribution to the variance of quantitative traits.

      Major Comments

      This paper has a lot of strong points, and I commend the authors for the effort and ingenuity expended in tackling the difficult problem of estimating epistatic (non-additive) genetic variance from GWAS summary statistics. The mere possibility of the estimated univariate regression coefficient containing a contribution from epistasis, as represented in the manuscript's Equation~3 and elsewhere, is intriguing in and of itself.

      Is \texttt{i-LDSC} Estimating Epistasis?

      Perhaps the issue that has given me the most pause is uncertainty over whether the paper's method is really estimating the non-additive genetic variance, as this has been traditionally defined in quantitative genetics with great consequences for the correlations between relatives and evolutionary theory (Fisher, 1930, 1941; Lynch & Walsh, 1998; Burger, 2000; Ewens, 2004).

      Let us call the expected phenotypic value of a given multiple-SNP genotype the \emph{total genetic value}. If we apply least-squares regression to obtain the coefficients of the SNPs in a simple linear model predicting the total genetic values, then the partial regression coefficients are the \emph{average effects of gene substitution} and the variance in the predicted values resulting from the model is called the \emph{additive genetic variance}. (This is all theoretical and definitional, not empirical. We do not actually perform this regression.) The variance in the residuals---the differences between the total genetic values and the additive predicted values---is the \emph{non-additive genetic variance}. Notice that this is an orthogonal decomposition of the variance in total genetic values. Thus, in order for the variance in $\mathbf{W}\bm{\theta}$ to qualify as the non-additive genetic variance, it must be orthogonal to $\mathbf{X} \bm{\beta}$.

      At first, I very much doubted whether this is generally true. And I was not reassured by the authors' reply to Reviewer~1 on this point, which did not seem to show any grasp of the issue at all. But to my surprise I discovered in elementary simulations of Equation~\ref{eq:twoX} above that for mean-centered $X_1$ and $X_2$, $(X_1 \beta_1 + X_2 \beta_2)$ is uncorrelated with $X_1 X_2 \theta$ for seemingly arbitrary correlation between $X_1$ and $X_2$. A partition of the outcome's variance between these two components is thus an orthogonal decomposition after all. Furthermore, the result seems general for any number of independent variables and their pairwise products. I am also encouraged by the report that standard and interaction LD Scores are ``lowly correlated' (line~179), meaning that the standard LDSC slope is scarcely affected by the inclusion of interaction LD Scores in the regression; this behavior is what we should expect from an orthogonal decomposition.

      I have therefore come to the view that the additional variance component estimated by \texttt{i-LDSC} has a close correspondence with the epistatic (non-additive) genetic variance after all.

      In order to make this point transparent to all readers, however, I think that the authors should put much more effort into placing their work into the traditional framework of the field. It was certainly not intuitive to multiple reviewers that $\mathbf{X}\bm{\beta}$ is orthogonal to $\mathbf{W}\bm{\theta}$. There are even contrary suggestions. For if $(\mathbf{X}\bm{\beta})^\intercal \mathbf{W} \bm{\theta} = \bm{\beta}^\intercal \mathbf{X}^\intercal \mathbf{W} \bm{\theta} $ is to equal zero, we know that we can't get there by $\mathbf{X}^\intercal \mathbf{W}$ equaling zero because then the method has nothing to go on (e.g., line~139). We thus have a quadratic form---each term being the weighted product of an average (additive) effect and an interaction coefficient---needing to cancel out to equal zero. I wonder if the authors can put forth a rigorous argument or compelling intuition for why this should be the case.

      In the case of two polymorphic sites, quantitative genetics has traditionally partitioned the total genetic variance into the following orthogonal components:<br /> \begin{itemize}<br /> \item additive genetic variance, $\sigma^2_A$, the numerator of the narrow-sense heritability;<br /> \item dominance genetic variance, $\sigma^2_D$;<br /> \item additive-by-additive genetic variance, $\sigma^2_{AA}$;<br /> \item additive-by-dominance genetic variance, $\sigma^2_{AD}$; and<br /> \item dominance-by-dominance genetic variance, $\sigma^2_{DD}$.<br /> \end{itemize}<br /> See Lynch and Walsh (1998, pp. 88-92) for a thorough numerical example. This decomposition is not arbitrary or trivial, since each component has a distinct coefficient in the correlations between relatives. Is it possible for the authors to relate the variance associated with their $\mathbf{W}\bm{\theta}$ to this traditional decomposition? Besides justifying the work in this paper, the establishment of a relationship can have the possible practical benefit of allowing \texttt{i-LDSC} estimates of non-additive genetic variance to be checked against empirical correlations between relatives. For example, if we know from other methods that $\sigma^2_D$ is negligible but that \texttt{i-LDSC} returns a sizable $\sigma^2_{AA}$, we might predict that the parent-offspring correlation should be equal to the sibling correlation; a sizable $\sigma^2_D$ would make the sibling correlation higher. Admittedly, however, such an exercise can get rather complicated for the variance contributed by pairs of SNPs that are close together (Lynch & Walsh, 1998, pp. 146-152).

      I would also like the authors to clarify whether LDSC consistently overestimates the narrow-sense heritability in the case that pairwise epistasis is present. The figures seem to show this. I have conflicting intuitions here. On the one hand, if GWAS summary statistics can be inflated by the tagging of epistasis, then it seems that LDSC should overestimate heritability (or at least this should be an upwardly biasing factor; other factors may lead the net bias to be different). On the other hand, if standard and interaction LD Scores are lowly correlated, then I feel that the inclusion of interaction LD Score in the regression should not strongly affect the coefficient of the standard LD Score. Relatedly, I find it rather curious that \texttt{i-LDSC} seems increasingly biased as the proportion of genetic variance that is non-additive goes up---but perhaps this is not too important, since such a high ratio of narrow-sense to broad-sense heritability is not realistic.

      How Much Epistasis Is \texttt{i-LDSC} Detecting?

      I think the proper conclusion to be drawn from the authors' analyses is that statistically significant epistatic (non-additive) genetic variance was not detected. Specifically, I think that the analysis presented in Supplementary Table~S6 should be treated as a main analysis rather than a supplementary one, and the results here show no statistically significant epistasis. Let me explain.

      Most serious researchers, I think, treat LDSC as an unreliable estimator of narrow-sense heritability; it typically returns estimates that are too low. Not even the original LDSC paper pressed strongly to use the method for estimating $h^2$ (Bulik-Sullivan et al., 2015). As a practical matter, when researchers are focused on estimating absolute heritability with high accuracy, they usually turn to GCTA/GREML (Evans et al., 2018; Wainschtein et al., 2022).

      One reason for low estimates with LDSC is that if SNPs with higher LD Scores are less likely to be causal or to have large effect sizes, then the slope of univariate LDSC will not rise as much as it ``should' with increasing LD Score. This was a scenario actually simulated by the authors and displayed in their Supplementary Figure~S15. [Incidentally, the authors might have acknowledged earlier work in this vein. A simulation inducing a negative correlation between LD Scores and $\chi^2$ statistics was presented by Bulik-Sullivan et al. (2015, Supplementary Figure 7), and the potentially biasing effect of a correlation over SNPs between LD Scores and contributed genetic variance was a major theme of Lee et al. (2018).] A negative correlation between LD Score and contributed variance does seem to hold for a number of reasons, including the fact that regions of the genome with higher recombination rates tend to be more functional. In short, the authors did very well to carry out this simulation and to show in their Supplementary Figure~S15 that this flaw of LDSC in estimating narrow-sense heritability is also a flaw of \texttt{i-LDSC} in estimating broad-sense heritability. But they should have carried the investigation at least one step further, as I will explain below.

      Another reason for LDSC being a downwardly biased estimator of heritability is that it is often applied to meta-analyses of different cohorts, where heterogeneity (and possibly major but undetected errors by individual cohorts) lead to attenuation of the overall heritability (de Vlaming et al., 2017).

      The optimal case for using LDSC to estimate heritability, then, is incorporating the LD-related annotation introduced by Gazal et al. (2017) into a stratified-LDSC (s-LDSC) analysis of a single large cohort. This is analogous to the calculation of multiple GRMs defined by MAF and LD in the GCTA/GREML papers cited above. When this was done by Gazal et al. (2017, Supplementary Table 8b), the joint impact of the improvements was to increase the estimated narrow-sense heritability of height from 0.216 to 0.534.

      All of this has at least a few ramifications for \texttt{i-LDSC}. First, the authors do not consider whether a relationship between their interaction LD Scores and interaction effect sizes might bias their estimates. (This would be on top of any biasing relationship between standard LD Scores and linear effect sizes, as displayed in Supplementary Figure~S15.) I find some kind of statistical relationship over the whole genome, induced perhaps by evolutionary forces, between \emph{cis}-acting epistasis and interaction LD Scores to be plausible, albeit without intuition regarding the sign of any resulting bias. The authors should investigate this issue or at least mention it as a matter for future study. Second, it might be that the authors are comparing the estimates of broad-sense heritability in Table~1 to the wrong estimates of narrow-sense heritability. Although the estimates did come from single large cohorts, they seem to have been obtained with simple univariate LDSC rather than s-LDSC. When the estimate of $h^2$ obtained with LDSC is too low, some will suspect that the additional variance detected by \texttt{i-LDSC} is simply additive genetic variance missed by the downward bias of LDSC. Consider that the authors' own Supplementary Table~S6 gives s-LDSC heritability estimates that are consistently higher than the LDSC estimates in Table~1. E.g., the estimated $h^2$ of height goes from 0.37 to 0.43. The latter figure cuts quite a bit into the estimated broad-sense heritability of 0.48 obtained with \texttt{i-LDSC}.

      Here we come to a critical point. Lines 282--286 are not entirely clear, but I interpret them to mean that the manuscript's Equation~5 was expanded by stratifying $\ell$ into the components of s-LDSC and this was how the estimates in Supplementary Table~S6 were obtained. If that interpretation is correct, then the scenario of \texttt{i-LDSC} picking up missed additive genetic variance seems rather plausible. At the very least, the increases in broad-sense heritability reported in Supplementary Table~S6 are smaller in magnitude and \emph{not statistically significant}. Perhaps what this means is that the headline should be a \emph{negligible} contribution of pairwise epistasis revealed by this novel and ingenious method, analogous to what has been discovered with respect to dominance (Hivert et al., 2021; Pazokitoroudi et al., 2021; Okbay et al., 2022; Palmer et al., 2023).

      REFERENCES

      Bulik-Sullivan, B., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson, N., Daly, M. J., Price, A. L., & Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47, 291-295.

      Burger, R. (2000). The mathematical theory of selection, recombination, and mutation. Wiley.

      de Vlaming, R., Okbay, A., Rietveld, C. A., Johannesson, M., Magnusson, P. K. E., Uitterlinden, A. G., van Rooij, F. J. A., Hofman, A., Groe- nen, P. J. F., Thurik, A. R., & Koellinger, P. D. (2017). Meta-GWAS Accuracy and Power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies. PLoS Genetics, 13, e1006495.

      Evans, L. M., Tahmasbi, R., Vrieze, S. I., Abecasis, G. R., Das, S., Gazal, S., Bjelland, D. W., de Candia, T. R., Haplotype Reference Consortium, Goddard, M. E., Neale, B. M., Yang, J., Visscher, P. M., & Keller, M. C. (2018). Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nature Genetics, 50, 737-745.

      Ewens, W. J. (2004). Mathematical population genetics I. Theoretical introduction (2nd ed.). Springer.

      Fisher, R. A. (1930). The genetical theory of natural selection. Oxford University Press.

      Fisher, R. A. (1941). Average excess and average effect of a gene substitution. Annals of Eugenics, 11, 53-63.

      Gazal, S., Finucane, H. K., Furlotte, N. A., Loh, P.-R., Palamara, P. F., Liu, X., Schoech, A., Bulik-Sullivan, B., Neale, B. M., Gusev, A., & Price, A. L. (2017). Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nature Genetics, 49, 1421-1427.

      Hivert, V., Sidorenko, J., Rohart, F., Goddard, M. E., Yang, J., Wray, N. R., Yengo, L., & Visscher, P. M. (2021). Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. American Journal of Human Genetics, 108, 786- 798.

      Lee, J. J., McGue, M., Iacono, W. G., & Chow, C. C. (2018). The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies. Genetic Epidemiology, 42, 783-795.

      Lynch, M., & Walsh, B. (1998). Genetics and the analysis of quantitative traits. Sinauer.

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., Sidorenko, J., Kweon, H., Goldman, G., Gjorgjieva, T., Jiang, Y., Hicks, B., Tian, C., Hinds, D. A., Ahlskog, R., Magnusson, P. K. E., Oskarsson, S., Hayward, C., Campbell, A., ... Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individu- als. Nature Genetics, 54, 437-449.

      Palmer, D. S., Zhou, W., Abbott, L., Wigdor, E. M., Baya, N., Churchhouse, C., Seed, C., Poterba, T., King, D., Kanai, M., Bloemendal, A., & Neale, B. M. (2023). Analysis of genetic dominance in the UK Biobank. Science, 379, 1341-1348.

      Pazokitoroudi, A., Chiu, A. M., Burch, K. S., Pasaniuc, B., & Sankararaman, S. (2021). Quantifying the contribution of dominance deviation effects to complex trait variation in biobank-scale data. American Journal of Human Genetics, 108, 799-808.

      Wainschtein, P., Jain, D., Zheng, Z., TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine Consoritum, Cupples, L. A., Shadyab, A. H., McKnight, B., Shoemaker, B. M., Mitchell, B. D., Psaty, B. M., Kooperberg, C., Liu, C.-T., Albert, C. M., Roden, D., Chasman, D. I., Darbar, D., Lloyd-Jones, D. M., Arnett, D. K., . . . Visscher, P. M. (2022). Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nature Genetics, 54, 263-273.

    1. eLife assessment

      This study presents valuable findings regarding inter-individual variability in the neural and behavioral effects of ketamine. The methodological approach used to characterize this variability is compelling, but the evidence to support the specificity of the changes and their genetic correlates is incomplete. The study would benefit from a more thorough examination of the specificity of the pharmacological and genetic results.

    2. Reviewer #1 (Public Review):

      In this work, 40 healthy volunteers underwent a placebo followed by a ketamine infusion during a resting state fMRI scan. The authors use principal components analysis (PCA) of the difference in global brain connectivity (GBC) between the ketamine and placebo infusions as their summary neural measure. First, a GBC map is computed after processing with the HCP minimal pipeline and removal of the global brain signal for each scan (~4.5 min, TR=700ms). Then the significant PCA components of difference between ketamine and placebo GBC maps are taken as the neural effect of interest and compared to the mean delta GBC. The first two principal components account for 24.5% of the variance of the data and had correlations with SST and PVALB cortical gene expression patterns that were above chance. No significant correlations were found between mean change in GBC and these genes. Additionally, in comparison with the mean GBC the PCs were found to better correlate with behavioral measures.

      To further support their aim to establish the multi-dimensionality of the ketamine response using their neural measure, the PCA dimensionality was estimated in external datasets that used psilocybin and LSD with sample size matching using identical processing and found lower dimensionality in these datasets.

      Effective dimensionality was calculated using the participation ratio and dataset re-sampling was used to control for sample size in this calculation, but dimensionality is also affected by motion within the sample among other noise sources, which are not well discussed. In particular, each drug may affect physiological noise in different ways and this may in turn affect their dimensionality measurement.

      A PCA decomposition of the changes (ketamine-placebo) in 31 measured behavioral variables was also performed which resulted in two major PCs which accounted for 41.4% of the variance. Following prior work, behavioral PCs were mapped onto the neural PCs to create neuro-behavioral PCs. The weighing of the PCs at the individual level was explored to compare inter-individual variability.

      In an earlier fMRI study of the timeseries response to ketamine (De Simoni, 2013) it was clear that there are both individual and regional brain response differences. Behaviorally, there is known individual variability in the response to ketamine insofar as only about 60% of depressed people will experience symptom improvement and even then to varying levels. Thus, it is good to see that the compound summary measure of the PCA of the change in GBC after ketamine follows this pattern and shows inter-individual differences.

      A strength of this paper is that it brings together multimodal and external datasets and combines them in a linked analysis to support their investigational aims. Several sets of analyses are used to draw relations between fMRI results, genetics, and behavioral measures but the range of conclusions is limited by the understandably small sample size for this kind of drug challenge study. A weakness is that the chosen summary measure (delta GBC of ketamine-placebo, followed by a group-level PCA) that has been principally developed by this lab and has not seen wide replication. The presentation of the analyses could be simplified to increase readability and impact. Nevertheless, this study provides informative steps toward the development of markers for individualized drug response.

    3. Reviewer #2 (Public Review):

      In this interesting work on the neuropharmacological effects of ketamine, the authors conducted a pharmacological functional magnetic resonance imaging (fMRI) study in 40 healthy participants receiving bolus and constant infusion of ketamine during resting-state fMRI. Data were preprocessed with the human connectome-based standard pipeline previously successfully used by the lab (FS parcellation and application of an atlas published by the group, HCP pipeline, FSL, global brain connectivity with and without global-signal regression). Briefly, GBC and principle component maps of the positive and negative syndrome scale (PANSS) were related to somatostatin and parvalbumin cortical gene expression patterns. In addition, the authors compared the effective dimensionality, i.e. eigenvalues of covariance matrices of drug vs. placebo, and found higher complexity of responses in ketamine vs. LSD and psilocybin, which is very interesting. Also, there was substantial inter-individual variation in behavioral and neurobehavioral results, which was captured by PC and GBC maps. In supplementary results, the authors also showed that the principle component PS1 highly correlated with the fMRI global signal.

      Although a complex set of analyses is presented, the paper is written very clearly and understandable. The authors did a good job of outlining the steps of their analyses in supplemental diagrams and the source code is provided. As a general remark, I consider the main strength of this work, to acknowledge the very diverse inter-individual variation of ketamine's effects and to use advanced methodological approaches to disentangle these.

      Since the drug also exhibits strong variation in clinical antidepressant responses, the methodology applied here will very likely yield interesting results applied in clinical datasets of patients with major depressive disorder.

    1. Author Response

      Reviewer #1 (Public Review):

      Sun et al. investigated the circuit mechanism of a novel type of synaptic plasticity in the projection from the visual cortex to the auditory cortex (VC-AC), which is thought to play an important role in visuo-auditory associative learning. The key question behind this paper is what is the role of CCK positive projection from the entorhinal cortex in the plasticity of VC-AC projections? They discover that the strength of VC-AC projections does not change when pairing the stimulation of this pathway with the acoustic stimulation of the auditory cortex (AC) unless CCK is applied to the AC or CCK positive projection from the entorhinal cortex to auditory cortex (EC-AC) is optogenetically stimulated. In contrast, optogenetically stimulating VC-AC projections, which express a lower level of CCK than the EC-AC projection, do not induce such synaptic plasticity. Interestingly, the data also indicates that even if the EC-AC pathway is stimulated 500ms ahead of the pairing of stimulating VC-AC pathway and the AC, the VC-AC synaptic strength can still be potentiated, consistent with the long-lasting nature of CCK as a neuropeptide. By performing a fear conditioning assay, the authors demonstrate that the CCK signaling is indeed required for the association of visual and auditory cues.

      The proposed mechanism is interesting because it not only helps explain the heterosynaptic plasticity of the visual-auditory projection but also will provide insight into how the entorhinal cortex as an association area contributes to the association of visual and auditory cues. Nevertheless, this study suffers from the lack of a few key experiments, which prevents drawing a conclusion on the contribution of CCK release from the EC-AC projection to the plasticity of the VC→AC projection.

      We are grateful for the constructive comments provided by the reviewers and appreciate the significant effort they have dedicated to reviewing our manuscript. To enhance our study and strengthen our conclusions, we have made the following revisions in response to their feedback.

      1) One main conclusion from figures 1-3 is that CCK released from the EC-AC projection is required for the plasticity of VC-AC projection in addition to pairing VALS with noise/electrical stimulation. But the data in those figures cannot exclude alternative explanations that CCK alone or the pairing CCK with either VALS or noise are sufficient to make the VC-AC synaptic connection more potent. It concerns the mechanism underlying the effect of CCK: CCK may function simply as a neuromodulator to regulate the excitatory synaptic transmission, but not to promote long term synaptic plasticity.

      Thanks for the valuable comment and pointing out the weakness. In response to the comment, we have conducted additional control experiments to reinforce our conclusions. These include: For Figure 1G, we introduced three control groups: CCK alone (Figure1-figure supplement 1F-G), CCK + presynaptic activation of VC-to-AC inputs (Figure 1-figure supplement 1H-I), and CCK + postsynaptic firing induced by noise (Figure 1-figure supplement 1J-K). Our findings from these control experiments indicate that in all three scenarios, there was no potentiation of the VC-to-AC inputs. Further details can be found in Figure 1-figure supplement 1F-K.

      For Figure 2E, we introduced three control groups: HFS laser EC-to-AC alone (Figure 2-figure supplement 1H-I), HFS laser EC-to-AC + presynaptic activation of VC-to-AC inputs (Figure 2-figure supplement 1L-M), and HFS laser + postsynaptic firing induced by noise (Figure 2-figure supplement 1P-Q). And we found that in all three scenarios, the VC-to-AC inputs were not significantly potentiated. Please see details in Figure 2-figure supplement 1.

      Given that our in vivo results already demonstrated that neither HFS laser EC-to-AC alone, nor its combination with presynaptic or postsynaptic activation, potentiated the VC-to-AC inputs, we did not replicate these control groups in our ex vivo setup. These additional experiments enhance the robustness of our findings and address the initial concerns raised.

      2) Similar issue exists in Fig. 2H and 3J. Without proper controls, it is impossible to tell whether all three conditions (HFLSEA, VALA, noise/electrical stimulation) are necessary for potentiated AC responses to acoustic/electrical stimulation.

      Same as above, we have conducted additional control experiments to reinforce our conclusions. These include:

      For Figure 2H, we also tested the noise response in the above three control groups: HFS laser EC to AC alone (Figure 2-figure supplement 1J-K), HFS laser EC-to-AC + presynaptic activation of VC-to-AC inputs (Figure 2-figure supplement 1N-O), and HFS laser + postsynaptic firing induced by noise (Figure 2-figure supplement 1R-S). And we found that fEPSPs evoked by noise stimuli were significantly potentiated after HFS laser EC-to-AC + Post (Figure 2-figure supplement 1R-S). However, there was no potentiation observed following HFS laser EC-to-AC alone (Figure 2-figure supplement 1J-K) and HFS laser EC-to-AC + Pre (Figure 2-figure supplement 1N-O).

      These results suggest that both HFS laser targeting the EC-to-AC projection and noise-induced AC firing are required to potentiate the AC's response to acoustic stimuli. In contrast, activation of the VC-to-AC projection is not necessary. This finding aligns with our previous research (Li et al., 2014).

      Given the similarity in experimental design, we opted not to replicate these specific control groups in our ex vivo setup.

      These additional control experiments have been crucial in reinforcing the conclusions of our study.

      3) Fig. 2E and 3G show that the stimulation of CCK-positive EC-AC projection is required for the plasticity of VC-AC projection. Considering most EC-AC projection neurons co-release glutamate and CCK, however, we cannot tell if CCK or glutamate or both matter to this type of plasticity. Even though the long delay in Fig 5B is consistent with the neuropeptide nature of CCK, direct experimental evidence is needed, since it is where the novelty of the paper is.

      Thank you for your constructive feedback. In response to the suggestions, for Figure 2E, we have incorporated two additional experiments: one with a CCKB receptor (CCKBR) antagonist and another with ACSF infused into the AC prior to HFS laser EC-to-AC + Pre/Post Pairing (Figures 2N-P). Our findings demonstrate that the CCKBR antagonist effectively inhibited the potentiation of the VC-to-AC inputs following the HFS laser EC-to-AC + Pre/Post Pairing. Conversely, ACSF did not exhibit this inhibitory effect. For further information, please refer to Figures 2N-P. Given the similarity in experimental design, we opted not to replicate these groups in our ex vivo setup.

      4) In Fig. 6, the authors examined the necessity of CCK for the generation of the visuo-auditory association. The experimental approach of injection CCK receptor blocker or CCK-4 is not specific to the EC-AC pathway. There is neither a link between VC-AC plasticity nor this behavioral result. Thus, the explanatory power of this experiment is limited in the context set up by the first 5 figures.

      Thank you for highlighting this area for improvement. To enhance the explanatory power of our behavioral experiments, we conducted the following additional studies:

      1) Assessing the Necessity of CCK+ EC-to-AC Projection in Establishing Visuo-Auditory Association:

      We bilaterally injected AAV9-syn-DIO-hM4Di-eYFP or AAV9-syn-DIO-eYFP into the EC and implanted cannulae in the AC of Cck Ires-Cre mice. During the encoding phase, we inactivated the CCK+ EC-to-AC pathway via CNO infusion into the AC. Our results show that this inactivation prevents the behavioral establishment of an association between the visual stimulus (VS) and auditory stimulus (AS), without affecting the fear conditioning memory to the AS (Figure 6B, beige).

      2) Determining the Role of VC-to-AC Projection in Establishing Visuo-Auditory Association: We bilaterally injected AAV9-syn-hM4Di-eYFP or AAV9-syn-eYFP into the visual cortex (VC) and also implanted cannulae in the AC of Cck Ires-Cre mice. Inactivating the VC-to-AC pathway during the encoding phase with CNO infusion in the AC, we observed that this inactivation hinders the establishment of a behavioral association between VS and AS, but does not interfere with the fear conditioning memory to the AS (Figure 6B, red).

      3) Investigating the Importance of CCK+ EC-to-AC Projection in Recalling Recent Visuo-Auditory Association:

      Again, AAV9-syn-DIO-hM4Di-eYFP or AAV9-syn-DIO-eYFP was injected bilaterally into the EC, and cannulae were implanted in the AC of Cck Ires-Cre mice. By inactivating the CCK+ EC-AC pathway during the retrieval phase with CNO infusion into the AC, we found that such inactivation disrupted the recall of the recent association between VS and AS behaviorally, yet did not affect the fear conditioning memory to the AS (Figure 6D, beige).

      4) Assessing the Necessity of VC-to-AC Projection in Recalling Recent Association Memory: For this experiment, AAV9-syn-hM4Di-eYFP or AAV9-syn-DIO-eYFP was injected bilaterally into the VC, and cannulae were placed in the AC of Cck Ires-Cre mice. Inactivating the VC-AC pathway during the retrieval phase with CNO infusion in the AC led to the discovery that this inactivation disrupted the behavioral recall of the recent association between VS and AS but did not disrupt the fear conditioning memory to the AS (Figure 6D, red).

      These additional experiments significantly contribute to our understanding of the roles played by the CCK+ EC-AC and VC-AC projections in both the establishment and recall of visuo-auditory associative memories.

      5) In page 16, line 322-326, the authors concluded that to induce the plasticity of VC→AC projection, Delay 1 should be longer than 10 ms and Delay 2 should be longer than 0 ms. This conclusion was not fully supported by the data from Figure 5B-D, because there is no data point between -65 ms and 10 ms for Delay 1 (for example 0 ms), and no negative values for Delay 2.

      We rewrote this paragraph and hope it is more accurate now.

      “Taken together, our study indicates that significant potentiation of the VC-to-AC inputs can be observed (Figure 5D, black cube) across five pairing trials with a 10-second inter-trial interval, under certain tested conditions: (i) the frequency of repetitive laser stimulation of the CCK+ entorhinal cortex (EC) to AC projection was maintained at 10 Hz or higher (as we did not test frequencies between 1 to 10 Hz), (ii) Delay 1 was set within the tested range of 10 to 535 ms (noting the absence of data between -65 to 10 ms), and (iii) Delay 2 was within the range of 0 to 200 ms (acknowledging that negative values for Delay 2 were not explored).”

      Reviewer #2 (Public Review):

      The manuscript by Sun et al., investigates the synaptic plasticity underlying visuo-auditory association. Through a series of in vivo and ex vivo electrophysiology recordings, the authors show that high-frequency stimulation (HFLS) of the cholecystokinin (CCK) positive neurons in the entorhino-auditory projection paired with an auditory stimulus can evoke long-term potentiation (LTP) of the visuo-auditory projection. However, LTP of the visuo-auditory projection could not be elicited by HFLS of the visuo-auditory projection itself or by an unpaired stimulus. They further demonstrate that auditory stimulus pairing with CCK is required to elicit LTP of the visuo-auditory projection as well as visuo-auditory association in a fear conditioning behavioral experiment. As they found elevated expression of CCK in entorhinal neurons which project to the auditory cortex, they conclude that HFLS of the entorhino-auditory projection causes CCK release.

      Strengths:

      The authors use an elegant approach with Chrimson and Chronos to stimulate different auditory inputs in the same mouse in vivo and also in slice and demonstrate that potentiation of the visuo-auditory projection is dependent on HFLS of the entorhino-auditory projection paired with auditory stimulus. Furthermore, they test several parameters in a systematic fashion, generating a comprehensive analysis of the plasticity changes that regulate visuo-auditory association.

      Weaknesses:

      In their previous publications (Chen et al., 2019; Li et al., 2014; Zhang et al., 2020), it has been established that HFLS of the entorhino-auditory projection and CKK release are important for visuo-auditory association via electrophysiology and behavioral experiments. The Chrimson and Chronos approach was applied by Zhang et al., 2020, where they already found that the visuo-auditory projection was potentiated through HFLS of entorhino-neocortical fibers. This manuscript extends those findings by testing different parameters of pairing, which may not represent a major conceptual advance. Unlike the electrophysiological recordings, drug infusion is used in behavioral manipulations to show that HFLS of the entorhino-auditory projection is important for visuo-auditory association. While the use of drugs to inhibit CKK receptors is important, it does not directly demonstrate that CCK release from the entorhino-auditory is necessary.

      We deeply appreciate the reviewer's constructive and insightful feedback. Building on our previous work (Zhang et al., 2020), which highlighted the potentiation of the VC-to-AC projection through high-frequency laser stimulation (HFS laser) of entorhino-neocortical fibers, our current study probes further into the intricacies of this process. We have thoroughly explored the specific conditions necessary for the potentiation of the VC-to-AC projection, assessing a wide range of parameters.

      A significant advancement in our current research is the elucidation of why HFS of the VC-to-AC pathway alone fails to induce potentiation, whereas HFS of the EC-to-AC pathway, coupled with Pre/Post Pairing, is effective. This critical distinction is linked to the heightened expression of CCK in EC neurons projecting to the AC, in contrast to those from the VC. In this revised version of our study, we have also demonstrated that HFS laser stimulation of the EC-to-AC CCK+ projection induces the release of endogenous CCK in the AC using a combination of a CCK sensor and fiber photometry.

      Behaviorally, our revised research emphasizes the vital role of the CCK+ EC-AC projection in both establishing and retrieving visuo-auditory memories, thereby highlighting its fundamental importance in memory processing. Moreover, our study confirms that the CCK+ EC-AC projection is not only crucial for memory formation and retrieval but also indicates that the VC-to-AC projection is the anatomical basis for establishing visuo-auditory associations and serves as the principal storage site for visuo-auditory associative memory. These findings represent significant strides in our understanding of synaptic plasticity and memory mechanisms.

      For the behavioral part, to build the link that HFS laser of the EC-to-AC CCK+ projection is important for visuo-auditory association in the behavioral context, we conducted the following additional behavioral studies (for details please see the response to comment 4 of reviewer 1):

      1) Assessing the Necessity of CCK+ EC-to-AC Projection in Establishing Visuo-Auditory Associative memories, by inactivating the pathway with inhibitory DREADD during the encoding phase.

      2) Investigating the Importance of CCK+ EC-to-AC Projection in Recalling Visuo-Auditory Association, by inactivating the pathway with inhibitory DREADD during the retrieving phase.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper combines an array of techniques to study the role of cholecystokinin (CCK) in motor learning. Motor learning in a pellet reaching task is shown to depend on CCK, as both global and locally targeted CCK manipulations eliminate learning. This learning deficit is linked to reduced plasticity in the motor cortex, evidenced by both slice recordings and two-photon calcium imaging. Furthermore, CCK receptor agonists are shown to rescue motor cortex plasticity and learning in knockout mice. While the behavioral results are clear, the specific effects on learning are not directly tested, nor is the specificity pathway between rhinal CCK neurons and the motor cortex. In general, the results present interesting clues about the role of CCK in motor learning, though the specificity of the claims is not fully supported.

      Since all CCK manipulations were performed throughout learning, rather than after learning, it is not clear whether it is learning that is affected or if there is a more general motor deficit. Related to this point, Figure 1D appears to show a general reduction in reach distance in CCK-/- mice. A general motor deficit may be expected to produce decreased success on training day 1, which does not appear to be the case in Figure 1C and Figure 2B, but may be present to some degree in Figure 5B. Or, since the task is so difficult on day 1, a general motor deficit may not be observable. It is therefore inconclusive whether the behavioral effect is learning-specific.

      Thanks for your comments and suggestions.

      We have tested the basic movement ability of CCK-/- and WT mice and we found that there were no significant difference between CCK-/- and WT in terms of stride length, stride time, step cycle ratio and grasp force (Figure S1C, S1D, S1E, S1F). Besides, we also have tested the performance of mice injected with CCKBR antagonist or injected with hM4Di together with clozapine after learned the task (Figure S2D, S8D). The performance of mice before and after antagonist injection or chemogenetic manipulation were comparable. These results suggested that all the CCK manipulations did not cause general defects to the movement ability of mice.

      The paper implicates motor cortex-projecting CCK neurons in the rhinal cortex as being a key component in motor learning. However, the relative importance of this pathway in motor learning is not pinned down. The necessity of CCK in the motor cortex is tested by injecting CCK receptor antagonists into the contralateral motor cortex (Figure 2), though a control brain region is not tested (e.g. the ipsilateral motor cortex), so the specificity of the motor cortex is not demonstrated.

      Thanks for your comments and suggestions.

      In this study, we focus on the role played by CCK from the rhinal cortex to the motor cortex, and how CCK affects motor learning. The single pellet reaching task was selected to study the role of CCK from the rhinal cortex to the motor cortex in motor skill learning and the motor cortex is considered as the main area generates motor memory when training in this task (Komiyama et al., 2010; Peters et al., 2014; Richard et al., 2019). We emphasized that the importance of the motor cortex in motor learning, not meant that other brain areas where also receive CCK-positive neural projections from the rhinal cortex, for example hippocampus (spatial memory), are not important for the performance of this task. In fact, specifically inhibiting the projection from the rhinal cortex to the contrallateral motor cortex is not enough to suppress the motor learning ability of, but inhibiting projecting in both sides (contro- and ipsi-lateral) could suppress the learning ability of mice, suggesting that the whole motor cortex is critical for motor skill learning (Figure 6, S8). In this paper, we studied the relationship between the rhinal cortex and the motor cortex and the role played by CCK in this circuit. The specificity of the motor cortex is task-dependent, not the main purpose in this study.

      The learning-related source of CCK in the motor cortex is also unclear, since even though it is demonstrated that CCK neurons in the rhinal cortex project to the motor cortex in Figure 4D, Figure 4C shows that there is also a high concentration of CCK neurons locally within the motor cortex. Likewise, the importance of the projection from the rhinal cortex to the motor cortex is not specifically tested, as rhinal CCK neurons targeted for inactivation in Figure 5 include all CCK cells rather than motor cortex-projecting cells specifically.

      Thanks for your comments and suggestions.

      The specificity of the CCK-projection from the rhinal cortex to the motor cortex for motor skill learning was studies using chemogenetic methods in the revised version of the manuscript. We first determined that over 98% of neurons in the rhinal cortex that projected to the motor cortex are CCK positive (Figure 6A, S6A, S6B). Next, we injected the retro-Cre virus in the motor cortex and the Cre-dependent hM4Di in the rhinal cortex in C57BL/6 mice to specifically inhibit the CCK neurons from the rhinal cortex to the motor cortex. Compared to two control groups, the learning ability of the experimental group was significant suppressed, suggesting that CCK projections from the rhinal cortex to the motor cortex are critical for motor skill learning (Figure 6). Detailed description was added in the part of "Result" in the manuscript.

      CCK is suggested to play a role in producing reliable activity in the motor cortex through learning through two-photon imaging experiments. This is useful in demonstrating what looks like normal motor cortex activity in the presence of CCK receptor antagonist, indicating that the manipulations in Figure 2 are not merely shutting off the motor cortex. It is also notable that, as the paper points out, the activity appears less variable in the CCK manipulations (Figure 3G). However, this could be due to CCK manipulation mice having less-variable movements throughout training. The Hausdorff distance is used for quantification against this point in Figure 1E, though the use of the single largest distance between trajectories seems unlikely to give a robust measure of trajectory similarity, which is reinforced by the CCK-/- traces looking much less variable than WT traces in Figure 1D. The activity effects may therefore be expected from a general motor deficit if that deficit prevented the mice from normal exploratory movements and restricted the movement (and activity) to a consistently unsuccessful pattern.

      Thanks for your comments and suggestions.

      To totally suppress CCK receptors in the motor cortex, the antagonist is unavoidable to diffuse to the adjacent brain areas as the motor cortex is not regularly circular. But the area inhibited most should be the motor cortex. We applied the chemogenetics method to further determine the specificity of the motor cortex in the motor skill learning. Specific projection from the RC to the MC was inhibited bilaterally, which suppressed the motor learning ability.

      For a wild-type mouse, neurons were activated when it try to get the food pellet. Neuronal pattern corresponding to each trial will be remembered, and the patterns corresponding to successful movements will tend to be repeated. Manipulations of CCK prevented neurons from remembering the pattern they tried and repeated the pattern they tried before no matter it is successful or not. This is corresponding to the neuron-activation pattern showed in figure 3D, 3E and 3G, the population activities (neuronal activities) are comparable, while the trial-to-trial population correlation is a little bit higher for the CCK-manipulation groups on Day 1. In terms of the behavior, manipulations of CCK decreased the possibility to explore the best path to get food pellets and just repeating a reach for the food pellet like it was the first time. Besides, many tests including the movement ability of CCK-/-, performance of antagonist injection group and chemogenetics manipulation group after learning indicated that CCK-manipulation did not affect the basic movement ability.

      Hausdorff distance is the greatest of all the distances from a point in one set to the closest point in the other set. It is not just the largest distance between two trajectories, but comprehensively takes all points in each trajectory into consideration. Hausdorff distance is widely used to assess the variation of two trajectories. The similarity of the shapes of trajectories is not applied for analysis because it is not very effective to assess the performance of a mouse. The fixed location of the initial site and food site makes all trajectories are single lines in the same direction, thus, the shapes of the trajectories are very similar among different trials. Two trajectories with similar shape but far from each other (big Hausdorff distance) should be treated as big variation because, in terms of the final results, they are quite different (success vs. miss). Therefore, Hausdorff distance is more reliable to be applied for assessment of the performance of mice.

      Finally, slice experiments are used to demonstrate the lack of LTP in the motor cortex following CCK knockout, which is rescued by CCK receptor agonists. This is a nice experiment with a clear result, though it is unclear why there are such striking short-term depression effects from high-frequency stimulation observed in Figure 6A that are not observed in Figure 1H. Also, relating to the specificity of the proposed rhinal-motor pathway, these experiments do not demonstrate the source of CCK in the motor cortex, which may for example originate locally.

      Thanks for your comments.

      1. Because CCK4 is a small molecule, which degrades very fast with half-time less than 1 min in the rat serum and 13 min in the human serum, we injected the drug into the electrode recording dishes, while the ACSF was stopped flowing, leading to a relatively low oxygen condition. As it showed in Figure 6A, it cost about 15 min for the brain slices to recover. Compared with CCK4 manipulation, the depression of vehicle group is stronger, which could be due to the effects of CCK4 induced LTP after HFS compensated the depression.

      2. In the motor cortex, many CCK-positive neurons are γ-aminobutyric acid-ergic (GABAergic) neurons, in which the role played by CCK is not very clear (Whissell et al., 2015). However, evidence showed that GABA may inhibit the release of CCK in the neocortex (Yaksh et al., 1987). Many glutamatergic neurons in the neocortex also express CCK (Watakabe et al., 2012). In this study, the stimulation electrode was placed on the layer 1, where receives most CCK projections from the rhinal cortex, to release CCK from the rhinal cortex, but can not rule out the possibility that some CCK may release from the local CCK neurons (Figure 4B). We focused on the importance of CCK for neural plasticity in the motor cortex, but did not aim to figure out the role played by the cortical CCK-positive neurons, including inhibitory and excitatory neurons, in neuronal plasticity and motor skill learning by this experiment.

      Therefore, the specificity of the projections from the rhinal cortex to the motor cortex was further studied by chemogenetic manipulation. Inhibiting the activity of the projections suppressed the learning ability compared with two types of control manipulations, indicating the CCK projections from RC to the MC is critical for motor skill learning.

      Reviewer #2 (Public Review):

      This study aims to test whether and if so, how cholecystokinin (CCK) from the mice rhinal cortex influences neural activity in the motor cortex and motor learning behavior. While CCK has been previously shown to be involved in neural plasticity in other brain regions/behavioral contexts, this work is the first to demonstrate its relationship with motor cortical plasticity in the context of motor learning. The anatomical projection from the rhinal cortex to the motor cortex is also a novel and important finding and opens up new opportunities for studying the interactions between the limbic and motor systems. I think the results are convincing to support the claim that CCK and in particular CCK-expressing neurons in the rhinal cortex are critical for learning certain dexterous movements such as single pellet reaching. However, more work needs to be done, or at least the following concerns should be addressed, to support the hypothesis that it is specifically the projection from the rhinal cortex to the motor cortex that controls motor learning ability in mice.

      1)Because CCK is expressed in multiple brain regions, as the authors recognized, results from the CCK knock-out mice could be due to a global loss of neural plasticity. In comparison, the antagonist experiment is in my opinion the most convincing result to support the specific effect of CCK in the motor cortex. However, it is unclear to me whether the CCK knock-out mice exhibited an impaired ability to learn in general, i.e., not confined to motor skills. For instance, it would be very valuable to show whether these mice also had severe memory deficits; this would help the field to understand different or similar behavioral effects of CCK in the case of global vs. local loss of function. If the CCK knock-out mice only exhibited motor learning deficits, that would be surprising but also very interesting given previous studies on its effect in other brain areas.

      Thanks for your comments. According to the studies in our lab, we found that CCK is critical for the neural plasticity in the auditory cortex, hippocampus and the amygdala and CCK-/- mice performed much worse than wildtype mice in associative, spatial and fear memory (Li et al.,2014; Chen et al., 2019; Su et al. 2019; Feng et al. 2021).

      2) Related to my last point, I believe that normal neural plasticity should be essential to motor skill learning throughout development not just during the current task. Thus, it would be important to show whether these CCK knock-out mice present any motor deficits that could have resulted from a lack of CCK-mediated neural plasticity during development. If not, the authors should explain how this normal motor learning during development is consistent with their major hypothesis in this study (e.g., is CCK not critical for motor learning during early development).

      Thanks for your comments and suggestions.

      Development is mainly gene-guided which prepares the physical structure for learning, while learning is dependent on the neural plasticity and a period of experience (such as motor training in this research). Besides, development is deemed as "experience-expectant", using common environmental information, while learning is "experience-dependent", sensitive to the specific individual experiences (Greenough et al., 1987; Galván, 2010). Moreover, development costs longer time to form a specific ability of a species in general. The role of CCK plays in the development is not clear. Duchemin et al. (1987) studied the CCK gene expression level in the brain of rats pre- and postnatally. They found that the CCK mRNA was detectable on embryonic day 14 (E14) and gradually increased to the maximum level on postnatal day 14 (P14), indicating that CCK might participate in the development of rats. Paolo et al. (2007) mapped the expression of CCK in the mouse brain. Plentiful CCK expression was observed at E12.5 in the thalamus and spinal cord and by E17.5 CCK expression extended to the cortex, hippocampus and hypothalamus, suggesting that CCK might also regulate the development of mice. Paolo et al. (2004) found that CCK suppressed the migration of GnRH-1 through CCK-A receptor in the brain. Besides, postnatal early learning may participate in development. CCK-B receptor antagonist administration (postnatal 6 hours) suppressed the infant sheep get motor preference, indicating that CCK might be important for the development of mother preference of sheep. However, what the role CCK played in the development of motor system is not known.

      In this study, the performance of both CCK-/- and WT mice is at the same level without significant difference on Day one, in terms of the percentage of "miss", "no-grasp", "drop" and "success". Besides, the movement abilities, including stride length, stride time, step cycle ratio and grasp force, were comparable for both CCK-/- and WT mice (Figure S1C, S1D, S1E, S1F), suggesting that knockout of cck gene did not affect the basic movement ability. This could be because the development of basic movement ability is not learning-guided, but is physical structure-determined. However, all these tests were on physical level, but how CCK affected the motor system on the molecular and cellular level is not known. Therefore, we further applied CCK-BR antagonist and chemogenetic method to study the role of CCK in the motor learning.

      3)Lines 198-200 and Fig. 2C: The authors found that the vehicle group showed significantly increased "no grasp" behavior, and reasoned that the implantation of a cannula may have caused injuries to the motor cortex. In order to support their reasoning and make the control results more convincing, I think it would be helpful to show histology from both the antagonist and control groups and demonstrate motor cortical injury in some mice of the vehicle group but not the antagonist group. Otherwise, I'm a bit concerned that the methods used here could be a significant confounding factor contributing to motor deficits.

      Thanks for your comments and suggestions.

      The injury of the motor cortex can not be avoided, because the cannula was inserted below the surface of the cortex (Figure S2C). The significantly increased "no-grasp" rate is because the improvement of miss rate of the Vehicle group, which turned to "no-grasp" but failed to further improve to drop or success, while for the Antagonist group, there is no significant improving from "miss" to "no-grasp", leaving no change in the "no grasp".

      4) The authors showed that chemogenetic inhibition of CCK neurons in the rhinal cortex impaired motor skill learning in the pellet-reaching task. However, we know that the rhinal cortex projects to multiple brain regions besides the motor cortex (e.g., other cortical areas and the hippocampus). Thus, the conclusion/claim that the observed behavioral deficits resulted from inhibited rhinal-motor cortical projections is not strongly supported without more targeted loss-of-function or rescue experiments.

      It would also be very informative to the field to compare the specific behavioral deficits, if any, of inhibiting specific downstream targets of the rhinal CCK neurons. As a concrete example, the hippocampus may be involved in learning more sophisticated motor skills (as the authors pointed out in the Discussion) besides the motor cortex. It would be a critical result if the authors could either show or exclude the possibility that the motor learning deficits observed in CCK-/- mice were at least partially due to the inhibition of hippocampal plasticity. This echoes my earlier point (point 1) that it is unclear whether the effect of lacking CCK in knock-out mice is specific in the motor cortex or engages multiple brain regions.

      Lastly, because Fig. 4 only showed histology in the rhinal and motor cortices, I am not sure whether the motor cortex solely receives CCK input from the rhinal cortex. A more comprehensive viral tracing result could be important to both supporting the circuit-specificity of the observed behavior in this study and providing a clearer picture of where the motor cortex receives CCK inputs.

      Thanks for your comments.

      The specificity of the CCK-projection from the rhinal cortex to the motor cortex for motor skill learning was studies using chemogenetic methods in the revised version of the paper. We first determined that over 98% of neurons in the rhinal cortex that projected to the motor cortex are CCK positive (Figure 6A, S6A, S6B). Next, we injected the retro-Cre virus in the motor cortex and the Cre-dependent hM4Di in the rhinal cortex in C57BL/6 mice to specifically inhibit the CCK neurons from the rhinal cortex to the motor cortex. Compared to two control groups, the learning ability of the experimental group was significantly suppressed, suggesting that CCK projections from the rhinal cortex to the motor cortex are critical for motor skill learning (Figure 6). Detailed description was added in the part of "Result" in the manuscript.

      In this study, we focus on the role played by CCK from the rhinal cortex, and how CCK affects motor learning. The single pellet reaching task was selected to study the role of CCK from the rhinal cortex in motor skill learning and the motor cortex is considered as the main area generates motor memory when training in this task (Komiyama et al., 2010; Peters et al., 2014; Richard et al., 2019). We emphasized that the importance of the contrallateral motor cortex in motor learning, not meant that other brain areas where also receive CCK-positive neural projections from the rhina cortex, for example hippocampus (spatial memory), are not important for the performance of this task. In fact, specifically inhibiting the projection from the rhinal cortex to the contrallateral motor cortex is not enough to suppress the motor learning ability, but inhibiting projecting in both sides (contro- and ipsi-lateral) could suppress the learning ability of mice, suggesting that the whole motor cortex is critical for motor skill learning (Figure 6, S8). In our lab, we found that CCK projection from the entorhinal cortex to the hippocampus is critical for spatial memory formation (Su et al., 2019). Impaired hippocampus, to some extent, affected the performance in single pellet reaching task (Shwuhuey et al., 2007). Therefore, manipulation of CCK projections from the rhinal cortex to the hippocampus may also affect the performance in the single pellet reaching task. In this paper, we aim to study the relationship between the rhinal cortex and the motor cortex and the role played by CCK in this circuit. Other brain areas involved in the single pellet reaching task are not the core concern in this study.

      The motor cortex also receive CCK projections from other cortices, such as the contrallateral motor cortex, the deep layer of visual cortex and auditory cortex, and thalamus (Figure S4).

      5) I am glad to see the CCK4 rescue experiment to demonstrate the sufficiency of CCK in promoting motor learning. However, the rescue experiment lacked specificity: IP injection did not allow specific "gain of function" in the motor cortex but instead, the improved learning ability in CCK knock-out mice could be a result of a global effect of CCK4 across multiple brain regions. CCK4 injection specifically targeted at the motor cortex would be necessary to support the sufficiency of CCK-regulated neuroplasticity in the motor cortex to promote motor learning.

      Thanks for your comments.

      First, the specificity of the circuit were studied by injecting a Cre virus in the MC and a Cre-dependent hM4Di virus in the RC. After injection with clozapine, the motor learning ability were significantly suppressed compared with the saline control and the control virus combined with clozapine.

      Besides, we emphasized that the importance of the motor cortex in motor learning, not meant that other brain areas where also receive CCK-positive neuronal projections from the rhinal cortex, for example hippocampus (spatial memory), are not important for the performance of this task. Specific infusion the drug into the motor cortex is hard to rescue the motor learning ability of CCK-/- mice because the motor cortex is very large, varying from AP: -1.3 to 2.46 mm and ML: ±0.5 to ±2.75 mm and other areas receiving CCK projections from the rhinal cortex also could be important for motor learning. Actually, we tried to inject CCK into the motor cortex through a drug cannula, but the result showed that it is hard to compensate the knock out of cck gene in the whole brain, and rescue the motor learning ability (Figure S11D, S11E). Moreover, cannula implantation causes inescapable injury to the motor cortex, because the cannula must be inserted into the brain, so that the drug could be infused into the brain. This injury may affect the performance in the task, as the motor cortex is very critical for motor learning. Therefore, it is not the best method to be applied for motor skill rescuing.

      Furthermore, CCK4 molecules can be transported to the whole brain by i.p. injection, as CCK4 is capable to pass through brain blood barrier, which compensates the knockout of cck gene in the whole brain, leading to the rescuing of motor learning ability. Furthermore, i.p. injection is widely accepted for drug discovery because it is very convenient, simply manipulated and does not causes any direct injury on the brain. Thus, we applied i.p. injection not only for whole brain CCK compensation, but also for the further study of the application in drug discovery.

      Reviewer #3 (Public Review):

      The authors elucidated the roles of cholecystokinin (CCK)-expressing excitatory neurons, which project from the rhinal cortex to the motor cortex, in motor skill learning. The authors found CCK knock-out mice exhibited learning defects in the pellet reaching task while the baseline success rate of the knock-out mice was similar to that of the wild-type mice. Application of a CCK B receptor (CCKBR) antagonist into the motor cortex lowered the success rate in the motor task. The authors found the population activity which was observed in the in vivo calcium imaging during motor learning was elevated after motor learning, but this increase disappeared in CCK knock-out mice and animals with CCKBR antagonist administration. Anterograde and retrograde viral tracing revealed that CCK-expressing excitatory neurons in the rhinal cortex projected to the motor cortex. Chemogenetic inhibition of the CCK-expressing neurons in the rhinal cortex lowered the ability for motor learning. The application of a CCKBR agonist increased the motor learning ability of CCK knock-out animals as well as long-term potentiation (LTP) observed in the slice of the motor cortex.

      However, the manuscript contains several shortcomings:

      First, the "Discussion" has several statements that are only supported weakly by the results, for example, ll. 429-431, ll. 432-433, and ll. 447-448. In addition, most of the sentences in this section are not divided into subsections. The paragraphs should be composed in multiple subsections with appropriate subheadings, even though the initial section summarizing the results can lack a subheading.

      Thanks for your suggestions. The statements were revised and the discussion was divided into subsections.

      Second, it would be important that the authors showed which area(s) of the brain is affected by the CCKBR antagonist in the experiments described in ll. 166-206 and Fig. 2. The authors injected the drug into the motor cortex, but the chemical can spread to neighboring cortical areas (e.g. somatosensory cortex) or wider brain regions. If so, the blockade of the CCKBR in the brain areas other than the motor cortex could cause the defects of the motor task learning observed in these experiments. I think it is desirable that such a possibility should be excluded. Conversely, it is possible that the antagonist had an effect on a limited subarea of the motor cortex (e.g. only the primary motor cortex (M1)). In this case, the information about the field altered by the CCKBR blocker would be useful to interpret the results of the learning defects.

      Thanks for your comments and suggestions.

      The drug cannula was implanted in the motor cortex (coordinates: AP, 1.4 mm, ML, -/+1.6 mm, DV, 0.25 - 0.3 mm) contralateral to the dominant hand of the mice (Figure S2C). To totally inhibit CCKBR in the motor cortex, we injected over-dosage of antagonist into the motor cortex. Thus, we cannot totally exclude the possibility that some antagonist spread to the neighboring cortices. However, the fact is that the motor cortex is very large, varying from AP: -1.3 to 2.46 mm and ML: ±0.5 to ±2.75 mm. It is not easily to spread out of the motor cortex with high concentration.

      Third, the authors need to show bilateral data about their anterograde and retrograde tracking of CCK-expressing neurons in the rhinal cortex. In ll. 290-292, they described as follows: "Both anterograde and retrograde tracking results indicated that CCK-expressing neurons in the rhinal cortex projecting to the motor cortex were asymmetric, showing a preference for the ipsilateral hemisphere." However, they provided only unilateral data for the anterograde (Fig. 4B) and the retrograde (Fig. 4D) experiments.

      Thanks for your comments. Both anterograde and retrograde tracking data from bilateral hemisphere were added to the supplementary file (Figure S4).

      Fourth, unilateral (contralateral to the dominant forelimb) experiments are needed in the chemogenetic inhibition of the CCK neurons. In ll. 301-338 and Fig. 5, the authors inhibited the CCK -expressing neurons in both hemispheres by injecting the virus into both sides. However, the CCKBR antagonist injection into the motor cortex contralateral to the dominant forelimb caused defects in motor learning ability, as described in ll. 166-206. The authors also observed that the population neuronal activity in the motor cortex contralateral to the dominant forelimb changed in accordance with the improvement of the motor skill in ll. 208-269. Therefore, it may be the case that inhibition of CCK neurons only in the side contralateral to the dominant forelimb - not bilaterally, as the authors did - could cause the lowered ability of motor learning. Such unilateral inhibition can be carried out by unilateral injection of the virus. In relation to the point above, in the chemogenetic inhibition experiments, it would be important to show which neurons in which cortical area is inhibited. This could be done by examining the distributions of the mCherry-labeled somata in the rhinal cortex using histochemistry.

      Thanks for your comments and suggestions.

      The specific of the CCK-projection from the rhinal cortex to the motor cortex for motor skill learning was studied using chemogenetic methods in the revised version of the paper. We first determined that over 98% of neurons in the rhinal cortex that projected to the motor cortex are CCK positive by retrograde virus injection and immunostaining (Figure 6A, S6A, S6B). Next, we injected the retro-Cre virus in the motor cortex and the Cre-dependent hM4Di in the rhinal cortex in C57BL/6 mice to specifically inhibit the CCK neurons from the rhinal cortex to the motor cortex. Compared to two control groups, the learning ability of the experimental group was significant suppressed, suggesting that CCK projections from the rhinal cortex to the motor cortex are critical for motor skill learning (Figure 6). Furthermore, we also injected the retro-Cre virus into the single site of the motor cortex controlateral to the dominant forelimb together with Cre-dependent hM4Di virus in the rhinal cortex. The result showed that after injection of clozapine, the motor learning ability was not significantly suppressed, suggesting that the bilateral motor cortex is important for motor skill learning. This is consistent with the previous findings that the increased GluA1 expression were observed bilaterally in the motor cortex after training in the single pellet reaching task. Detailed description was added in the part of "Result" in the manuscript.

      Fifth, it would be valuable to further examine differences in task performance across sessions and groups. The paragraph in ll. 138-153 needs a comparison of the "miss" rates of CCK-/- animals between Day 1 vs. Day 6 (related to ll. 429- 431). This paragraph also needs comparisons of the "no-grasp" and "drop" rates of CCK-/- animals between Day 1 vs. Day 6 (related to ll. 432- 433). The paragraph in ll. 175-190 needs comparisons of success rates between Day 1 and Day 5/6 within the antagonist group (related to ll. 447-448).

      Thanks for your comments. The comparisons were made in the revised manuscript.

    2. Reviewer #3 (Public Review):

      The authors elucidated the roles of cholecystokinin (CCK)-expressing excitatory neurons, which project from the rhinal cortex to the motor cortex, in motor skill learning. The authors found CCK knock-out mice exhibited learning defects in the pellet reaching task while the baseline success rate of the knock-out mice was similar to that of the wild-type mice. Application of a CCK B receptor (CCKBR) antagonist into the motor cortex lowered the success rate in the motor task. The authors found the population activity which was observed in the in vivo calcium imaging during motor learning was elevated after motor learning, but this increase disappeared in CCK knock-out mice and animals with CCKBR antagonist administration. Anterograde and retrograde viral tracing revealed that CCK-expressing excitatory neurons in the rhinal cortex projected to the motor cortex. Chemogenetic inhibition of the CCK-expressing neurons in the rhinal cortex lowered the ability for motor learning. The application of a CCKBR agonist increased the motor learning ability of CCK knock-out animals as well as long-term potentiation (LTP) observed in the slice of the motor cortex.

      However, the manuscript contains several shortcomings:

      First, the "Discussion" has several statements that are only supported weakly by the results, for example, ll. 429-431, ll. 432-433, and ll. 447-448. In addition, most of the sentences in this section are not divided into subsections. The paragraphs should be composed in multiple subsections with appropriate subheadings, even though the initial section summarizing the results can lack a subheading.

      Second, it would be important that the authors showed which area(s) of the brain is affected by the CCKBR antagonist in the experiments described in ll. 166-206 and Fig. 2. The authors injected the drug into the motor cortex, but the chemical can spread to neighboring cortical areas (e.g. somatosensory cortex) or wider brain regions. If so, the blockade of the CCKBR in the brain areas other than the motor cortex could cause the defects of the motor task learning observed in these experiments. I think it is desirable that such a possibility should be excluded. Conversely, it is possible that the antagonist had an effect on a limited subarea of the motor cortex (e.g. only the primary motor cortex (M1)). In this case, the information about the field altered by the CCKBR blocker would be useful to interpret the results of the learning defects.

      Third, the authors need to show bilateral data about their anterograde and retrograde tracking of CCK-expressing neurons in the rhinal cortex. In ll. 290-292, they described as follows: "Both anterograde and retrograde tracking results indicated that CCK-expressing neurons in the rhinal cortex projecting to the motor cortex were asymmetric, showing a preference for the ipsilateral hemisphere." However, they provided only unilateral data for the anterograde (Fig. 4B) and the retrograde (Fig. 4D) experiments.

      Fourth, unilateral (contralateral to the dominant forelimb) experiments are needed in the chemogenetic inhibition of the CCK neurons. In ll. 301-338 and Fig. 5, the authors inhibited the CCK -expressing neurons in both hemispheres by injecting the virus into both sides. However, the CCKBR antagonist injection into the motor cortex contralateral to the dominant forelimb caused defects in motor learning ability, as described in ll. 166-206. The authors also observed that the population neuronal activity in the motor cortex contralateral to the dominant forelimb changed in accordance with the improvement of the motor skill in ll. 208-269. Therefore, it may be the case that inhibition of CCK neurons only in the side contralateral to the dominant forelimb - not bilaterally, as the authors did - could cause the lowered ability of motor learning. Such unilateral inhibition can be carried out by unilateral injection of the virus.

      In relation to the point above, in the chemogenetic inhibition experiments, it would be important to show which neurons in which cortical area is inhibited. This could be done by examining the distributions of the mCherry-labeled somata in the rhinal cortex using histochemistry.

      Fifth, it would be valuable to further examine differences in task performance across sessions and groups. The paragraph in ll. 138-153 needs a comparison of the "miss" rates of CCK-/- animals between Day 1 vs. Day 6 (related to ll. 429- 431). This paragraph also needs comparisons of the "no-grasp" and "drop" rates of CCK-/- animals between Day 1 vs. Day 6 (related to ll. 432- 433). The paragraph in ll. 175-190 needs comparisons of success rates between Day 1 and Day 5/6 within the antagonist group (related to ll. 447-448).

    1. Reviewer #2 (Public Review):

      The authors set out to study the potent HIV capsid inhibitor lenacapavir (LEN) and how it alters capsid stability. They use a previously developed single-molecule fluorescence imaging assay to take two measurements of individual viral particles over time: 1) they track the release of GFP from GFP-loaded particles to determine whether the capsid is intact or open, and 2) they track the disassembly of the capsid lattice by measuring the signal intensity of a capsid binding fluorophore (AF568-CypA), which diminishes as the capsid lattice subunits disassociate.

      As in their previous work, the authors report that most of their capsids are "leaky" and rapidly lose GFP after the viral membrane is permeabilized, followed by disassembly of the capsid lattice. A subset of capsids maintain GFP signal for various periods of time until they spontaneously "open," and a smaller subset remains closed for the entire length of the imaging experiment (typically 30 min). Interestingly, the authors find that LEN has two effects in this assay: it not only promotes a more rapid release of GFP (interpreted to mean loss of capsid integrity), but it also prevents the capsid lattice from disassembling after opening. As expected, the cellular cofactor IP6 (which stabilizes capsids in cells and in vitro) was found to protect against capsid rupture and counteracted the effects of LEN (although high concentrations of LEN could override any protective effects of IP6).

      Their single-molecule experiments are nicely buttressed by in vitro assembly reactions of purified CA protein, with IP6 promoting cone formation and LEN promoting aberrant assembly into tubes. The authors go further to test the kinetics of LEN's effects on HIV infection and reverse transcription, and they perform experiments in comparison to other factors that target the FG binding pocket (BI-2, PF-74, and a peptide from the host factor CPSF6). They find that LEN works differently than these other capsid binders, and stabilizes the lattice structure much more effectively, which the authors suggest is due to how well LEN bridges between CA-CA monomers and rigidifies CA hexamers.

      It's particularly interesting that the results of their kinetic studies indicate that LEN's effects on capsid strain (which may ultimately promote rupture) may not happen immediately, but instead, take time to build as the drug occupies more and more binding sites. The authors estimate that roughly 30% of binding sites need to be occupied by LEN to reach half-maximal inhibition of infection, and based on their binding curves, it may take ~20h to reach this level of occupancy in the presence of sub nM concentrations of LEN. Although other mechanisms in addition to catastrophic rupture of capsids are likely at play during inhibition of infection (such as inhibition of host factor binding), these kinetics support previous reports that the most potent functions of capsid inhibition occur at or between the steps of nuclear entry and integration.

      It is important to note that although in vitro uncoating assays can help us understand the physical nature of HIV capsid and capsid inhibitor interactions, the assays in this paper might not accurately model the capsid dynamics that are experienced in a cell during infection. The authors report that more than half of their capsids are "leaky" at the start of their assay, but this could be an artifact of the experimental system. Several groups have now demonstrated that capsids remain intact or largely intact for several hours after infection. Thus, while their method is valuable to the research community and can provide insight into capsid stability (and how it can be influenced by capsid binding factors), the authors should be cautious about using pore-forming proteins to permeabilize the virion and interpreting the release of GFP in their single-molecule fluorescence system as an accurate reflection of HIV dynamics in vivo.

      In this regard, it would be helpful to establish whether the pore-forming proteins used in vitro to permeabilize the virus membrane have an impact on capsid integrity. It's possible that the concentration of pore-forming proteins used in this paper (200nM) actually promotes "leaky" capsids and rapid opening of capsids in vitro, whereas capsids in their native state in the cytoplasm could remain mostly intact until disrupted by host factors and/or small molecules. Determining whether lower concentrations of DLY/SLO (or PFO as used in Marquez et al., 2018) change the ratio of leaky to closed capsids, or delay the time to capsid opening (either in the presence of IP6 or in the presence of LEN) would be informative. It may be possible to optimize the concentration of pore-forming proteins (and other buffer constituents) to achieve permeabilization of the membrane with minimal disruption to capsid integrity, which could approximate conditions within the cell.

      Experiments with capsid mutations that stabilize or destabilize the lattice structure (and exhibit different sensitivities to IP6) could help support the authors' conclusions, as would testing mutations that confer resistance to LEN (e.g. Q67H+N74D, M66I, etc...). It would be of great interest to find if CA mutations affect either GFP release or the CypA paint signal, and whether resistance mutations mitigate the effects of LEN in single-molecule experiments.

      The discussion section of this paper is expertly written and places the work into the larger context of HIV research. The authors have thoughtfully analyzed their experiments with capsid inhibitors in relation to kinetics, occupancy, the potential for rigidification, and cofactor binding. They offer reasonable explanations for how LEN exhibits opposing effects on the HIV capsid at high occupancy through inducing capsid rupture while simultaneously preventing the dissociation of CA subunits. Many lines of evidence are now converging on the concept that the capsid evolved to be stable enough to protect its contents, yet flexible enough to navigate the steps of reverse transcription, nuclear entry, and uncoating. With this paper, the authors make a strong case that LEN functions as an antiviral, at least in part, through engaging "lethal hyperstabilization" of the capsid, promoting rigid lattice formations that are incompatible with closed cone structures.

    1. Author Response

      Reviewer #1 (Public Review):

      This thorough study expands our understanding of BMP signaling, a conserved developmental pathway, involved in processes diverse such as body patterning and neurogenesis. The authors applied multiple, state-of-art strategies to the anthozoan Nematostella vectensis in order to first identify the direct BMP signaling targets - bound by the activated pSMAD1/5 protein - and then dissect the role of a novel pSMAD1/5 gradient modulator, zwim4-6. The list of target genes features multiple developmental regulators, many of which are bilaterally expressed, and which are notably shared between Drosophila and Xenopus. The analysis identified in particular zswim4-6 a novel nuclear modulator of the BMP pathway conserved also in vertebrates. A combination of both loss-of-function (injection of antisense morpholino oligonucleotide, CRISPR/Cas9 knockout, expression of dominant negative) and gain-of-function assays, and of transcriptome sequencing identified that zwim acts as a transcriptional repression of BMP signaling. Functional manipulation of zswim5 in zebrafish shows a conserved role in modulating BMP signaling in a vertebrate.

      The particular strength of the study lies in the careful and thorough analysis performed. This is solid developmental work, where one clear biological question is progressively dissected, with the most appropriate tools. The functional results are further validated by alternative approaches. Data is clearly presented and methods are detailed. I have a couple of comments.

      1) I was intrigued - as the authors - by the fact that the ChiP-Seq did not identify any known BMP ligand bound by pSMAD1/5. Are these genes found in the published ChiP-Seq data of the other species used for the comparative analysis? One hypothesis could be that there is a change in the regulatory interactions and that the initial set-up of the gradient requires indeed a feedback loop, which is then turned off at later gastrula. In this case, immunoprecipitation at early gastrula, prior to the set-up of the pSMAD1/5 gradient, could reveal a different scenario. Alternately, the regulation could be indirect, for example, through RGM, an additional regulator of BMP signaling expressed on the side of lower BMP activity, which is among the targets of the ChiP-Seq. This aspect could be discussed. Additionally, even if this is perhaps outside the scope of this study, I think it would be informative to further assess the effect of ZSWIM manipulation on RGM (and vice versa).

      Indeed, BMP genes are direct BMP signaling targets in Drosophila (dpp) (Deignan et al., 2016, https://doi.org/10.1371/journal.pgen.1006164) and frog (bmp2, bmp4, bmp5, bmp7) (Stevens et al., 2021, https://doi.org/10.1242/dev.145789). Of all these ligands, only the dorsally expressed Xenopus bmp2 is repressed by BMP signaling, while another dorsally expressed Xenopus BMP gene admp is not among the direct targets. All other BMP genes listed here are expressed in the pMad/pSMAD1/5/8-positive domain and are activated by BMP signaling.

      In Nematostella, we do not find BMP genes among the ChIP-Seq targets, but this is not that surprising considering the dynamics of the bmp2/4, bmp5-8 and chordin expression, as well as the location of the pSMAD1/5-positive cells. In late gastrulae/early planulae, Chordin appears to be shuttling BMP2/4 and BMP5-8 away from their production source and over to the gdf5-like side of the directive axis (Genikhovich et al., 2015; Leclere and Rentsch, 2014). By 4 dpf, chordin expression stops, and BMP2/4 and BMP5-8 start to be both expressed AND signal in the mesenteries. If bmp2/4 and bmp5-8 expression were directly suppressed by pSMAD1/5 (as is the case chordin or rgm expression), this mesenterial expression would not be possible. Therefore, in our opinion, it is most likely that at late gastrula and early planula the regulation of bmp2/4 and bmp5-8 expression by BMP signaling is indirect. We do not have an explanation for why gdf5-like (another BMP gene expressed on the “high pSMAD1/5” side) is not retrieved as a direct BMP target in our ChIP data. Since we do not understand well enough how BMP gene expression is regulated, we do not discuss this at length in the manuscript.

      As the Reviewer suggested, we analyzed the effect of ZSWIM4-6 KD on the expression of rgm. Expectedly, since it is expressed on the “low BMP side”, its expression was strongly expanded (Figure 6 - Figure Supplement 4)

      2) I do not fully understand the rationale behind the choice of performing the comparative assays in zebrafish: as the conservation was initially identified in Xenopus, I would have expected the experiment to be performed in frog. Furthermore, reading the phylogeny (Figure 4A), it is not obvious to me why ZSWIM5 was chosen for the assay (over the other paralog ZSWIM6). Could the Authors comment on this experiment further?

      The comparison was done in zebrafish because we were planning to generate zswim5 mutants, whose analysis is currently in progress. ZSWIM6 is not expressed at the developmental stages we were interested in, while ZSWIM5 was, based on available zebrafish expression data (White et al., 2017):

      Reviewer #2 (Public Review):

      The authors provide a nice resource of putative direct BMP target genes in Nematostella vectensis by performing ChIP-seq with an anti-pSmad1/5 antibody, while also performing bulk RNA-seq with BMP2/4 or GDF5 knockdown embryos. Genes that exhibit pSmad1/5 binding and have changes in transcription levels after BMP signaling loss were further annotated to identify those with conserved BMP response elements (BREs). Further characterization of one of the direct BMP target genes (zswim4-6) was performed by examining how expression changed following BMP receptor or ligand loss of function, as well as how loss or gain of function of zswim4-6 affected development and BMP signaling. The authors concluded that zswim4-6 modulates BMP signaling activity and likely acts as a pSMAD1/5 dependent co-repressor. However, the mechanism by which zswim4-6 affects the BMP gradient or interacts with pSMAD1/5 to repress target genes is not clear. The authors test the activity of a zswim4-6 homologue in zebrafish (zswim5) by over-expressing mRNA and find that pSMAD1/5/9 labeling is reduced and that embryos have a phenotype suggesting loss of BMP signaling, and conclude that zswim4-6 is a conserved regulator of BMP signaling. This conclusion needs further support to confirm BMP loss of function phenotypes in zswim5 over-expression embryos.

      Major comments

      1) The BMP direct target comparison was performed between Nematostella, Drosophila, and Xenopus, but not with existing data from zebrafish (Greenfeld 2021, Plos Biol). Given the functional analysis with zebrafish later in the paper it would be nice to see if there are conserved direct target genes in zebrafish, and in particular, is zswim5 (or other zswim genes) are direct targets. Since conservation of zswim4-6 as a direct BMP target between Nematostella and Xenopus seemed to be part of the rationale for further functional analysis, it would also be nice to know if this is a conserved target in zebrafish.

      Thank you for the suggestion. In the paper by Greenfeld et al., 2021, zebrafish zswim5 was downregulated approximately 2.4x in the bmp7 mutant at 6 hpf, while zswim6 was barely expressed and not affected at this stage. We added this information to the text of the manuscript. Expression of several other zebrafish zswim genes was also affected in the bmp7 mutant, but these genes do not appear relevant for our study since their corresponding orthologs are not identified as pSMAD1/5 ChIP-Seq targets in Nematostella. Notably, zebrafish zzswim5 is not clearly differentially expressed in BMP or Chd overexpression conditions (See Supplementary file 1 in Rogers et al. 2020). Importantly, in the paper, we wanted to compare ChiP-Seq data with ChIP-Seq data, however, unfortunately, no ChIP-Seq data for pSMAD1/5/8 is currently available for zebrafish, thus precluding comparisons.

      Related to this, in the discussion it is mentioned that zswim4/6 is also a direct BMP target in mouse hair follicle cells, but it wasn't obvious from looking at the supplemental data in that paper where this was drawn from.

      Please see Supplementary Table 1, second Excel sheet labeled “Mx ChIP_Seq” in Genander et al., 2014, https://doi.org/10.1016/j.stem.2014.09.009. Zswim4 has a single pSMAD1 peak associated with it, Zswim6 has two.

      2) The loss of zswim4-6 function via MO injection results in changes to pSmad1/5 staining, including a reduction in intensity in the endoderm and gain of intensity in the ectoderm, while over-expression results in a loss of intensity in the ectoderm and no apparent change in the endoderm. While this is interesting, it is not clear how zswim4-6 is functioning to modify BMP signaling, and how this might explain differential effects in ectoderm vs. endoderm. Is the assumption that the mechanism involves repression of chordin? And if so one could test the double knockdown of zswim4-6 and chordin and look for the rescue of pSad1/5 levels or morphological phenotype.

      We do not think that the mechanism of the ZSWIM4-6 action is via repression of Chordin. As loss of chordin leads to the loss of pSMAD1/5 in Nematostella (Genikhovich et al., 2015), the proposed experiment is, unfortunately, not feasible to test this hypothesis. Currently, we see two distinct effects of the modulation of zswim4-6 expression. First, it affects the pSMAD1/5 gradient, possibly by destabilizing nuclear SMAD1/5, as has been proposed by Wang et al., 2022 for the vertebrate Zswim4. This is in line with our results shown on Fig. 6C-F’ and Fig. 6-Figure supplement 3. In our opinion, the reaction of the genes expressed on the “high BMP” side of the directive axis to the overexpression or KD of ZSWIM4-6 (Fig. 6I-K’, 6N-P’) can be explained by these changes in the pSMAD1/5 signaling intensity. Secondly, zswim4-6 appears to promote pSMAD1/5-mediated gene repression. This is in line with the reaction of the genes expressed on the “low BMP” side of the directive axis (Fig. 6G-H’, 6L-M’, Fig. 6-Figure Supplement 4). These genes are repressed by BMP signaling, but they expand their expression upon zswim4-6 KD in spite of the increased pSMAD1/5. Our ChiP experiment (Fig. 6Q) supports this view.

      3) Several experiments are done to determine how zswim4-6 expression responds to the loss of function of different BMP ligands and receptors, with the conclusion being that swim4-6 is a BMP2/4 target but not a GDF5 target, with a lot of the discussion dedicated to this as well. However, the authors show a binary response to the loss of BMP2/4 function, where zswim4-6 is expressed normally until pSmad1/5 levels drop low enough, at which point expression is lost. Since the authors also show that GDF5 morphants do not have as strong a reduction in pSmad1/5 levels compared to BMP2/4 morphants, perhaps GDF5 plays a positive but redundant role in swim4-6 expression. To test this possibility the authors could inject suboptimal doses of BMP2/4 MO with GDF5 MO and look for synergy in the loss of zswim4-6 expression.

      Thanks for this great suggestion! We performed this experiment (Fig. 5H’’-L) and indeed, a suboptimal dose of BMP2/4MO + GDF5lMO results in a complete radialization of the embryo and abolished zswim4–6, similar to the effect of a high dose of BMP2/4. This result suggests that rather than being a ligand-specific signaling function, GDF5-like signaling alone still provides sufficiently high pSmad1/5 levels to activate zswim4-6 expression to apparent wildtype levels, demonstrating the sensitivity of this gene to even very low amounts of BMP signaling.

      4) The zswim4-6 morphant embryos show increased expression of zswim4-6 mRNA, which is said to indicate that zswim4-6 negatively regulates its own expression. However in zebrafish translation blocking MOs can sometimes stabilize target transcripts, causing an artifact that can be mistakenly assumed to be increased transcription (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7162184/). Some additional controls here would be warranted for making this conclusion.

      Thanks for raising this important experimental consideration. To-date, we do not have any evidence for MO-mediated transcript stabilization in Nematostella, and we have not found such data in the literature on models other than zebrafish. mRNA stabilization by the MO also seemed unlikely because we were unable to KD zswim4-6 using several independent shRNAs - an effect we frequently observe with genes, whose activity negatively regulates their own expression. However, to test the possibility that zswim4-6MO binding stabilizes zswim4-6 mRNA, we injected mRNA containing the zswim4-6MO recognition sequence followed by the mCherry coding sequence (zswim4-6MO-mCherrry) with either zswim4-6MO or control MO. We could clearly detect mCherry fluorescence at 1 dpf if control MO was co-injected with the mRNA, but not if zswim4-6MO was coninjected with the mRNA. At 2 dpf (the stage at which we showed upregulation of zswim4-6 upon zswim4-6MO injection on Fig. 6I-I’), zswim4-6MO-mCherrry mRNA was undetectable by in situ hybridization with our standard FITC-labeled mCherry probe independent of whether zswim4-6MO-mCherrry mRNA was co-injected with the control MO or ZSWIM4-6MO, while hybridization with the FITC-labeled FoxA probe worked perfectly.

      Author response image 1.

      We are currently offering two alternative hypothesis for the observed increase in zswim4-6 levels in the paper rather than stating explicitly that ZSWIM4-6 negatively regulates its own expression: “The KD of zswim4-6 translation resulted in a strong upregulation of zswim4-6 transcription, especially in the ectoderm, suggesting that ZSWIM4-6 might either act as its own transcriptional repressor or that zswim4-6 transcription reacts to the increased ectodermal pSMAD1/5 (Fig. 6I-I’).” Given the sensitivity of zswim4-6 to even the weakest pSMAD1/5 signal (zswim4/6 is expressed upon GDF5-like KD, which drastically reduces pSMAD1/5 signaling intensity (see Fig. 1 and 2 in Genikhovich et al., 2015, http://doi.org/10.1016/j.celrep.2015.02.035 and Fig. 6-Figure supplement 3 of this paper), the latter option (that it reacts to the increased ectodermal pSMAD1/5) is, in our opinion, clearly the more probable one.

      5) Zswim4-6 is proposed to be a co-repressor of pSmad1/5 targets based on the occupancy of zswim4-6 at the chordin BRE (which is normally repressed by BMP signaling) and lack of occupancy at the gremlin BRE (normally activated by BMP signaling). This is a promising preliminary result but is based only on the analysis of two genes. Since the authors identified BREs in other direct target genes, examining more genes would better support the model.

      We suggest that ZSWIM4-6 may be a co-repressor of pSMAD1/5 targets because it is a nuclear protein (Fig. 4G), whose knockdown results in the expansion of the ectodermal expression of several genes repressed by pSMAD1/5 in spite of the expansion of pSMAD1/5 itself (Fig. 6G-H’, 6L-M’, Fig. 6-Figure Supplement 4). Our limited ChIP analysis supports this idea by showing that ZSWIM4-6 is bound to the pSMAD1/5 site of chordin (repressed by pSMAD1/5) but not on gremlin (activated by pSMAD1/5). We agree that adding the analysis of more targets in order to challenge our hypothesis would be good. However, given technical limitations (having to inject many thousands of eggs with the EF1a::ZSWIM4-6-GFP plasmid in order to get enough nuclei to extract sufficient immunoprecipitated chromatin for qPCR on 3 genes (chordin, gremlin, GAPDH) for each biological replicate, it is currently unfortunately not feasible to test more genes. It will be of great interest for follow up studies to generate a knock-in line with tagged zswim4-6 to analyze target binding on a genome-wide scale. We stress in the discussion that currently the power of our conclusion is low.

      6) The rationale for further examination of zswim4-6 function in Nematostella was based in part on it being a conserved direct BMP target in Nematostella and Xenopus. The analysis of zebrafish zswim5 function however does not examine whether zswim5 is a BMP target gene (direct or indirect). BMP inhibition followed by an in situ hybridization for zswim5 would establish whether its expression is activated downstream of BMP.

      In the paper by Greenfeld et al., 2021, zebrafish zswim5 was downregulated approximately 2.4x in the bmp7 mutant at 6 hpf. However, this gene was not among the 57 genes, which were considered to be direct BMP targets because their expression was affected by bmp7 mRNA injection into cycloheximide-treated bmp7 mutants (Greenfeld et al., 2021). We added this information to the text of the manuscript.

      7) Although there is a reduction in pSmad1/5/9 staining in zebrafish injected with zswim5 mRNA, it is difficult to tell whether the resulting morphological phenotypes closely resemble zebrafish with BMP pathway mutations (such as bmp2b). More analysis is warranted here to determine whether stereotypical BMP loss of function phenotypes are observed, such as dorsalization of the mesoderm and loss of ventral tail fin.

      We agree, and we have tuned down all zebrafish arguments. Analyses of zswim5 mutants are currently ongoing.

    2. Reviewer #1 (Public Review):

      This thorough study expands our understanding of BMP signaling, a conserved developmental pathway, involved in processes diverse such as body patterning and neurogenesis. The authors applied multiple, state-of-art strategies to the anthozoan Nematostella vectensis in order to first identify the direct BMP signaling targets - bound by the activated pSMAD1/5 protein - and then dissect the role of a novel pSMAD1/5 gradient modulator, zwim4-6. The list of target genes features multiple developmental regulators, many of which are bilaterally expressed, and which are notably shared between Drosophila and Xenopus. The analysis identified in particular zswim4-6 a novel nuclear modulator of the BMP pathway conserved also in vertebrates. A combination of both loss-of-function (injection of antisense morpholino oligonucleotide, CRISPR/Cas9 knockout, expression of dominant negative) and gain-of-function assays, and of transcriptome sequencing identified that zwim acts as a transcriptional repression of BMP signaling. Functional manipulation of zswim5 in zebrafish shows a conserved role in modulating BMP signaling in a vertebrate.<br /> The particular strength of the study lies in the careful and thorough analysis performed. This is solid developmental work, where one clear biological question is progressively dissected, with the most appropriate tools. The functional results are further validated by alternative approaches. Data is clearly presented and methods are detailed.

      I have a couple of comments.<br /> 1) I was intrigued - as the authors - by the fact that the ChiP-Seq did not identify any known BMP ligand bound by pSMAD1/5. Are these genes found in the published ChiP-Seq data of the other species used for the comparative analysis? One hypothesis could be that there is a change in the regulatory interactions and that the initial set-up of the gradient requires indeed a feedback loop, which is then turned off at later gastrula. In this case, immunoprecipitation at early gastrula, prior to the set-up of the pSMAD1/5 gradient, could reveal a different scenario. Alternately, the regulation could be indirect, for example, through RGM, an additional regulator of BMP signaling expressed on the side of lower BMP activity, which is among the targets of the ChiP-Seq. This aspect could be discussed. Additionally, even if this is perhaps outside the scope of this study, I think it would be informative to further assess the effect of ZSWIM manipulation on RGM (and vice versa).<br /> 2) I do not fully understand the rationale behind the choice of performing the comparative assays in zebrafish: as the conservation was initially identified in Xenopus, I would have expected the experiment to be performed in frog. Furthermore, reading the phylogeny (Figure 4A), it is not obvious to me why ZSWIM5 was chosen for the assay (over the other paralog ZSWIM6). Could the Authors comment on this experiment further?

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths:

      The study addresses an intriguing research question that fills a gap in existing literature, and was carefully designed and well-executed, with a series of experiments and control experiments.

      We thank the reviewer for the positive statement about the conception and execution of the study as well as the potential interest to the community within a broader field.

      Weaknesses:

      1) My main concern is the null effect of precision estimation pattern between cued and un-cued trials. It is well established that relative to the un-cued stimuli, the cued stimuli obtain more attentional resource and this study claimed serial attentional resource allocation during parallel feature value tracking. However, all Experiments 3a-c did not find any difference in precision estimates between these two types of trials.

      We would like to annotate that the terminology „cued versus uncured trials“ in the usual sense of distinguishing between stimuli being attended versus unattended is admittedly somewhat misleading in the current work. In cued and uncured trials of the present experiments 3a-c the allocation of attention is equal. The difference is that the color stream that is attended first is defined (knowable) in the cued but not in the uncued trials. In all cases subjects had to track both color streams and report any of the probed streams as accurately as possible. In other words, the overall allocation of attention in cued and uncured trials is the same. Also, the „cue“ did not provide any information regarding the following probe (no indication of likelihood for a probe in that stream as in an attention experiment). It was entirely irrelevant and was therefore expected not to alter subjects overall performance – as confirmed by the mentioned null-result. The performed test shows, that the reported bias of ~2:1 does not depend on whether in one set of the trials one stream is cued or not. The sole purpose of the “cue” was to subconsciously redirect attention briefly towards that particular stream at the start of each trial in order to ‘phase-reset’ any process, switching/oscillating feature-based resources over time. Performance imbalance across streams is hereby not altered by this phase-reset but remains constant since precision ratio is estimated across a large number of trials and durations. To clarify this issue, we rephrased relevant descriptions in the methods section.

      2) Results of Exp.1 in the main text were different from those in Figure.

      Thank you for spotting that error. We have corrected the figure accordingly.

      3) It would be helpful to add more details for the assignation of response 1 and response 2 to target 1 and target 2, respectively, in all experiments.

      For Experiment 2 and 3 only one response per trial was required by the subjects. This design was chosen to avoid potentially ambiguous response-target assignments.

      However in the first experiment, as the reviewer points out, subjects gave two color estimates (one for each of the tracked color streams) within each trial. Given that we intend to split subjects’ target-response differences (precisions) into two distributions (based on the idea that each stream is being maintained by an independent attentional resource), there are two possible ways of assigning responses:

      (1) We split responses into a best and worst independent of which response was given first.

      (2) Alternatively, we assign target-response pairs based on the order of response. The assumption would be, that the first response would be the one with the highest confidence and would be paired with the target closest. This pairing would occur independent of the second response, which is consequently paired with the remaining target. This leaves open the possibility of the second target-response difference being better than the first one due to resource fluctuations. In general, this strategy would be less ‘rigid’ in dividing the two precision-responses into ‘good’ and ‘bad’ responses and was consequently chosen.

      To avoid problems arising from the ambiguity of target-response assignments, in all following experiments (2/3), subjects were required to give one response per trial only. We will go into further detail on this issue with reviewer 3 as well, including a numerical example. The logic behind the target-response assignments in experiment 1 has been described in more detail in the methods.

      Reviewer #2 (Publlic Review):

      The authors asked the question about whether and how changing feature values within the same feature dimensions are tracked. Using a series of behavioral studies combined with modeling approaches, the authors report interesting results regarding a robust, uneven distribution of attentional resources between two changing feature values (in a 2:1 ratio), alternating at 1 Hz. Although the results are clear, it is important to rule out the possible biases due to computational processes. The results advanced our understanding of how parallel tracking of multiple feature values within the same dimension is achieved.

      We thank the reviewer for the summary, including the potential impact on the field and we look forward to clarify methodological imprecisions.

      Reviewer #3 (Public Review):

      The study is interesting and the results are informative in how well people can report colors of two superimposed dot clouds. It reveals that there are trade-offs between reporting two colors. However, I have a few basic but major concerns with the present study and its conclusions about people's abilities to continuously track color values and the rate at which attention may be allocated across the two streams which I am outlining below.

      We thank the reviewer for the positive description of our findings and look forward to address any remaining issues.

      1) The first concern regards the task that was used to measure continuous tracking of feature values, which in my view is ambiguous in whether it truly assesses active tracking of features or rather short-term memory of the last-seen colors. Specifically, participants were viewing two colored dot clouds that then turned gray, and were asked to report each of the colors they saw using continuous report. The test usually occurred after 6-8s (in Exp. 1 &2), so while not completely predictable, participants could easily perform the task without tracking both feature streams continuously and simply perform the color report based on the very last colors they saw. In other words, it does not seem necessary to know which color belonged to which stream, or what color it was before, to perform the task successfully. Thus, it is unclear to what extent this task is actually measuring active tracking, the same way tracking of spatial locations in multiple-object tracking tasks has been studied, which is the literature that the authors are trying to draw parallels to. In multiple-object tracking tasks, targets and nontarget objects look identical and so to keep track of which of the moving objects are targets, participants need to attend to them actively and selectively. (Similarly, the original feature-tracking study by Blaser et al., at least in their main experiment, people were asked to track an object superimposed on a second object which required continuous and selective tracking of that object).

      The reviewer addresses a very fundamental point regarding ‘tracking’ in general: Does tracking rely on attentional processes or mere perception.

      The reviewer posits that subjects may simply ‘report based on the very last color they saw’ without the need to track both features streams continuously. Our argument supported by a broad literature on change blindness, inattentional blindness and related phenomena (c.f. Rensink, 2000) is, that one cannot consciously report a changing feature-value without continuously attending to it, in particular when it moves around randomly in feature space. The report of a feature value at a random unpredictable time t by ‘identifying it’ includes its attentive processing immediately before t. Since the time of the probing identification is random, it must continue throughout the trial. We do also rule out any strategy in which subjects only start tracking after some time (the probe appears between 6-8sec after trial onset) since such a strategy would involve processes of temporal attention as well and increase difficulty.

      Lastly, the reviewer refers to Blaser et al. as an example in which attentive tracking would be required, since ‘an object [is] superimposed on a second object’. We do absolutely agree. However, the same design principle applies in the current experiment: Two objects with separate values in feature space, that continuously change, are superimposed, that is, spatially inseparable. We do believe that the continuous movement of the feature values through color space separates this work from previous feature-tracking studies like Re et al., in which the presented features remained static. The latter work gives rise to alternate explanations in terms of working memory (mentioned in the next point of the reviewer). Once feature values keep changing and are relevant, a process of updating their internal representations in order to grant access is required (i.e. attention).

      2) The main claim that tracking two colors relies on a shared and strictly limited resource is primarily based on the relation between the two responses people give, such that the first response about one color tends to be higher accuracy than for the second response of the other color across participants. In my view, this is a relatively weak version of looking at trade-offs in resources, and it would have been more compelling to show such trade-offs at a single-trial level, or assess them with well-established methods that have been developed to look at attentional bottlenecks such as attention-operating characteristics that allow quantifying the cost of adding an additional task in a precise and much more direct manner.

      The reviewer suggests showing trade-offs at a single trial level within subject, which is in essence what we have done in experiment 1. Testing both streams simultaneously, however, has the drawback of introducing interference effects during the report (Reporting the first stream may degrade the precision of reporting the second stream) as well as the mentioned ambiguity between targets and responses. The second and third experiment circumvent this by probing only one color stream, as to analyze the data with a minimal set of assumptions. As the dependent measure of ‘precision’ fluctuates highly across trials, we have to estimate an overall tracking resource by creating a ‘precision’ distribution across many trials.

      3) Finally, the data of the last experiment is taken as evidence that feature-based selection oscillates at 1Hz between the two streams. This is based on response errors changing across time points with respect to an exogenous cue that is thought to "reset" attentional allocation to one stream. Only one of three data sets (which uses relatively sparse temporal sampling) shows a significant interaction between cue and time, and given that there was no a priori prediction of when such interaction should occur, this result begs for a replication to ensure that this is not a false positive result. Furthermore, based on the analyses done in the paper, it may very well be the case that the presumed "switching rate" is entirely non-oscillatory based on a recent very important paper by Geoffrey Brookshire (2022, Nature Human Behavior) that demonstrates that frequency analysis are not just sensitive to periodic but also aperiodic temporal structures. The paper also has a series of suggested analyses that could be used here to further test the current conclusions.

      The reviewer is absolutely correct in doubting the oscillatory nature of the results in Exp3. Importantly, in our discussion we do not claim that a regular periodicity of the attentional process maintains both color streams. In contrast, we stress the point of ‘one-feature at a time’, indicating a constraint that entails alternation between two representations. We do not presume any sort of regularity of this process but, instead, consider the switching being determined by the recurrent processing of tuning towards one of the two relevant values. Our interpretation is therefore largely in line with Brookshires criticism of previous attentional oscillation studies. In fact, we entirely share the doubtful interpretation of attentional oscillations that transfer mathematical modelling onto functional processes. In our study we use the tool of Fourier transformation in a mere methodological manner, in order to quantify alternations between our color streams but not to imply an underlying oscillatory process. We cannot draw conclusions about underlying attentional oscillations especially since we quantify the alternation/switch only across one full and one half period, in exp3a and exp3b respectively.

      We make the distinction between oscillations as a methodological tool and functional cognitive process more clear in the paper.

    1. Reviewer #1 (Public Review):

      In several developmental systems, the core Planar Cell Polarity (PCP) pathway organises the dynamics of cellular behaviours underlying morphogenesis. During pupal stages, the Drosophila wing undergoes a complex morphogenetic process that results in the simultaneous elongation and narrowing of the wing blade along the proximal-distal and anterior-posterior axes, respectively. It was proposed that this dynamic process is driven by mechanical stress that results in cell deformations and cell rearrangements. However, prior work by Etournay et al. (eLife 2015) shows that mutants that reduce of mechanical stresses do not completely eliminate oriented cell rearrangements. Here, Piscitello-Gomez et al. use imaging techniques previously developed by them and others, combined with a computational analysis of a rheological model, to evaluate the role of the core-PCP pathway as a possible patterning cue that could orient cell rearrangements in this system. Surprisingly, the authors found that core-PCP mutants only affect an early retraction velocity upon laser ablation, but do not seem to drive overall morphogenesis in this system. Therefore, the original question of the work, namely, identifying the patterning cues that establish oriented cell rearrangements in this system, remains unanswered.

      The work exemplifies how the integration of mechanical perturbations, image analysis, and computational modelling could be used to investigate the contribution of a specific patterning cue in morphogenesis. While the conclusions of the manuscript are solid and the data support the conclusion that core-PCP pathway mutants do not display an altered cell dynamic or cell elongation phenotype relative to wild-type controls, one challenge of the approach is that the time-lapse imaging technique is done only in a handful of pupal wings. This does not permit to conclude whether subtle changes in cell elongation or cell rearrangements could account for observed changes in the shape of adult wings (that are more round in these mutants). Other patterning and polarity cues such as Fat-Daschous or Toll-like signalling are suggested by the authors, but their examination is left for future studies.

    2. Reviewer #2 (Public Review):

      The core planar cell polarity (PCP) pathways are known to control tissue morphogenesis in vertebrates and also in a number of developing tissues in the fruitfly Drosophila. However, it has long been observed that beyond effects on hair polarity, core PCP activity does not have dramatic effects on Drosophila wing morphogenesis. Here the authors carry out detailed quantitative studies of cell behaviors in flies mutant for core PCP genes during pupal wing morphogenesis between about 16 to 32 hours of pupal life to further try to determine if core PCP activity affects cell behaviors in the wing.

      Their overall conclusion is that there is no effect on tissue morphogenesis. However, the number of wings looked at for each genotype is low due to the enormous amount of work required to analyze the cell behaviors on an entire wing surface over 16 hours of development. Thus, rigorous statistics cannot be applied to support the statement that there is no change in morphogenesis. Moreover, by eye, the average cell behaviors do appear different and the authors themselves say there are subtle differences. They also note that adult wings have a change in size. Also, a previous publication suggested a change in cell arrangements at the late stages of the period studied (Sugimura & Ishihara 2013).

      Interestingly, the authors do report a change in local mechanical properties of the tissue in flies with altered core PCP pathway activity, by using laser ablation to study tissue rheology. This seems to support the view that there could be a subtle change in tissue morphogenesis.

      Ultimately, this is a valuable set of results that help to clarify core PCP pathway function in Drosophila tissues. It clearly demonstrates effects on tissue mechanics, but also indicates that this does not result in gross changes in tissue morphogenesis - the latter being consistent with previous observations.

    1. Author Response

      eLife Assessment:

      The fluorescently tagged SYT-1 mouse line will be useful for the field. Importantly, the authors used a comprehensive set of immunohistochemical and physiological experiments to demonstrate that the fluorescence tagging did not alter the function of SYT-1. These are important control experiments that will make the strain useful for physiological experiments in the future. However, the advance of this manuscript is less clear.

      We thank the editor for raising this point. In the revised manuscript, we performed additonal experiments including testing the expression level of Syt1-TDT and testing the co-labeling of Syt1-TDT with synaptic marker in situ. We also dicussed the advantage of our model compared with the existed ones in line 285 to 300 in the section of discusion. Briefly, we conclude the advance of our models as follows: First, the Syt1-TDT could label synapse in situ, especially in glomerular layer of olfactory bulb (compared with B6SJL-Tg(Thy1-Syt1/ECFP)1Sud/J (Han et al. 2005)). Second, we provided a potential usage of our model in the study of electrophysiological recording and imaging in vivo, as the electrophyiological properties of neurons from Syt1-TDT mice are normal (not be analyzed in B6.Cg-Tg(Thy1-YFP/Syp)10Jrs/J and B6;CBA-Tg(Thy1-spH)21Vnmu/J (Umemori et al. 2004; Li et al. 2005)), which might be result from the relative low expression of Syt1-TDT compared with the native Syt1. Third, the neurons from the transgenic mice can be used in ASF screening by skiping the procedure of immunostaining. It will save the cost of time, reagents and work.

      Reviewer #1 (Public Review):

      In this manuscript, Zhang and colleagues created a transgenic mouse strain that expresses SYT-1-tdt in all neurons. They showed that the labelled SYT-1 colocalizes with multiple synaptic markers and label synapses in different regions. More importantly, they showed that the transgenic expression does not alter synaptic function using ephys assays. This is a straightforward paper that generated a useful reagent that will be used broadly.

      We are grateful for the reviewer’s positive comments.

      Reviewer #2 (Public Review):

      Yang et al. produced a transgenic mouse line (Syt1-TDT) that could be used for labeling both excitatory and inhibitory synaptic sites in cultured neurons and in vivo neurons. The strength of the current study is to provide a series of thorough analyses to claim the applicability of this mouse line in the relevant neuroscience research field(s). The weakness is the potential impact/usefulness of this mouse line. To strengthen the merit of this mouse line, the authors should present evidence showing its advantage over other similar genetic approaches.

      We thank the reviewer for raising this point. To strengthen the merit of this mouse line, we tested the application of Syt1-TDT in labeling synapse in situ. We found that the Syt1-TDT is highly overlapped with synapsin in the brain slice, especially in hippocampus, cerebellum and olfactory bulb, which suggest a potential usage of our model in imaging synapse in vivo. We also compared our transgenic model with the existed ones in line 285 to 300 in the section of discussion in the revised manuscript:

      “Several fluorescently tagged synaptic protein transgenic mice model, such as YFP tagged synaptophysin and pHluorin tagged synaptobrevin have been developed to label synapses [49, 50]. While these models can label synapse well, it lacks the functional analysis of neurotransmitter release in the overexpressed neurons as synaptophysin and synaptobrevin were reported to play a role in regulating neurotransmitter release. Considering the overexpression of synaptobrevin or synaptophysin were reported to promote neurite elongation or enhance neurotransmitter secretion, the synaptic organization and synaptic transmission might be changed in these models. Weiping Han et al. in their previous work [47] have generated transgenic mice expressing a Syt1-ECFP fusion protein. The Syt1-ECFP mice expressed the fluorescent protein ECFP in the cortex, midbrain, and cerebellum. However, the expression pattern in their model showed some difference with ours: In the olfactory bulb, the Syt1-TDT signals were highly enriched in glomerular layer in our model, which was not observed in the previously reported Syt1-ECFP transgenic mice [47]. It suggested a potential application of our model in labeling synapse in glomerular layer of olfactory bulb compared with Syt1-ECFP transgenic mice.”

      Reviewer #3 (Public Review):

      Yang and colleagues provide a thorough characterization of a transgenic mouse model expressing fluorescently tagged synaptotagmin. In particular, they present key controls validating this mouse model as a tool, including co-localization of the tagged synaptotagmin with other synaptic markers as well as normalcy of synaptic transmission mediated by synaptic terminals expressing the tagged synaptotagmin. Importantly, the authors present data on the potential use of neuronal cultures obtained from these mice in synaptic co-culture assays. In these assays, synaptic cell adhesion molecules expressed on non-neuronal cell lines such as HEK-293 cells or COS cells are used to test the sufficiency of these molecules to trigger synapse assembly. This mouse model will be a useful addition to existing models expressing fluorescently-tagged synaptic vesicle proteins such as synaptophysin, synaptotagmin as well as synaptobrevin.

      We are grateful for the reviewer’s positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      Bakoyiannis et al. investigated the distinct contribution of ventral hippocampal outputs to the nucleus accumbens and medial prefrontal cortex on memory in mice exposed to a high-fat diet (HFD) beginning in adolescence. The authors first characterize the hippocampal to accumbens or mPFC circuits using intersectional viral approaches. They then replicate their previous finding that adolescent HFD contributes to the overactivation of the ventral hippocampus during contextual learning via quantification of c-fos+ cells. In this manuscript, the authors further explore the distinct contribution of these two outputs from the ventral hippocampus using chemogenetics to specifically inhibit one circuit or the other. Interestingly, the authors find that inhibition of either circuit returns c-fos+ cell number to control levels, but the effects on memory are dissociable. They demonstrate that inhibition of output to the NAc rescues HFD-induced deficits on object recognition, while inhibition of mPFC outputs rescues HFD-induced deficits on object location recall. The authors further confirmed that chemogenetic manipulations resulted in alterations in c-fos+ cells that were specific to CA1, and not CA3 or DG. Behaviorally, they excluded any contribution of anxiety on recall, finding no effect on the elevated plus maze.

      The strengths of this manuscript include robust behavioral findings that can be attributed to specific circuits. The conclusions of this paper are largely well supported by the data, although some of the methods could provide more detail and the statistical approaches used for analysis need improvement.

      We thank the Reviewer for thoroughly summarizing the main results of the study and for providing the comments that we address below.

      Reliance on only one measure of anxiety to exclude this as a confound on recall performance is a weakness of the manuscript. To be more convincing that anxiety is not a confound, more than one behavioral assay should be performed.

      Reviewer #2 (Public Review):

      Bakoyiannis et al. aim to analyze the impact of high-fat diet (HFD) intake during the preadolescent period on memory performances by optogenetically manipulating the circuits responsible for related memory performances. In previous work, they showed the possibility to rescue object-based memory impairments in HFD-exposed animals by silencing the ventral hippocampus (vHPC). Here they investigated further the projections to the nucleus accumbens (NAc) and medial prefrontal cortex (mPFC), 2 of the main monosynaptic targets of the vHPC.

      They used a precise strategy to target and manipulate only vHPC cells that project to either NAc or mPFC. They found that preadolescent HFD can induce different types of memory deficits related to different vHPC pathways. In particular, they found that silencing vHPC-NAc, but not vHPC-mPFC, pathway restored HFD-induced object recognition memory deficit. On the other side, silencing vHPC to mPFC, but not vHPC-NAc, pathway rescued HFD-induced object location memory deficits. Moreover, these pathways do not control anxiety-like behaviours since their inactivation has no effect on anxiety levels.

      We thank the Reviewer for summarizing the findings of the study and for their positive comments on our manuscript.

      The conclusions of the manuscript are mostly supported by the results, but there are some points and controls that need to be addressed and clarified:

      • While identifying the relevance of hippocampal cells projecting to NAc and mPFC, a missing control is to verify the activity of vHPC not projecting to these 2 regions in normal conditions or when the investigated pathways are manipulated. This control is essential to refine and bring novel results related to their previous discovery that vHPC overall is involved in the process.

      • A downstream effect of their optogenetic manipulation on NAc and mPFC cellular populations should be shown if they want to claim that their chemogenetic inhibition decrease the activation of the pathway and not only of vHPC projecting neurons.

      New c-Fos experiments were performed. Please see our response to points 4-5-6 in the “Essential Revision” section.

      Reviewer #3 (Public Review):

      "Obesogenic diet induces circuit-specific memory deficits in mice" by Bakoyiannis et al., investigates the role of specific ventral hippocampal circuits (specifically to nucleus accumbens and mPFC) in high-fat diet-induced memory deficits. The authors had previously shown that increases in activity in the ventral hippocampus accompany high-fat diet-induced memory deficits, and that inhibition of activity thereby normalizes those memory deficits. In this manuscript, the authors extend these findings to specific projections, showing that they normalize different types of memories by inhibiting the two different pathways.

      The strengths of the paper include the pathway-specific manipulations that reveal a difference between the two types of memory. The results are a modest step forward for the field of feeding and learning and memory and would be of interest to that subgroup of neuroscientists. However, the paper also has a number of weaknesses which I detail below.

      We thank the Reviewer for summarizing the finding of our study and for the positive feedback.

      1) First, the authors show an effect of cfos from both pathways in Figure 2 on object learning. However, the inactivation studies show a pathway-specific effect on object recognition and object location, with no experiments to delineate how this divergence occurs. The authors do not specify whether they compared cfos in the control group between NAc and mPFC projections (presumably they did some controls with each injection), which might reveal differences.

      We have added new groups and presented/analyzed the results for each pathway (either vHPC-NAc pathway or vHPC-mPFC pathway) separately for c-Fos (new Figure 2 and Figure 2-Figure Supplement 1) or behaviours (new Figure 3 and Figure 3-Figure Supplement 1). Please see our responses to points 2, 4-5-6 and 9 in the “Essential Revision” section.

      2) Related to this, it is unclear how the pathways end up diverging for memory if they do not show any differences in cfos during training. Perhaps there are pathway-specific differences in cfos following the ORM and OLM tests? It is difficult to support the claim that there are pathway differences in memory following inactivation if we do not see any pathway-specific change in activity.

      We thank the Reviewer for this comment. Please see our answer to point 7 in the “Essential Revision” section above.

      3) Figure 2 and Figure 3 are also hard to interpret because of the usage of a 1-way ANOVA which is not the appropriate statistical test when there are two independent variables (HFD and DREADD manipulation). Indeed, noticing the statistical test also reveals that a critical control missing: HFD -, hM4di+CNO +. It is possible that inactivation simply brings down cfos levels regardless of diet. While this might benefit memory in the case of HFD, it is critical to know whether the manipulation is specific to the overactivation caused by HFD or just provides a general decrease in activity.

      Based on this comment we added new HFD-hM4di+CNO+ groups and modified statistical analyses accordingly. Indeed, inactivation of each pathway (vHPC-NAc or vHPC-mPFC) decreases c-Fos in both HFD+ and HFD- (CD+) groups (new Figure 2) whereas it has opposite effect on behaviors, improving memory performance in HFD+ groups but impairing or having no effect in HFD- (CD+) groups (new Figure 3). We have corrected this in the manuscript (please see our responses to points 2 and 9 of “Essential Revision” section).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper reports the fundamental discovery of adrenergic modulation of spontaneous firing through the inhibition of the Na+ leak channel NALCN in cartwheel cells in the dorsal cochlear nucleus. This study provides unequivocal evidence that the activation of alpha-2 adrenergic or GABA-B receptors inhibit NALCN currents to reduce neuronal excitability. The evidence supporting the conclusions is compelling, the electrophysiological data is high quality and the experimental design is rigorous.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study uses electrophysiological techniques in vitro to address the role of the Na+ leak channel NALCN in various physiological functions in cartwheel interneurons of the dorsal cochlear nucleus. Comparing wild type and glycinergic neuron-specific knockout mice for NALCN, the authors show that these channels 1) are required for spontaneous firing, 2) are modulated by noradrenaline (NA, via alpha2 receptors) and GABA (through GABAB receptors), 3) how the modulation by NA enhances IPSCs in these neurons.

      This work builds on previous results from the Trussell's lab in terms of the physiology of cartwheel cells, and from other labs in terms of the role of NALCN channels, that have been characterized in more and more brain areas somewhat recently; for this reason, this study could be of interest for researchers that work in other preparations as well. The general conclusions are strongly supported by results that are clearly and elegantly presented.

      I have a few comments that, in my opinion, might help clarify some aspects of the manuscript.

      1. It is mentioned throughout the manuscript, including the abstract, that the results suggest a closed apposition of NALCN channels and alpha2 and GABAB receptors. From what I understand, this conclusion comes from the fact that GABAB receptors activate GIRK channels through a membrane-delimited mechanism. Is it possible that these receptors converge on other effectors, for example adenylate cyclase (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6374141/).

      We have now tested the role of adenylyl cyclase modulation in the control of NALCN, by saturating the cells with a cAMP analogue 8-Br-cAMP and found no effect on the NA response. These data are included in the paper. While further experiments are necessary, these results argue in favor of a direct gating by G-proteins.

      1. In Figure 2G, the neurons from NALCN KO mice appear to reach a significantly higher frequency than those from WT (figure 2E, 110 vs. 70 spikes/s). Was this higher frequency a feature of all experiments? The results mention a rundown of peak firing rate due to whole-cell dialysis, but, from what I understand, the control conditions should be similar for all experiments.

      The peak firing rates in control solutions for WT and KO CWC are not statistically different.

      1. Also in Figure 2, the firing patterns for neurons from WT and NALCN KO mice appear to be quite different, with spikes appearing to be generated during the hyperpolarization of the bursts in the second half of the current step for WT neurons but always during the depolarization in KO neurons. Was this always the case? If so, could NALCN channels be involved in this type of firing? Along these lines, it would be interesting to show an example of a firing pattern of neurons from WT mice in the presence of NA, which inhibits NALCN channels.

      The specific pattern of spikes in CWC is quite variable from trial-to-trial or cell-to-cell, as it is dependent on multiple CaV and calcium dependent K channels subtypes, and is not dependent on the genotypes used here. The primary effects observed in the KO are in background firing and sensitivity to NA, both reflected alterations in rheobase. The firing pattern example requested was shown in the raster plot of fig 2B2.

      1. It might be interesting to discuss how the hyperpolarization induced by the activation of GIRK channels and inhibition of NALCN channels could have different consequences due to their opposite effect on the input resistance.

      We considered this as a point of discussion, but decided that making sense of it would depend on assumptions about the location of the channels (dendritic vs somatic, distance to AIS) that we do not have data for. For example, a dendritic increase in resistance through NALCN block, leading to a hyperpolarization of the soma, might have actions similar to a somatic hyperpolarizing conductance increase by GIRK, as far as the voltage at the AIS is concerned.

      Reviewer #2 (Public Review):

      This is a very interesting paper with several important findings related to the working mechanism of the cartwheel cells (CWC) in the dorsal cochlear nucleus (DCN). These cells generate spontaneous firing that is inhibited by the activation of α2-adrenergic receptors, which also enhances the synaptic strength in the cells, but the mechanisms underlying the spontaneous firing and the dual regulation by α2-adrenergic receptor activation have remained elusive. By recording these cells with the NALCN sodium-leak channel conditionally knocked, the authors discovered that both the spontaneous firing and the regulation by noradrenaline (NA) require NALCN. Mechanistically, the authors found that activation of the adrenergic receptor or GABAB receptor inhibits NALCN. Interestingly, these receptor activations also suppress the low [Ca2+] "activation" of NALCN currents, suggesting crosstalk between the pathways. The finding of such dominant contribution of the NALCN conductance to the regulation of firing by NA is somewhat surprising considering that NA is known to regulate K+ conductances in many other neurons.

      The studies reveal the molecular mechanisms underlying well known regulations of the neuronal processes in the auditory pathway. The results will be important to the understanding of auditory information processing in particular, and, more generally, to the understanding of the regulation of inhibitory neurons and ion channels. The results are convincing and are clearly presented.

      Reviewer #3 (Public Review):

      The study by Ngodup and colleagues describes the contribution of sodium leak NALCN conductance on the effects of noradrenaline on cartwheel interneurons of the DCN. The manuscript is very well-written and the experiments are well-controlled. The scope of the study is of high biological relevance and recapitulates a primary finding of the Khaliq lab (Philippart et al., eLife, 2018) in ventral midbrain dopamine neurons, that Gi/o-coupled receptors inhibit NALCN current to reduce neuronal excitability. Together these studies provide unequivocable evidence for NALCN as a downstream target of these receptors. There are no major concerns. I have only minor suggestions:

      Minor

      1. As introduced in the introduction, NALCN is inhibited by extracellular calcium which has led to some discourse of the relevance of NALCN when recorded in 0.1 mM calcium. A strength of this study is the effect of NA on NALCN is recorded in physiological levels of calcium (1.2 mM). I suggest including the concentration of extracellular calcium in the aCSF in the Results section instead of relying on the reader to look to the Methods.

      Done.

      1. It would be interesting to include the basal membrane properties of the KO compared to wildtype, including membrane resistance and resting membrane potential. From the example recording in Figure 2, one might think that the KOs have lower membrane resistance, so it is interesting that the 2 mV hyperpolarization produced similar effects on rheobase. In addition, from the example in Figure 2G, it appears that NA has an effect on firing frequency with large current injection in the KO. Is this true in grouped data and if so, is there any speculation into how this occurs?

      We have included in the text a comparison of the input resistance in WT and KO. These were not different. This should not be too surprising given the wide range of values between animals, and the necessity to compare populations. Measurements of resting potential are complicated by the fact that CWC are normally spontaneously active. As was discussed in the text, peak firing frequency declined with time during recording in both control and KO, necessitating normalization as shown in Fig 2E-H.

      1. Please expand on the rationale for why GABAB and alpha2 must be physically close to NALCN. To my knowledge, the mechanism by which these receptors inhibit NALCN is not known. Must it be membrane-delimited?

      Given the known membrane delimited modulation of GIRK by GABAB, and that alpha2 and GABAB receptors appear to share the same population of NALCN channels, and that alpha2 receptors do not appear to target GIRK channels, we felt the simplest explanation would be coupling through G-proteins, with spatial segregation of different receptor/channel pools providing the means for separating GIRK and NALCN effects. Given that the alpha2 receptor is a Gi/o GPCR, we have now included in the revision new experiments using 8-Br-cAMP, as discussed above. These showed no effect on the NA response, consistent with a direct effect membrane delimited of G-proteins. We acknowledge however that further experiments are warranted.

      Reviewer #1 (Recommendations For The Authors):

      1. I suggest labeling the voltage traces in Figure 2 with WT and KO for easier comprehension; in addition, I suggest adding the average data to the plots in Figure 2, as in Figure 2-supplementary Figure 1 panel F.

      We have added the figure labels as requested. We chose not to add the average data as we noticed that averaging the full FI plots led to a smearing of the curves and a distortion in the apparent rheobase. Thus, we instead measured the rheobase for individual cells and report their average.

      1. For readers that are not familiar with the field, more details should be given about the electrical stimulation to evoke IPSCs in cartwheel cells, and what they represent.

      Done.

      1. The methods should mention if and how the concentrations of divalents were adjusted in the experiments with 0.1 extracellular Ca2+

      Done.

      Reviewer #2 (Recommendations For The Authors):

      I only have several minor comments.

      1. The total lack of spontaneous firing in CWCs in the NALCN KO (Fig. 1) is interesting and provides an opportunity to probe the in vivo function of such spontaneous firing. Besides being a little smaller, do the mutant mice have any sign of abnormality in sound signal processing?

      Figure 1 – Figure supplement 1 showed that there are no effects on auditory brainstem responses in the KO.

      1. Figs. 3&4 (and several other figures with voltage-clamp recordings), a line indicating zero current level would be useful.

      Done

      1. page 7, "Outward current generated by suppression of NALCN": it might be better to state as "Outward response generated by suppression of NALCN", as the authors correctly pointed out that the NA-induced apparently outward current response is largely a result of an inhibition of NALCN-mediated inward Na+ current. One way to clarify this might be to record at the Nernst potential of K+ to isolate the contribution of Na+ currents (unclear if K+- or Cs+-based pipette was used in the experiment in Fig 3).

      Text has been modified.

      1. Figs. 5,6&7: do the dashed lines indicate initial current level or zero current level?

      Initial current. See legends.

      1. The labeling of some of the bar graphs can be made more clear. For example, in Fig. 2K, the right two columns should be labeled as WT as well. Fig. 3C & Fig. 4C, the left two columns should be labeled as WT and the right two as KO.

      Added labels to Fig 2 as requested.

      1. Figs. 5-7: The suppression of low extracellular [Ca2+]-induced NALCN-dependent current by NA and baclofen is very interesting. As the tonic inhibition of NALCN by extracellular Ca2+ is likely through a Ca2+-sensing GPCR (CaSR) and G-proteins (lowering [Ca2+] releases the inhibition and generates inward current) (Lu et al. 2010), the action of NA and baclofen may all converge onto the same G-protein dependent pathway of the Ca2+-sensing receptor. I'd include this in the discussion to provide a potential mechanistic explanation of the interesting observation.

      This is indeed an interesting idea. We prefer not to discuss here, as 1) the source of Ca2+ sensitivity of the channel seems to be controversial (Chua et al 2020), and 2) the effect of Ca2+ reduction is enormously slower than the effect of the modulators (Fig 5-7), implying distinct mechanisms.

      Reviewer #3 (Recommendations For The Authors):

      Typos/general comments

      1. Figure 2 would be easier to comprehend with WT and KO labels as in the other figures. Done

      2. Page 11, size of the IPSCs in NA is missing the minus sign.

      Corrected.

      1. Is the y-axis correct on Figure 8B? This looks like it is doubling the size of the IPSC.

      Thank you for catching this mistake. The formula used to calculate % change was in error. We have corrected all the data analysis in the figure, which fortunately did not change the conclusion. Regarding the axis, note that the measurement was % change, not ratio of drug vs control.

    2. eLife assessment

      This paper reports the fundamental discovery of adrenergic modulation of spontaneous firing through the inhibition of the Na+ leak channel NALCN in cartwheel cells in the dorsal cochlear nucleus. This study provides unequivocal evidence that the activation of alpha-2 adrenergic or GABA-B receptors inhibit NALCN currents to reduce neuronal excitability. The evidence supporting the conclusions is exceptional, the electrophysiological data is high quality and the experimental design is rigorous.

    3. Reviewer #1 (Public Review):

      This study uses electrophysiological techniques in vitro to address the role of the Na+ leak channel NALCN in various physiological functions in cartwheel interneurons of the dorsal cochlear nucleus. Comparing wild type and glycinergic neuron-specific knockout mice for NALCN, the authors show that these channels 1) are required for spontaneous firing, 2) are modulated by noradrenaline (NA, via alpha2 receptors) and GABA (through GABAB receptors), 3) how the modulation by NA enhances IPSCs in these neurons.

      This work builds on previous results from the Trussell's lab in terms of the physiology of cartwheel cells, and from other labs in terms of the role of NALCN channels, that have been characterized in more and more brain areas somewhat recently; for this reason, this study could be of interest for researchers that work in other preparations as well. The general conclusions are strongly supported by results that are clearly and elegantly presented.

      In this revised submission, the authors addressed all my questions. This is very interesting work that could be of interest for researchers working in other brain areas as well.

    4. Reviewer #2 (Public Review):

      This is a very interesting paper with several important findings related to the working mechanism of the cartwheel cells (CWC) in the dorsal cochlear nucleus (DCN). These cells generate spontaneous firing that is inhibited by the activation of α2-adrenergic receptors, which also enhances the synaptic strength in the cells, but the mechanisms underlying the spontaneous firing and the dual regulation by α2-adrenergic receptor activation have remained elusive. By recording these cells with the NALCN sodium-leak channel conditionally knocked, the authors discovered that both the spontaneous firing and the regulation by noradrenaline (NA) require NALCN. Mechanistically, the authors found that activation of the adrenergic receptor or GABAB receptor inhibits NALCN. Interestingly, these receptor activations also suppress the low [Ca2+] "activation" of NALCN currents, suggesting crosstalk between the pathways. The finding of such dominant contribution of the NALCN conductance to the regulation of firing by NA is somewhat surprising considering that NA is known to regulate K+ conductances in many other neurons.

      The studies reveal the molecular mechanisms underlying well known regulations of the neuronal processes in the auditory pathway. The results will be important to the understanding of auditory information processing in particular, and, more generally, to the understanding of the regulation of inhibitory neurons and ion channels. The results are convincing and are clearly presented.

      In this revision, the authors have satisfactorily addressed all my previous comments.

    5. Reviewer #3 (Public Review):

      The study by Ngodup and colleagues describes the contribution of sodium leak NALCN conductance on the effects of noradrenaline on cartwheel interneurons of the DCN. The manuscript is very well-written and the experiments are well-controlled. The scope of the study is of high biological relevance and recapitulates a primary finding of the Khaliq lab (Philippart et al., eLife, 2018) in ventral midbrain dopamine neurons, that Gi/o-coupled receptors inhibit NALCN current to reduce neuronal excitability. Together these studies provide unequivocable evidence for NALCN as a downstream target of these receptors.

      In re-review of this study, the authors have addressed the concerns sufficiently. This is a very nice study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their constructive comments on the manuscript. We have extensively revised the manuscript based on these concerns and comments. The followings are the specific answers.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript "Long‐read single‐cell sequencing reveals expressions of hypermutation clusters of isoforms in human liver cancer cells", S. Liu et al present a protocol combining 10x Genomics single-cell assay with Element LoopSeq synthetic long-read sequencing to study single nucleotide variants (SNVs) and gene fusions in Hepatocellular carcinoma (HCC) at single‐cell level. The authors were the first to combine LoopSeq synthetic long‐read sequencing technology and 10x Genomics barcoding for single cell sequencing. For each cell and each somatic mutation, they obtain fractions of mutated transcripts per gene and per each transcript isoform. The manuscript states that these values (as well as gene fusion information) provide better features for tumor-normal classification than gene expression levels. The authors identified many SNVs in genes of the human major histocompatibility complex (HLA) with up to 25 SNVs in the same molecule of HLA‐DQB1 transcript. The analysis shows that most mutations occur in HLA genes and suggests evolution pathways that led to these hypermutation clusters. Yet, very little is said about novel isoforms and alternative splicing in HCC cells, differences in isoform ratio between cells carrying different mutations, or diversity of alternative isoforms across cells. While the manuscript by Liu et al. presents a promising combination of technologies, it lacks significant insights, a comprehensive introduction, and has significant problems with data description and presentation.

      Answer: Thanks for the precious suggestion. Our long-read single-cell sequencing has discovered an average of 442 novel isoform transcripts per benign liver cell and 450 novel isoform transcripts per HCC cell per SCANTI v1.2 analysis. These are stated in the revised manuscript. The alternative splicing was detected by differential isoform expression as demonstrated in supplemental figures 6 and 7 and supplemental tables 8-11. The examples of differences in isoform ratio between cells carrying different mutations are now shown by DOCK8 and STEAP4 (figure 5 in the revised manuscript). A new section was added in the results to discuss the mutation expression of these two genes. The diversity of isoforms of the selected genes is shown in Supplemental Figure 10.

      This study showed how mutations in the same allele evolved in liver cancer. In particular, HLA hypermutations were found to develop from some specific sites of the molecules into large clusters of mutations in the same molecules. A new paragraph of introduction was added about the role of mutations in human cancer development. We also revised the figures to present the information better. All the HLA genes expressed only one known isoform, as shown in Figure 4 and Supplemental Figure 3, regardless of mutations.

      Major comments:

      1. The introduction section is scarce. It lacks description of important previous works focused on clustered mutations in cancers (for example, PMID35140399), on deriving the process of cancer development through somatic evolution (PMID32025013, from single cell data PMID32807900). Moreover, some key concepts e.g. mutational gene expression and mutational isoform expression are not defined. The introduction and the abstract contain slang expressions e.g. "protein mutation', a combination of terms I teach my students not to use.

      Answer: We appreciate the reviewer for the idea of more solid background introduction and term definition. We added a new paragraph in the introduction section to introduce the role of mutations and hypermutations in human cancers. Some important work has been cited. We added a new section in the "Methods" to define "mutation gene expression share" and "mutation isoform expression share". "Protein mutation" has been replaced by "genetic mutation".

      1. In the results section, to select the mutations of interest, the authors apply UMAP dimensionality reduction to the mutation isoforms expression and cluster samples in UMAP space, then select the mutations that are present only in one cluster, then apply UMAP to the selected mutations only and cluster the samples again. The motivation for such a procedure seems unclear, could it be replaced with a more straightforward feature selection?

      Answer: Thanks for raising up this important question. The goal of the analysis is an unbiased classification of the cell populations in the samples. We found that by removal of mutated isoform expressions that were at similar levels of all cells, the UMAP clustering generated clear segregation of three population cells. When the unique mutated isoform expressions from each group were applied, it generated highly distinct 8 groups of cells, with each group having a distinct mutation isoform expression pattern. If we force known knowledge into the mix of the analysis, it may generate unwanted bias. Specifically, the first UMAP was performed in an unbiased way to cluster cells, while the second step is a supervised approach by selecting the unique mutations in each cluster to identify the classifiers. The second UMAP matches the Benign/HCC labeling well.

      1. As I understand, the first "mutated isoform"-based UMAP clustering was built from expression levels of 205 "mutational isoforms". What was the purpose and outcome of the second "mutated isoform"based UMAP clustering (Figure 2E)? In the manuscript the authors just describe the clusters and do not draw any conclusions or use the results of the clustering anywhere further.

      Answer: Thanks for pointing this out. Figure 2E was generated from unique mutation isoform expressions in groups A, B, and C from Figure 2D. The purpose of Figure 2E is to investigate whether these unique mutation isoforms can further classify the cell populations free of prior biological knowledge. We added a sentence in the revision to clarify the purpose of the clustering. The conclusion from this analysis, including Figure 2F and Figure 3 (which is an extension of Figure 2E), is that HLA mutation isoform expressions dominated the classifications of cell populations.

      1. The authors just cluster the data three times based on expression levels of different sets of "mutational isoforms" and describe the clusters. What do we need to gather from these clustering attempts besides the set of 113 mutations used for further analysis? What was the point of the reclusterings? Did the authors observe improvement of the classification at each step?

      Answer: Thanks for asking this important question. The improvement of re-clustering to classify cell populations is the obvious segregation of 8 different groups of cells without any manual classification through prior knowledge. The distances among groups were far apart in comparison to the first clustering (figure 2B). Detailed subclassifications were achieved on cell populations that otherwise could not be segregated based on the first clustering.

      1. The alignment of short reads generated from hypermutated transcriptomes is non-trivial. The proposed approach could address the issue without the need for whole genome sequencing and offer insights about the cancer development through somatic evolution. Why didn't the authors use modern phylogenetic approaches in the "Evolution of mutations in HLA molecules" section or at least utilize the already performed clustering to infer cell lineages?

      Answer: We appreciate for the great question. For a single molecule mutation evolution, single gene clustering may not produce a desirable and robust effect. A simple evolution snowball chart in Figure 4B may be easier to be understood.

      1. I am not sure I understood the definition of "mutated gene expression levels" and "mutated isoform expression levels" in the "Mutational gene expression and fusion transcript enhanced transcriptome clustering of benign hepatocytes and HCC" section. The authors mention that gene lists included all the isoforms within the same range of standard deviation. If I understand it correctly, they are equal if there is only one expressed transcript isoform. In that case, this overlap is not surprising at all.

      Answer: We thank the reviewer for the great question. The definition of mutation gene expression level, mutation isoform expression level, and fusion gene expression level are now defined in the "Methods" section. In all HLA mutation transcripts, there were multiple transcripts with or without mutations for a single dominant isoform.

      1. "To investigate the roles of gene expression alterations that were not accompanied with isoform expression changes, UMAP analyses were performed based on the non‐overlapped genes." Venn diagrams (Sup Figure 8) show that there are much less "non-overlapped genes" than "genes that showed both gene and isoform level changes" for each SD threshold (for example, for SD>=0.8 59 vs 275). Could that be the reason why clustering based on the former group is worse i.e the cancer and normal cells are separated less clearly?

      Answer: The number of (attributes) genes could be a contributing factor in the segregation of cell populations. However, the number of attributes is not the underlying reason for worse performance for gene only classifier because much smaller isoforms/genes (22) overlap in SD>=1 outperformed a large number of genes (59) with SD>=0.8. It suggested that 59 gene expression classifier is less efficient in segregating the cell populations. To address this concern, we took SD>=0.8 as an example for demonstration if we subsampled the 275 overlapped genes/isoforms to 59 (equal to 59 non-overlapped genes in terms of number), we can still get better separation than the 59 DEG only. We repeated this subsampling process for three times. Similar results were found. The new data were inserted into supplemental Figure 8

      Reviewer #2 (Public Review):

      In the present study, Liu et al present an analysis of benign and HCC liver samples which were subjected to a new technology (LOOP-Seq) and paired WES. By integrating these data, the authors find isoforms, fusions and mutations which uniquely cluster within HCC samples, such as in the HLA locus, which serve as candidate leads for further investigation. The main appeal of the study is in the potential of LOOPSeq as a method to present isoform-resolved data without actually performing long-read sequencing. While this presents an exciting new method, the current study lacks systematic comparisons with other technologies/data to test the robustness, reproducibility and utility of LOOPSeq. Further, this study could be further improved by giving more physiologic context and examples from the analyses, thus providing a new resource to the HCC community. A few suggestions based on these are below:

      Answer: We appreciate the reviewer to raise up all the important questions and the great suggestions. The LOOPseq technology was compared with Oxford nanopore and PacBio long-read sequencing in our previous study. We have cited analysis in the introduction section of the paper. HLA mutation clusters in the single molecules are our finding with major physiological significance since these mutations may help liver cancer cells evade immune surveillance. We have extensively discussed the potential impact of these mutations on cancer development in the discussion. In addition, we added a new section of DOCK8 and STEAP4 mutation expressions in the results (page 11, new Figure 5) that are highly relevant to the pathogenesis of HCC.

      1. A primary consideration is that this seems to be the first implementation of LOOP-Seq, where the technology, while intriguing, has not been evaluated systematically. It seems like a standard 10x workflow is performed, where exons are selectively pulled down and amplified. Subsequent ultra-deep sequencing is assumed to give isoform-resolution of the sc-seq data. To demonstrate the utility of the approach it would benefit the study to compare the isoform-resolved results with studies where long-read sequencing was actually performed (ex: https://journals.lww.com/hep/Fulltext/2019/09000/Long_Read_RNA_Sequencing_Identifies_Alternativ e.19.aspx, https://www.jhep-reports.eu/article/S2589-5559(22)00021-0/fulltext, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010342). Presumably, a fair amount of overlap should occur to justify the usage.

      Answer: We have discussed the utility of the methodology in comparison with the previous studies by these three groups in the revision (results, page 12).

      1. Related to this point, the sc-seq cell types and benign vs HCC genes should be compared with the wealth of data available for HCC sc-seq (https://www.nature.com/articles/s41467-022-322833, https://www.nature.com/articles/s41598-021-84693-w). These seem to be important to benchmark the technology in order to demonstrate that the probe-based selection and subsequent amplification does not bias cell type definition and clustering. In particular, https://www.nature.com/articles/s41586021-03974-6 seems quite relevant to compare mutational landscapes from the data.

      Answer: This is a great point. The consistency probe-based analysis was demonstrated in our previous analyses and the analyses mentioned in the comments. We further discussed it in the results section of the paper (page 12).

      1. From the initial UMAP clustering, it will be important to know what the identities are of the cells themselves. Presumably, there is quite a bit of immune cells and hepatocytes, but without giving identities, downstream mechanistic interpretation is difficult.

      Answer: When mutation analyses were combined with cell marker analysis, i.e., immune marker positive but negative in HLA mutation, we found only one bona fide immune cell in the HCC sample. Thus, immune cells may not be significant in the current analysis.

      1. In general, there are a fair amount of broad analyses, such as comparisons of hierarchical clustering of cell types, but very little physiologic interpretations of what these results mean. For example, among the cell clusters from Fig 6, knowing the pathways and cell annotations would help to contextualize these results. Without more biologically-meaningful aspects to highlight, most of the current appeal for the manuscript is dependent on the robustness of LOOP-seq and its implementation.

      Answer: To address this comment, a new pathway analysis was performed on the cluster results of Figure 6. A new supplemental table was generated. The results are now discussed on page 13.

      1. Many of the specific analyses are difficult and the methods are brief. Especially given that this technology is new and the dataset potentially useful, I would strongly recommend the authors set up a git repository, galaxy notebook or similar to maximize utility and reproducibility

      Answer: The script file has been uploaded to GIT to facilitate the reproducibility of the analysis. We also added a new pipeline description script in the methods (pages 19-20).

      1. The authors claim that clustering between benign and HCC samples was improved by including isoform & gene (Suppl fig 8). This seems like an important conclusion if true, especially to justify the use of longread implementation. Given that the combination of isoform + gene presents ~double the number of variables on which to cluster, it would be important to show that the improved separation on UMAP distance is actually due to the isoforms themselves and not just sampling more variables from either gene or isoform

      Answer: The number of (attributes) genes could be a contributing factor in the segregation of cell populations. However, the number of attributes is not the underlying reason for worse performance for gene only classifier because much smaller isoforms/genes (22) overlap in SD>=1 outperformed a large number of genes (58) with SD>=0.8. It suggested that 58 gene expression classifier is less efficient in segregating the cell populations. To address this comment, we performed random subsampling to reduce the isoform/gene overlap iterates, similar results were obtained. A new supplemental figure was generated to reflect the new analyses.

      1. SQANTI implementation to identify fusions relevant for the HCC/benign comparison. How do the fusions compare with those already identified for HCC? These analyses can be quite messy when performed on WES alone so it seems that having such deep RNA-seq would improve the capacity to see which fused genes are strongly expressed/suppressed. This doesn't seem as evident from current analysis. There are quite a bit of WES datasets which could be compared: https://www.nature.com/articles/ng.3252, https://www.nature.com/articles/s41467-01803276-y

      Answer: Exome sequencing is not an ideal tool to identify fusion genes. Very few fusion genes have been discovered based on RNA sequencing so far. The fusion genes discovered in the study appeared mostly novel. No exome sequencing was involved in the identification of fusion genes.

      1. Figure 4 is fairly unclear. The matrix graphs showing gene position mutations are tough to interpret and make out. Usually, gene track views with bars or lollipop graphs can make these results more readily interpretable. Also, how Figure 4 B infers causal directions from mutations is unclear.

      Answer: We appreciate the reviewer for pointing this out. We have revised the diagram in Figure 4A to reflect the proper distance between the mutations in HLA-DQB1 NM_002123. Since these are the positions in the same alleles (protein), the gene track view or lollipop graph may not show that properly. The mutation clusters started from an isolated mutation, and mutation did not revert to wild type sequence after occurring. Based on these two principles, we showed several mutation accumulation pathways leading to hypermutation clusters.

      Reviewer #3 (Public Review):

      The Liu, et al. manuscript focuses on the interesting topic of evaluating in an almost genome-wide-scale, the number of transcriptional isoforms and fusion gene are present in single cells across the annotated protein coding genome. They also seek to determine the occurrences of single nucleotide variations/mutations (SNV) in the same isoform molecule emanating from the same gene expressed in normal and normal and hepatocellular carcinoma (HCC) cells. This study has been accomplished using modified LoopSeq long‐read technology (developed by several of the authors) and single cell isolation (10X) technologies. While this effort addresses a timely and important biological question, the reader encounters several issues in their report that are problematic.:

      1. Much of the analysis of the evolution of mutations results and the biological effects of the fusion genes is conjecture and is not supported by empirical data. While their conclusions leave the reader with a sense that the results obtained from the LoopSeq has substantive biological implications. However, they are extended interpretations of the data. For example: The fusion protein likely functions as a decoy interference protein that negatively impacts the microtubule organization activity of EML4.(pg 9)... and other statements presented in a similar fashion.

      Answer: We thank the reviewer for the helpful comment. The mutation results were experimentally validated by exome sequencing on the same samples. Furthermore, these mutations were filtered by requiring their presence in three different transcriptomes. The biological significance of these mutations is probably the subject of investigation in the next phase. Since a large number of HLA mutations did not occur overnight, the analysis of the accumulation pathways for these mutations was warranted, given the extensive evidence of such a process. The impact of mutations on HLA molecules appeared obvious and should be discussed. For ACTR2-EML4 fusion, we revised it as "The loss of microtubule binding domain may negatively impact the microtubule organization activity of EML4 domain of the fusion protein." We only discussed the obvious impact due to the loss of a large protein domain.

      2, LoopSeq has the advantage of using short read sequencing analyses to characterize the exome capture results and thus benefits from low error rate compared to standard long-read sequencing techniques. However, there is no evidence obtained from standard long read sequencing that the isoforms observed with LoopSeq are obtained with parallel technologies such as long read technologies. It is not made clear how much discordance there is in comparing the LoopSeq results are with either PacBio or ONT long read technologies.

      Answer: The comparative analyses among LOOPSeq, Oxford nanopore, and PacBio sequencing were performed in our previous study. We have cited the study in our introduction.

      1. There is no proteome evidence (empirically derived or present in proteome databases) from the HCC and normal samples that confirms the presence or importance of the identified novel isoforms, nor is there support that indicate that changes in levels HLA genes translate to effects observed at the protein level. Since the stability and transport differences of isoforms from the same gene are often regulated at the post-transcriptional level, the biological importance of the isoform variations is unclear.

      Answer: Given the transcriptome sequencing data, we can only focus on the isoform variation analysis but not directly link to the protein level variation because of the post-transcriptional level regulation. We discussed this in the revised manuscript (page 14).

      4 It is unclear why certain thresholds were chosen for standard deviation (SD) <0.4 (page 5), SD >1.0 (pg 11).

      Answer: The threshold is flexible and arbitrary. We showed different thresholds, and the same conclusion holds. We just choose the thresholds with better separation and a reasonable number of genes/isoforms for the downstream analysis. (Supplemental Figure 6-7 with different thresholds and supplemental tables 4-12).

      1. HLA is known to accumulate considerable somatic variation. Of the many non-immunological genes determined to have multiple isoforms what are the isoform specific mutation rates in the same isoform molecule? Are the HLA genes unique in the number of mutations occurring in the same isoform?

      Answer: We thank the reviewer for this important suggestion. We now show mutation expression patterns in isoforms of DOCK8 and STEAP4 in Figure 5. A new section is added to discuss the mutation expression of these two genes. As shown in supplemental figure 10, HLA-DQB1, HLA-DRB1, HLA-B, and HLA-C, have only one known isoform detected,

      Editorial comments:

      The present study pairs single-cell seq with LoopSeq synthetic long-read sequencing on samples of HCC and benign liver to identify mutations and fusion transcripts specific to cancer cells. The authors present a potentially important resource; however the overall support remains incomplete.

      While the approach of evaluating isoform-specific changes at the cellular level to cancer seeks to address a timely and important topic, there is currently incomplete evidence in support of the major claims in the manuscript. In particular, major recommendations to provide stronger support for the combination of technologies and interpretation regarding cancer-associated genomic changes include: 1) systematic evaluation of UMAP-based clustering methods, to what subsets of data they are applied and subsequent interpretations, 2) direct comparisons of results with additional methods to quantify long-read sequencing data and those evaluating mutational consequences of HCC progression and 3) detailed expansion of the description of methods and rationale for selecting specific parameters and cell types for further analyses. Including these changes would significantly strengthen the support for utility of combining 10x single-cell with Loop-seq and provide compelling evidence for usage of this resource in dissecting HCC-associated molecular changes.

      Answer: We appreciate the frank and constructive comments. The goal of UMAP is to obtain biological knowledge through unbiased data selection. Systematically, we select classifiers without any prior knowledge (blind to the samples). In our case, classifiers with high standard deviation across all the cells were chosen. We stressed this in the result section. The comparison among LOOPSeq, PacBio, and Oxford nanopore was made in our previous study. We cited that analysis in this paper. Analysis detail and pipelines were added in the revised manuscript to improve the reproducibility. The mutation expression analysis was quite clear-cut. The clustering classified the HCC and benign liver cells by itself and identified a few cancer cells in the benign liver sample. All these were accomplished without applying any knowledge.

      Reviewer #1 (Recommendations For The Authors):

      Overall, there are numerous problems with data presentation and insufficient description, which authors could fix.

      1. Figure 4. A. It would be more clear if the figure showed the distribution of mutations in the molecule. Otherwise, it's hard to see if we see clusters of mutations or just 25 mutations spread uniformly across the transcript. B. It's unclear what the reader needs to take away from these columns of numbers.

      Answer: The mutation positions are now presented as proportion to the location in a molecule. Column B is the distribution of mutation molecules from left panel in each cluster of cells (from Figure 3A) and their sample origin (HCC or benign liver). We clarify it a little more in the legend of Figure 4A.

      1. As a reader, I did not understand how "mutated gene expression levels" and "mutated isoform expression levels" were calculated in terms of sequenced long reads

      Answer: We defined the term and calculations in the methods section of the revised manuscript.

      1. Page 6 "genes involving antigen presentation"

      Answer: The full sentence of the subtitle is" Mutations of genes involving antigen presentation dominated the mutation expression landscape."

      1. Page 6 "These unique mutational isoforms" - how are these isoforms unique?

      Answer: We take away most of the "unique" adjectives to describe the non-redundant mutations.

      1. Page 6. Unclear "All but one clusters contained cells co‐migrated with cells of their sources."

      "Among 113 mutation isoforms, the major histocompatibility complex (HLA) was the most prominent with 68 iterations (60.2%) (Supplemental Table 3, Figure 3B)" There is nothing about HLA in Figure 3B.

      Answer: We revised the sentence as "Cells in all but one clusters co-migrated with cells of their sources". The mutation isoform expressions were listed in supplemental Table 3. They are too small and become unreadable when put in the figure.

      1. Page 10 "genes or isoforms that across all samples had with expression standard deviations less than" - probably "with" should not be there.

      Answer: We correct the error and thank the reviewer for the comment.

      1. Page 11 "UMAP analysis was performed using genes with standard deviations {greater than or equal to} 1.0 (182 wild‐type genes) and standard deviations >0.4 (282 mutated genes)". What do "wild-type" and "mutated" mean here?

      Answer: We edited as "UMAP analysis was performed using gene expressions with standard deviations ≥ 1.0 (182 non-mutated genes) and gene mutation expression with standard deviations 0.4 (282 mutated genes)."

      1. I could not find the description of Supplementary Tables.

      Answer: The supplemental table legends are added in the revised manuscript.

      1. In the Discussion section, the authors mention that mutations were mainly expressed in a specific isoform of a gene for a given cell. I suggest to emphasize this point in the Results section and illustrate it with a comparison of abundance of mutated and non-mutated isoforms

      Answer: For HLA molecules, their expression appeared to be restricted to one known isoform, regardless of mutation status. This sentence is removed in the revision. A new section of DOCK8 and STEAP4 mutation expression is added to the result.

      1. It is also mentioned that mutations may have an impact on the RNA splicing process. The authors should compare the observed isoform ratio to a prediction of the effect of variants on splicing by SpliceAI or similar tools

      Answer: This sentence was removed from the discussion.

      1. Figure 3c: triangles corresponding to HLA-positive cells are hard to distinguish

      Answer: We provide a larger representation of the triangle and circle in figure 3c in the revision.

      Reviewer #2 (Recommendations For The Authors):

      Many of my comments could be addressed by spending time to provide the code/data and a walkthrough of analyses so that other users would be able to answer these questions on their own.

      Answer: We have included a script section in the revision to ensure the reproducibility of the analysis. The raw data had been uploaded to GEO (see Methods).

    2. Reviewer #2 (Public Review):

      In the present study, Liu et al present an analysis of benign and HCC liver samples which were subjected to a new technology (LOOP-Seq) and paired WES. By integrating these data, the authors find isoforms, fusions and mutations which uniquely cluster within HCC samples, such as in the HLA locus, which serve as candidate leads for further investigation. The main appeal of the study is in the potential of LOOP-Seq as a method to present isoform-resolved data without actually performing long-read sequencing.

      Comments on revised version:

      I made several comments on the previous version which have been adequately addressed.

    3. eLife assessment

      The authors pair single-cell sequencing technology with the LoopSeq synthetic long-read method to examine samples of hepatocellular carcinoma and benign liver, with the goal of identifying mutations and fusion transcripts specific to cancer cells. The authors present a valuable resource and the overall support for the major claims is solid.

    4. Reviewer #1 (Public Review):

      In the manuscript "Long‐read single‐cell sequencing reveals expressions of hypermutation clusters of isoforms in human liver cancer cells", S. Liu et al present a protocol combining 10x Genomics single-cell assay with Element LoopSeq synthetic long-read sequencing to study single nucleotide variants (SNVs) and gene fusions in Hepatocellular carcinoma (HCC) at single‐cell level. The authors were the first to combine LoopSeq synthetic long‐read sequencing technology and 10x Genomics barcoding for single cell sequencing. For each cell and each somatic mutation, they obtain fractions of mutated transcripts per gene and per each transcript isoform. The manuscript states that these values (as well as gene fusion information) provide better features for tumor-normal classification than gene expression levels. The authors identified many SNVs in genes of the human major histocompatibility complex (HLA) with up to 25 SNVs in the same molecule of HLA‐DQB1 transcript. The analysis shows that most mutations occur in HLA genes and suggests evolution pathways that led to these hypermutation clusters.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1. The results that TF binding produces microdomains at medium and long linker DNA but not short linker is very interesting. Although the differences can be observed from the figure, it still lacks of quantitative comparison. It is not clear the exact definition of the microdomain observed from simulations and what numbers of microdomains can be identified under different conditions. A quantitative comparison of different conditions could also be provided.

      We thank the reviewer for this suggestion. Our intent was to show qualitatively how TF binding locations that we design can direct fiber folding and create microdomains, which we define in the paper as high frequency contact regions in the contact maps, similar to the TADs observed in HiC maps. Together with the fiber configurations, contact maps allow us to identify formation of such microdomains, and to observe how these microdomains change depending on the conditions we build into the model, such as TF binding region or linker DNA length.

      To address your point, we have added a clustering analysis of the contact matrices with nucleosome resolution and assign each contact along the genome position (nucleosome index) to a cluster. In Supporting Figure S6, we show how DBSCAN clustering provides a clustering distribution that quantitatively describes the microdomains observed in the matrices and estimates the number of microdomains. For example, in the 44 and 62 bp systems, the contacts along the genomic distance separate into 5, 2, and 1 nucleosome groups for topologies 1 to 3, and into 2 and 1 group for topology 4, respectively. In the 26 bp and Life-Like systems, where microdomains are more diffuse due to fiber rigidity or polymorphism, we see that the clustering results are not as TF-topology-dependent as in the 44 and 62 bp systems. We also decomposed the contact matrices into one dimensional plots that depict the magnitude of 𝑖, 𝑖 ± 𝑘 internucleosome interactions. We see that internucleosome patterns change with the TF binding topology, and that the 26 bp and Life-Like systems show the least changes.

      1. When increasing TF concentration, from 0 to 100%, it seems that both packing ratio and sedimentation coefficients are not sensitive to the TF concentrations after 25%. Is it due to the saturation of TF binding? How many TF binding sites are considered at each concentration?

      Yes, in most cases, at TF concentrations higher than 25%, the fiber compaction does not change due to saturation of TF binding. Although the TF concentrations are reached, such as 50%, 70%, or 100%, these do not influence the fiber architecture. A higher order folding and compaction cannot be reached due to excluded volume interactions that impede overlapping of beads in the model.<br /> We have clarified this in the manuscript.

      As stated in the Methods section, the TF concentration refers to the number of linker DNA beads that can engage in a constraint compared to the total number of linker DNA beads. Thus, at 25% TF, 25% of linker DNA beads are engaged in TF constraints. We have added a comment on this in the Results section.

      1. It is shown that the contact maps that reveal microdomains are ensemble-based maps and single trajectories do not show clear formation of microdomains. Does the formation of microdomains increase with the number of combined trajectories?

      The formation of microdomains occurs in each single trajectory. However, the microdomains formed in each trajectory can be different. That is why ensemble-based maps show clearer trends of microdomains that might not be as visible in single-trajectory maps. If we increase the number of trajectories, the macrodomains will be more visible and there will be more macrodomains in the contact map, but the formation of microdomains will not increase in each single trajectory.

      1. "As we see from Figure 4A, when the linker DNA is short, such as 26 and 35 bp, TF binding does not increase the packing ratio of the fiber." The results of 35bp cannot be found in Figure 4A. In addition, the color of 44 and 62 bp should be changed since they are very similar in the figure.

      Thank you for catching this. The results corresponding to the 35 bp system are presented in the Supporting Figure 7. We have changed the text to read “As we see from Figure 4A and Figure S7..”.

      We have changed the color of the 62 bp trace to blue in the plots of Figure 4. Consistently, we have also changed the color of the 62 bp fiber in Figure 2 and Figure 5.

      1. For modelling of TF binding at increasing concentrations, it is mentioned that in these three conditions, TFs are allowed to bind to any region. Do you mean TF can also bind to nucleosomal DNA? Nucleosome structure prevents the binding of many TFs.

      In our model, only linker DNA beads can engage in the constraints (bind TF).<br /> We have changed the text to read “TFs are allowed to bind to any linker DNA region”.

      1. The details of the Mnase-seq dataset and how NFRs are identified should be provided, such as the coverage of the data and what read fragments are selected for NFR mapping.

      MNase data in bedgraph format were downloaded from the Genome Expression Omnibus (GSM2083107) repository and loaded without further processing into the Genome Browser. NFRs were visually inspected and detected as genomic regions without peaks. As detailed in the GEO repository, the sequenced paired-end reads were mapped to the mm9 genome. Only uniquely mapped reads with no more than two mismatches were retained and reads with insert sizes less than 50 or larger than 500 bp were discarded.

      We have clarified this in the manuscript.

      1. The calculations of volume and area of the Eed promoter region should be further elucidated.

      Thank you. We now elaborate upon these calculations. In particular, the Eed promoter region is defined between cores 123 and 129. The x,y or x,y,z coordinates of those cores are used to create the bounding area or volume by defining the shape’s vertices.

      1. In Figure 3, it is not clear how different topology are identified.

      In Figure 3 the topology, or TF binding regions, is the same for each of the 10 contact maps as these emerge from trajectory replicas of the same system which we named Topology 1. Different microdomains are formed in each individual trajectory as the high-frequency regions appear in different locations on each contact map. However, when these 10 maps are summed, the ensemble contact map clearly shows consensus microdomains in each region where TF binds.

      Reviewer #2:

      To further improve the manuscript, I have the following suggestions/comments.

      1. While most of the conclusions in this paper follow from the evidence provided by the ximulations, the result in section 3.3 title "Gene locus repression is medicated by TF finding," may not follow from the results. In my opinion, repression is a more complex process, and many more factors (such as nucleosome positioning, nucleosome sliding, histone methylation, and other proteins such as PRC or HP1, etc) may be involved in repression. While compaction is often associated with repressed chromatin (heterochromatin), recent studies have shown that heterochromatin fibers are highly diverse, and compaction alone may not be the criteria for repression (eg. see Spracklin et al. Nat. Struct. Mol. Biol. 30, 38-51 (2023).). In this light, I would recommend slightly modifying the title to say, "TF binding-mediated compaction can help in gene locus repression" or something similar.

      Yes! We completely agree that gene repression is a very complex phenomenon that involves many factors that we are approaching by modeling starting from the simplest strategy. Thus, we have changed the subtitle to read “TF binding-mediated compaction as possible mechanism of gene locus repression”.

      1. Authors could also present the contact probability versus genomic distance. This may provide some generic features at nucleosome resolution, given the variability in linker length and LH density.

      We thank the reviewer for this suggestion. We have now calculated the contact probability for the EED gene with and without TF binding (Supporting Figure 8). We see that the contact probability corresponding to short range interactions (i ± 2, 3, 4, 5, and 6) is slightly lower for the EED gene upon TF binding. However, a striking increase in the contact probability upon TF binding is seen in the genomic region between 3 and 5 kb, which corresponds to local loop interactions. Thus, TF binding slightly decreases local interactions but increases chromatin loops. Such changes are not observed for the EED system with LH density 0.8 (Supporting Figure 9), further supporting the idea that an increase in LH density hampers the effect of TF binding for the EED gene architecture. <br /> We have now added these results to the manuscript.

      1. Write a short paragraph about the limitations of the model/study. For example, one of the limitations could be that, as of now, it has only the effect of a few proteins, but to predict repression, one may need to incorporate the effect of several proteins.

      We agree with the reviewer that our model is a simple, first-step approach. Nonetheless, even the simplest mathematical model can be enlightening in helping dissect essential factors. Here, our model clearly shows how TF binding location modulates fiber architecture and the interplay between TF binding and other chromatin elements, like linker DNA length, LH density, and histone acetylation. We have now stated in the Discussion section that although limited due to being implicit and not considering other protein partners, our model can provide insights on the regulation of chromatin architecture by protein binding. Future modeling with explicit protein binding or combination of several proteins will further help us understand genome folding regulation.

      1. The radius of gyration of 26 kb chromatin is around ~60nm in this paper. Is there any experimental measurement to compare (approximate order of magnitude)? While I do not know any measurement for Eed gene locus, I am aware of the results in the Boettiger et al. paper from Xiaowei Zhuang lab (Nature 2016). There, they find that the Rg of a 26 kb region is above 100nm. But that is for a different organism, a different set of genes. Also, see Sangram Kadam et al. Nature Communications 14 (1), 4108, 2023.

      Thank you for this suggestion. To the best of our knowledge, there are no radius of gyration measurements for the EED gene. Regarding the two papers you cite, in the paper from Boettiger et al. (1) they determine by microscopy experiments that Rg ∝ 𝐿! where 𝐿 is the genomic length and 𝑐 is 0.37 ± 0.02 for active chromatin (Figure 1d of the paper). In such case, the Rg for a 26 kb region would be 43 ± 9 nm. Considering that these are Drosophila cells, our value of 62 nm is in good agreement with that estimate. Regarding the Kadam et al. paper (2), by coarse grained modeling they find an Rg of around 100 nm for different genes. Considering that the radius of gyration depends on cell type and fiber configuration (see for example (3) for the dependency of Rg on loop number and persistence length), we believe that our measurements in the same ball park as experimental results and other theoretical modeling studies are good indicators of our model’s reasonableness.

      We have added this comparison to the manuscript.

      1. The reason why it is useful to compare some distance measurements (physical dimension) with experiments is the following: The contact map in Hi-C only gives relative contact probabilities. It does not give absolute contact probabilities. To convert a Hi-C map into a physical distance, one requires comparison with some experimentally measured 3D distance. The radius of gyration is an ideal quantity to compare. From my experience, the contact probability is often much smaller than 1, suggesting that the chromatin is more expanded. But this could be due to the effect of many other proteins in vivo and the crowding, etc. I do not expect this work to incorporate all those effects. However, it may be useful to make a comment about it in the manuscript.

      Thank you. We have added to the discussion a comment on our first-generation model of TF binding to chromatin and the neglect of many associated protein and RNA cofactors that certainly influence chromosome folding and domain formation on higher scales. Some distance measures are also added to the Results as mentioned above.

      References

      1. Boettiger,A.N., Bintu,B., Moffitt,J.R., Wang,S., Beliveau,B.J., Fudenberg,G., Imakaev,M., Mirny,L.A., Wu,C. and Zhuang,X. (2016) Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature, 529, 418–422.

      2. Kadam,S., Kumari,K., Manivannan,V., Dutta,S., Mitra,M.K. and Padinhateeri,R. (2023) Predicting scale-dependent chromatin polymer properties from systematic coarsegraining. Nat. Commun., 14, 4108.

      3. Wachsmuth,M., Knoch,T.A. and Rippe,K. (2016) Dynamic properties of independent chromatin domains measured by correlation spectroscopy in living cells. Epigenetics Chromatin, 9, 57.

    2. eLife assessment

      In this important study, chromatin is simulated as a polymer at the scale of genes, and the 3D organization of chromatin is analyzed at nucleosome resolution. There is convincing evidence for the emergence of chromatin microdomains due to the action of transcription factors, based on the simulation incorporating well-known biophysical properties of DNA, of nucleosomes, of linker histones, and of the transcription factor pair Myc:Max, as well as considering how the 3D organization of chromatin results from bending and looping of DNA. The work greatly improves our understanding of how the joint action of transcription factors and chromatin features affects chromatin structure and accessibility, which is of interest to anyone studying gene regulation.

    3. Reviewer #1 (Public Review):

      In this study, authors performed multiple sets of mesoscale chromatin simulations at nucleosome resolution to study the effects of TF binding on chromatin structures. Through simulations at various conditions, authors performed systemically analysis to investigate how linker histone, tail acetylation, and linker DNA length can operate together with TFs to regulate chromatin architecture. Using gene Eed as one example, authors found that binding of Myc:Max could repress the gene expression by increasing fiber folding and compaction and this repression can be reversed by the linker histone. Understanding how transcription factors bind to regulatory DNA elements and modulate chromatin structure and accessibility is an essential question in epigenetics. Through modelling of TF binding to chromatin structures at nucleosome levels, authors demonstrated that TF binding could create microdomains that are visible in the ensemble-based contact maps and short DNA linkers prevent the formation microdomains. It has also been shown that tail acetylation and TF binding have opposite effects on chromatin compaction and linker histone can compete for the linker DNA with TF binding to impair the effect of TF binding. This study improves our knowledge on how TFs collaborate with different epigenetic marks and chromatin features to regulate chromatin structure and accessibility, which will be of broad interest to the community.

    4. Reviewer #2 (Public Review):

      In this paper, Portillo-Ledesma et al. study chromatin organization in the length scale of a gene, simulating the polymer at nucleosome resolution. The authors have presented an extensive simulation study with an excellent model of chromatin. The model has linker DNA and nucleosomes with all relevant interactions (electrostatics, tails, etc). Authors simulate 10 to 26 kb chromatin with varying linker lengths, linker histones (LH), and acetylated tails. The authors then study the effect of a transcription factor (TF) Myc: Max binding. The critical physical feature of the TF in the model is that it binds to the linker region and bends the DNA to make loops/intra-chromatin contacts. Authors systematically investigate the interplay between different variables such as linker DNA length, LH density, and the TF concentration in determining chromatin compaction and 3D organization.

      The manuscript is well-written and is a relevant study with many useful results. The biggest strength of the work is the fact that the authors start with a relevant model that incorporates well-known biophysical properties of DNA, nucleosomes, linker histones, and the transcription factor Myc:Max. One of the novel results is the demonstration of how linker lengths play an important role in chromatin compaction (measured by computing packing ratio) in the presence of DNA-bending TFs. As the TF concentration increases, chromatin with short linker lengths does not compact much (only a small change in packing ratio). If the linker lengths are long, a higher percentage of TFs leads to an increase in packing ratio (higher compaction). Authors further show that TFs are able to compact Life-like chromatin fiber with linker length taken from a realistic distribution. The authors compute inter-nucleosomal contact maps from their simulated configurations and show that the map has features similar to what is observed in Hi-C/Micro-C experiments. Authors study the compaction of the Eed gene locus and show that TF binding leads to the formation of small domains known as micro-domains. Authors have predicted many relevant and testable quantities. Many of the results agree with known experiments like the formation of the micro-domains. Hence, the conclusions made in this study are justified - they follow from the simulation results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have only a few very minor suggestions for improvement.

      • the text repeatedly uses the terms "central nervous system" and "enteric nervous system", which are not in standard use in the field. These terms are not defined until the bottom of p. 12 even though they are used earlier. It would be useful for the authors to explicitly describe their definitions of these terms earlier in the paper.

      Fixed.

      • the inclusion of four pre-trained models is a powerful and useful aspect of WormPsyQi. Would it be possible to develop a simple tool that, when given the user's images, could recommend which of the four models would be most appropriate?

      We appreciate the reviewer for bringing this up. To address this, we have now added an additional function in the pipeline to test all pre-trained models on representative input images. Before processing an entire dataset, users can view all segmentation results for images in Fiji to assess which model performed best, judged by the user. The GUI, running guide document, and manuscript have been modified accordingly.

      In addition, we would like to emphasize that the pre-trained models were developed by iterative analyses of many reporters, often with multiple rounds of parameter tuning; the results were validated post hoc to choose the optimal model for each reporter, and we have listed this information in Supplemental Table 1 to inform the choice of the pre-trained model for commonly used reporter types.

      • On p. 11 (and elsewhere), the differences in the performance of WormPsyQi and human experimenters are called "statistically insignificant". This statement is not particularly informative (absence of evidence is not evidence of absence). Can the authors provide a more rigorous analysis here - or provide an estimate of the typical effect size of the machine-vs-human difference?

      To address this, we have included additional analysis in Figure 2 – figure supplement 3. For two reporters - I5 GFP::CLA-1 and M4 GFP::RAB-3 - we compare WormPsyQi vs. labelers and inter-labeler puncta quantification. A high Pearson correlation coefficient (r2) reflects greater correspondence between two independent scoring methods. We chose these two test cases to demonstrate that the machine-vs-human effect size is reporter-dependent. For I5, where the CLA-1 signal is very discrete and S/N ratio is high, the discrepancy between WormPsyQi, labeler 1, and labeler 2 is minimal (r2=0.735); moreover, scoring correspondence depends on the labeler (r2=0.642 and 0.942, respectively). In other words, WormPsyQi mimics some labelers better than others, which is to be expected. For M4, where the RAB-3 signal is diffuse and synapse density is high in the ROI, the inter-labeler discrepancy is high (r2=0.083) and WormPsyQi vs labeler (1 or 2) discrepancy is slightly reduced (r2=0.322 and 0.116, respectively). The problematic regions for the M4 RAB-3 reporter are emphasized in Figure 6 - figure supplement 1A. Overall, the additional analysis suggests that the effect size is contingent on the reporter type and image quality, and importantly for scoring difficult strains WormPsyQi may average out inter-labeler scoring variability.

      • p. 12: "Again, relying on alternative reporters where possible..." This is an incomplete sentence - are some words missing?

      Edited.

      Reviewer #2 (Recommendations For The Authors):

      1. The authors effectively validated the sexually dimorphic synaptic connectivity by comparing the synapse puncta numbers of PHB>AVA, PHA>AVG, PHB>AVG, and ADL>AVA. However, these differences appear to be quite robust. It would be beneficial for the authors to test whether WormPsyQi can detect more subtle changes at the synapses, such as 10-20% changes in puncta number and fluorescence intensity.

      While the dimorphic strains were used to first validate WormPsyQi based on the ground truth of very well-characterized reporters, the reviewer reasonably asks whether our pipeline can pick up on more subtle differences. To address this, we have now included an additional figure (Figure 9 – figure supplement 2), where we performed pairwise comparisons between L4 and adult timepoints for the reporter M3 GFP::RAB-3. As reflected in panels A and C, although the difference between puncta number and mean intensity between L4 and adult is marginal (22% increase in puncta number and 13% increase in mean intensity from L4 to adult), WormPsyQi can pick it up as statistically significant.

      1. On page 10, the authors mentioned that "cell-specific RAB-3 reporters have a more diffuse synaptic signal compared to the punctate signal in CLA-1 reporters for the same neuron, as shown for the neuron pair ASK (Figure 4 -figure supplement 1B, C)". It is important to note that in this case, the reporter gene expressing RAB-3 is part of an extrachromosomal array, whereas the reporter gene expressing CLA-1 is integrated into the chromosome. It's possible that the observed difference in pattern may arise from variations in the transgenic strategies employed.

      To emphasize the difference in puncta features inherent to the reporter type, we have now added WormPsyQi segmentation results for ASK CLA-1 extrachromosomal reporter (otEx7455) next to the ASK CLA-1 integrant (otIs789) and ASK RAB-3 reporter (otEx7231) in Figure 4 – figure supplement 1C. Importantly, otEx7455 was integrated to generate otIs789, so they belong to the same transgenic line. Literature shows that RAB-3 and CLA-1 have different localization patterns and corresponding functions at presynaptic specializations, and this is qualitatively and quantitatively shown by the significant difference in puncta area size between RAB-3 and both CLA-1 reporters, i.e., both CLA-1 reporters have smaller, discrete puncta compared to RAB-3 (Figure 4 – figure supplement 1C). Quantitatively, in the case of ASK - where the synapse density is sparse enough that even diffuse RAB-3 puncta can be segmented without confounding adjacent puncta – overall puncta number between otEx7231 and otIs789 are similar. However, RAB-3 signal is diffuse and this poses quantification problems in cases where the synapse density is higher (e.g. AIB, SAA in Figure 4 – figure supplement 1D) and WormPsyQi fails to score puncta in these reporters since the signal is not punctate. As far as integrated vs. extrachromosomal reporters go, the reviewer is right in pointing out that some differences may be stemming from reporter type as our additional analysis between otIs789 and otEx7455 indeed shows fewer puncta in the latter owing to variable expressivity.

      1. The authors mentioned that having a cytoplasmic reporter in the background of the synaptic reporter enhanced performance. It would be more informative to provide comparative results with and without cytoplasmic reporters, particularly for scenarios involving dim signals or densely distributed signals.

      The presence of a cytoplasmic marker is critical in two specific scenarios: 1) images where the S/N ratio is poor, and 2) when the image S/N ratio is good, but the ROI is large, which would make the image processing computationally expensive.

      To demonstrate the first scenario, we have included an additional panel in Figure 4 – figure supplement 1(B) to show how WormPsyQi performs on the PHB>AVA GRASP reporter with and without the channel having cytoplasmic marker. The original image was processed as-is in the former case with both the synaptic marker in green and cytoplasmic marker in red; for comparison, only the green channel having synaptic marker was used to simulate a situation where the strain does not have a cytoplasmic marker. As shown in the figure, in the presence of background autofluorescence signal from the gut (which can be easily confounded with GRASP puncta depending on the worm’s orientation), WormPsyQi quantified GRASP puncta much more robustly with the cytoplasmic label; without the cytoplasmic marker, gut puncta are incorrectly segmented as synapses (highlighted with red arrows) while some dim synaptic puncta are not picked up (highlighted with yellow arrows).

      To demonstrate the second scenario, we now highlight the case of ASK CLA-1 in Figure 2 - figure supplement 4E. Additionally, we have emphasized in the manuscript that in cases where the S/N ratio is good and the image is restricted to a small ROI, WormPsyQi will perform well even in the absence of a cytoplasmic marker. This is equally important to note as having a specific cytoplasmic marker in the background may not always be feasible and, in fact, if the cytoplasmic marker is discontinuous or dim relative to puncta signal, using a suboptimal neurite mask for synapse segmentation would result in undercounting synapses.

      1. On page 12, the author stated "We also note that in several cases, GRASP quantification differed from EM scoring". However, the EM scoring is primarily based on a single sample, making it challenging to conduct a statistical analysis for the purpose of comparison.

      This is correct and is indeed a limitation of EM for this type of analysis. We have now reworded this sentence (page 14) to emphasize the reviewer’s point, and it is also elaborated further in the limitations section.

      1. In Figure 6F, the discrepancy between WormPsyQi and human quantification in the analysis of RAB-3 is observed. The author stated that "the RAB-3 signal was too diffuse to resolve all puncta". To better illustrate this discrepancy, it would be beneficial to include images highlighting the puncta that WormPsyQi cannot score, providing direct evidence that diffusing signals are not able to automatically detectable.

      To highlight puncta that were not segmented by WormPsyQi but were successfully scored manually, we have included arrows in Figure 6. In addition, for reporter M4p::GFP::RAB-3, we have included magnified insets in Figure 6 - figure supplement 1A to highlight the region where human annotator scores more puncta than WormPsyQi owing to the high synapse density. In future implementations, additional functionality can be built for separating these merged puncta into instances based on geometrical features such as shape and intensity contour.

      1. In Figure 9 S1D, the results from WormPsyQi and the manual are totally different. To address this notable discrepancy, the authors should highlight and illustrate the areas of discrepancy in the images. This visual representation can assist future users in identifying signal types that may not be well-suited for WormPsyQi analysis and inspire the development of new strategies to tackle such challenges.

      This is now addressed in additional figure panels in Figure 4 – figure supplement 1B and Figure 6 - figure supplement 1A.

      Reviewer #3 (Recommendations For The Authors):

      I found the comparison between manual quantification and WormPsyQi-based quantification to be very informative. In my opinion, quantifying the number of puncta is not the most tedious/difficult quantification even when done manually. Would the authors be able to include manual-WormPsyQi comparison for more time-consuming and potentially more prone to human error/bias quantifications such as puncta size or distribution patterns using a few markers with some inter/intra animal variabilities?

      To address this point, we have now included an additional figure supplement to Figure 2 (Figure 2 – figure supplement 4). We focused on the ASK GFP::CLA-1 reporter and had two human annotators manually label the masks of puncta for each worm by scanning Z-stacks and drawing all pixels belonging to each puncta in Fiji, which were then processed by WormPsyQi’s quantification pipeline to score puncta number, volume, and distribution. We also included a comparison of overall image processing time for each annotator and WormPsyQi. For features analyzed, the difference between WormPsyQi and human annotators for ASK CLA-1 is not statistically significant for multiple puncta features. Importantly, WormPsyQi reduces overall processing time by at least an order of magnitude, and while this is already advantageous for counting puncta, it is especially useful for other important puncta features since a) they may not be easily discernible, and b) it is extremely laborious to quantify them manually in large datasets when pixel-wise labels are required.

      The authors listed minimum human errors and biases as one of the benefits of WormPsyQi. For the markers with discrepancies in quantifications between human and WormPsyQi, have the authors encountered or considered human errors/biases as potential reasons for such discrepancies?

      This is the same point brought up by reviewer 1. We added Figure 2- figure supplement 3 to compare WormPsyQi to different human labelers, and show that because human labels can introduce systematic bias, WormPsyQi reduces such bias by scoring images using the same metric.

      The authors noted that WormPsyQi would be useful for comparing different genotypes/environments. Some mutants have known changes in synapse patterning/number. It would be helpful if the authors could validate WormPsyQi using some of the mutants with known synapse defects. For instance, zig-10 mutant increases the cholinergic synapse density just by a bit (Cherra and Jin, Neuron 2016), and nlr-1 mutant disrupts punctated localization of UNC-9 gap junction in the nerve ring (Meng and Yan, Neuron 2020), which could only be detectable by experts' eyes. It would be interesting to see if WormPsyQi picks up such subtle phenotypes.

      We agree that our pipeline would need to be tested in multiple paradigms to test its performance on detecting additional subtle phenotypes. In the context of this paper, we note that the developmental analysis of puncta in Figure 8 was performed to validate the ground truth from previous EM-based analyses (Witvliet et al., 2021), albeit the latter was limited by sample size. We extended this developmental analysis to the pharyngeal reporters, and in some cases the difference across timepoints was marginal (as emphasized by additional Figure 9 - figure supplement 2), but still detected by WormPsyQi. Lastly, our synapse localization analysis in Figure 10 assigns the probability of finding a synapse at a particular location along a neurite, which is not easily discernible by manual scoring.

      One of the benefits of the automated data analysis program is to be able to notice the differences you do not expect. For example, there are situations where you feel that in certain genotypes there is something different from wild type with their synapses but you can't tell what's different from wild type. In such cases, you may not know what to quantify. I think it would be beneficial if there were more parameters to be included in the default qualifications such as puncta number/size/intensity/distributions in the pipeline, so that the users may find unexpected phenotypes from one of the default quantifications.

      We apologize if this was not clearer in the manuscript where we first describe the pipeline in detail. To clarify, the output of WormPsyQi is a CSV file which includes several quantitative features, such as mean/max/min fluorescence intensity, puncta volume, and position. While most of our analyses are focused on puncta count, the user can perform downstream statistical analyses on all additional features scored to infer which features are most significantly variable across conditions. To make this clearer, we have elaborated the text when we first describe our pipeline, and along with the new Figure 2 - figure supplement 4, we hope that this point is clearer now.

      In addition, most proof-of-principle analysis we performed was focused on an ROI where we expect the synapses to localize. In practice, the user can input images and perform quantification across the entire image without biasing toward an ROI (this can be done in the GUI synapse corrector window) to also evaluate synaptic changes in regions outside the usual ROI.

      The authors stated that WormPsyQi could mitigate the problems stemming from scoring images with low signal-to-noise ratio or in regions with high background autofluorescence, laboriousness of scoring large datasets, and inter-dataset variability. Other than the 'laboriousness of scoring large datasets' it appeared to me that WormPsyQi does not do better than manual quantifications, especially inter-dataset variability, as the authors noted variability among the transgenes as one of the limitations of the toolkits. If two datasets are taken with completely different setups such as two independent arrays taken with two distinct confocal microscopes, would WormPsyQi make these two datasets comparable?

      We have included additional figure supplements to address the reviewer’s point. A significant advantage WormPsyQi offers over manual scoring is that it provides a standardized method of quantifying synapse features. As shown in Figure 2 – figure supplement 3, human labelers can introduce systematic bias (e.g. some over count puncta, while some undercount). In addition, while puncta number may be relatively easy to quantify, especially in a high-quality dataset, more subtle puncta features such as size, intensity, and distribution are much more laborious to quantify and require a priori knowledge of signal localization (Figure 2 – figure supplement 4, Figure 10). Altogether, our pipeline facilitates multiple measurements while also enabling robust quantification in hard-to-score cases such as the example shown for PHB>AVA reporter (Figure 4 - figure supplement 1B).

      Minor comments:

      Limitations are not quite specific to this work but those are general limitations to the concatemeric trans genes and fluorescently labeled synaptic proteins. I'd appreciate discussing specific limitations to WormPsyQi related to image acquisitions. For instance, for neurons with 3D structures would WormPsyQi be able to handle z-stacks closer to coverslip and stacks that are deeper side in a similar manner? Would the users need to be aware of such limitations when comparing different genotypes?

      To address the reviewer’s comment, we have elaborated the last paragraph in the limitations section to explicitly discuss where the user should exercise caution. The reviewer reasonably points out that the fluorescent signal away from the cover slip is typically dimmer, and neurite masking in this case is indeed compromised if dim to start with. In such cases, we recommend that the user either performs some preprocessing such as deconvolution, denoising, or contrast enhancement to boost the neurite signal, or segment synapses without the neurite mask if the puncta signal is brighter than that of the cytoplasmic marker. We hope that our additional figure supplements will clarify that WormPsyQi’s performance is contingent on reporter type and image quality, thus making it easier for the user to discern where automated quantification falls short and alternative reporters should be explored. In general, if puncta are not discernible to the user due to very poor S/N ratio, for instance, we do not recommend using WormPsyQi to process such datasets; this will be manifest in the results of the new “test all models” feature we added in the revised version.

      Some Rab-3 fusion proteins are described as RAB-3::GFP(BFP). Do these represent the C-terminal fusion of the fluorescent proteins? RAB-3 is a small GTPase with a lipid modification site at its C-terminus essential for its localization and function. Is it possible that the diffuse signal of some RAB-3 markers is caused by c-terminal fusion of the fluorescent protein?

      While we do have reporters with N- and C-terminal RAB-3 fusions for different neurons, we do not have both for the same neuron to perform a fair comparison. However, as noted in response to a previous comment by reviewer 2, RAB-3 and CLA-1 have distinct localization patterns at the synapse and this aligns with their distinct functions: while RAB-3 localizes at synaptic vesicles, CLA-1 is an active zone protein required for synaptic vesicle clustering. Accordingly, we have observed diffuse RAB-3 signal in reporters irrespective of where the protein is tagged, and while this is not problematic for ROIs with a low synapse density, it confounds quantification in synapse-dense regions. In contrast, CLA-1 puncta are typically easier to quantify more discretely, which is particularly relevant for features such synapse distribution, size, and intensity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiate L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity, and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity, and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.<br /> Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree and have updated the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree and have updated the figure annotation.

      ● Bsh role in L4/L5 cell fate: o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently, we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a follow-up manuscript on LPC heterogeneity, but those experiments have just barely been started.

      ● Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from the majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We have updated this explanation in the text.

      ● Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we have made that change.

      ● Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We have rephrased it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      ● Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We have updated Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ● Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we have updated it.

      ● It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-213).

      ● Dip-β regulation:

      ● Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-BshsgRNAs) is that it effectively removed Bsh expression from the majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We have updated this explanation in the text.

      ● Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We have added this to the text.

      ● Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We have included this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same PrimarySecondary selector activation logic.

      That is a great point, thank you! We have included this in the discussion.

    2. eLife assessment

      This paper, offering insights into the mechanisms of neuronal cell type diversification, provides important findings that have theoretical or practical implications beyond a single subfield. The data are compelling and provide evidence that features methods, data and analyses that are more rigorous than the current state-of-the-art.

    3. Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechansism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organism to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

    4. Reviewer #2 (Public Review):

      Summary:<br /> In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors.

      The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Strengths:<br /> Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such case, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Weaknesses:<br /> ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      ● Bsh role in L4/L5 cell fate:<br /> o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.<br /> o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.<br /> o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.<br /> o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.<br /> o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      ● DamID and Bsh binding profile:<br /> ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.<br /> ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      ● Dip-β regulation:<br /> ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.<br /> ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.<br /> ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

    1. eLife assessment

      This study presents a useful deep learning-based inter-protein contact prediction method named PLMGraph-Inter which combines protein language models and geometric graphs. The evidence supporting the claims of the authors is solid, although it could have information leakage between training and test sets, and although more emphasis should be given to predictions starting from unbound monomer structures. The authors show that their approach may be useful in some cases where AlphaFold-Multimer performs poorly. This work will be of interest to researchers working on protein complex structure prediction, particularly when accurate experimental structures are available for one or both of the monomers in isolation.

    2. Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      Weaknesses:

      My biggest issue with this work is the evaluations made using *bound* monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, *not* Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      Besides, in cases where *any* experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

    3. Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      1) The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      2) Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

    1. eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the fundamental basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that K+ binding on the intraceullular side catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The mechansitic consequences of K+ binding could either facilitate LeuT re-setting and/or prevent the rebinding and possible efflux of Na+ and substrate.

    1. eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:<br /> I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.

      Points of improvement<br /> 1. Noise<br /> The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper.<br /> Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.

      2. Cross-correlation analyses<br /> I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect?

      A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode?

      3. Clearer definitions of variables, constructs, and visualisations<br /> The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.

      General Remarks<br /> In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:<br /> The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.

      Weaknesses:<br /> Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.

    4. Reviewer #3 (Public Review):

      Summary:<br /> Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.

      Strengths:<br /> The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.

      Weaknesses:

      The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.

    1. eLife assessment

      This paper explores how Notch activity acts together with homeodomain transcription Bsh factors to establish distinct cell fates (L4 vs L5) in the visual system of Drosophila. The findings are important and have theoretical or practical implications beyond a single subfield. The methods, data, and analyses are compelling and support the claims with only minor weaknesses.

    2. Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activate downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Strengths:<br /> The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This support their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Weaknesses:<br /> Differential Notch activity in L4 and L5:<br /> ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.<br /> ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.<br /> ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.<br /> ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Notch role in establishing L4 vs L5 fates:<br /> ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.<br /> ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.<br /> ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      L4-to-L3 conversion in the absence of Bsh<br /> ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.<br /> ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      Different chromatin landscape in L4 and L5 neurons<br /> ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.<br /> e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

    1. eLife assessment

      This useful study presents a possible solution for a significant problem - that of draining vein sensitivity in functional MRI, which complicates the interpretability of laminar-fMRI results. The addition of a low diffusion-weighted gradient is presented to remove the draining vein signal and obtain functional responses with higher spatial fidelity. However, the strength of the evidence is inadequate, most tests appear to have been done only in a single subject. Significance thresholds in presented maps are very low and most cortical depth-dependent response profiles do not differ from baseline, even in the BOLD data shown as reference. Curiously, even BOLD group data fails to replicate the well-known pattern of draining towards the cortical surface.

    2. Reviewer #1 (Public Review):

      Summary:

      This study aims to provide imaging methods for users of the field of human layer-fMRI. This is an emerging field with 240 papers published so far. Different than implied in the manuscript, 3T is well represented among those papers. E.g. see the papers below that are not cited in the manuscript. Thus, the claim on the impact of developing 3T methodology for wider dissemination is not justified. Specifically, because some of the previous papers perform whole brain layer-fMRI (also at 3T) in more efficient, and more established procedures.

      The authors implemented a sequence with lots of nice features. Including their own SMS EPI, diffusion bipolar pulses, eye-saturation bands, and they built their own reconstruction around it. This is not trivial. Only a few labs around the world have this level of engineering expertise. I applaud this technical achievement. However, I doubt that any of this is the right tool for layer-fMRI, nor does it represent an advancement for the field. In the thermal noise dominated regime of sub-millimeter fMRI (especially at 3T), it is established to use 3D readouts over 2D (SMS) readouts. While it is not trivial to implement SMS, the vendor implementations (as well as the CMRR and MGH implementations) are most widely applied across the majority of current fMRI studies already. The author's work on this does not serve any previous shortcomings in the field.

      The mechanism to use bi-polar gradients to increase the localization specificity is doubtful to me. In my understanding, killing the intra-vascular BOLD should make it less specific. Also, the empirical data do not suggest a higher localization specificity to me.

      Embedding this work in the literature of previous methods is incomplete. Recent trends of vessel signal manipulation with ABC or VAPER are not mentioned. Comparisons with VASO are outdated and incorrect.

      The reproducibility of the methods and the result is doubtful (see below).

      I don't think that this manuscript is in the top 50% of the 240 layer-fmri papers out there.

      3T layer-fMRI papers that are not cited:<br /> Taso, M., Munsch, F., Zhao, L., Alsop, D.C., 2021. Regional and depth-dependence of cortical blood-flow assessed with high-resolution Arterial Spin Labeling (ASL). Journal of Cerebral Blood Flow and Metabolism. https://doi.org/10.1177/0271678X20982382

      Wu, P.Y., Chu, Y.H., Lin, J.F.L., Kuo, W.J., Lin, F.H., 2018. Feature-dependent intrinsic functional connectivity across cortical depths in the human auditory cortex. Scientific Reports 8, 1-14. https://doi.org/10.1038/s41598-018-31292-x

      Lifshits, S., Tomer, O., Shamir, I., Barazany, D., Tsarfaty, G., Rosset, S., Assaf, Y., 2018. Resolution considerations in imaging of the cortical layers. NeuroImage 164, 112-120. https://doi.org/10.1016/j.neuroimage.2017.02.086

      Puckett, A.M., Aquino, K.M., Robinson, P.A., Breakspear, M., Schira, M.M., 2016. The spatiotemporal hemodynamic response function for depth-dependent functional imaging of human cortex. NeuroImage 139, 240-248. https://doi.org/10.1016/j.neuroimage.2016.06.019

      Olman, C.A., Inati, S., Heeger, D.J., 2007. The effect of large veins on spatial localization with GE BOLD at 3 T: Displacement, not blurring. NeuroImage 34, 1126-1135. https://doi.org/10.1016/j.neuroimage.2006.08.045

      Ress, D., Glover, G.H., Liu, J., Wandell, B., 2007. Laminar profiles of functional activity in the human brain. NeuroImage 34, 74-84. https://doi.org/10.1016/j.neuroimage.2006.08.020

      Huber, L., Kronbichler, L., Stirnberg, R., Ehses, P., Stocker, T., Fernández-Cabello, S., Poser, B.A., Kronbichler, M., 2023. Evaluating the capabilities and challenges of layer-fMRI VASO at 3T. Aperture Neuro 3. https://doi.org/10.52294/001c.85117

      Scheeringa, R., Bonnefond, M., van Mourik, T., Jensen, O., Norris, D.G., Koopmans, P.J., 2022. Relating neural oscillations to laminar fMRI connectivity in visual cortex. Cerebral Cortex. https://doi.org/10.1093/cercor/bhac154

      Strengths:

      See above. The authors developed their own SMS sequence with many features. This is important to the field. And does not leave sequence development work to view isolated monopoly labs. This work democratises SMS.<br /> The questions addressed here are of high relevance to the field: getting tools with good sensitivity, user-friendly applicability, and locally specific brain activity mapping is an important topic in the field of layer-fMRI.

      Weaknesses:

      1. I feel the authors need to justify why flow-crushing helps localization specificity. There is an entire family of recent papers that aim to achieve higher localization specificity by doing the exact opposite. Namely, MT or ABC fRMRI aims to increase the localization specificity by highlighting the intravascular BOLD by means of suppressing non-flowing tissue. To name a few:

      Priovoulos, N., de Oliveira, I.A.F., Poser, B.A., Norris, D.G., van der Zwaag, W., 2023. Combining arterial blood contrast with BOLD increases fMRI intracortical contrast. Human Brain Mapping hbm.26227. https://doi.org/10.1002/hbm.26227.

      Pfaffenrot, V., Koopmans, P.J., 2022. Magnetization Transfer weighted laminar fMRI with multi-echo FLASH. NeuroImage 119725. https://doi.org/10.1016/j.neuroimage.2022.119725

      Schulz, J., Fazal, Z., Metere, R., Marques, J.P., Norris, D.G., 2020. Arterial blood contrast ( ABC ) enabled by magnetization transfer ( MT ): a novel MRI technique for enhancing the measurement of brain activation changes. bioRxiv. https://doi.org/10.1101/2020.05.20.106666

      Based on this literature, it seems that the proposed method will make the vein problem worse, not better. The authors could make it clearer how they reason that making GE-BOLD signals more extra-vascular weighted should help to reduce large vein effects.

      The empirical evidence for the claim that flow crushing helps with the localization specificity should be made clearer. The response magnitude with and without flow crushing looks pretty much identical to me (see Fig, 6d).<br /> It's unclear to me what to look for in Fig. 5. I cannot discern any layer patterns in these maps. It's too noisy. The two maps of TE=43ms look like identical copies from each other. Maybe an editorial error?

      The authors discuss bipolar crushing with respect to SE-BOLD where it has been previously applied. For SE-BOLD at UHF, a substantial portion of the vein signal comes from the intravascular compartment. So I agree that for SE-BOLD, it makes sense to crush the intravascular signal. For GE-BOLD however, this reasoning does not hold. For GE-BOLD (even at 3T), most of the vein signal comes from extravascular dephasing around large unspecific veins, and the bipolar crushing is not expected to help with this.

      2. The bipolar crushing is limited to one single direction of flow. This introduces a lot of artificial variance across the cortical folding pattern. This is not mentioned in the manuscript. There is an entire family of papers that perform layer-fmri with black-blood imaging that solves this with a 3D contrast preparation (VAPER) that is applied across a longer time period, thus killing the blood signal while it flows across all directions of the vascular tree. Here, the signal cruising is happening with a 2D readout as a "snap-shot" crushing. This does not allow the blood to flow in multiple directions.<br /> VAPER also accounts for BOLD contaminations of larger draining veins by means of a tag-control sampling. The proposed approach here does not account for this contamination.

      Chai, Y., Li, L., Huber, L., Poser, B.A., Bandettini, P.A., 2020. Integrated VASO and perfusion contrast: A new tool for laminar functional MRI. NeuroImage 207, 116358. https://doi.org/10.1016/j.neuroimage.2019.116358

      Chai, Y., Liu, T.T., Marrett, S., Li, L., Khojandi, A., Handwerker, D.A., Alink, A., Muckli, L., Bandettini, P.A., 2021. Topographical and laminar distribution of audiovisual processing within human planum temporale. Progress in Neurobiology 102121. https://doi.org/10.1016/j.pneurobio.2021.102121

      If I would recommend anyone to perform layer-fMRI with blood crushing, it seems that VAPER is the superior approach. The authors could make it clearer why users might want to use the unidirectional crushing instead.

      3. The comparison with VASO is misleading.<br /> The authors claim that previous VASO approaches were limited by TRs of 8.2s. The authors might be advised to check the latest literature of the last years.<br /> Koiso et al. performed whole brain layer-fMRI VASO at 0.8mm at 3.9 seconds (with reliable activation), 2.7 seconds (with unconvincing activation pattern, though), and 2.3 (without activation).<br /> Also, whole brain layer-fMRI BOLD at 0.5mm and 0.7mm has been previously performed by the Juelich group at TRs of 3.5s (their TR definition is 'fishy' though).

      Koiso, K., Müller, A.K., Akamatsu, K., Dresbach, S., Gulban, O.F., Goebel, R., Miyawaki, Y., Poser, B.A., Huber, L., 2023. Acquisition and processing methods of whole-brain layer-fMRI VASO and BOLD: The Kenshu dataset. Aperture Neuro 34. https://doi.org/10.1101/2022.08.19.504502

      Yun, S.D., Pais‐Roldán, P., Palomero‐Gallagher, N., Shah, N.J., 2022. Mapping of whole‐cerebrum resting‐state networks using ultra‐high resolution acquisition protocols. Human Brain Mapping. https://doi.org/10.1002/hbm.25855

      Pais-Roldan, P., Yun, S.D., Palomero-Gallagher, N., Shah, N.J., 2023. Cortical depth-dependent human fMRI of resting-state networks using EPIK. Front. Neurosci. 17, 1151544. https://doi.org/10.3389/fnins.2023.1151544

      The authors are correct that VASO is not advised as a turn-key method for lower brain areas, incl. Hippocampus and subcortex. However, the authors use this word of caution that is intended for inexperienced "users" as a statement that this cannot be performed. This statement is taken out of context. This statement is not from the academic literature. It's advice for the 40+ user base that wants to perform layer-fMRI as a plug-and-play routine tool in neuroscience usage. In fact, sub-millimeter VASO is routinely being performed by MRI-physicists across all brain areas (including deep brain structures, hippocampus etc). E.g. see Koiso et al. and an overview lecture from a layer-fMRI workshop that I had recently attended: https://youtu.be/kzh-nWXd54s?si=hoIJjLLIxFUJ4g20&t=2401

      Thus, the authors could embed this phrasing into the context of their own method that they are proposing in the manuscript. E.g. the authors could state whether they think that their sequence has the potential to be disseminated across sites, considering that it requires slow offline reconstruction in Matlab?<br /> Do the authors think that the results shown in Fig. 6c are suggesting turn-key acquisition of a routine mapping tool? In my humble opinion, it looks like random noise, with most of the activation outside the ROI (in white matter).

      4. The repeatability of the results is questionable.<br /> The authors perform experiments about the robustness of the method (line 620). The corresponding results are not suggesting any robustness to me. In fact, the layer profiles in Fig. 4c vs. Fig 4d are completely opposite. The location of peaks turns into locations of dips and vice versa.<br /> The methods are not described in enough detail to reproduce these results.<br /> The authors mention that their image reconstruction is done "using in-house MATLAB code" (line 634). They do not post a link to github, nor do they say if they share this code.

      It is not trivial to get good phase data for fMRI. The authors do not mention how they perform the respective coil-combination.<br /> No data are shared for reproduction of the analysis.

      5. The application of NODRIC is not validated.<br /> Previous applications of NORDIC at 3T layer-fMRI have resulted in mixed success. When not adjusted for the right SNR regime it can result in artifactual reductions of beta scores, depending on the SNR across layers. The authors could validate their application of NORDIC and confirm that the average layer-profiles are unaffected by the application of NORDIC. Also, the NORDIC version should be explicitly mentioned in the manuscript.

      Akbari, A., Gati, J.S., Zeman, P., Liem, B., Menon, R.S., 2023. Layer Dependence of Monocular and Binocular Responses in Human Ocular Dominance Columns at 7T using VASO and BOLD (preprint). Neuroscience. https://doi.org/10.1101/2023.04.06.535924

      Knudsen, L., Guo, F., Huang, J., Blicher, J.U., Lund, T.E., Zhou, Y., Zhang, P., Yang, Y., 2023. The laminar pattern of proprioceptive activation in human primary motor cortex. bioRxiv. https://doi.org/10.1101/2023.10.29.564658

    3. Reviewer #2 (Public Review):

      This study developed a setup for laminar fMRI at 3T that aimed to get the best from all worlds in terms of brain coverage, temporal resolution, sensitivity to detect functional responses, and spatial specificity. They used a gradient-echo EPI readout to facilitate sensitivity, brain coverage and temporal resolution. The former was additionally boosted by NORDIC denoising and the latter two were further supported by parallel-imaging acceleration both in-plane and across slices. The authors evaluated whether the implementation of velocity-nulling (VN) gradients could mitigate macrovascular bias, known to hamper the laminar specificity of gradient-echo BOLD.

      The setup allows for 0.9 mm isotropic acquisitions with large coverage at a reasonable TR (at least for block designs) and the fMRI results presented here were acquired within practical scan-times of 12-18 minutes. Also, in terms of the availability of the method, it is favorable that it benefits from lower field strength (additional time for VN-gradient implementation, afforded by longer gray matter T2*).

      The well-known double peak feature in M1 during finger tapping was used as a test-bed to evaluate the spatial specificity. They were indeed able to demonstrate two distinct peaks in group-level laminar profiles extracted from M1 during finger tapping, which was largely free from superficial bias. This is rather intriguing as, even at 7T, clear peaks are usually only seen with spatially specific non-BOLD sequences. This is in line with their simple simulations, which nicely illustrated that, in theory, intravascular macrovascular signals should be suppressible with only minimal suppression of microvasculature when small b-values of the VN gradients are employed. However, the authors do not state how ROIs were defined making the validity of this finding unclear; were they defined from independent criteria or were they selected based on the region mostly expressing the double peak, which would clearly be circular? In any case, results are based on a very small sub-region of M1 in a single slice - it would be useful to see the generalizability of superficial-bias-free BOLD responses across a larger portion of M1.

      As repeatedly mentioned by the authors, a laminar fMRI setup must demonstrate adequate functional sensitivity to detect (in this case) BOLD responses. The sensitivity evaluation is unfortunately quite weak. It is mainly based on the argument that significant activation was found in a challenging sub-cortical region (LGN). However, it was a single participant, the activation map was not very convincing, and the demonstration of significant activation after considerable voxel-averaging is inadequate evidence to claim sufficient BOLD sensitivity. How well sensitivity is retained in the presence of VN gradients, high acceleration factors, etc., is therefore unclear. The ability of the setup to obtain meaningful functional connectivity results is reassuring, yet, more elaborate comparison with e.g., the conventional BOLD setup (no VN gradients) is warranted, for example by comparison of tSNR, quantification and comparison of CNR, illustration of unmasked-full-slice activation maps to compare noise-levels, comparison of the across-trial variance in each subject, etc. Furthermore, as NORDIC appears to be a cornerstone to enable submillimeter resolution in this setup at 3T, it is critical to evaluate its impact on the data through comparison with non-denoised data, which is currently lacking.

      The proposed setup might potentially be valuable to the field, which is continuously searching for techniques to achieve laminar specificity in gradient echo EPI acquisitions. Nonetheless, the above considerations need to be tackled to make a convincing case.

    4. Reviewer #3 (Public Review):

      Summary:<br /> The authors are looking for a spatially specific functional brain response to visualise non-invasively with 3T (clinical field strength) MRI. They propose a velocity-nulled weighting to remove the signal from draining veins in a submillimeter multiband acquisition.

      Strengths:<br /> - This manuscript addresses a real need in the cognitive neuroscience community interested in imaging responses in cortical layers in-vivo in humans.<br /> - An additional benefit is the proposed implementation at 3T, a widely available field strength.

      Weaknesses:<br /> - Although the VASO acquisition is discussed in the introduction section, the VN-sequence seems closer to diffusion-weighted functional MRI. The authors should make it more clear to the reader what the differences are, and how results are expected to differ. Generally, it is not so clear why the introduction is so focused on the VASO acquisition (which, curiously, lacks a reference to Lu et al 2013). There are many more alternatives to BOLD-weighted imaging for fMRI. CBF-weighted ASL and GRASE have been around for a while, ABC and double-SE have been proposed more recently.<br /> - The comparison in Figure 2 for different b-values shows % signal changes. However, as the baseline signal changes dramatically with added diffusion weighting, this is rather uninformative. A plot of t-values against cortical depth would be much more insightful.<br /> - Surprisingly, the %-signal change for a b-value of 0 is not significantly different from 0 in the gray matter. This raises some doubts about the task or ROI definition. A finger-tapping task should reliably engage the primary motor cortex, even at 3T, and even in a single participant.<br /> - The BOLD weighted images in Figure 3 show a very clear double-peak pattern. This contradicts the results in Figure 2 and is unexpected given the existing literature on BOLD responses as a function of cortical depth.<br /> - Given that data from Figures 2, 3, and 4 are derived from a single participant each, order and attention affects might have dramatically affected the observed patterns. Especially for Figure 4, neither BOLD nor VN profiles are really different from 0, and without statistical values or inter-subject averaging, these cannot be used to draw conclusions from.<br /> - In Figure 5, a phase regression is added to the data presented in Figure 4. However, for a phase regression to work, there has to be a (macrovascular) response to start with. As none of the responses in Figure 4 are significant for the single participant dataset, phase regression should probably not have been undertaken. In this case, the functional 'responses' appear to increase with phase regression, which is contra-intuitive and deserves an explanation.<br /> - Consistency of responses is indeed expected to increase by a removal of the more variable vascular component. However, the microvascular component is always expected to be smaller than the combination of microvascular+macrovascular responses. Note that the use of %signal changes may obscure this effect somewhat because of the modified baseline. Another expected feature of BOLD profiles containing both micro- and microvasculature is the draining towards the cortical surface. In the profiles shown in Figure 7, this is completely absent. In the group data, no significant responses to the task are shown anywhere in the cortical ribbon.<br /> - Although I'd like to applaud the authors for their ambition with the connectivity analysis, I feel that acquisitions that are so SNR starved as to fail to show a significant response to a motor task should not be used for brain wide directed connectivity analysis.

      The claim of specificity is supported by the observation of the double-peak pattern in the motor cortex, previously shown in multiple non-BOLD studies. However, this same pattern is shown in some of the BOLD weighted data, which seems to suggest that the double-peak pattern is not solely due to the added velocity nulling gradients. In addition, the well-known draining towards the cortical surface is not replicated for the BOLD-weighted data in Figures 3, 4, or 7. This puts some doubt about the data actually having the SNR to draw conclusions about the observed patterns.

    1. eLife assessment

      This important study combines psychophysics, fMRI, and TMS to reveal a causal role of FEF in generating an attention-induced ocular dominance shift, with potential relevance for clinical applications. The evidence supporting the claims of the authors is solid, but the theoretical and mechanistic interpretation of results and experimental approaches need to be strengthened. The work will be of broad interest to perceptual and cognitive neuroscience.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Based on a "dichoptic-background-movie" paradigm that modulates ocular dominance, the present study combines fMRI and TMS to examine the role of the frontoparietal attentional network in ocular dominance shifts. The authors claimed a causal role of FEF in generating the attention-induced ocular dominance shift.

      Strengths:<br /> A combination of fMRI, TMS, and "dichoptic-background-movie" paradigm techniques is used to reveal the causal role of the frontoparietal attentional network in ocular dominance shifts. The conclusions of this paper are mostly well supported by data.

      Weaknesses:<br /> The relationship between eye dominance, eye-based attention shift, and cortical functions remains unclear and merits further delineation. The rationale of the experimental design related to the hemispheric asymmetry in the FEF and other regions should be clarified.

      Theoretically, how the eye-related functions in this area could be achieved, and how it interacts with the ocular representation in V1 warrant further clarification.

    3. Reviewer #2 (Public Review):

      Summary<br /> Song et al investigate the role of the frontal eye field (FEF) and the intraparietal sulcus (IPS) in mediating the shift in ocular dominance (OD) observed after a period of dichoptic stimulation during which attention is selectively directed to one eye. This manipulation has been previously found to transiently shift OD in favor of the unattended eye, similar to the effect of short-term monocular deprivation. To this aim, the authors combine psychophysics, fMRI, and transcranial magnetic stimulation (TMS). In the first experiment, the authors determine the regions of interest (ROIs) based on the responses recorded by fMRI during either dichoptic or binocular stimulation, showing selective recruitment of the right FEF and IPS during the dichoptic condition, in line with the involvement of eye-based attention. In a second experiment, the authors investigate the causal role of these two ROIs in mediating the OD shift observed after a period of dichoptic stimulation by selectively inhibiting with TMS (using continuous theta burst stimulation, cTBS), before the adaptation period (50 min exposure to dichoptic stimulation). They show that, when cTBS is delivered on the FEF, but not the IPS or the vertex, the shift in OD induced by dichoptic stimulation is reduced, indicating a causal involvement of the FEF in mediating this form of short-term plasticity. A third control experiment rules out the possibility that TMS interferes with the OD task (binocular rivalry), rather than with the plasticity mechanisms. From this evidence, the authors conclude that the FEF is one of the areas mediating the OD shift induced by eye-selective attention.

      Strengths<br /> 1. The experimental paradigm is sound and the authors have thoroughly investigated the neural correlates of an interesting form of short-term visual plasticity combining different techniques in an intelligent way.

      2. The results are solid and the appropriate controls have been performed to exclude potential confounds.

      3. The results are very interesting, providing new evidence both about the neural correlates of eye-based attention and the involvement of extra-striate areas in mediating short-term OD plasticity in humans, with potential relevance for clinical applications (especially in the field of amblyopia).

      Weaknesses<br /> 1. Ethics: more details about the ethics need to be included in the manuscript. It is only mentioned for experiment 1 that participants "provided informed consent in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences". (Which version of the Declaration of Helsinki? The latest version requires the pre-registration of the study. The code of the approved protocol together with the code and date of the approval should be provided.) There is no mention of informed consent procedures or ethics approval for the TMS experiments. This is a huge concern, especially for brain stimulation experiments!

      2. Statistics: the methods section should include a sub-section describing in detail all the statistical analyses performed for the study. Moreover, in the results section, statistical details should be added to support the fMRI results. In the current version of the manuscript, the claims are not supported by statistical evidence.

      3. Interpretation of the results: the TMS results are very interesting and convincing regarding the involvement of the FEF in the build-up of the OD shift induced by dichoptic stimulation, however, I am not sure that the authors can claim that this effect is related to eye-based attention, as cTBS has no effect on the blob detection task during dichoptic stimulation. If the FEF were causally involved in eye-based attention, one would expect a change in performance in this task during dichoptic stimulation, perhaps a similar performance for the unattended and attended eye. The authors speculate that the sound could have an additional role in driving eye-based attention, which might explain the lack of effect for the blob discrimination task, however, this hypothesis has not been tested.

      4. Writing: in general, the manuscript is well written, but clarity should be improved in certain sections.

      a. fMRI results: the first sentence is difficult to understand at first read, but it is crucial to understand the results, please reformulate and clarify.

      b. Experiment 3: the rationale for experiment one should be straightforward, without a long premise explaining why it would not be necessary.

      c. Discussion: the language is a bit familiar here and there, a more straightforward style should be preferred (one example: p.19 second paragraph).

      5. Minor: the authors might consider using the term "participant" or "observer" instead of "subject" when referring to the volunteers who participated in the study.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This study studied the neural mechanisms underlying the shift of ocular dominance induced by "dichoptic-backward-movie" adaptation. The study is self-consistent.

      Strengths:<br /> The experimental design is solid and progressive (relationship among three studies), and all of the raised research questions were well answered.

      The logic behind the neural mechanisms is solid.

      The findings regarding the cTMS (especially the position/site can be useful for future medical implications).

      Weaknesses:<br /> Why does the "dichoptic-backward-movie" adaptation matter? This part is severely missing. This kind of adaptation is neither intuitive like the classical (Gbison) visual adaptation, nor practical as adaptation as a research paradigm as well as the fundamental neural mechanism. If this part is not clearly stated and discussed, this study is just self-consistent in terms of its own research question. There are tons of "cool" phenomena in which the neural mechanisms are apparent as "FEF controls vision-attention" but never tested using TMS & fMRI, but we all know that this kind of research is just of incremental implications.

    1. eLife assessment

      This important study presents a detailed investigation of the early development of cardiac and respiratory interoceptive sensitivity in infants aged 3, 9, and 18 months. The evidence supporting the conclusions are solid and based on convincing statistical analyses, despite the limited sample size for the younger and older age groups. This study will be of significant interest to developmental psychologists and neuroscientists working on interoception and its influence on socio-cognitive development.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The authors of this study investigated the development of interoceptive sensitivity in the context of cardiac and respiratory interoception in 3-, 9-, and 18-month-old infants using a combination of both cross-sectional and longitudinal designs. They utilised the cardiac interoception paradigm developed by Maister et al (2017) and also developed a new paradigm to investigate respiratory interoception in infants. The main findings of this research are that 9-month-old infants displayed a preference for stimuli presented synchronously with their own heartbeat and respiration. The authors found less reliable effects in the 18-month-old group, and this was especially true for the respiratory interoceptive data. The authors replicated a visual preference for synchrony over asynchrony for the cardiac domain in 3-month-old infants, while they found inconclusive evidence regarding the respiratory domain. Considering the developmental nature of the study, the authors also investigated the presence of developmental trajectories and associations between the two interoceptive domains. They found evidence for a relationship between cardiac and respiratory interoceptive sensitivity at 18 months only and preliminary evidence for an increase in respiratory interoception between 9 and 18 months.

      Strengths: The conclusions of this paper are mostly well supported by data, and the data analysis procedures are rigorous and well-justified. The main strengths of the paper are:<br /> - A first attempt to explore the association between two different interoceptive domains. How different organ-specific axes of interoception relate to each other is still open and exploring this from a developmental lens can help shed light into possible relationships. The authors have to be commended for developing novel interoceptive tasks aimed at assessing respiratory interoceptive sensitivity in infants and toddlers, and for trying to assess the relationship between cardiac and respiratory interoception across developmental time.<br /> - A thorough justification of the developmental ages selected for the study. The authors provide a rationale behind their choice to examine interoceptive sensitivity at 3, 9, and 18 months of age. These are well justified based on the literature pertaining to self- and social development. Sometimes, I wondered whether explaining the link between these self and social processes and interoception would have been beneficial as a reader not familiar with the topics may miss the point.<br /> - An explanation of the direction of looking behaviour using latent curve analysis. I found this additional analysis extremely helpful in providing a better understanding of the data based on previous research and analytical choices (though see comment under weaknesses). As the authors explain in the manuscript, it is often difficult to interpret the direction of infant-looking behaviour as novelty and familiarity preferences can also be driven by hidden confounders (e.g. task difficulty). The authors provide some evidence that analytical choices can explain some of these effects. Beyond the field of interoception, these findings will be relevant to development psychologists and will inform future studies using looking time as a measure of infants' ability to discriminate among stimuli.<br /> - The use of simulation analysis to account for the small sample size. The authors acknowledge that some of the effects reported in their study could be explained by a small sample size (i.e. the 3-month-olds and 18-month-olds data). Using a simulation approach, the authors try to overcome some of these limitations and provide convincing evidence of interoceptive abilities in infancy and toddlerhood (but see also my next point).

      Weaknesses:<br /> - The authors should carefully address the potential confounding of not counterbalancing the conditions of the first trial in both interoceptive tasks for the 9-month and 18-month age groups. The results of these groups could indeed be driven by having seen the synchronous trial first.<br /> - The conclusion that cardiac interoception remains stable across infancy is not fully warranted by the data. Given the small sample size of 18-month-old toddlers included in the final analyses, it might be misleading to state this without including the caveat that the study may be underpowered. In other words, the small sample size could explain the direction of the results for this age group.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This study by Tünte et al. investigated the development of interoceptive sensitivity in the first year of life, focusing specifically on cardiac and respiratory sensitivity in infants aged 3, 9, and 18 months. The research employed a previously developed experimental paradigm in the cardiac domain and adapted it for a novel paradigm in the respiratory domain. This approach assessed infants' cardiac and respiratory sensitivity based on their preferential-looking behavior toward visuo-auditory stimuli displayed on a monitor, which moved either in sync or out of sync with the infants' own heartbeats or breathing. The results for the cardiac domain showed that infants, across all age groups, preferred stimuli moving synchronously rather than asynchronously with their heartbeat, suggesting the presence of cardiac sensitivity as early as 3 months of age. However, it is noteworthy that the direction of this preference contradicts a previous study, which found that 5-month-old infants looked longer at stimuli moving asynchronously, rather than synchronously, with their heartbeat (Maister et al., 2017). In the respiratory domain, only the younger age group(s) of infants showed a preference for stimuli presented synchronously with their breathing, unlike the 18-month-olds. The authors conducted various statistical analyses to thoroughly examine the obtained data, an effort that provides deeper insights and is valuable for future research in this field.

      Strengths:<br /> Few studies have explored the early development of interoception, making the replication of the original study by Maister et al. (2017) particularly valuable. Beyond replication, this study expands the investigation into the respiratory domain, significantly enhancing our understanding of interoceptive development. The provision of longitudinal and cross-sectional data from infants at 3, 9, and 18 months of age is instrumental in understanding their developmental trajectory.

      Weaknesses:<br /> (1) My primary concern is that this study did not counterbalance the conditions of the first trial in both iBEAT and iBREATH tests for the 9-month and 18-month age groups. In these tests, the first trial invariably involved a synchronous stimulus. I believe that the order of trials can significantly influence an infant's looking duration, and this oversight could potentially impact the results, especially where a marked preference for synchronous stimuli was observed among infants.<br /> (2) The analysis indicated that the study's sample size was too small to effectively assess the effects within each age group. This limitation fundamentally undermines the reliability of the findings.<br /> (3) The authors attribute the infants' preferential-looking behavior solely to the effects of familiarity and novelty. However, the meaning of "familiarity" in relation to external stimuli moving in sync with an infant's heartbeat or breathing is not clearly defined. A deeper exploration of the underlying mechanisms driving this behavior, such as from the perspectives of attention and perception, is necessary.

    1. eLife assessment

      This study presents a valuable framework and findings for our understanding of the brain as a fractal object, by observing the stability of its shape property within 11 primate species. Although the framework is well-detailed, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims, particularly on the effects of aging and on the interpretation of links between brain shape and the underlying anatomy. This study will be of interest to neuroscientists interested in brain morphology, and to physicists and mathematicians interested in modeling the shapes of complex objects.

    2. Reviewer #1 (Public Review):

      This study examined a universal fractal primate brain shape. However, the paper does not seem well structured and is not well written. It is not clear what the purpose of the paper is. And there is a lack of explanation for why the proposed analysis is necessary. As a result, it is challenging to clearly understand what novelty in the paper is and what the main findings are. Additionally, several terms are introduced without adequate explanation and contextualization, further complicating comprehension. Does the second section, "2. Coarse-graining procedure", serve as an introduction or a method? Moreover, the rationale behind the use of the coarse-graining procedure is not adequately elucidated. Overall, it is strongly recommended that the paper undergoes significant improvements in terms of its structure, explanatory depth, and overall clarity to enhance its comprehensibility.

    3. Reviewer #2 (Public Review):

      In this manuscript, Wang and colleagues analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The work builds on the scaling law introduced previously by co-author Mota, and Herculano-Houzel. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between the brains of young (~20 year old) and old (~80) humans. My main criticism of this manuscript is that the findings are presented in too abstract a manner for the scientific contribution to be recognized.

      1. The series of operations to coarse-grain the cortex illustrated in Figure 1, constitute a novel procedure, but it is not strongly motivated, and it produces image segmentations that do not resemble real brains. The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel (section S2). This causes the cortical segmentation, such as the bottom row of Figure 1B, to increase in thickness with successive melting steps, to unrealistic values. For the rightmost figure panel, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of a typical mammalian neocortex.

      2. For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4c. It seems this additional spacing between gyri in 80-year-olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20-year-olds. A case could be made that the familiar way of describing brain tissue - cortical volume, white matter volume, thickness, etc. - is a more direct and intuitive way to describe differences between young and old adult brains than the obscure shape metric described in this manuscript. At a minimum, a demonstration of an advantage of the Figure 4a and 4b analyses over current methods for interpreting age-related differences would be valuable.

      3. In Discussion lines 199-203, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. First, the authors do not show, (and it would be surprising if it were true) that self-similarity is observed for length scales smaller than the acquired MRI data for any of the datasets analyzed. The analysis is restricted to coarse (but not fine)-graining. Therefore, self-similarity on all length scales would seem to be too strong a constraint. Second, it is hard to imagine how this test could be used in practice. Specific examples of how gyrification mechanisms support or fail to support the generation of self-similarity across any length scale, would strengthen the authors' argument.

      Some additional, specific comments are as follows:

      4. The definition of the term A_e as the "exposed surface" was difficult to follow at first. It might be helpful to state that this parameter is operationally defined as the convex hull surface area. Also, for the pial surface, A_t, there are several who advocate instead for the analysis of a cortical mid-thickness surface area, as the pial surface area is subject to bias depending on the gyrification index and the shape of the gyri. It would be helpful to understand if the same results are obtained from mid-thickness surfaces.

      5. In Figure 2c, the surfaces get smaller as the coarse-graining increases, making it impossible to visually assess the effects of coarse-graining on the shapes. Why aren't all cortical models shown at the same scale?

      6. Text in Section 3.2 emphasizes that K is invariant with scale (horizontal lines in Figure 3), and asserts this is important for the formation of all cortices. However, I might be mistaken, but it appears that K varies with scale in Figure 4a, and the text indicates that differences in the S dependence are of importance for distinguishing young vs. old brains. Is this an inconsistency?

    4. Reviewer #3 (Public Review):

      Summary:

      Through a detailed methodology, the authors demonstrated that within 11 different primates, the shape of the brain matched a fractal of dimension 2.5. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependent effect of aging on the human brain.

      Strengths:

      - New hierarchical way of expressing cortical shapes at different scales derived from the previous report through the implementation of a coarse-graining procedure.<br /> - Positioning of results in comparison to previous works reinforcing the validity of the observation.<br /> - Illustration of scale-dependence of effects of brain aging in the human.

      Weaknesses:

      - The impact of the contribution should be clarified compared to previous studies (implementation of new coarse graining procedure, dimensionality of primate brain vs previous studies, and brain aging observations).<br /> - The rather small sample sizes, counterbalanced by the strength of the effect demonstrated.<br /> - The use of either averaged or individual brains for the different sub-studies could be made clearer.<br /> - The model discussed hypothetically in the discussion is not very clear, and may not be state-of-the-art (axonal tension driving cortical folding? cf. https://doi.org/10.1115/1.4001683).

    5. Author Response:

      We thank all reviewers for their comments and effort to improve our paper. We appreciate that the writing can be clarified overall, and some sections need more elaboration. We will provide these in the next revision within the coming months. Particularly, we will focus on some common themes identified by all reviewers:

      1. We will clarify that the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our analysis purely focuses on the likeliness in terms of whole-brain morphometrics between actual brains and coarse-grained brains. Specifically on the point of “thickening” of the brain: this is anatomically well-founded, as less folded brains have a “thicker” cortex than more folded brains, when they are all normalised to the same size. This is fundamentally why the universal scaling law also applies to these coarse-grained brains. We will provide more detail to highlight this.

      2. We will clarify the motivation behind our coarse-graining procedure better: mathematically, this is directly inspired by box-counting algorithms in fractal geometry; but this algorithm also has elegant parallels with other algorithms which we will highlight.

      3. The age effects are demonstrated here in a small sample as a proof-of-principle, but we will update our latest results using ~100 subjects from the CamCAN data demonstrating the same effect. We have additionally described and verified these age effects in more detail in a separate preprint (https://arxiv.org/abs/2311.13501) with ~1500 subjects, and additionally showed that scale-dependent metrics substantially improve understanding and applications such as brain age prediction.

      4. We have independently also received the feedback that we need to clarify how our method interacts with different resolution of the original MRI. We will add this as a new set of results, demonstrating that the MRI acquisition resolution (within a reasonable range) has a very small effect, as our method takes the reconstructed surfaces as a starting point.

      5. We agree that it may be confusing to emphasise a constant K in the first set of results across species, and then later highlight a changing K in the human ageing results. We will clarify that in the first set of results, we find a “constant” K relative to a changing S: The range in K across melted primate brains is approx 0.1, whereas in S it is over 1.2. In other words, S changes are an order of magnitude higher than K changes. Hence, we described K as “constant” relative to S. Nevertheless, K shows subtle changes within individuals, which is what we are describing in the human ageing results. These changes are within the range of K values described in the across species results.

      6. Finally, we will also make sure to summarise our specific contributions beyond existing work:

        (i) Showing for the first time that representative primate species follow the exact same fractal scaling – as opposed to previous work showing that they have a similar fractal dimension, i.e. slope, but not necessarily the same offset, as previous methods had no consistent way of comparing offsets.

        (ii) Previous work could also not show direct agreement in morphometrics between the coarse-grained brains of primate species and other non-primate mammalian species.

        (iii) Demonstrating in proof-of-principle that multiscale morphometrics, in practice, can have much larger effect sizes for classification applications. This moves beyond our previous work where we only showed the scaling law across and within species, but all on one (native) scale with comparable effect sizes for classification applications.

    1. eLife assessment

      In the last 15 years, large-scale association studies (GWAS) have served to estimate the association between genome-wide common variants and a large number of disparate traits and diseases in humans. This valuable method provides a new way to find correlations between the genetic component of a phenotype of interest, and all this wealth of genetic information. This software adds as a new tool to investigate genetic correlation between traits, and to generate new mechanistic hypotheses and dissect the role of the observed associations in disease heterogeneity. The results of the application of their method are solid and generally agree with what others have seen using similar AD and UKB data.

    2. Reviewer #1 (Public Review):

      The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

      The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

      Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

      Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

      Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

      Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

      Strengths:<br /> BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

      Weaknesses:<br /> However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

      In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We will include the suggested data on the Cryo-EM analyses in a revised version of the preprint. We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. In a revised version we will include the analyses of several more datasets at the F1 and F2 conditions to support this statement.

      Reviewer #3 (Public Review):

      Weaknesses:

      1. The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOS-like polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations:

      (1) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability.

      (2) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. The pH 7.4 conditions were only mentioned twice, once as an unseeded and once as a cross-seeded fibrilization. We solved a second Type 1 structure from a second dataset from the same protein batch fibrillized under similar conditions at pH 7.4 but with the addition of inositol trisphosphate in the hopes that we could replicate one of the in vivo polymorphs. However only the Type 1 polymorphs were observed and so we will add this data point to the revised manuscript. We are currently screening more fibrils produced at pH 7.0 and will include any replicates of Type 5 or the Type 1M polymorphs or of new structures that are obtained at these conditions… however, as noted in the original manuscript, reproducibility at this pH might be difficult because there appears to be a wider range of accessible polymorphs. As will be mentioned in the revised version, the Type 5 structure was solved from a manually picked set of fibers that represented 10-20% of the observed fibrils. The remaining fibers in the sample comprised polymorphs that could not be analyzed due to their inhomogeneity or lack of twist.

      (3) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      1. The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alpha-synuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds.

      We thank the reviewer for reminding us to include a reference to these studies as a clear example of polymorph selection by cross-seeding which we will do in the revised version. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wild-type without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7-ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118). In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      1. In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation. We will add more discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      1. In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to: https://doi.org/10.1038/nature23002).

      As suggested by reviewer #2, we will add more comprehensive information on the 3D reconstruction and refinement process to a revised version.

      1. The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and will be corrected in a revised version.

      Reviewing Editor:

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work.

      We agree as stated above and will continue to work on this important point.

    2. eLife assessment

      This study presents important findings on the different polymorphs of alpha-synuclein filaments that form at various pH's during in vitro assembly reactions with purified recombinant protein. Of particular note is the discovery of two new polymorphs (1M and 5A) that form in PBS buffer at pH 7. The strength of the evidence presented is solid, but the addition of replicate experiments with re-purified proteins at pH 5.8 and pH 7 would further strengthen the conclusions. The work will be of interest to biochemists and biophysicists working on protein aggregation and amyloids.

    3. Reviewer #1 (Public Review):

      Summary:<br /> Frey et al. report the structures of aSyn fibrils that were obtained under a variety of conditions. These include the generation of aSyn fibrils without seeds, but in different buffers and at different pH values. These also include the generation of aSyn fibrils in the presence of seeding fibrils, again performed in different buffers and at different pH values, while the seeds were generated at different conditions. The authors find that fibril polymorphs primarily correlate with fibril growth buffer conditions, and not such much with the type of seed. However, the presence of a seed is still required, likely because fibrils can also seed along their lateral surfaces, not only at the blunt ends.

      Strengths:<br /> The manuscript includes an excellent review of the numerous available structures of aSyn. As the authors state, "it seems that there are about as many unique atomic-resolution structures of these aggregates as there are publications describing them."

      The text is interesting to read, figures are clear and not redundant.

      Weaknesses:<br /> The manuscript is excellently written, but sometimes a few commas are lacking.

    4. Reviewer #2 (Public Review):

      Summary:<br /> This is an exciting paper that explores the in vitro assembly of recombinant alpha-synuclein into amyloid filaments. The authors changed the pH and the composition of the assembly buffers, as well as the presence of different types of seeds, and analysed the resulting structures by cryo-EM.

      Strengths:<br /> By doing experiments at different pHs, the authors found that so-called type-2 and type-3 polymorphs form in a pH-dependent manner. In addition, they find that type-1 filaments form in the presence of phosphate ions. One of their in vitro assembled type-1 polymorphs is similar to the alpha-synuclein filaments that were extracted from the brain of an individual with juvenile-onset synucleinopathy (JOS). They hypothesize that additional densities in a similar place as additional densities in the JOS fold correspond to phosphate ions.

      Weaknesses:<br /> The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

    5. Reviewer #3 (Public Review):

      Summary:<br /> The high heterogeneity nature of α-synuclein (α-syn) fibrils posed significant challenges in structural reconstruction of the ex vivo conformation. A deeper understanding of the factors influencing the formation of various α-syn polymorphs remains elusive. The manuscript by Frey et al. provides a comprehensive exploration of how pH variations (ranging from 5.8 to 7.4) affect the selection of α-syn polymorphs (specifically, Type1, 2, and 3) in vitro by using cryo-electron microscopy (cryo-EM) and helical reconstruction techniques. Crucially, the authors identify two novel polymorphs at pH 7.0 in PBS. These polymorphs bear resemblance to the structure of patient-derived juvenile-onset synucleinopathy (JOS) polymorph and diseased tissue amplified α-syn fibrils. The manuscript supports the notion that seeding is non-polymorph-specific in the context of secondary nucleation-dominated aggregation, underscoring the irreplaceable role of pH in polymorph formation. Nevertheless, certain areas within the manuscript would benefit from further refinement and elaboration to more robustly substantiate this hypothesis.

      Strengths:<br /> This study systematically investigates the effects of environmental conditions and seeding on the structure of α-syn fibrils. It emphasizes the significant influence of environmental factors, especially pH, in determining the selection of α-syn polymorphs. The high-resolution structures obtained through cryo-EM enable a clear characterization of the composition and proportion of each polymorph in the sample. Collectively, this work provides strong support for the pronounced sensitivity of α-syn fibril structures to environmental conditions and systematically categorizes previously reported α-syn fibril structures. Furthermore, the identification of JOS-like polymorph also demonstrates the possibility of in vitro reconstruction of brain-derived α-syn fibril structures.

      Weaknesses:<br /> 1. The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOS-like polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations:

      (1) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      (2) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.

      (3) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.

      2. The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds.

      3. In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.

      4. In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to: https://doi.org/10.1038/nature23002).

      5. The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths

      This paper is well situated theoretically within the habit learning/OCD literature.

      Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid.

      The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

      The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the authors' conclusions.

      Response: We truly appreciate the positive assessment of referee 1, particularly the consideration that our study is theoretically strong and that ‘the results are supportive of the authors' conclusions’. This is an important external endorsement of our conclusions, contrasting somewhat with the views of referee 2.

      Weaknesses

      The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status. The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

      Response: We agree with the reviewer that the proof of principle established in our study opens new avenues for research into the psychological and behavioral determinants of the heterogeneity of this clinical population. However, considering the study timeline and the pandemic constraints, a bigger sample was not possible. Our sample can indeed be considered small if one compares it with current online studies, which do not require in-person/laboratory testing, thus being much easier to recruit and conduct. However, given the nature of our protocol (with 2 demanding test phases, 1-month engagement per participant and the inclusion of OCD patients without comorbidities only) and the fact that this study also involved laboratory testing, we consider our sample size reasonable and comparable to other laboratory studies (typically comprising on average between 30-50 participants in each group).

      This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an inlaboratory approach.

      An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

      Response: Thank you for acknowledging the impact of our study, in particular the unique ability of our task to interrogate the habit system.

      Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1. Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.

      Response: We acknowledge that the research on habit learning is a topic of current controversy, especially when it comes to how to induce and measure habits in humans. Therefore, within this context referee’s 2 criticism could be expected. Across distinct fields of research, different methodologies have been used to measure habits, which represent relatively stereotyped and autonomous behavioral sequences enacted in response to a specific stimulus without consideration, at the time of initiation of the sequence, of the value of the outcome or any representation of the relationship that exists between the response and the outcome. Hence these are stimulus-bound responses which may or may not require the implementation of a skill during subsequent performance. Behavioral neuroscientists define habits similarly, as stimulus-response associations which are independent of reward or outcome, and use devaluation or contingency degradation strategies to probe habits (Dickinson and Weiskrantz, 1985; Tricomi et al., 2009). Others conceptualize habits as a form of procedural memory, along with skills, and use motor sequence learning paradigms to investigate and dissect different components of habit learning such as action selection, execution and consolidation (Abrahamse et al., 2013; Doyon et al., 2003; Squire et al., 1993). It is also generally agreed that the autonomous nature of habits and the fluid proficiency of skills are both usually achieved with many hours of training or practice, respectively (Haith and Krakauer, 2018).

      We consider that Balleine and Dezfouli (2019) made an excellent attempt to bring all these different criteria within a single framework, which we have followed. We also consider that our discussion in fact followed a rather cautious approach to interpretation solely in terms of goaldirected versus habitual control.

      Referee 2 does not actually specify criteria by which they define habits and skills, except for asserting that skilled behavior is goal-directed, without mentioning what the actual goal of the implantation of such skill is in the present study: the fulfillment of a habit? We assume that their definition of habit hinges on the effects of devaluation, as a single criterion of habit, but which according to Balleine and Dezfouli (2019) is only 1 of their 4 listed criteria. We carefully addressed this specific criterion in our manuscript: “We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered…”. Therefore, we took due care in our conclusions concerning habits and thus found the referee’s comment misleading and unfair.

      We note that our trained motor sequences did in fact fulfil the other 3 criteria listed by Balleine and Dezfouli (2019), unlike many studies employing only devaluation (e.g. Tricomi et al 2009; Gillan et al 2011). Moreover, we cited a recent study using very similar methodology where the devaluation test was applied and shown to support the habit hypothesis (Gera et al., 2022).

      Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1). Transitions between habitual and goal-directed control over behavior are quite well established in the experimental literature, especially when choice opportunities become available (Bouton et al (2021), Frölich et al (2023), or a new goal-directed schemata is recruited to fulfill a habit (Fouyssac et al, 2022). This switching between habits and goal-directed responding may reflect the coordination of these systems in producing effective behavior in the real world.

      • Fouyssac M, Peña-Oliver Y, Puaud M, Lim NTY, Giuliano C, Everitt BJ, Belin D. (2021).Negative Urgency Exacerbates Relapse to Cocaine Seeking After Abstinence. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.10.009

      • Frölich S, Esmeyer M, Endrass T, Smolka MN and Kiebel SJ (2023) Interaction between habits as action sequences and goal-directed behavior under time pressure. Front. Neurosci. 16:996957. doi: 10.3389/fnins.2022.996957

      • Bouton ME. 2021. Context, attention, and the switch between habit and goal-direction in behavior. Learn Behav 49:349– 362. doi:10.3758/s13420-021-00488-z

      1. Some methodological aspects need more detail and clarification.

      2. There are concerns regarding some of the analyses, which require addressing.

      Response: We thank referee 2 for their detailed review of the methods and analyses of our study and for the helpful feedback, which clearly helps improve our manuscript. We will clarify the methodological aspects in detail and conduct the suggested analysis. Please see below our answers to the specific points raised.

      Introduction:

      1. It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      Response: We agree that there is no evidence that action sequences become habitual more readily than single actions, although action sequences clearly allow ‘chunking’ and thus likely engage neural networks including the putamen which are implicated in habit learning as well as skill. In our revised manuscript we will instead state: “we have recently postulated that extensive training of sequential actions could be a means for rapidly engaging the ‘habit system’ (Robbins et al., 2019)]”

      DONE in page 2

      1. In the Hypothesis section the authors state: “we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence”. I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      Response: We agree that choice of the familiar response sequence should not be a necessary criterion for habitual control although choice for a familiar sequence is, in fact, not inconsistent with this hypothesis. In a recent study, Zmigrod et al (2022) found that 'aversion to novelty' was a relevant factor in the subjective measurement of habitual tendencies. It should also be noted that this preference was present in patients with OCD. If one assumes instead, like the referee, that the familiar sequence is goal-directed, then it contravenes the well-known 'egodystonia' of OCD which suggests that such tendencies are not goal-directed.

      To clarify our hypothesis, we will amend the sentence to the following: “Finally, we expected that OCD patients would generally report greater habits, as well as attribute higher intrinsic value to the familiar app sequences manifested by a greater preference for performing them when given the choice to select any other, easier sequence”.

      DONE in page 5. We have now rephrased it: “Additionally, we hypothesized that OCD patients would generally display stronger habits and assign greater intrinsic value to the familiar app sequences, evidenced by a marked preference for executing them even when presented with a simpler alternative sequence.”

      A few notes on the task description and other task components:

      1. It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      Response: These details will be included in the revised manuscript. Thank you for pointing out the need for further clarification of the task design.

      Done in page 7

      1. Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      Response: This additional information will be added to the revised manuscript. If participants omitted to train for more than 2 days, the researcher would send a reminder to the participant to request to catch up. If the participant would not react accordingly and a third day would be skipped, then the researcher would call to understand the reasons for the lack of engagement and gauge motivation. The participant would be excluded if more than 5 sequential days of training were missed. Only 2 participants were excluded given their lack of engagement.

      Done in page 8

      1. According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.

      If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).

      This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      Response: The task procedure was indeed the same as detailed in Banca et al., 2020. We did not include these extra components in this current manuscript for reasons of succinctness and because the manuscript was already rather longer than a common research article, given that we present three different, though highly inter-dependent, experiments in order to answer key interrelated questions in an optimal manner. However, since referee 2 considers this additional analysis to be important, we will be happy to include it in the supplementary material of the revised manuscript.

      These additional components of the task as well as the respective analysis are now described in the Supplementary Materials.

      Training engagement analysis:

      1. I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      Response: We acknowledge referee’s concern on this matter and agree to replace the y-axis variable of Figure 2b to the number of performed practices (thus aligning with Figure 2a). This amendment will remove any potential effect of weaker performance on the engagement measurement and will provide clearer results.

      We have now decided to remove this figure as it does not add much to figure 2a. Instead, we replaced figure 2b and 2c for new plots, following new analysis linked to the next reviewer request (point 10)

      1. Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      Response: Done, see revised Figure 2b and 2c. We have assessed the diurnal training patterns within each group using circular statistics, followed by independent-sample statistical testing of those circular distributions with the Watson’s U2 test ( Landler et al., 2021). While OCD participants have a group effect of practice with a significant peak at ~18:00, and HV participants have an earlier significant peak at ~15:00, the Watson’s U test did not find statistical betweengroup differences.

      • Landler L, Ruxton GD, Malkemper EP. Advice on comparing two independent samples of circular data in biology. Scientific reports. 2021 Oct 13;11(1):20337.

      Learning results:

      1. When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Response: Thank you for pointing this out. The descriptive stats for MT0 will be added to the revised version of the manuscript.

      Done page 11

      1. Sensitivity of sequence duration and IKI consistency (C) to reward:

      I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      Response: This is an important question. Our analysis protocol was designed to ensure that incorrect trials do not contaminate or confound the results. To estimate the trial-to-trial difference in ∆MT (or C) and ∆R, we exclusively included pairs of contiguous trials where participants achieved correct performance and received feedback scores for both trials. For example, if a participant made a performance error on trial 23, we did not include ∆R or ∆MT estimates for the pairs of trials 23-22 and 24-23. Instead of excluding incorrect trials from our analyses, we retained them in our time series but assigned them a NaN (not a number) value in Matlab. As a result, ∆R and ∆MT was not defined for those two pairs of trials. Similarly for C. This approach ensured that our analyses are not confounded by incremental or decremental feedback scores between noncontiguous trials. In the past, when assessing the timing of correct actions during skilled sequence performance, we also considered events that were preceded and followed by correct actions. This excluded effects such as post-error slowing from contaminating our results (Herrojo Ruiz et al., 2009, 2019). Therefore, we do not believe that any further reanalysis is required.

      • Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral cortex. 2009 Nov 1;19(11):2625-39.

      • Bury G, García-Huéscar M, Bhattacharya J, Ruiz MH. Cardiac afferent activity modulates early neural signature of error detection during skilled performance. NeuroImage. 2019 Oct 1;199:704-17.

      1. I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."

      Response: Thank you for raising this point. This has been done, see updated Figures 5 and 6. After normalizing the ∆MT(n+1) := MT(n+1) – MT(n) difference values by dividing them with the baseline MT(n) at trial n, we obtain the same results. Similar results are also obtained for IKI consistency (C).

      See below our initial response from June 2023.

      Thank you for raising this point. Figure 5b illustrates two distinct effects of reward changes on behavioral adaptation, which are expected based on previous research.

      I. Practice effects: Firstly, we observe that as participants progress across bins of practice, the degree of improvement in behavior (reflected by faster movement time, MT) following a decrease in reward (∆R−) diminishes, consistent with our expectations based on previous work. Conversely, we found that ∆MT does not change across bins of practices following an increase in reward (∆R+).

      We appreciate the reviewer’s suggestion regarding controlling for the reference movement time (MT) in the previous trial when examining the practice effect in the p(∆T|∆R−) and p(∆T|∆R+) distributions. In the revised manuscript, we will conduct the proposed control analysis to better understand whether the sensitivity of MT to score decrements changes across practice when normalising MT to the reference level on each trial. But see below for a preliminary control analysis.

      II. Asymmetry of the effect of ∆R− and ∆R+ on performance: Figure 5b also depicts the distinct impact of score increments and decrements on behavioural changes. When aggregating data across practice bins, we consistently observed that the centre of the p(∆T|∆R−) distribution was smaller (more negative) than that of p(∆T|∆R+). This suggests that participants exhibited a greater acceleration following a drop in scores compared to a relative score increase, and this effect persisted throughout the practice sessions. Importantly, this enhanced sensitivity to losses or negative feedback (or relative drops in scores) aligns with previous research findings (Galea et al., 2015; Pekny et al., 2014; van Mastrigt et al., 2020).

      We have conducted a preliminary control analysis to exclude the potential impact that reference movement time (MT) values could have on our analysis. We have assessed the asymmetry between behavioural responses to ∆R− and ∆R+ using the following analysis: We estimated the proportion of trials in which participants exhibited speed-up (∆T < 0) or slow-down (∆T > 0) behaviour following ∆R− and ∆R+ across different practice bins (bins 1 to 4). By discretising the series of behavioural changes (∆T) into binary values (+1 for slowing down, -1 for speeding up), we can assess the type of changes (speed-up, slow-down) without the absolute ∆T or T values contributing to our results. We obtained several key findings:

      • Consistent with expectations (sanity check), participants exhibited more instances of speeding up than slowing down across all reward conditions.

      • Participants demonstrated a higher frequency of speeding up following ∆R− compared to ∆R+, and this asymmetry persisted throughout the practice sessions (greater proportion of -1 events than +1 events). 53% events were speed-up events in the in the p(∆T|∆R+) distribution for the first bin of practices, and 55% for the last bin. Regarding p(∆T|∆R-), there were 63% speed-up events throughout each bin of practices, with this proportion exhibiting no change over time.

      • Accordingly, the asymmetry of reward changes on behavioural adaptations, as revealed by this analysis, remained consistent across the practice bins.

      Thus, these preliminary findings provide an initial response to referee 2 and offer valuable insights into the asymmetrical effects of positive/negative reward changes on behavioural adaptations. We plan to include these results in the revised manuscript, as well as the full control analysis suggested by the referee. We will further expand upon their interpretation and implications.

      1. Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.

      Response: The reviewer’s concern is addressed in the previous quesQon. Also, this analysis would not be possible because our Gaussian fit analyses use the Qme series of conQnuous reward scores, in which ∆R− or ∆R+ are embedded. These events cannot be analyzed once reward feedback is removed because we do not have behavioral events following ∆R− or ∆R+ anymore.

      Done

      1. This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      Response: We will conduct this analysis for the revised manuscript, similarly to the control analysis suggested by referee 2 on MT. Our preliminary control analysis, as explained above, suggests that the fundamental asymmetry in the effect of ∆R+ and ∆R+ on behavioral changes persists when excluding the impact of reference performance values in our Gaussian fit analysis.

      Done. See updated Figure 6. The results are very similar once we normalize the IKI consistency index C with the IKI of the baseline performance at trial n.

      1. Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      Response: We have now conducted this analysis. There is in fact no evidence to conclude that the continuously rewarded sequence was the preferred one. The result shows that 54.5% of HV and 29% of the OCD sample considered the continuous sequence to be their preferred one, a nonstatistically significant difference. Note that this preference may not necessarily be linked simply to programmed reward. The overall preference may be influenced by many other factors, such as, for example, the aesthetic appeal of particular combinations of finger movements.

      Regarding both experiments 2 and 3:

      1. The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Response: Following referee’s advice, we will move these details (currently written in the Methods section) to the Results section, when we introduce Phase B and before describing the results of experiments 2 and 3.

      Done in page 21

      Experiment 2:

      1. In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      Response: We clearly defined the theoretical framing of experiment 2 as a test of whether trained action sequences gain intrinsic value and we are pleased to hear that the referee agrees with this framing. If the referee is referring to the paragraph below (in the Discussion), we actually do acknowledge within this paragraph that a preference for the trained sequences can either be conceptually aligned with a habit OR a goal-directed behavior.

      “On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes”

      This narrowing of goals model of OCD refers to a hypothetically transiQonal stage of compulsion development driven by behavior having an abnormally strong, goal-directed nature, typically linked to specific values and concerns.

      If the referee is referring to the penulQmate sentence of hypothesis secQon, this has been amended in response to Q5. We cannot find any other possible instances in this manuscript stating that experiment 2 is a test of habitual or goal-directed behavior.

      Experiment 3:

      1. Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this reevaluation led participants to switch from habitual to goal-directed behavior.

      Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      Response: This comment is aligned with (and follows) the referee’s criticism of experiment 1 not achieving automatic and habitual actions. We have addressed this matter above, in response 1 to Referee 2.

      Mobile-app performance effect on symptomatology: exploratory analyses:

      1. Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      Response: We have conducted analysis to address this relevant question. There is no correlation between the YBOCS score change and the number of total practices, meaning that the patients who improved symptomatology post training did not necessarily chose to play the app more during the training stage (rs = 0.25, p = 0.15). Additionally, we have statistically compared the improvers (patients with reduced YBOCS scores post-training) and the non-improvers (patients with unchanged or increased YBOCS scores post-training) in their number of app completed practices during the training phase and no differences were observed (U = 169, p = 0.19).

      The result from the correlational analysis has been added to the revised manuscript (page 28).

      Discussion:

      1. Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      Response: We do not agree that the work is either "inadequate or mis-framed" and will not therefore be substantially revising the Discussion. We will however clarify further the interpretation we have made and make explicit the alternative viewpoint of the referee. For example, we will retitle experiment 3 as “Re-evaluation of the learned action sequence: possible test of goal/habit arbitration” to acknowledge the referee’s viewpoint as well as our own interpretation.

      Done

      1. In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      Response: We recognize that the term "disadvantageously" may be semantically ambiguous for some readers and therefore we will remove it.

      Done

      Materials and Methods:

      1. The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Response: Details of the learning procedure of the novel sequence (in condition 3, experiment 3) will be provided in the methods of the revised version of the manuscript.

      Done in page 40

      Minor comments:

      1. In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      Response: We will follow this referee’s advice and will rephrase the sentence for clarity.

      Done. See page 16.

      1. With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

      Response: The word "further" will be removed.

      Done

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting manuscript, which was a pleasure to review. I have some minor comments you may wish to consider.

      1. I believe that it is possible to include videos as elements in eLife articles - please consider if you can do this to demonstrate the action sequence on the smartphone. I followed the YouTube video, and it was very helpful to see exactly what participants did, but it would be better to attach the video directly, if possible.

      Response: This is a great idea and we will definitely attach our video demonstrating the task to the revised manuscript (Version of Record) if the eLife editors allow.

      We ask permission to the editor to add the video

      1. The abstract states that the study uses a "novel smartphone app" but is the same one as described in Banca et al. Suggest writing simply "smartphone app".

      Response: We will remove the word novel.

      Done

      1. Some of the hypotheses described in the second half of the Hypothesis section could be stated more explicitly. For example: "We also hypothesized that the acquisition of learning and automaticity would differ between the two action sequences based on their associated rewarded schedule (continuous versus variable) and reward valence (positive or negative)." The subsequent sentence explains the prediction for the schedule but what is the hypothesized direction for reward valence? More detail is subsequently given on p. 14, Results, but it would be better to bring these details up to the Introduction. "We additionally examined differential effects of positive and negative feedback changes on performance to build on previous work demonstrating enhanced sensitivity to negative feedback in patients with OCD (Apergis-Schoute et al 2023, Becker et al., 2014; Kanen et al., 2019)." In general, the second part of the Hypothesis section is a bit dense, sometimes with two predictions per sentence. It could be useful for the reader if hypotheses were enumerated and/or if a distinction was made among the hypotheses with respect to their importance.

      We fully revised the hypothesis section, on page 5, following this reviewer’s suggestion. We think this section is much clearer now, in our revised manuscript.

      Response: Thank you for pointing out the need for clarity in our hypothesis section. This is a very important point and we will carefully rewrite our hypothesis in the revised manuscript to make them as clear as possible.

      1. Did medication status correlate with symptom severity in the OCD group (e.g., higher symptoms for the 6 participants on SSRI+antipsychotics?). Could this, or SSRI-only status, have impacted results in any way? I appreciate that there is no way to test medication status statistically but readers may be interested in your thoughts on this aspect.

      Response: We have now conducted exploratory analysis to assess the potential effect of medication in the following output measures: app engagement (as measured by completed practices), explicit preference and YBOCS change post-training. The patients who were on combined therapy (SSRIs + antipsychotic) did not perform significantly different in these measures as compared to the remaining patients and no other effects of interest were observed. Their symptomatology was indeed slightly more severe but not statistically significant [Y-BOCS combined = 26.2 (6.5); Y-BOCS SSRI only = 23.8 (6.1); Y-BOCS No Med = 23.8 (2.2), mean(std)]. Only one patient showed symptom improvement after the app training, another became worse and the remaining patients on combined therapy remain stable during the month.

      Palminteri et al (2011) found that unmedicated OCD patients exhibited instrumental learning deficits, which were fully alleviated with SSRI treatment. Therefore, it is possible that the SSRI medication (present in our sample) may have reduced habit formation and facilitated behavioral arbitration. However, since the effect goes against the habit hypothesis, it has is unlikely that it has confounded our measure of automaticity. If anything, medication rendered experiment 2 and 3 more goal-oriented. We agree that further studies are warranted to address the effect of SSRIs on these measures.

      1. You could explain earlier why devaluation could not be tested here (it is only explained in the Limitations section near the end)

      Response: The revised manuscript will be amended to account for this note.

      Done in page 25.

      1. Capitalize 'makey-makey', I didn't realize there was a product called Makey Makey until I Googled it.

      Response: Sure. We will capitalize 'Makey-Makey'. Thank you for pointing this out!

      Done

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors (ordered by the paper sections):

      In the introduction

      1. regarding this part "We used a period of 1-month's training to enable effective consolidation, required for habitual action control or skill retention to occur. This acknowledged previous studies showing that practice alone is insufficient for habit development as it also requires off-line consolidation computations, through longer periods of time (de Wit et al., 2018) and sleep (Nusbaum et al., 2018; Walker et al., 2003)." I advise the authors to re-check whether what is attributed here to de Wit et al. (2018) is indeed justified (if I remember correctly they have not mentioned anything about off-line consolidation computations).

      Response: When we revise the manuscript, we will remove the de Wit et al. (2018) citation from this sentence.

      Done

      in the Outline paragraph

      1. it stated: "We continuously collected data online, in real time, thus enabling measurements of procedural learning as well as automaticity development." I think this wording implies that the fact that the data was collected online in real time was advantageous in that it enabled to assess measurements of procedural learning and automaticity development, which in my understanding is not the case.

      Response: To make this sentence clearer, we will change it to the following: ‘We continuously collected data online, to monitor engagement and performance in real time and to enable acquisition of sufficient data to analyze, à posteriori, procedural learning and automaticity development’.

      Done in page 4: ‘We collected data online continuously to monitor engagement and performance in real-time. This approach ensured we acquired sufficient data for subsequent analysis of procedural learning and automaticity development’.

      1. In the final sentence of this paragraph "or and" should be changed to "or/end".

      Response: This was a typo. The word ‘and’ will be removed.

      Done

      1. In Figure 1c - Note that in the figure legend it says "Each sequence comprises 3 single press moves, 2 two-finger moves..." whereas in the example shown in the figure it's the other way around (2 single press moves and 3 two-finger moves).

      Response: Thank you so much for spotting this! The example shown in the figure is incorrect. We apologize for the mistake. It should depict 3 single press moves, 2 two-finger moves and 1 three- finger move. The figure will be amended.

      Done

      In the results section:

      1. Regarding the "were followed by a positive ring tone and the unsuccessful ones by a negative ring tone", I suggest mentioning that there was also a positive visual (rewarding) effect.

      Response: Thank you. A mention to the visual effect will be added for both the positive (successful) and negative (unsuccessful) trials. Done in page 7

      1. p 10. - Note a typo in the following sentence where the word "which" appears twice consecutively:

      "Furthermore, both groups exhibited similar motor durations at asymptote which, which combined with the previous conclusion, indicates that OCD patients improved their motor learning more than controls, but to the same asymptote."

      Response: Thank you for spotting this typo. The second word will be removed. Done

      1. I have a few suggestions with respect to Figure 3:

      2. keeping the y-axes scale similar in all subplots would be more visually informative.

      Here we kept the y-axes scale similar in all subplots, except one of them, which was important to keep to capture all the data.

      1. For the subplots in 3b I would recommend for the transparent regions, instead of the IQR, to use the median +/- 1.57 * IQR/sqrt(n) which is equivalent to how the notches are calculated in a box-plot figure (It is referred to as an approximate 95% confidence interval for the median). This should make the transparent area narrower and thus better communicate the results.

      Done

      1. I think the significant levels mentioned in figure legend 3b (which are referring to the group effect measured for each reward schedule type separately) is not mentioned in the text. While not crucial, maybe consider adding it in the text.

      We don’t think this is necessary and may actually lead to confusion because in the text we report a Kruskal–Wallis H test (which is the most appropriate statistical test), including their H and p values for the group and reward effects. Since in the figure we separated the analysis and plots for variable and continuous reward schedules (for visual purposes) , we reported a U test separated for each reward schedule. Therefore, we consider that the correct statistics are reported in the appropriate places of the manuscript.

      Response: Thank you for this very helpful suggestion. We will amend figure 3 accordingly.

      1. In the Automaticity results (pp. 12 and 13) when describing the Descriptive stats the wrong parameter indicator are used (DL instead of CL and nD instead of nC.

      Response: Thank you for noticing it. We will amend.

      Done

      1. In Sensitivity of IKI consistency (C) to reward results:

      In Figure 6a legend: with respect to "... and for reward increments (∆R+, purple) and decrements (∆R-, green)" - note that there are also additional colors indicating these ∆Rs.

      Response: Done. We had used a 2 x 2 color scheme: green hues for ∆R-, and purple hues for ∆R+. Then, OCD is denoted by dark colors, and HV by light colors. This represents all four colors used in the figure. For instance, OCD and ∆R- is dark green, whereas OCD and ∆R+ is denoted by dark purple.

      1. p.21 - the YBOCS abbreviation appears before the full form is spelled out in the text.

      Response: In the revised version, we will make sure the YBOCS abbreviation will be spelled out the first time it is mentioned.

      Done in page 24

      Experiments 2 and 3:

      1. If there is a reason behind presenting the conditions sequentially rather than using intermixed trials in experiments 2 and 3, it would be useful to mention it in the text.

      Response: Experiment 2 could have used intermixed trials. However, we were concerned that the use of intermixed trials in experiment 3 would increase excessively the memory load of the task, which could then be a confound.

      Done in page 41

      1. I wonder whether the presentation order of the conditions in experiments 2 and 3 affected participants' results? Maybe it is worth adding this factor to the analysis.

      Response: As we mentioned both in the methods and results sections, we counterbalanced all the conditions across participants, in both experiments 2 and 3. This procedure ensures no order effects.

      Experiment 2:

      1. Regarding this sentence (pp. 21-22): "However, some participants still preferred the app sequence, specifically those with greater habitual tendencies, including patients who considered the app training beneficial." I think the part that mentions that there are "patients who considered the app training beneficial" appears below and it may confuse the reader. I suggest either providing a brief explanation or indicating that further details will be provided later in the text ("see below in...").

      Response: We will clarify this section.

      We added “see below exploratory analyses of “Mobile-app performance effect on symptomatology”” in the end of the sentence so that the reader knows this is further explained below. Page 25

      1. Finally, in addition to subgrouping maybe it is worth testing whether there is a correlation between the YBOCS score change and the app-sequences preference (as to learn if the more they change their YBOCS the more they prefer the learned sequences and vice versa?)

      Response: Thank you for suggesting this relevant correlational analysis, which we have now conducted. Indeed, there is a correlation between the YBOCS score change and the preference for the app-sequences, meaning that the higher the symptom improvement after the month training, the greater the preference for the familiar/learned sequence. This is particularly the case for the experimental condition 2, when subjects are required to choose between the trained app sequence and any 3-move sequence (rs = 0.35, p=0.04). A trend was observed for the correlation between the YBOCS score change and the preference for the app-sequences in experimental condition 1 (app preferred sequence versus any 6-move sequence): rs = 0.30, p=0.09.

      This finding represents an additional corroboration of our conclusion that the app seems to be more beneficial to patients more prone to routine habits, who are somewhat more averse to novelty.

      This analysis was added in page 24, 25 and page 35.

      Experiment 3:

      1. You mention "The task was conducted in a new context, which has been shown to promote reengagement of the goal system (Bouton, 2021)." In my understanding this observation is true also for experiment 2. In such case it should be stated earlier (probably under: "Phase B: Tests of actionsequence preference and goal/habit arbitration").

      Response: As answered above in (Q17), we will follow this referee 2’s suggestion and describe the contextual details of experiments 2 and 3 in the Results section, when we introduce Phase B.

      Done in page 21.

      1. w.r.t this sentence - "...that sequence (Figure 8b, no group effects (p = 0.210 and BF = 0.742, anecdotal evidence)" I would add what the anecdotal evidence refers (as done in other parts of the paper), to prevent potential confusion.

      Response: OK, this will be added.

      Added on page 27

      Discussion:

      1. w.r.t. "Here we have trained a clinical population with moderately high baseline levels of stress and anxiety, with training sessions of a higher order of magnitude than in previous studies (de Wit et al., 2018, 2018; Gera et al., 2022) (30 days instead of 3 days)." The Gera et al. 2022 (was more than 3 days), you probably meant Gera et al. 2023 ("Characterizing habit learning in the human brain at the individual and group levels: a multi-modal MRI study", for which 3 days is true).

      Response: Thank you for pointing this out. We will keep the citation to Gera et al 2022 given its relevance to the sentence but we will remove the information inside the parenthesis. This amendment will solve the issue raised here.

      Done in page 32

      1. w.r.t "to a simple 2-element sequence with less training (Gera et al., 2022)" - it's a 3-element sequence in practice.

      Response: Thank you for this correction. We will amend this sentence accordingly.

      Done in page 32

      1. (p.30) w.r.t "and enhanced error-related negativity amplitudes in OCD" - a bit more context of what the negative amplitudes refer to would be useful (So the reader understands it refers to electrophysiology).

      Response: We will add a sentence in our revised manuscript addressing this matter. This sentence has been removed in the revised manuscript

      Supplementary materials:

      1. under "Sample size for the reward sensitivity analysis":

      It is stated "One practice corresponded to 20 correctly performed sequences. We therefore split the total number of correct sequences into four bins." I was not able to follow this reasoning here (20 correct trials in practice => splitting the data the 4 bins). More clarity here would be useful.

      Response: We will clarify this procedure of our analysis in the revised version of the manuscript. Thanks.

      Done. See Supplementary materials.

      1. Also, maybe I am missing something, but I couldn't understand why the number of sequences available per bin is different for the calculation of ∆MT and C. Aren't any two consecutive sequences that are good for the calculation of one of these measures also good for the calculation of the other?

      Response: Thank you for pointing this out. Indeed, the number of trials was the same for both analyses, ∆MT and C. We had saved an incorrect variable as number of trials. We will amend the text.

      We have re-analyzed the trial number data. The average number of trials per bin both for the ∆MT and C analyses was 109 (9) in the HV and 127 (12) in OCD groups. Although the number was on average larger in the patient group, we did not find significant differences between groups (p = 0.47).

      When assessing the p(∆T|∆R+) and p(∆T|∆R-) separately, more trials were available for p(∆T|∆R+), 107 (10) , than for p(∆T|∆R-), and 98 (8). These trial numbers differed significantly (p = 0.0046), but were identical for ∆MT and C analyses.

      Done. Included in Supplementary materials.

      Minor comments:

      1. Not crucial, but maybe for the sake of consistency consider merging the "Self-reported habit tendencies" section and the "Other self-reported symptoms" section, preferably where the latter is currently placed.

      Response: We fully understand the referee’s rationale underlying this suggestion. We indeed considered initially presenting the self-reported questionnaires all together, in a last, single section of the results, as suggested by the referee. However, we decided to report the higher habitual tendencies of OCD as an initial set of results, not only because it is a novel and important finding (which justifies it to be highlighted) but also because it is essential to the understanding of some of the remaining results presented.

      1. In some figure legends the percentage of the interval of the mentioned confidence intervals (probably 95%) is missing. I suggest adding it.

      Response: OK, this will be added.

      Done

      1. The NHS abbreviation appears without spelling out the full form.

      Response: This will be amended accordingly.

      I removed NHS as it is not relevant.

      1. In p.38 the citation (Rouder et al., 2012) is duplicated (appears twice consecutively).

      Response: Thank you for pointing this out. We will amend accordingly.

      Done

      In the results section:

      1. The authors mention: "To promote motivation, the total points achieved on each daily training sessions were also shown, so participants could see how well they improved across days". Yet, if the score is based on the number of practices, it may not represent participants improvement in case in some days more practices are performed. I suggest to clarify this point.

      Response: The goal of providing the scoring feedback was, as explained in the sentence, to gauge motivation and inform the subject about their performance. Having this goal in mind, it does not really matter if one day their scoring would be higher simply because they would have done more practice on that day. Participants could easily understand that the scoring reflected their performance on each practice so they would realize that the more practice, the greater their improvement and that the scoring would increase across days of practice. We will amend the sentence to the following: "To promote motivation, the total points achieved on each training session (i.e. practice) was also shown, so participants could see how well they improved across practice and across days".

      Done in page 7 and 8.

    2. eLife assessment

      The study provides valuable insights into OCD patients' acquisition of automaticity, skill learning, and the impact of intrinsic rewards on action sequence completion. The data provide incomplete evidence for the main claims as it is not clear that the participants' performance on the task meets the criteria for habitual behaviour.

    3. Reviewer #1 (Public Review):

      It is known that aberrant habit formation is a characteristic of obsessive-compulsive disorder (OCD). Habits can be defined according to the following features (Balleine and Dezfouli, 2019): rapid execution, invariant response topography, action 'chunking' and resistance to devaluation.<br /> The extent to which OCD behavior is derived from enhanced habit formation relative to deficits in goal-directed behavior is a topic of debate in the current literature. This study examined habit-learning specifically (cf. deficits in goal-directed behavior) by regularly presenting, via smartphone, sequential learning tasks to patients with OCD and healthy controls. Participants engaged in the tasks every day over the course of a month. Automaticity, including the extent to which individual actions in the sequence become part of a unified 'chunk', was an important outcome variable. Following the 30 days of training, in-laboratory tasks were then administered to examine 1) if performing the learned sequences themselves had become rewarding 2) differences in goal-directed vs. habitual behavior.

      Several hypotheses were tested, including:<br /> Patients would have impaired procedural learning vs. healthy volunteers (this was not supported, possibly because there were fewer demands on memory in the task used here)<br /> Once the task had been learned, patients would display automaticity faster (unexpectedly, patients were slower to display automaticity)<br /> Habits would form faster under a continuous (vs. variable) reinforcement schedule

      Exploratory analyses were also conducted: an interesting finding was that OCD patients with higher self-reported symptoms voluntarily completed more sessions with the habit-training app and reported a reduction in symptoms.

      Strengths

      This paper is well situated theoretically within the habit learning/OCD literature.<br /> Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid.<br /> The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

      Weaknesses

      The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task.<br /> The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.

      The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the author's conclusions.

      This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an in-laboratory approach.

      An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

    4. Reviewer #2 (Public Review):

      I would like to express my appreciation for the authors' dedication to revising the manuscript. It is evident that they have thoughtfully addressed numerous concerns I previously raised, significantly contributing to the overall improvement of the manuscript.

      My primary concern regarding the authors' framing of their findings within the realm of habitual and goal-directed action control persists. I will try explain my point of view and perhaps clarify my concerns.<br /> While acknowledging the historical tendency to equate procedural learning with habits, I believe a consensus has gradually emerged among scientists, recognizing a meaningful distinction between habits and skills or procedural learning. I think this distinction is crucial for a comprehensive understanding of human action control. While these constructs share similarities, they should not be used interchangeably. Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses).

      Watson et al. (2022) aptly detailed my concerns in the following statements: "Defining habits as fluid and quickly deployed movement sequences overlaps with definitions of skills and procedural learning, which are seen by associative learning theorists as different behaviours and fields of research, distinct from habits."<br /> "...the risk of calling any fluid behavioural repertoire 'habit' is that clarity on what exactly is under investigation and what associative structure underpins the behaviour may be lost."<br /> I strongly encourage the authors, at the very least, to consider Watson et al.'s (2022) suggestion: "Clearer terminology as to the type of habit under investigation may be required by researchers to ensure that others can assess at a glance what exactly is under investigation (e.g., devaluation-insensitive habits vs. procedural habits)", and to refine their terminology accordingly (to make this distinction clear). I believe adopting clearer terminology in these respects would enhance the positioning of this work within the relevant knowledge landscape and facilitate future investigations in the field.

      Regarding the authors' use of Balleine and Dezfouli's (2018) criteria to frame recorded behavior as habitual, as well as to acknowledgment the study's limitations, it's important to highlight that while the authors labeled the fourth criterion (which they were not fulfilling) as "resistance to devaluation," Balleine and Dezfouli define it as "insensitive to changes in their relationship to their individual consequences and the value of those consequences." In my understanding, this definition is potentially aligned with the authors' re-evaluation test, namely, it is conceptually adequate for evaluating the fourth criterion (which is the most accepted in the field and probably the one that differentiate habits from skills). Notably, during this test, participants exhibited goal-directed behavior.

      The authors characterized this test as possibly assessing arbitration between goal-directed and habitual behavior, stating that participants in both groups "demonstrated the ability to arbitrate between prior automatic actions and new goal-directed ones." In my perspective, there is no justification for calling it a test of arbitration. Notably, the authors inferred that participants were habitual before the test based on some criteria, but then transitioned to goal-directed behavior based on a different criterion. While I agree with the authors' comment that: "Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1)." they implicitly assert a shift from habit to goal-directed behavior without providing evidence that relies on the same probed mechanism.<br /> Therefore, I think it would be more cautious to refer to this test as solely an outcome revaluation test. Again, the results of this test, if anything, provide evidence that the fourth criterion was tested but not met, suggesting participants have not become habitual (or at least undermines this option).

    1. Author Response

      We thank the reviewers for their fair assessment of our work and will submit a revised version edited for clarity of presentation and precision of interpretations.

    2. eLife assessment

      This study investigated the factors related to understudied genes in biomedical research. It showed that understudied genes are largely abandoned at the writing stage, and it identified a number of biological and experimental factors that influence which genes are selected for investigation. The study is a valuable contribution to this branch of meta-research, and while the evidence in support of the findings is solid, the interpretation and presentation of the results (especially the figures) needs to be improved.

    3. Reviewer #1 (Public Review):

      Summary and strengths<br /> The authors tried to address why only a subset of genes are highlighted in many publications. Is it because these highlighted genes are more important than others? Or is it because there are non-genetic reasons? This is a critical question because in the effort to discover new genes for drug targets and clinical benefit, we need to expand a pool of genes for deep analyses. So I appreciate the authors' efforts in this study, as it is timely and important. They also provided a framework called FMUG (short for Find My Understudied Gene) to evaluate genes for a number of features for subsequent analyses.

      Weaknesses<br /> Many of the figures are hard to comprehend, and the figure legends do not sufficiently explain them.<br /> # For example, what was plotted in Fig 1b? The number of articles increased from results -> write-ups -> follow-ups in all four categories with different degrees. But it does not seem to match what the authors meant to deliver.<br /> # Fig 4 is also confusing. It appears that the genes were clustered by many features that the authors developed. But does it have any relationship with genes being under- or over-studied?

    4. Reviewer #2 (Public Review)

      Summary and strengths<br /> In this manuscript the authors analyse the trajectory of understudied genes (UGs) from experiment to publication and study the reasons for why UGs remain underrepresented in the scientific literature. They show that UGs are not underrepresented in experimental datasets, but in the titles and abstracts of the manuscripts reporting experimental data as well as subsequent studies referring to those large-scale studies. They also develop an app that allows researchers to find UGs and their annotation state. Overall, this is a timely article that makes an important contribution to the field. It could help to boost the future investigation of understudied genes, a fundamental challenge in the life sciences. It is concise and overall well-written, and I very much enjoyed reading it. However, there are a few points that I think the authors should address.

      Weaknesses<br /> The authors conclude that many UGs "are lost" from genome-wide assay at the manuscript writing stage. If I understand correctly, this is based on gene names not being reported in the title or abstract of these manuscripts. However, for genome-wide experiments, it would be quite difficult for authors to mention large numbers of understudied genes in the abstract. In contrast, one might highlight the expected behaviour of a well-studied protein simply to highlight that the genome-wide study provides credible results. Could this bias the authors' conclusions and, if so, how could this be addressed? For example, would it be worth to normalise studies based on the total number of genes they cover?

      Figure 1B is confusing in its present form. I think the plot and/or the legend need revising. For example, what "numbers to the right of each box plot" are the authors referring to? Also, I assume that the filled boxes are understudied genes and the empty/white box is "all genes", but that's not explained in the legend. In the main text, the figure is referred to with the sentence "we found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature ". I cannot follow how the figure shows this. My interpretation is that the y-axis is not showing the number of articles, but represents the percentage of articles mentioning a gene in the title/abstract, displayed on a log scale. If so, perhaps a better axis labels and legend text could be sufficient. But then one would also need to somehow connect this to the statement in the main text about the 20% highest-studied genes (a dashed line?). Alternatively, the authors could consider other ways of plotting these data, e.g. simply plotting the "% of publication in which a gene appears" from 0-100% or so.

    5. Reviewer #3 (Public Review):

      Summary and strengths<br /> The manuscript investigated the factors related to understudied genes in biomedical research. It showed that understudied are largely abandoned at the writing stage and identified biological and experimental factors associated with selection of highlighted genes.

      It is very important for the research community to recognize the systematic bias in research of human genes and take precautions when designing experiments and interpreting results. The authors have tried to profile this issue comprehensively and promoted more awareness and investigation of understudied genes.

      Weaknesses<br /> Regarding result section 1 "Understudied genes are abandoned at synthesis/writing stage", the figures are not clear and do not convey the messages written in the main text. For example, in Figure 1B, figure S5 and S6,<br /> - There is no "numbers to the right of each box plot".<br /> - Do these box plots only show understudied genes? How many genes are there in each box plot? The definition and numbers of understudied genes are not clear.<br /> - "We found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature (Figure 1B)". This is not clear from the figure.

      Regarding result section 2 "Subsequent reception by other scientists does not penalize studies on understudied genes", the authors showed in figure 2 that there is a negative correlation between articles per gene before 2015 and median citations to articles published in 2015. Another explanation could be that for popular genes, there are more low-quality articles that didn't get citations, not necessarily that less popular genes attract more citations.

      Regarding result section 3 "Identification of biological and experimental factors associated with selection of highlighted genes", in Figure 3 and table s2, the author stated that "hits with a compound known to affect gene activity are 5.114 times as likely to be mentioned in the title/abstract in an article using transcriptomics", The number 5.144 comes out of nowhere both in the figure and the table. In addition, figure 4 is not informative enough to be included as a main figure.

    1. eLife assessment

      This valuable manuscript demonstrates that the glycosyltransferase UGGT slows the degradation of endoplasmic reticulum (ER)-associated degradation substrates through a mechanism involving re-glucosylation of asparagine-linked glycans following release from the calnexin/calreticulin lectins. The evidence supporting this conclusion is solid using genetically-deficient cell models and biochemical methods to monitor the degradation of trafficking-incompetent ER-associated degradation substrates, although the manuscript could be improved through additional studies directed towards defining potential functional differences between UGGT1 and UGGT2 and additional insights into the impact of UGGT on the nature of substrate glycosylation within the ER. This work will be of specific interest to those interested in mechanistic aspects of ER protein quality control and protein secretion.

    2. Reviewer #1 (Public Review):

      Summary:<br /> UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:<br /> The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes glycoprotein degradation.

      Weaknesses:<br /> Less clear, though, is the involvement of UGGT2 in the process. Also, to this reviewer, some data do not necessarily support the conclusion.

      Major criticisms:

      1. One of the biggest problems I had on reading through this manuscript is that, while the authors appeared to generate UGGTs-KO cells from HCT116 and HeLa cells, it was not clearly indicated which cell line was used for each experiment. I assume that it was HCT116 cells in most cases, but I did not see that it was clearly mentioned. As the expression level of UGGT2 relative to UGGT1 is quite different between the two cell lines, it would be critical to know which cells were used for each experiment.

      2. While most of the authors' conclusion is sound, some claims, to this reviewer, were not fully supported by the data. Especially I cannot help being puzzled by the authors' claim about the involvement of UGGT2 in the ERAD process. In most of the cases, KO of UGGT2 does not seem to affect the stability of ERAD substrates (ex. Fig. 1C, 2A, 3D). When the author suggests that UGGT2 is also involved in the ERAD, it is far from convincing (ex. Fig. 2D/E). Especially because now it has been suggested that the main role of UGGT2 may be distinct from UGGT1, playing a role in lipid quality control (Hung, et al., PNAS 2022), it is imperative to provide convincing evidence if the authors want to claim the involvement of UGGT2 in a protein quality control system.

      In fact, it was not clear at all whether even UGGT1 is also involved in the process in Fig. 2D/E, as the difference, if any, is so subtle. How the authors can be sure that this is significant enough? While the authors claim that the difference is statistically significant (n=3), this may end up with experimental artifacts. To say the least, I would urge the authors to try rescue experiments with UGGT1 or 2, to clarify that the defect in UGGT-DKO cells can be reversed. It may also be interesting to see that the subtle difference the authors observed is indeed N-glycan-dependent by testing a non-glycosylated version of the protein (just like NHK-QQQ mutants in Fig. 2C).

      To this reviewer, it is still possible that the involvement of UGGT1 (or 2, if any) could be totally substrate-dependent, and the substrates used in Fig 2D or E happen not to be dependent to the action of UGGTs. To the reviewer, without the data of Fig. 2D and E the authors provide enough evidence to demonstrate the involvement of UGGT1 in preventing premature degradation of glycoprotein ERAD substrates. I am just afraid that the authors may have overinterpreted the data, as if the UGGTs are involved in stabilization of all glycoproteins destined for ERAD.

      3. I am a bit puzzled by the DNJ treatment experiments. First, I do not see the detailed conditions of the DNJ treatment (concentration? Time?). Then, I was a bit surprised to see that there were so little G3M9 glycans formed, and there was about the same amount of G2M9 also formed (Figure 1 Figure supplement 4B-D), despite the fact that glucose trimming of newly syntheized glycoproteins are expected to be completely impaired (unless the authors used DNJ concentration which does not completely impair the trimming of the first Glc). Even considering the involvement of Golgi endo-alpha-mannosidase, a similar amount of G3M9 and G2M9 may suggest that the experimental conditions used for this experiment (i.e. concentration of DNJ, duration of treatment, etc) is not properly optimized.

    3. Reviewer #2 (Public Review):

      In this study, Ninagawa et al., shed light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO cells, they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      While this study convincingly demonstrates early degradation of misfolded glycoproteins in the absence of UGGTs, my major concern is the need for additional experiments to support the "tug of war" model involving UGGTs and EDEMs in influencing the substrate's fate - whether misfolded glycoproteins are pulled into the folding or degradation route. Specifically, it would be valuable to investigate how overexpression of UGGTs and EDEMs in WT cells affects the choice between folding and degradation for misfolded glycoproteins. Considering previous studies indicating that monoglucosylation influences glycoprotein solubility and stability, an essential question is: what is the nature of glycoproteins in UGGTKO/EDEMKO and potentially UGGT/EDEM overexpression cells? Understanding whether these substrates become more soluble/stable when GM9 versus mannose-only translation modification accumulates would provide valuable insights.

      The study delves into the physiological role of UGGT, but is limited in scope, focusing solely on the effect of ATF6alpha in UGGT KO cells' stress response. It is crucial for the authors to investigate the broader impact of UGGT KO, including the assessment of basal ER proteotoxicity levels, examination of the general efflux of glycoproteins from ER, and the exploration of the physiological consequences due to UGGT KO. This broader perspective would be valuable for the wider audience. Additionally, the marked increase in ATF4 activity in UGGTKO requires discussion, which the authors currently omit.

      The discussion section is brief and could benefit from being a separate section. It is advisable for the authors to explore and suggest other model systems or disease contexts to test UGGT's role in the future. This expansion would help the broader scientific community appreciate the potential applications and implications of this work beyond its current scope.

    4. Reviewer #3 (Public Review):

      This manuscript focuses on defining the importance of UGGT1/2 in the process of protein degradation within the ER. The authors prepared cells lacking UGGT1, UGGT2, or both UGGT1/UGGT2 (DKO) HCT116 cells and then monitored the degradation of specific ERAD substrates. Initially, they focused on the ER stress sensor ATF6 and showed that loss of UGGT1 increased the degradation of this protein. This degradation was stabilized by deletion of ERAD-specific factors (e.g., SEL1L, EDEM) or treatment with mannose inhibitors such as kifunesine, indicating that this is mediated through a process involving increased mannose trimming of the ATF6 N-glycan. This increased degradation of ATF6 impaired the function of this ER stress sensor, as expected, reducing the activation of downstream reporters of ER stress-induced ATF6 activation. The authors extended this analysis to monitor the degradation of other well-established ERAD substrates including A1AT-NHK and CD3d, demonstrating similar increases in the degradation of destabilized, misfolding protein substrates in cells deficient in UGGT. Importantly, they did experiments to suggest that re-overexpression of wild-type, but not catalytically deficient, UGGT rescues the increased degradation observed in UGGT1 knockout cells. Further, they demonstrated the dependence of this sensitivity to UGGT depletion on N-glycans using ERAD substrates that lack any glycans. Ultimately, these results suggest a model whereby depletion of UGGT (especially UGGT1 which is the most expressed in these cells) increases degradation of ERAD substrates through a mechanism involving impaired re-glucosylation and subsequent re-entry into the calnexin/calreticulin folding pathway.

      I must say that I was under the impression that the main conclusions of this paper (i.e., UGGT1 functions to slow the degradation of ERAD substrates by allowing re-entry into the lectin folding pathway) were well-established in the literature. However, I was not able to find papers explicitly demonstrating this point. Because of this, I do think that this manuscript is valuable, as it supports a previously assumed assertion of the role of UGGT in ER quality control. However, there are a number of issues in the manuscript that should be addressed.

      Notably, the focus on well-established, trafficking-deficient ERAD substrates, while a traditional approach to studying these types of processes, limits our understanding of global ER quality control of proteins that are trafficked to downstream secretory environments where proteins can be degraded through multiple mechanisms. For example, in Figure 1-Figure Supplement 2, UGGT1/2 knockout does not seem to increase the degradation of secretion-competent proteins such as A1AT or EPO, instead appearing to stabilize these proteins against degradation. They do show reductions in secretion, but it isn't clear exactly how UGGT loss is impacting ER Quality Control of these more relevant types of ER-targeted secretory proteins.

      Lastly, I don't understand the link between UGGT, ATF6 degradation, and ATF6 activation. I understand that the idea is that increased ATF6 degradation afforded by UGGT depletion will impair activation of this ER stress sensor, but if that is the case, how does UGGT2 depletion, which only minimally impacts ATF6 degradation (Fig. 1), impact activation to levels similar to the UGGT1 knockout (Fig 4)? This suggests UGGT1/2 may serve different functions beyond just regulating the degradation of this ER stress sensor. Also, the authors should quantify the impaired ATF6 processing shown in Fig 4B-D across multiple replicates.

      Ultimately, I do think the data support a role for UGGT (especially UGGT1) in regulating the degradation of ERAD substrates, which provides experimental support for a role long-predicted in the field. However, there are a number of ways this manuscript could be strengthened to further support this role, some of which can be done with data they have in hand (e.g., the stats) or additional new experiments.

    1. eLife assessment

      In this study, Ger and colleagues present a valuable new technique that uses recurrent neural networks to distinguish between model misspecification and behavioral stochasticity when interpreting cognitive-behavioral model fits. Evidence for the usefulness of this technique, which is currently based primarily on a relatively simple toy problem, is considered incomplete but could be improved via comparisons to existing approaches and/or applications to other problems. This technique addresses a long-standing problem that is likely to be of interest to researchers pushing the limits of cognitive computational modeling.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Ger and colleagues address an issue that often impedes computational modeling: the inherent ambiguity between stochasticity in behavior and structural mismatch between the assumed and true model. They propose a solution to use RNNs to estimate the ceiling on explainable variation within a behavioral dataset. With this information in hand, it is possible to determine the extent to which "worse fits" result from behavioral stochasticity versus failures of the cognitive model to capture nuances in behavior (model misspecification). The authors demonstrate the efficacy of the approach in a synthetic toy problem and then use the method to show that poorer model fits to 2-step data in participants with low IQ are actually due to an increase in inherent stochasticity, rather than systemic mismatch between model and behavior.

      Strengths:<br /> Overall I found the ideas conveyed in the paper interesting and the paper to be extremely clear and well-written. The method itself is clever and intuitive and I believe it could be useful in certain circumstances, particularly ones where the sources of structure in behavioral data are unknown. In general, the support for the method is clear and compelling. The flexibility of the method also means that it can be applied to different types of behavioral data - without any hypotheses about the exact behavioral features that might be present in a given task.

      Weaknesses:<br /> That said, I have some concerns with the manuscript in its current form, largely related to the applicability of the proposed methods for problems of importance in computational cognitive neuroscience. This concern stems from the fact that the toy problem explored in the manuscript is somewhat simple, and the theoretical problem addressed in it could have been identified through other means (for example through the use of posterior predictive checking for model validation), and the actual behavioral data analyzed were interpreted as a null result (failure to reject that the behavioral stochasticity hypothesis), rather than actual identification of model-misspecification. I expand on these primary concerns and raise several smaller points below.

      A primary question I have about this work is whether the method described would actually provide any advantage for real cognitive modeling problems beyond what is typically done to minimize the chance of model misspecification (in particular, post-predictive checking). The toy problem examined in the manuscript is pretty extreme (two of the three synthetic agents are very far from what a human would do on the task, and the models deviate from one another to a degree that detecting the difference should not be difficult for any method). The issue posed in the toy data would easily be identified by following good modeling practices, which include using posterior predictive checking over summary measures to identify model insufficiencies, which in turn would call for the need for a broader set of models (See Wilson & Collins 2019). Thus, I am left wondering whether this method could actually identify model misspecification in real world data, particularly in situations where standard posterior predictive checking would fall short. The conclusions from the main empirical data set rest largely on a null result, and the utility of a method for detecting model misspecification seems like it should depend on its ability to detect its presence, not just its absence, in real data.

      Beyond the question of its advantage above and beyond data- and hypothesis-informed methods for identifying model misspecification, I am also concerned that if the method does identify a model-insufficiency, then you still would need to use these other methods in order to understand what aspect of behavior deviated from model predictions in order to design a better model. In general, it seems that the authors should be clear that this is a tool that might be helpful in some situations, but that it will need to be used in combination with other well-described modeling techniques (posterior predictive checking for model validation and guiding cognitive model extensions to capture unexplained features of the data). A general stylistic concern I have with this manuscript is that it presents and characterizes a new tool to help with cognitive computational modeling, but it does not really adhere to best modeling practices (see Collins & Wilson, eLife), which involve looking at data to identify core behavioral features and simulating data from best-fitting models to confirm that these features are reproduced. One could take away from this paper that you would be better off fitting a neural network to your behavioral data rather than carefully comparing the predictions of your cognitive model to your actual data, but I think that would be a highly misleading takeaway since summary measures of behavior would just as easily have diagnosed the model misspecification in the toy problem, and have the added advantage that they provide information about which cognitive processes are missing in such cases.

      As a more minor point, it is also worth noting that this method could not distinguish behavioral stochasticity from the deterministic structure that is not repeated across training/test sets (for example, because a specific sequence is present in the training set but not the test set). This should be included in the discussion of method limitations. It was also not entirely clear to me whether the method could be applied to real behavioral data without extensive pretraining (on >500 participants) which would certainly limit its applicability for standard cases.

      The authors focus on model misspecification, but in reality, all of our models are misspecified to some degree since the true process-generating behavior almost certainly deviates from our simple models (ie. as George Box is frequently quoted, "all models are wrong, but some of them are useful"). It would be useful to have some more nuanced discussion of situations in which misspecification is and is not problematic.

    3. Reviewer #2 (Public Review):

      SUMMARY:<br /> In this manuscript, Ger and colleagues propose two complementary analytical methods aimed at quantifying the model misspecification and irreducible stochasticity in human choice behavior. The first method involves fitting recurrent neural networks (RNNs) and theoretical models to human choices and interpreting the better performance of RNNs as providing evidence of the misspecifications of theoretical models. The second method involves estimating the number of training iterations for which the fitted RNN achieves the best prediction of human choice behavior in a separate, validation data set, following an approach known as "early stopping". This number is then interpreted as a proxy for the amount of explainable variability in behavior, such that fewer iterations (earlier stopping) correspond to a higher amount of irreducible stochasticity in the data. The authors validate the two methods using simulations of choice behavior in a two-stage task, where the simulated behavior is generated by different known models. Finally, the authors use their approach in a real data set of human choices in the two-stage task, concluding that low-IQ subjects exhibit greater levels of stochasticity than high-IQ subjects.

      STRENGTHS:<br /> The manuscript explores an extremely important topic to scientists interested in characterizing human decision-making. While it is generally acknowledged that any computational model of behavior will be limited in its ability to describe a particular data set, one should hope to understand whether these limitations arise due to model misspecification or due to irreducible stochasticity in the data. Evidence for the former suggests that better models ought to exist; evidence for the latter suggests they might not.

      To address this important topic, the authors elaborate carefully on the rationale of their proposed approach. They describe a variety of simulations - for which the ground truth models and the amount of behavioral stochasticity are known - to validate their approaches. This enables the reader to understand the benefits (and limitations) of these approaches when applied to the two-stage task, a task paradigm commonly used in the field. Through a set of convincing analyses, the authors demonstrate that their approach is capable of identifying situations where an alternative, untested computational model can outperform the set of tested models, before applying these techniques to a realistic data set.

      WEAKNESSES:<br /> The most significant weakness is that the paper rests on the implicit assumption that the fitted RNNs explain as much variance as possible, an assumption that is likely incorrect and which can result in incorrect conclusions. While in low-dimensional tasks RNNs can predict behavior as well as the data-generating models, this is not *always* the case, and the paper itself illustrates (in Figure 3) several cases where the fitted RNNs fall short of the ground-truth model. In such cases, we cannot conclude that a subject exhibiting a relatively poor RNN fit necessarily has a relatively high degree of behavioral stochasticity. Instead, it is at least conceivable that this subject's behavior is generated precisely (i.e., with low noise) by an alternative model that is poorly fit by an RNN - e.g., a model with long-term sequential dependencies, which RNNs are known to have difficulties in capturing.

      These situations could lead to incorrect conclusions for both of the proposed methods. First, the model misspecification analysis might show equal predictive performance for a particular theoretical model and for the RNN. While a scientist might be inclined to conclude that the theoretical model explains the maximum amount of explainable variance and therefore that no better model should exist, the scenario in the previous paragraph suggests that a superior model might nonetheless exist. Second, in the early-stopping analysis, a particular subject may achieve optimal validation performance with fewer epochs than another, leading the scientist to conclude that this subject exhibits higher behavioral noise. However, as before, this could again result from the fact that this subject's behavior is produced with little noise by a different model. Admittedly, the existence of such scenarios *in principle* does not mean that such scenarios are common, and the conclusions drawn in the paper are likely appropriate for the particular examples analyzed. However, it is much less obvious that the RNNs will provide optimal fits in other types of tasks, particularly those with more complex rules and long-term sequential dependencies, and in such scenarios, an ill-advised scientist might end up drawing incorrect conclusions from the application of the proposed approaches.

      In addition to this general limitation, the paper also makes a few additional claims that are not fully supported by the provided evidence. For example, Figure 4 highlights the relationship between the optimal epochs and agent noise. Yet, it is nonetheless possible that the optimal epoch is influenced by model parameters other than inverse temperature (e.g., learning rate). This could again lead to invalid conclusions, such as concluding that low-IQ is associated with optimal epoch when an alternative account might be that low-IQ is associated with low learning rate, which in turn is associated with optimal epoch. Yet additional factors such as the deep double-descent (Nakkiran et al., ICLR 2020) can also influence the optimal epoch value as computed by the authors.

      An additional issue is that Figure 4 reports an association between optimal epoch and noise, but noise is normalized by the true minimal/maximal inverse-temperature of hybrid agents (Eq. 23). It is thus possible that the relationship does not hold for more extreme values of inverse-temperature such as beta=0 (extremely noisy behavior) or beta=inf (deterministic behavior), two important special cases that should be incorporated in the current study. Finally, even taking the association in Figure 4 at face value, there are potential issues with inferring noise from the optimal epoch when their correlation is only r~=0.7. As shown in the figures, upon finding a very low optimal epoch for a particular subject, one might be compelled to infer high amounts of noise, even though several agents may exhibit a low optimal epoch despite having very little noise.

      APPRAISAL AND DISCUSSION:<br /> Overall, the authors propose a novel method that aims to solve an important problem, but whose generality might be limited only to special cases. In the future, it would be beneficial to test the proposed approach in a broader setting, including simulations of different tasks, different model classes, different model parameters, and different amounts of behavioral noise. Nonetheless, even without such additional work, the proposed methods are likely to be used by cognitive scientists and neuroscientists interested in assessing the quality and limits of their behavioral models.

    1. eLife assessment

      This important study evaluates a model for multisensory correlation detection, focusing on the detection of correlated transients in visual and auditory stimuli. Overall, the experimental design is sound and the evidence is compelling. The synergy between the experimental and theoretical aspects of the paper is strong. The work will be of interest to neuroscientists and psychologists working in the domain of sensory processing and perception.

    2. Reviewer #1 (Public Review):

      The authors present a model for multisensory correlation detection that is based on the neurobiologically plausible Hassenstein Reichardt detector. It modifies their previously reported model (Parise & Ernst, 2016) in two ways: a bandpass (rather than lowpass) filter is initially applied and the filtered signals are then squared. The study shows that this model can account for synchrony judgment, temporal order judgment, etc in two new data sets (acquired in this study) and a range of previous data sets.

      Strengths:<br /> 1. The model goes beyond descriptive models such as cumulative Gaussians for TOJ and differences in cumulative Gaussians for SJ tasks by providing a mechanism that builds on the neurobiologically plausible Hassenstein-Reichardt detector.<br /> 2. This modified model can account for results from two new experiments that focus on the detection of correlated transients and frequency doubling. The model also accounts for several behavioural results from experiments including stochastic sequences of A/V events and sinewave modulations.

      Additional thoughts:<br /> 1. The model introduces two changes: bandpass filtering and squaring of the inputs. The authors emphasize that these changes allow the model to focus selectively on transient rather than sustained channels. But shouldn't the two changes be introduced separately? Transients may also be detected for signed signals.

      2. Because the model is applied only to rather simple artificial signals, it remains unclear to what extent it can account for AV correlation detection for naturalistic signals. In particular, speech appears to rely on correlation detection of signed signals. Can this modified model account for SJ or TOJ judgments for naturalistic signals?

      Even Nidiffer et al. (2018) which is explicitly modelled by the authors report a significant difference in performance for correlated and anti-correlated signals. This seems to disagree with the results of study 1 which is reported in the current paper and the model's predictions. How can these contradicting results be explained? In case the brain performs correlation detection on signed and unsigned signals, is a more complex mechanism needed to arbitrate between those two mechanisms?

      3. The number of parameters seems quite comparable for the authors' model and descriptive models (e.g. PSF models). This is because time constants require refitting (at least for some experimental data sets) and the correlation values need to be passed through a response mode (i.e. probit function) to account for behavioural data. It remains unclear how the brain adjusts the time constants to different sensory signals.

      4. Fujisaki and Nishida (2005, 2006) proposed mechanisms for AV correlation detection based on the Hassenstein-Reichardt motion detector (though not formalized as a computational model).

    3. Reviewer #2 (Public Review):

      Summary:<br /> This is an interesting and well-written manuscript that seeks to detail the performance of two human psychophysical experiments designed to look at the relative contributions of transient and sustained components of a multisensory (i.e., audiovisual) stimulus to their integration. The work is framed within the context of a model previously developed by the authors and is now somewhat revised to better incorporate the experimental findings. The major takeaway from the paper is that transient signals carry the vast majority of the information related to the integration of auditory and visual cues, and that the Multisensory Correlation Detector (MCD) model not only captures the results of the current study but is also highly effective in capturing the results of prior studies focused on temporal and causal judgments.

      Strengths:<br /> Overall the experimental design is sound and the analyses are well performed. The extension of the MCD model to better capture transients makes a great deal of sense in the current context, and it is very nice to see the model applied to a variety of previous studies.

      Weaknesses:<br /> My one major issue with the paper revolves around its significance. In the context of a temporal task(s), is it in any way surprising that the important information is carried by stimulus transients? Stated a bit differently, isn't all of the important information needed to solve the task embedded in the temporal dimension? I think the authors need to better address this issue to punch up the significance of their work.

      In a more minor comment, I think there also needs to be a bit more effort into articulating the biological plausibility/potential instantiations of this sustained versus transient dichotomy. As written, the paper suggests that these are different "channels" in sensory systems, when in reality many neurons (and neural circuits) carry both on the same lines.

    1. eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      Points that could be addressed or discussed:<br /> 1. The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      2. The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      3. It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      4. Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      5. It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

    3. Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice). The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

    1. eLife assessment

      This important study seeks to advance the current understanding of intergenerational olfactory changes associated with odor-induced fear conditioning in mice. Whilst the overall approach employed by the authors is appropriate and the evidence presented in support of claims is solid, there is general agreement that specific points - particularly the lack of effect in the F1 generation - deserve further attention.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The study by Liff et al significantly advances our understanding of transgenerational olfactory changes resulting from fear conditioning, particularly in revealing elevated odor-encoding neurons in both conditioned mice (F0) and their progeny (F1). The authors attribute F0 increases to biased stem cell receptor selection, building upon the seminal work of Dias and Ressler (2014). While the dedication and use of novel histological techniques add strength to the study, there are notable weaknesses, including the need for clarification on discrepancies with previous findings, the decision to modify paradigms, and the presentation of behavioral data in supplementary materials.

      Overall, the manuscript has strong potential but would benefit from addressing these weaknesses and minor recommendations to enhance its quality and contribution to the field.

      Strengths:<br /> - Significant contribution to understanding transgenerational olfactory changes induced by fear conditioning.<br /> - Use of novel histological techniques and exploration of stem cell involvement adds depth to the study.

      Weaknesses:<br /> Discrepancies with previous findings need clarification, especially regarding the absence of similar behavioral effects in F1. Lack of discussion on the decision to modify paradigms instead of using the same model. Presentation of behavioral data in supplementary materials, with a recommendation to include behavioral quantification in main figures. Absence of quantification for freezing behavior, a crucial measure in fear conditioning.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The authors examined inherited changes to the olfactory epithelium produced by odor-shock pairings. The manuscript demonstrates that odor fear-conditioning biases olfactory bulb neurogenesis toward more production of the olfactory sensory neurons engaged by the odor-shock paring. Further, the manuscript reveals that this bias remains in first-generation male and female progeny produced by trained parents. Surprisingly, there was a disconnect between the increased morphology of the olfactory epithelium for the conditioned odor and the response to odor presentation. The expectation based on previous literature and the morphological results was that F1 progeny would also show an aversion to the odor stimulus. However, the authors found that F1 progeny were not more sensitive to the odor compared to littermate controls.

      Strengths:<br /> The manuscript includes conceptual innovation and some technical innovation. The results validate previous findings that were deemed controversial in the field, which is a major strength of the work. Moreover, these studies were conducted using a combination of genetically modified animals and state-of-the-art imaging techniques, highlighting the rigorous nature of the research. Lastly, the authors provide novel mechanistic details regarding the remodeling of the olfactory epithelium, demonstrating that biased neurogenesis, as opposed to changes in survival rates, account for the increase in odorant receptors after training.

      Weaknesses:<br /> The main weakness is the disconnect between the morphological changes reported and the lack of change in aversion to the odorant in F1 progeny. The authors also do not address the mechanisms underlying the inheritance of the phenotype, which may lie outside of the scope of the present study.

    4. Reviewer #3 (Public Review):

      In their paper entitled "Fear conditioning biases olfactory stem cell receptor fate" Liff et al. address the still enigmatic (and quite fascinating) phenomenon of intergenerationally inherited changes in the olfactory system in response to odor-dependent fear conditioning.

      In the abstract / summary, the authors raise expectations that are not supported by the data. For example, it is claimed that "increases in F0 were due to biased stem cell receptor choice." While an active field of study that has seen remarkable progress in the past decade, olfactory receptor gene choice and its relevant timing in particular is still unresolved. Here, Liff et al., do not pinpoint at what stage during differentiation the "biased choice" is made.

      Similarly, the concluding statement that the study provides "insight into the heritability of acquired phenotypes" is somewhat misleading. The experiments do not address the mechanisms underlying heritability.

      The statement that "the percentage of newborn M71 cells is 4-5 times that of MOR23 may simply reflect differences in the birth rates of the two cell populations" should, if true, result in similar differences in the occurrence of mature OSNs with either receptor identity. According to Fig. 1H & J, however, this is not the case.

      An important result is that Liff et al., in contrast to results from other studies, "do not observe the inheritance of odor-evoked aversion to the conditioned odor in the F1 generation." This discrepancy needs to be discussed.

      The authors speculate that "the increase in neurons responsive to the conditioned odor could enhance the sensitivity to, or the discrimination of, the paired odor in F0 and F1. This would enable the F1 population to learn that odor predicts shock with fewer training cycles or less odorant when trained with the conditioned odor." This is a fascinating idea that, in fact, could have been readily tested by Liff and coworkers. If this hypothesis were found true, this would substantially enhance the impact of the study for the field.

    1. eLife assessment

      This useful study explores how archerfish adapt their shooting behavior to environmental changes, particularly airflow perturbations. It will be of interest to experts interested in mechanisms for motor learning. While the evidence for an internal model for adaptation is solid, evidence for adaptation to light refraction, as initially hypothesized, is inconclusive. As such, the evidence supporting an egocentric representation might be caused by alternative mechanisms to airflow perturbations.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:<br /> The results of both experiments were convincing, given the observable learning curve and the clear aftereffect. The ability of these fish to correct their errors is also remarkable. Nonetheless, certain aspects of the experiment's motivation and conclusions temper my enthusiasm.

      1. The authors motivated their experiments with two hypotheses, asking whether archerfish can adapt to light refractions using an innate look-up table as opposed to possessing a capacity to adapt. However, the present experiments are not designed to arbitrate between these ideas. That is, the current experiments do not rule out the look-up table hypothesis, which predicts, for example, that motor adaptation may not generalize to de novo situations with arbitrary action-outcome associations. Such look-up table operations may also show set-size effects, whereas other mechanisms might not. Whether their capacity to adapt is innate or learned was also not directly tested, as noted by the authors in the discussion. Could the authors clarify how they see their results positioned in light of the two hypotheses noted in the Introduction?

      2. The authors claim that archerfish use egocentric coordinates rather than allocentric coordinates. However, the current experiments do not make clear whether the archerfish are "aware" that their position was flipped (as the authors noted, no visual cues were provided). As such, for example, if the fish were "unaware" of the switch, can the authors still assert that generalization occurs in egocentric coordinates? Or simply that, when archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions.

      3. The experiments offer an opportunity to examine whether archerfish demonstrate any savings from one session to another. Savings are often attributed to a faster look-up table operation. As such, if archerfish do not exhibit savings, it might indicate a scenario where they do not possess a refined look-up table and must rely on implicit mechanisms to relearn each time.

      4. The authors suggest that motor adaptation in response to wind may hint at mechanisms used to adapt to light refraction. However, how strong of a parallel can one draw between adapting to wind versus adapting to light refraction? This seems important given the claims in this paper regarding shared mechanisms between these processes. As a thought experiment, what would the authors predict if they provided a perturbation more akin to light refraction (e.g., a film that distorts light in a new direction, rather than airflow)?

      5. The number of fish excluded was greater than those included. This raises the question as to whether these fish are merely elite specimens or representative of the species in general.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:<br /> The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data.

      Weaknesses:<br /> While the introduction, the title, and the discussion are associated with the refraction index, the latter was not altered, and neither was the position of the target. The "shot" was altered, this is a simple motor adaptation task and not a question related to the refractory index. The title, abstract, and the introduction are thus misleading. The authors appear to deduce from their data that the wind is not taken into account and thus conclude that the fish perceive a different refractory index. This might be based on the assumption that fish always hit their target, which is not the case. The airflow does not alter the position of the target, thus the airflow does not alter the refractive index. The fish likely does not perceive the airflow, thus alteration of its shooting abilities is likely assumed to be an "internal problem" of shooting. I am sorry but I am not able to understand the conclusion they draw from their data.

    1. eLife assessment

      This valuable study uses a novel experimental design to elegantly demonstrate how we exploit stimulus structure to overcome working memory capacity limits. While the behavioural evidence is convincing, the neural evidence is incomplete, as it only provides partial support for the proposed information compression mechanism. This study will be of interest to cognitive neuroscientists studying structure learning and memory.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Huang and Luo investigated whether regularities between stimulus features can be exploited to facilitate the encoding of each set of stimuli in visual working memory, improving performance. They recorded both behavioural and neural (EEG) data from human participants during a sequential delayed response task involving three items with two properties: location and colour. In the key condition ('aligned trajectory'), the distance between locations of successively presented stimuli was identical to their 'distance' in colour space, permitting a compression strategy of encoding only the location and colour of the first stimulus and the relative distance of the second and third stimulus (as opposed to remembering 3 locations and 3 colours, this would only require remembering 1 location, 1 colour, and 2 distances). Participants recalled the location and colour of each item after a delay.

      Consistent with the compression account, participants' location and colour recall errors were correlated and were overall lower compared to a non-compressible condition ('misaligned trajectory'). Multivariate analysis of the neural data permitted decoding of the locations and colours during encoding. Crucially, the relative distance could also be decoded - a necessary ingredient for the compression strategy.

      Strengths:<br /> The main strength of this study is a novel experimental design that elegantly demonstrates how we exploit stimulus structure to overcome working memory capacity limits. The behavioural results are robust and support the main hypothesis of compressed encoding across a number of analyses. The simple and well-controlled design is suited to neuroimaging studies and paves the way for investigating the neural basis of how environmental structure is detected and represented in memory. Prior studies on this topic have primarily studied behaviour only (e.g., Brady & Tenenbaum, 2013).

      Weaknesses:<br /> The main weakness of the study is that the EEG results do not make a clear case for compression or demonstrate its neural basis. If the main aim of this strategy is to improve memory maintenance, it seems that it should be employed during the encoding phase. From then on, the neural representation in memory should be in the compressed format. The only positive evidence for this occurs in the late encoding phase (the re-activation of decoding of the distance between items 1 and 2, Fig. 5A), but the link to behaviour seems fairly weak (p=0.068). Stronger evidence would be showing decoding of the compressed code during memory maintenance or recall, but this is not presented. On the contrary, during location recall (after the majority of memory maintenance is already over), colour decoding re-emerges, but in the un-compressed item-by-item code (Fig. 4B). The authors suggest that compression is consolidated at this point, but its utility at this late stage is not obvious.

      Impact:<br /> This important study elegantly demonstrates that the use of shared structure can improve capacity-limited visual working memory. The paradigm and approach explicitly link this field to recent findings on the role of replay in structure learning and will therefore be of interest to neuroscientists studying both topics.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this study, the authors wanted to test if using a shared relational structure by a sequence of colors in locations can be leveraged to reorganize and compress information.

      Strength:<br /> They applied machine learning to EEG data to decode the neural mechanism of reinstatement of visual stimuli at recall. They were able to show that when the location of colors is congruent with the semantically expected location (for example, green is closer to blue-green than purple) the related color information is reinstated at the probed location. This reinstatement was not present when the location and color were not semantically congruent (meaning that x displacement in color ring location did not displace colors in the color space to the same extent) and semantic knowledge of color relationship could not be used for reducing the working memory load or to benefit encoding and retrieval in short term memory.

      Weakness:<br /> The experiment and results did not address any reorganization of information or neural mechanism of working memory (that would be during the gap between encoding and retrieval). There was also a lack of evidence to rule out that the current observation can be addressed by schematic abstraction instead of the utilization of a cognitive map.<br /> The likely impact of the initial submission of the study would be in the utility of the methods that would be helpful for studying a sequence of stimuli at recall. The paper was discussed in a narrow and focused context, referring to limited studies on cognitive maps and replay. The bigger picture and long history of studying encoding and retrieval of schema-congruent and schema-incongruent events is not discussed.

    1. eLife assessment

      This is an important theoretical study providing insight into how fluctuations in excitability can contribute to gradual changes in the mapping between population activity and stimulus, commonly referred to as representational drift. The authors provide convincing evidence that fluctuations can contribute to drift. Overall, this is a well-presented study that explores the question of how changes in intrinsic excitability can influence distinct memory representations.

    1. eLife assessment

      This study provides important findings based on convincing evidence demonstrating that females and males have different strategies to regulate energy consumption in the brain in the context of low energy intake. While food deprivation reduces energy consumption and visual processing performance in the visual cortex of males, the female cortex is unaffected, likely at the expense of other functions. This study is relevant for scientists interested in body metabolism and neuroscience.

    2. Reviewer #1 (Public Review):

      Padamsey et al. followed up on their previous study in which they found that male mice sacrifice visual cortex computation precision to save energy in periods of food restriction (Padamsey et al. 2021, Neuron). In the present study, the authors find that female mice show much lower levels of adaptation in response to food restriction on the level of metabolic signaling and visual cortex computation. This is an important finding for understanding sex differences in adaptation to food scarcity and also impacts the interpretation of studies employing food restriction in behavioral analyses and learning paradigms.

      The manuscript is, in general, very clear and the conclusions are straightforward. The main limitation, that the number of experiments is insufficient to compare the effects of food restriction in males and females directly, is discussed by the authors: to address this point they use Bayes factor analysis to provide an estimate of the likelihood that females and males indeed differ in terms of energy metabolism and sensory processing adaptions during food restriction.

      The following points are not entirely clear yet.<br /> 1. For a number of experiments the authors use their new data set on females and compare that with the data set previously published on males. In how far are these data sets comparable? Have they been performed originally in parallel for example using siblings of different sexes or have the experiments been conducted several years apart from each other? What is the expected variability, if one repeated these experiments with the same sex considering the differences/similarities between experimental setups, housing conditions, interindividual differences, etc.?

      2. Energy consumption and visual processing may differ between periods in which animals are in different behavioral states. Is there a possibility that male and female mice differed in behavioral state during measurements? Were animals running or resting during visual stimulation and during ATP measurements?

      3. Related to the previous point: the authors show that ATP consumption was reduced in male mice during visual stimulation. What about visual cortex ATP consumption in the absence of visual stimulation? Do food-deprived males and/or females show lower ATP consumption in the visual cortex e.g. during sleep?

    3. Reviewer #2 (Public Review):

      Summary:<br /> Padamsey et al build up on previous significant work from the same group which demonstrated robust changes in the visual cortex in male mice from long-term (2-3 weeks) food restriction. Here, the authors extend this finding and reveal striking sex-specific differences in the way the brain responds to food restriction. The measures included the whole-body measure of serum leptin levels, and V1-specific measures of activity of key molecular players (AMPK and PPARα), gene expression patterns, ATP usage in V1, and the sharpness of visual stimulus encoding (orientation tuning). All measures supported the conclusion that the female mouse brain (unlike in males) does not change its energy usage and cortical functional properties on comparable food restriction.

      While the effect of food restriction on more peripheral tissue such as muscle and bones has been well studied, this result contributes to our understanding of how the brain responds to food restriction. This result is particularly significant given that the brain consumes a large fraction of the body's energy consumption (20%), with the cortex accounting for half of that amount. The sex-specific differences found here are also relevant for studies using food restriction to investigate cortical function.

      Strengths:<br /> The study uses a wide range of approaches mentioned above which converge on the same conclusion, strengthening the core claim of the study.

      Weaknesses:<br /> Since the absence of a significant effect does not prove the absence of any changes, the study cannot claim that the female mouse brain does not change in response to food restriction. However, the authors do not make this claim. Instead, they make the well-supported claim that there is a sex-specific difference in the response of V1 to food restriction.

    4. Reviewer #3 (Public Review):

      Summary:<br /> The authors food-deprived male and female mice and observed a much stronger reduction of leptin levels, energy consumption in the visual cortex, and visual coding performance in males than females. This indicates a sex-specific strategy for the regulation of the energy budget in the face of low food availability.

      Strengths:<br /> This study extends a previous study demonstrating the effect of food deprivation on visual processing in males, by providing a set of clear experimental results, demonstrating the sex-specific difference. It also provides hypotheses about the strategy used by females to reduce energy budget based on the literature.

      Weaknesses:<br /> The authors do not provide evidence that females are not impacted by visually guided behaviors contrary to what was shown in males in the previous study.

    1. eLife assessment

      Zhang et al. deliver an important transcriptomic atlas of the human spinal cord, combining single-cell and spatial transcriptomics to unveil molecular insights. While convincingly overcoming Visium limitations using snRNA-seq, the manuscript is criticized for its largely observational approach and lack of quantitative analysis, especially in supporting claims about sex differences in motor neurons and DRG-spinal cord neuronal interactions.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Zhang et al. provide valuable data for understanding molecular features of the human spinal cord. The authors made considerable efforts to acknowledge and objectively address the limitations of Visium while attempting to overcome them by utilizing single-nucleus RNA sequencing (snRNA-seq) from the same tissue. By mapping snRNA-seq clusters to Visium data, they offer spatial information, complemented by RNA-ISH and immunofluorescence (IF) validation. They also discuss gender-related differences and the similarities between human and mouse data, aiming to establish a crucial foundation for experimental research. However, I have some comments below.

      1. The observation of gender-related differences is interesting. The authors reported that SCN10A, associated with nociceptos, exhibited stronger expression in females. While they intend to validate this finding through IF, the quantitative difference is not clearly observed in the IF data (Figure 5f). It would be essential to provide validation through DAPI-based cell counts, demonstrating the difference in CHAT/SCNA10A co-expression.

      2. It is meritorious that in novel features of the transcriptomic study, the authors considered gender-related differences and similarities between humans and mice. Nevertheless, despite the extensive bioinformatics-based analyses performed, the results mostly confirm what has been previously reported (Nguyen et al. 2021; Yadav et al. 2023; Jung et al. 2023).

      3. The study did not perform snRNA-seq in the DRG. The limitations of Visium in cell type separation are acknowledged, and the authors are aware that Visium alone has limitations in describing cell expression patterns. The authors need to validate their findings via analyses of public DRG snRNA-seq data (Jung et al. 2023 Ncom; Nguyen et al. 2021eLife) before drawing broad conclusions.

      4. Figure 7's comparison between human Visium spot data and Renthal et al.'s mouse snRNA-seq may have limitations as Visium spot data could not provide a transcriptional profile at the single cell resolution. The authors need to clarify this point.

      5. Recent findings indicate that type 2 cytokines can directly stimulate sensory neurons. This includes the expression of IL-4RA, IL31RA, and IL13RA in DRG. These findings support the role of JAK kinase inhibitors in mediating chronic itch. Demonstrating the expression of these itch receptors in DRG would be valuable.

      6. Given that juxtacrine and paracrine signals operate from 0 to 200 um, spatial information is vital to understanding intercellular communication. The presentation of spatial information using Visium is meaningful, and more comprehensive analyses of potential interaction based on distance should be provided, beyond the top 10 interactions (Figure 8).

      7. The gender-related differences are interesting and, if possible, it would be interesting to explore whether age-related differences or degeneration-related factors exist. Using public data could allow the examination of age-related changes.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this paper, the authors generated a comprehensive dataset of human spinal cord transcriptome using single-cell RNA sequencing and the Visium spatial transcriptomics platform. They employed Visium data to determine the spatial orientation of each cell type. Using single-cell RNA sequencing data, they identified differentially expressed genes by comparing human and mouse samples, as well as male and female samples.

      Strengths:<br /> This study offers a thorough exploration of both cellular and spatial heterogeneity within the human spinal cord. The resulting atlas datasets and analysis findings represent valuable resources for the neuroscience community.

      Weaknesses:<br /> The analysis of spatial transcriptomics data was conducted as it is single-cell RNAseq data. However, there are established tools for effectively integrating these two types of data. The incorporation of deconvolution methods could enhance the characterization of each spot's cell type composition.

    4. Reviewer #3 (Public Review):

      Summary:<br /> Zhang et al sought to use spatial transcriptomics and single-nucleus RNA sequencing to classify human spinal cord neurons. The authors reported 17 clusters on 10x Visium slides (6 donors) and 21 clusters by single-nucleus sequencing (9 donors). The authors tried to compare the results to those reported in mice and claimed similar patterns with some differing genes.

      Strengths:<br /> The manuscript provides a valuable database for the molecular and cellular organization of adult human spinal cords in addition to published datasets (Andersen, et al. 2023; Yadav, et al. 2023).

      Weaknesses:<br /> The results are largely observatory and lack quantitative analysis. Moreover, the assertions regarding the sex differences in motor neurons and the potential interactions between DRG and spinal cord neuronal subclusters appear preliminary and necessitate more rigorous validation.

    1. eLife assessment

      This study presents findings on the structure and dynamics of the Type I ABC importer and bacterial osmolarity regulator OpuA, addressing the question of whether the substrate binding domains physically interact in a salt-dependent manner. Based on a collective assessment of the single-molecule fluorescence resonance energy transfer and cryogenic electron microscopy data, the researchers convincingly conclude that the substrate domains directly interact. These findings are valuable and it will be interesting to see if future studies can provide further evidence of this direct interaction and define it in further detail.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The type I ABC importer OpuA from Lactococcus lactis is the best-studied transporter involved in osmoprotection. In contrast to most ABC import systems, the substrate binding protein is fused via a short linker to the transmembrane domain of the transporter. Consequently, this moiety is called the substrate binding domain (SBD). OpuA has been studied in the past in great detail and we have a very detailed knowledge about function, mechanisms of activation and deactivation as well as structure.

      Strengths:<br /> Application of smFRET to unravel transient interactions of the SBDs. The method is applied at a superb quality and the data evaluation is excellent.

      Weaknesses:<br /> The proposed model is not directly supported by experimental data. Rather all alternative models are excluded as they do not fit the obtained data.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this report, the authors used solution-based single-molecule FRET and low-resolution cryo-EM to investigate the interactions between the substrate-binding domains of the ABC-importer OpuA from Lactococcus lactis. Based on their results, the authors suggest that the SBDs interact in an ionic strength-dependent manner.

      Strengths:<br /> The strength of this manuscript is the uniqueness and importance of the scientific question, the adequacy of the experimental system (OpuA), and the combination of two very powerful and demanding experimental approaches.

      Weaknesses:<br /> A demonstration that the SBDs physically interact with one another and that this interaction is important for the transport mechanism will greatly strengthen the claims of the authors. The relation to cooperativity is also unclear.

    1. Author Response

      Reviewer #3 (Public Review):

      [...] Weaknesses:

      The study produces a large amount of data that is in general cohesive and support the main conclusions, but more thorough considerations on some of their findings may be helpful, as exemplified by the following:

      1) the effect of microglial ablation on chloral hydrate-induced RORR in Fig. 1B appears to be not the same as other anesthetics. what does this mean?

      2) Macrophage ablation impedes anesthesia emergence from pentobarbital (Fig. 3C). how may this occur?

      3) examination of the potential effect of microglial depletion on dendritic spine density is interesting but the experimental design does not seem to align well with the PPR and eEPSC data, which indicate a reduction in presynaptic release (Fig.10E) and increase of postsynaptic function (Fig. 10H), respectively. The PPR data seems to suggest a presynaptic effect of microglia; ablation.

      This reviewer may confused the brain regions between our spine quantification (Figure 11) and patch-clamp recording (Figure 10). In our spine quantification, all evaluations were conducted in the mPFC. However, the patch-clamp recording were performed in SON (Figure 10 B-F) and LC (Figure 10 G-K), different brain regions from our spine quantification. As one of our conclusion, microglia differentially modulate the activity of neuronal network in a brain region-specific manner, neurons in different brain regions may exhibit different electrophysiological alterations upon microglial depletion. Therefore, this comment might be a factual error.

    2. eLife assessment

      This study presents a valuable finding on the mechanisms underlying general anesthesia, with a focus on microglial regulation. The evidence supporting the claims of the authors is solid, although some of the novelty of these findings may be reduced based on the recent publication of a similar study. The work will be of interest to medical biologists working on mechanisms of anesthesia, microglia, and neuron-microglia interaction.

    3. Reviewer #1 (Public Review):

      Summary:

      This study by He, Liu, and He et al. investigated the fundamental role of microglia in modulating general anesthesia. While microglia have been previously shown to regulate neuronal network activity, their role in the induction of (i.e., LORR) and emergence from (i.e., RORR) anesthesia has only recently been explored. Recently published work by Cao et al. reported that microglia modulate general anesthesia via P2Y12 receptor. The present study largely reproduces those findings and does so using an impressive array of techniques and clever approaches. Following the serendipitous discovery that microglia-depleted mice exhibit increased LORR and decreased RORR, the authors go on to demonstrate that microglia regulate neuronal activity in a region-specific manner during anesthesia via purinergic receptor-mediated calcium signaling. The manuscript is well written and the data are convincing, elegantly validated using several different methods and controls, and largely complete. Nevertheless, this Reviewer has a few minor comments and suggestions to further strengthen the manuscript.

      Strengths:

      Impressive number of genetic mouse models, techniques, controls, and methods of validation.

      Weaknesses:

      Some of the novelty of these findings may be reduced based on the recent publication of a similar study.

    4. Reviewer #2 (Public Review):

      In this manuscript, He et al. have found that delayed anesthesia induction and early anesthesia emergence were observed in microglia-depleted mice. They also showed that neuronal activities were differentially regulated by microglia depletion, possibly via suppressing the neuronal network of anesthesia-activated brain regions and activating emergence-activated brain regions. Mechanistically, this influence was found to be dependent on the activation of microglial P2Y12 receptors and subsequent calcium influx. These findings contribute to a better understanding of the role microglia play in regulating anesthesia and shed light on the underlying mechanisms involved. Nonetheless, there are still some aspects that require further investigation and clarification.

      1. In Figure 3A the authors used IBA1 to represent microglia, and the corresponding description is 'brain microglia were not influenced'. However, IBA1 is not a specific biomarker for brain resident microglia. It's recommended to use other biomarkers, such as TMEM119 and P2RY12 to better examine the efficiency of microglial depletion.<br /> 2. In Figure 7, 8 and 9 the authors stated that they aim to investigate the impacts microglia exert on neuronal activity. However, using only c-Fos is not sufficient to represent neuron. The authors are supposed to combine c-Fos with other specific biomarkers for neuron to better validate their conclusions.<br /> 3. In Figure 11 the authors use C1qa-/- transgenic mice and draw the conclusion 'microglia mediated anesthesia modulation does not result from spine pruning'. However, as C1q contains multiple subtypes, I have some reservations regarding whether the authors' conclusion is entirely warranted based solely on the knockout of a single subtype of C1q.<br /> 4. In Figure 14E the authors showed that expression levels of Stim1 is significantly down-regulated in CX3CR1CreER::STIM1fl/fl mouse brains. While this is not incorrect, I would suggest the authors sort microglia with FACS or MACS to perform q-RT-PCR and examine the expression levels of Stim1 since the Cre-LoxP system here is microglia specific.<br /> 5. The flow of the manuscript should have been improved. For instance, the results of repopulated microglia in Figure 1B was described even after Figure 2 and 3, which makes the manuscript a little confusing. Additionally, in Figure 14, it would be beneficial to provide a more comprehensive introduction to molecules such as hM3Dq and Stim1 to improve the clarity and readability of the result descriptions.

    5. Reviewer #3 (Public Review):

      Summary:<br /> This work aims to understand the contribution of microglia to anesthesia induced by general anesthetics. The authors report that ablation of microglia shortens anesthesia, manifested by the delay of anesthesia induction and the early anesthesia emergence. They show that microglial depletion suppresses activity in the neuronal network of anesthesia-activated brain regions but enhances activity in emergence-activated brain regions. Based on these findings, the authors suggest microglia facilitate and stabilize the anesthesia status. To elucidate the underlying mechanism, they further tested the potential contribution of microglia-mediated dendritic spine plasticity and microglial P2Y12-Ca2+ signaling, and identified the latter as a critical pathway through which microglia regulate anesthesia.

      Strengths:<br /> A major strength of this study is the systematic experimental design, which includes multiple anesthetics and complementary approaches, leading to very compelling data. As a result, a significant contribution of microglia in instating and maintaining the state of anesthesia is convincingly established. In addition, the results also shed light on the potential underlying microglial mechanistic. The findings are of relevance to both medical practice and basic understanding of microglial biology and neuron-glia interactions.

      Weaknesses:<br /> The study produces a large amount of data that is in general cohesive and support the main conclusions, but more thorough considerations on some of their findings may be helpful, as exemplified by the following:

      1) the effect of microglial ablation on chloral hydrate-induced RORR in Fig. 1B appears to be not the same as other anesthetics. what does this mean?

      2) Macrophage ablation impedes anesthesia emergence from pentobarbital (Fig. 3C). how may this occur?

      3) examination of the potential effect of microglial depletion on dendritic spine density is interesting but the experimental design does not seem to align well with the PPR and eEPSC data, which indicate a reduction in presynaptic release (Fig.10E) and increase of postsynaptic function (Fig. 10H), respectively. The PPR data seems to suggest a presynaptic effect of microglia; ablation.

    1. eLife assessment

      The authors propose that the asymmetric segregation of the NuRD complex in C. elegans is regulated in a V-ATPase-dependent manner, that this plays a crucial role in determining the differential expression of the apoptosis activator egl-1 and that it is therefore critical for the life/death fate decision in this species. The proposed model is interesting and the work could be important if proven correct. However, the current evidence is inadequate to support the major claims.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is an interesting, timely and informative article. The authors used publicly available data (made available by a funding agency) to examine some of the academic characteristics of the individuals recipients of the National Institutes of Health (NIH) k99/R00 award program during the entire history of this funding mechanism (17 years, total ~ 4 billion US dollars (annual investment of ~230 million USD)). The analysis focuses on the pedigree and the NIH funding portfolio of the institutions hosting the k99 awardees as postdoctoral researchers and the institutions hiring these individuals. The authors also analyze the data by gender, by whether the R00 portion of the awards eventually gets activated and based on whether the awardees stayed/were hired as faculty at their k99 (postdoctoral) host institution or moved elsewhere. The authors further sought to examine the rates of funding for those in systematically marginalized groups by analyzing the patterns of receiving k99 awards and hiring k99 awardees at historically black colleges and universities.

      The goals and analysis are reasonable and the limitations of the data are described adequately. It is worth noting that some of the observed funding and hiring traits are in line with the Matthew effect in science (https://www.science.org/doi/10.1126/science.159.3810.56) and in science funding (https://www.pnas.org/doi/10.1073/pnas.1719557115). Overall, the article is a valuable addition to the research culture literature examining the academic funding and hiring traits in the United States. The findings can provide further insights for the leadership at funding and hiring institutions and science policy makers for individual and large-scale improvements that can benefit the scientific community.

      Thank you for these comments. We have incorporated the articles referenced on the Matthew effect into the first paragraph of the Discussion our revised preprint.

      Reviewer #2 (Public Review):

      Early career funding success has an immense impact on later funding success and faculty persistence, as evidenced by well-documented "rich-get-richer" or "Matthew effect" phenomena in science (e.g., Bol et al. 2018, PNAS). Woitowich et al. examined publicly available data on the distribution of the National Institutes of Health's K99/R00 awards - an early career postdoc-to-faculty transition funding mechanism - and showed that although 85% of K99 awardees successfully transitioned into faculty, disparities in subsequent R01 grant obtainment emerged along three characteristics: researcher mobility, gender, and institution. Men who moved to a top-25 NIH funded institution in their postdoc-to-faculty transition experienced the shortest median time to receiving a R01 award, 4.6 years, in contrast to the median 7.4 years for women working at less well-funded schools who remained at their postdoc institutions. This result is consistent with prior evidence of funding disparities by gender and institution type. The finding that researcher mobility has the largest effect on subsequent funding success is key and novel, and enhances previous work showing the relationship between mobility and ones' access to resources, collaborators, or research objects (e.g., Sugimoto and Larivière, 2023, Equity for Women in Science (Harvard University Press)).

      These results empirically demonstrate that even after receiving a prestigious early career grant, researchers with less mobility belonging to disadvantaged groups at less-resourced institutions continue to experience barriers that delay them from receiving their next major grant. This result has important policy implications aimed at reducing funding disparities - mainly that interventions that focus solely on early career or early stage investigator funding alone will not achieve the desired outcome of improving faculty diversity.

      The authors also highlight two incredible facts: No postdoc at a historically Black college or university (HBCU) has been awarded a K99 since the program's launch. And out of all 2,847 R00 awards given thus far, only two have been made to faculty at HBCUs. Given the track record of HBCUs for improving diversity in STEM contexts, this distribution of awards is a massive oversight that demands attention.

      At no fault of the authors, the analysis is limited to only examining K99 awardees and not those who applied but did not receive the award. This limitation is solely due to the lack of data made publicly available by the NIH. If this data were available, this study would have been able to compare the trajectory of winners versus losers and therefore could potentially quantify the impact of the award itself on later funding success, much like the landmark Bol et al. (2018) paper that followed the careers of winners of an early career grant scheme in the Netherlands. Such an analysis would also provide new insights that would inform policy.

      Although data on applications versus awards for the K99/R00 mechanism are limited, there exists data for applicant race and ethnicity for the 2007-2017 period, which were made available by a Freedom of Information Act request through the now defunct Rescuing Biomedical Research Initiative: https://web.archive.org/web/20180723171128/http://rescuingbiomedicalresearch.org/blog/examining-distribution-k99r00-awards-race/. These results are not presently discussed in the paper, but are highly relevant given the discussion of K99 award impacts on the sociodemographic composition of U.S. biomedical faculty. From 2007 to 2017, the K99 award rate for white applicants was 31.0% compared to 26.7% for Asian applicants and 16.2% for Black applicants. In terms of award totals, these funding rates amount to 1,384 awards to white applicants, 610 to Asian applicants, and 25 to Black applicants for the entire 2007-2017 period. And in terms of R00 awards, or successful faculty transitions: whereas 77.0% of white K99 awardees received an R00 award, the conversion rate for Asian and Black K99 awardees was lower, at 76.1% and 60.0%, respectively. Regarding this K99-to-R00 transition rate, Woitowich et al. found no difference by gender (Table 2). These results are consistent with a growing body of literature that shows that while there have been improvements to equity in funding outcomes by gender, similar improvements for achieving racial equity are lagging.

      The conclusions are well-supported by the data, and limitations of the data and the name-gender matching algorithm are described satisfactorily.

      One aspect that the authors should expand or comment on is the change in the rate of K99 to R00 conversions. Since 2016, while the absolute number of K99 and R00 awards has been increasing, the percentage of R00 conversions appears to be decreasing, especially in 2020 and 2021. This observation is not clearly stated or shown in Figure 1 but is an important point - if the effectiveness of the K99/R00 mechanism for postdoc-to-faculty transitions has been decreasing lately, then something is undermining the purpose of this mechanism. This result bears emphasis and potentially discussion for possible reasons for why this is happening.

      Thank you for these insightful comments. We now calculate a rolling conversion rate for K99 to R00 awards which shows there is not as much of a decline in conversion from K99 to R00 (Fig 1B). We still see a slight decline in 2021 and 2022. 468 K99 awards are from 2020 or later so they may still convert to the R00 phase. Thus it is difficult to draw conclusions about 2021/2022 yet. As more time passes, we may better be able to determine whether or not significant alteration from normal occurred in these years, presumably due to pressures from the Covid-19 pandemic. We also thank you for providing the details of the FOIA request. We have included a discussion of these data in the discussion.

      Reviewer #3 (Public Review):

      The researchers aim add to the literature on faculty career pathways with particular attention to how gender disparities persist in the career and funding opportunities of researchers. The researchers also examine aspects of institutional prestige that can further amplify funding and career disparities. While some factors about individuals' pathways to faculty lines are known, including the prospects of certain K award recipients, the current study provides the only known examination of the K99/R00 awardees and their pathways.

      Strengths:

      The authors establish a clear overview of the institutional locations of K99 and R00 awardees and the pathways for K99-to-R00 researchers and the gendered and institutional patterns of such pathways. For example, there's a clear institutional hierarchy of hiring for K99/R00 researchers that echo previous research on the rigid faculty hiring networks across fields, and a pivotal difference in the time between awards that can impact faculty careers. Moreover, there's regional clusters of hiring in certain parts of the US where multiple research universities are located. Moreover, documenting the pathways of HBCU faculty is an important extension of the Wapman et al. study (among others from that research group), and provides a more nuanced look at the pathways of faculty beyond the oft-discussed high status institutions. (However, there is a need for more refinement in this segment of the analyses as discussed further below.). Also, the authors provide important caveats throughout the manuscript about the study's findings that show careful attention to the complexity of these patterns and attempting to limit misinterpretations of readers.

      Weaknesses:

      The authors reference institutional prestige in relation to some of the findings, but there's no specific measure of institutional prestige included in the analyses. If being identified as a top 25 NIH-funded institution is the proximate measure for prestige in the study, then more justification of how that relates to previous studies' measures of institutional prestige and status are needed to further clarify the interpretations offered in the manuscript.

      The identification of institutional funding disparities impacting HBCUs is an important finding and highlights another aspect of how faculty at these institutions are under resourced and arguably undervalued in their research contributions. However, a lingering question exists: why compare HBCUs with Harvard? What are the theoretical and/or methodological justifications for such comparisons? This comparison lends itself to reifying the status hierarchy of institutions that perpetuate funding and career inequalities at the heart of the current manuscript. If aggregating all HBCU faculty together, then a comparable grouping for comparison is needed, not just one institution. Perhaps looking at the top 25 NIH funded institutions could be one way of providing a clearer comparison. Related to this point is the confusing inclusion of Gallaudet in Figure 6 as it is not an officially identified HBCU. Was this institution also included in the HBCU-related calculations?

      Thank you for this comment. We agree this comparison perpetuates the perception of the prestige hierarchy and is problematic. We now compare all institutions in the top 25 NIH funding category to all HBCUs. Thank you also for identifying our error in mis-coding Gallaudet as an HBCU. We have corrected this in the current version.

      There is a clear connection that is missed in the current iteration of the manuscript derived from the work of Robert Merton and others about cumulative advantages in science and the "Matthew effect." While aspects of this connection are noted in the manuscript such as well-resourced institutions (those with the most NIH funding in this circumstance) hire each others' K99/R00 awardees, elaborating on these connections are important for readers to understand the central processes of how a rigid hierarchy of funding and career opportunities exist around these pathways. The work the authors build on from Daniel Larremore, Aaron Clauset, and their colleagues have also incorporated these important theoretical connections from the sociology of knowledge and science, and it would provide a more interdisciplinary lens and further depth to understanding the faculty career inequalities documented in the current study.

      Reviewer #1 (Recommendations For The Authors):

      Comments to authors:

      1. For the benefit of general reader, it would be informative to mention the amount of annual NIH investment in the k99 funding mechanism in the text (230 awards representing a ~ 230 million US dollars investment).

      Thank you for this suggestion. We have added that this is ~$25 million investment annually.

      1. It is worth noting that some of the observed funding and hiring traits resemble the Matthew effect, discussed in: The Matthew effect in science: https://www.science.org/doi/10.1126/science.159.3810.56

      The Matthew effect in science funding: https://www.pnas.org/doi/10.1073/pnas.1719557115

      It would be of value to cite these for further context for the readers.

      Thank you for this suggestion. We have included these references and briefly discussed the Matthew effect in the first paragraph of the Discussion.

      1. Figs 3, 6 and Fig S1 are hard to read without zooming in due to their format and don't work great within a letter size page but can work if they are also linked to a zoomable web version. It would make sense to have an online navigable/searchable/selectable version. But when the reader zooms out, there are patterns that reflect what points the authors are making (though those could be illustrated differently). These figures are really made for online webapp visualization (such as Shiny in R).

      We agree with this comment and have used the “googleVis()” package in R to put together interactive Sankey diagrams. These can be found at: https://dantyrr.github.io/K99-R00-analysis/ and they are referenced in the manuscript.

      1. The abstract states 85% of awardees get R00 awards. That appears to come from 198/234 (page 6) though it's not explicitly stated, and other ratios give different answers (e.g., 1-304/3475 = 91%) but the 85% seems to be the right one. That first paragraph of the results could be clearer. Also, in the middle of page three the number given is 90% so something is inconsistent. For Figure 1A, given the methodology it should be possible to calculate a rolling conversion rate as "R00(t) / K99(t-1)" (and a similarly-calculated cumulative rate).

      Thank you for catching these errors. These were introduced because there are R00 awardees that did not have extramural K99 awards. These are intramural NIH K99 awardees but there is no public data on these awardees. The correct number is 78% of K99 awardees that transitioned to the R00 phase. We have also calculated the rolling conversion rate which is 89% if you exclude the first 2 years of the program (when the first awardees were within the 2-yr K99 period) and final 2 years (when most recent K99 awardees were still within their first 2 years of the K99 period).

      1. Assuming that 85% is the correct number, is there any information/insight into why ~1/6 of awardees do not continue to R00, which seems high given that only two years passes - that's a lot of awardees not getting R00 positions.

      We are unsure of why these don’t convert. In the revised version of the manuscript, we speculate on this in the 4th paragraph of the discussion:

      The factors that prevented the other 302 K99 awardees from 2019 and earlier unable to convert their K99-R00 grants is cause for concern within our greater academic community. Possible explanations include leaving the biomedical workforce, accepting tenure-track positions or other positions abroad, or by simply not successfully securing a tenable tenure-track offer.

      1. It looks like perhaps a non-zero number of K99s are just one year and not two (e.g., see 2006 in Fig 1A, which should not appear if all 2006 awards were 2 years). What is the typical percentage of K99s not activated for a second year, and is this a sizable % of the 15% not converting to R00?

      This is an interesting question. We didn’t originally look into this and the dataset that we originally downloaded from NIH reporter included a significant number of duplicates for the grants because year 1 of the K99 was listed on its own line and year 2 was listed on a different line. The first step in curating the data was to delete the duplicate values so we only had one entry per person. Unfortunately based on sorting of the data tables, sometimes the year 1 appeared above year 2 and at other times year 2 appeared before year 1. Because none of the data we were interested in are benchmarked to K99 start date, we removed the duplicate values non-specifically. With the dataset we currently have, we would not be able to tell which individuals dropped out (didn’t convert to R00) during the first or second year of the K99. In order to do this we would have to download the raw data from NIH reporter again and curate it again. We may do this in the future but for the purpose of publishing the current manuscript we prefer to focus our efforts on other aspects of the revision.

      1. Further down page 3, the authors state that "men typically experience 2-3% greater funding success rates" is ambiguous, as rates are themselves a percentage. So, is it 2-3% greater as in 23% vs 20%, or is it 2-3% greater as in 20.6% vs 20%? Please clarify the language.

      Thank you for asking for this clarification. We have updated the text here to reflect that we mean “23% vs 20%”.

      1. Metrics such as time to first R01 are compared internally within the study set, which yields interesting insights, but more could be done to benchmark these metrics to non-K99 scientists.

      We agree with the reviewer that this would be ideal; however, we feel that it is out of the scope of this manuscript. We may examine this in the future.

      1. In the text, several times percentages are being referred to when the figures cited do not show percentages. For example (page 6) 'proportion of awardees that stayed at the same institution declined to about 20% where it has remained consistent (Fig 1B)' - Figure 1B does not show percentages, instead the reader would need to work out from the raw numbers what the pattern of percentages might look like. It's fine (great even) to provide the raw numbers, but would be great to show the percentages as well. This happened for multiple graphs.

      Thank you for this comment. We agree that showing the percentage would be beneficial so we have included the percentages in Figure 1 for the conversion rate. We also added a standalone figure panel for the rolling conversion rate for Figure 1. For Figure 4, we have also included a right Y-axis to better indicate the % women.

      1. Figure 4 - putting the %women on a 0-250 scale makes it difficult to see the changes in that curve. Please replot it as a separate graph with an appropriate scale (30-50%? 30-70%?)

      Thank you for this comment. We have made this edit.

      1. Figure 5 - The table appears inconsistent - the Moved/Stayed HR is 1.411 suggesting that moving is better for reducing time to R01, but then Woman/Man is 1.208, so one of these pairs needs to be written in the opposite order to have the table make sense (intended to be listed as 'better/worse'?)

      Thank you for noticing this. In the revised manuscript we have re-run the cox proportional hazard model using the R package “survival” and the function “coxph()”. There were minor differences in the hazard ratios using this package instead of Graphpad prism; however, the R package is much more widely used compared to prism for these types of analysis. We present the new data in the table in Figure 5B in the revised manuscript. We now present the “detrimental” cox hazard value for each variable (i.e. 0.7095 for the mobility [moved/stayed]). We also underlined the variable which was detrimental to receiving an R01 award earlier.

      1. Figure 5's graph appears strange. All the lines have an appearance of stochasticity but are actually multiples of each other, rising exactly in sync. Are these actually modeled lines? If so, why not instead actually draw the lines based on the real data from the real groups depicted, and give the n for each group?

      Thank you for picking this up. The software we originally used to plot the graphs did plot modeled lines instead of the actual data. We have re-run the cox proportional hazard model using the R “survival” package v3.5-5 and the coxph() and survfit() functions. The updated data are in Figure 5 of the revised manuscript.

      1. Table 1 should note that each column sums to 100%.

      This is a good suggestion. In the revised manuscript, we have added a row to the table to indicate the column total N and %.

      1. The authors discuss how k99/R00 grant reviewing process may have to change but the k99 awards also impact the faculty hiring ecosystem as well. There are faculty hiring job ads explicitly requesting or indicating preference towards k99 holders and the results described in this article show that k99 awarding is biased towards particular demographics at select wealthy institutions. Of course, collective/central action is almost always more effective/impactful (especially in shorter time line) than individual elective action. In other words, NIH changing granting patterns would likely work better than encouraging faculty searches to change the weight they give to K99s, because there are many searches and just one NIH. But these are not mutually exclusive and individual action can still help when central action isn't done (if the NIH does not change the k99/R00 grant review process for more inclusive funding and does not increase the number of annual k99 awards hence the annual budget for this award mechanism) and it would be good to have this discussed in the manuscript.

      Thank you for this comment and thoughtful insights. We have included additional discussion on this in the final paragraph of the discussion.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for conducting this important work. On top of some thoughts I have described in the public review (in particular, Chris Pickett's FOIA data on K99/R00 outcomes by applicant race and ethnicity), I only have a few comments for potential improvements to this paper:

      1. The comparison of K99-R00 transition rates by gender was interesting. However, I missed the analysis on the K99-R00 transition rates by institution (by type or by top-25 NIH funded institution versus not). I think this analysis may be buried somewhere in the more nuanced descriptions about faculty flows from one institution type to another, but I was not able to locate it. I wonder if the authors could consider dedicating a subsection to specifically describing the transition rate by institution type, creating a table equivalent to Table 2. This section would probably fit best somewhere before the authors dive into the nuances of self-hires and faculty flows.

      Said another way: As I was reading, I felt I was missing an answer to a simple question - are there differences in conversion rates by institution type (however you define institution type, as an MSI or non MSI, or top-25 NIH funded versus not)?

      Thank you for this suggestion. We have created the table (Table 3 and Table 4) in the revised manuscript. We also made a new figure (now figure 5 in the revised manuscript). This was an interesting way to look at the data and it is very clear that the number of K99 and R00 awards is heavily concentrated within the institutions that have the highest NIH funding. We have added a paragraph in the results in a new section entitled “K99 and R00 awards are concentrated within the highest funded institutions”.

      1. Regarding the comparison of HBCUs and Harvard: this analysis was elucidating, but I am not sure if the framing of this analysis as pertaining to "systematically marginalized groups" - see second sentence in the section, "Faculty doctorates differ between Harvard and HBCUs" is appropriate. While it is true that proportionally more faculty at HBCUs are from marginalized groups, there are also many faculty at HBCUs who are from privileged or advantaged backgrounds (e.g., white, men, educated at elite institutions). It would be more accurate to rephrase the second sentence to say something along the lines of, "We sought to examine the rates of funding for those at historically under-funded institutions." I recommend that the authors comb the paper for any other potential places in the text that conflate systemic marginalization with institution type, and rephrase as needed for accuracy.

      Thank you for pointing this out. This is an extremely important point and we have removed any instances we could find where we conflate systemically marginalized groups with institution type.

      1. I strongly recommend Sugimoto and Larivière (2023)'s new book, Equity for Women in Science, which has an entire section dedicated to previous work investigating how researcher mobility impacts access to resources, collaborations, et cetera (Chapter 5 on Mobility; other chapters on Funding are also relevant but I hone in on Mobility since this is such a key result of this work). I think this chapter would provide significant food-for-thought and background that could strengthen the Discussion section of the paper.

      Thank you for this suggestion. We have added some discussion of mobility in the first paragraph of the Discussion.

      1. I appreciated the subsection headings that described key results (e.g., "Institutions with the most NIH funding tend to hire K99/R00 awardees from other institutions with the most funding"; "K99/R00 awardee self-hires are more common at institutions with the top NIH funding.") This paper structure made it easier for me to ensure that I was getting the intended takeaway from a figure or section. But partway through the paper, the subheadings changed to being less declarative and therefore less informative (e.g., "Gender of K99/R00 awardees"; "Factors influencing K99/R00 awardee future funding success"). It would be great to rephrase these boilerplate subsection headers to be more declarative, like earlier subsection headings. For example, maybe say "Men receive the majority of K99 awards" or "No gender difference in the rate of conversion from K99 to R00" or something to that effect, depending on what result the authors wish to emphasize.

      Thank you for this comment. This is a very good point. We have re-worded the more generic headings in the revised version.

      1. Lastly, I would like to share a question that came to my mind that involves an additional analysis, but is work that is (probably) out-of-the-scope of this paper, but could instead be a separate paper or product. Circling back to Chris Pickett's FOIA-ed data on K99/R00 funding outcomes by applicant race and ethnicity (https://web.archive.org/web/20180723171128/http://rescuingbiomedicalresearch.org/blog/examining-distribution-k99r00-awards-race/): Given that Pickett's numbers provide incontrovertible information on the number of awards to various racial and ethnic groups, I wonder if it is possible to use this information as an "answer key" to (1) check the accuracy of an algorithm that assigns race based on name for applications in your analysis but for 2007-2017 period, and, (2) if the results are reasonable, then examine the dataset with race and ethnicity information. Some recent papers performing large-scale bibliometric analyses have applied such algorithms (e.g., see Kozlowski et al. 2022 PNAS Intersectional inequalities in science) and I wonder if they could be useful, or at least tested, here. Again, Pickett's data would serve as the benchmark to see if the algorithm produces numbers that are consistent with the actual funding outcomes; if they're not wildly off, or perhaps accurate for some groups but not others, there might be something here.

      This is a really insightful comment. We have discussed whether we could assign ethnicity based on an algorithm and check based on Chris Pickett’s data. We agree that it is beyond the scope of this article, but has potential for future research.

      Reviewer #3 (Recommendations For The Authors):

      -In the methods section, it would be helpful to provide an overview of the number of universities, departments, and faculty represented in the data analyzed in the study.

      Thank you for this comment. We agree with the reviewer. We have added a section to the results discussing the distribution of different types of institutions. We also added Table 3 and Table 4 and a new Figure 5 describing these. Regarding the faculty, we have discussed the demographics of the K99 and R00 awardees as best as we could. We do not have data on which faculty laboratories the K99 awardees were in when they received their awards. This information is not available through NIH reporter.

      -I would consider incorporating, or at least citing, Jeff Lockhart and colleagues' recent paper Nature Human Behavior article "Name-based demographic inference and the unequal distribution of misrecognition" about to provide readers with an additional resource and more information about the likelihood of misattribution and general cautionary notes about using gender and race/ethnicity ascription/imputation approaches and tools for research.

      Thank you for bringing this reference to our attention. We have incorporated this into the methods section describing our name-based gender determination.

      -In the next to last sentence under the final paragraph of the methods section, there looks to be a typo as it should read "K99 or R00," not "K00" as currently written.

      Thank you for catching this. We have now corrected it.

      -Clarifying some of the data and measures used are necessary to limit confusion and misinterpretations of the study's findings.

      Thank you. We have significantly updated the revised manuscript and hope that it is more clear.

      -Elaborating more on the gender inequality notable in the Cox proportional hazard model would strengthen the authors' point about persistent gender inequalities within the K99/R00 funding mechanism and pathways. In its current iteration, the findings are somewhat buried by the discussion of institutional differences, but when we look at the findings and the plot associated with the model, we notice that men have more advantages than women in funding and institutional location.

      Thank you for highlighting this. This is true and we have elaborated on the gender inequality in the revised version of the manuscript.

      -Also for the Cox proportional hazard model, I would consider exploring the inclusion of data that can further clarify the biomedical research infrastructure of institutions. For example, in the conversation about the differences between Princeton and other universities including other Ivies, it's important to note that Princeton does not have a medical school. Moreover, other institutions do not operate or are affiliated with a hospital. Adding more data to the model that can better contextualize the research infrastructure around researchers with NIH awards beyond the size of the NIH portfolio can shed light on possibly other important institutional differences that undergird these inequalities.

      Thank you for this comment. We have added additional details about the institutional type; however, to examine whether institutions are attached to a hospital (or are themselves as hospital like MGH etc.) or whether institutions include a medical school may be difficult. We would have to manually code these and then determine whether or not the award recipient was affiliated with a department within that entity or not. We believe that this is a fascinating question but that it is out of the scope of the present manuscript. This is something that we will look into for potential future publications.

      -Throughout the manuscript there's usage of "elite" and "prestigious" that are somewhat ambiguous regarding what exactly they are referring to about institutional characteristics. This is a common issue in the literature, but trying to clarify what these terms specifically mean for the current study and checking for consistent usage with limited interchangeability that can add confusion for readers about what is being referred to would give added strength to the conversation provided by the authors.

      Thank you for this suggestion. Based on these comments and those by the other reviewers, in the revised version of the manuscript, we have limited the use of “elite” and “prestigious” to describe institutions in order not to perpetuate biases toward certain institutions.

      -In relation to the discussion at the end of the manuscript of the longer time to award noted for researchers who stay at the same institutions, another possibility for the disparity could be their reliance for service work (e.g., hiring committees, departmental committees, supporting graduate students through mentoring and/or dissertation committee work, etc.) in their institutions given their knowledge of and experience within it.

      Thank you for this suggestion. We have added 2 sentences to the discussion reflecting this possibility.

      -Engaging with how STEM professional cultures can perpetuate these funding disparities and related hiring and career outcomes could enhance the contributions of the study. In relation to STEM professional cultures, engaging with the work of Mary Blair-Loy and Erin Cech in their recent book, Misconceiving Merit, could help provide additional insights for readers.

      Thank you for these comments. We have incorporated edits to the revised manuscript reflecting the work of Erin Cech and Mary Blair-Loy.

    2. eLife assessment

      This study follows the career trajectories of the winners of an early-career funding award in the United States, and finds that researchers with greater mobility, men, and those hired at well-funded institutions experience greater subsequent funding success. Using data on K99/R00 awards from the National Institutes of Health's grants management database, the authors provide compelling evidence documenting the inequalities that shape faculty funding opportunities and career pathways, and show that these inequalities disproportionately impact women and faculty working at particular institutions, including historically black colleges and universities. Overall, the article is an important addition to the literature examining inequality in biomedical research in the United States.

    3. Reviewer #1 (Public Review):

      Summary and strengths<br /> This is an interesting, timely and informative article. The authors used publicly available data (made available by a funding agency) to examine some of the academic characteristics of the individuals recipients of the National Institutes of Health (NIH) k99/R00 award program during the entire history of this funding mechanism (17 years, total ~ 4 billion US dollars (annual investment of ~230 million USD)). The analysis focuses on the pedigree and the NIH funding portfolio of the institutions hosting the k99 awardees as postdoctoral researchers and the institutions hiring these individuals. The authors also analyze the data by gender, by whether the R00 portion of the awards eventually gets activated and based on whether the awardees stayed/were hired as faculty at their k99 (postdoctoral) host institution or moved elsewhere. The authors further sought to examine the rates of funding for those in systematically marginalized groups by analyzing the patterns of receiving k99 awards and hiring k99 awardees at historically black colleges and universities.

      The goals and analysis are reasonable and the limitations of the data are described adequately. It is worth noting that some of the observed funding and hiring traits are in line with the Matthew effect in science (Merton, 1968: https://www.science.org/doi/10.1126/science.159.3810.56) and in science funding (Bol et al., 2018: https://www.pnas.org/doi/10.1073/pnas.1719557115). Overall, the article is a valuable addition to the research culture literature examining the academic funding and hiring traits in the United States. The findings can provide further insights for the leadership at funding and hiring institutions and science policy makers for individual and large-scale improvements that can benefit the scientific community.

      Weaknesses<br /> The authors have addressed my recommendations in the previous review round in a satisfactory way.

    4. Reviewer #2 (Public Review):

      Summary and strengths<br /> Early career funding success has an immense impact on later funding success and faculty persistence, as evidenced by well-documented "rich-get-richer" or "Matthew effect" phenomena in science (e.g., Bol et al., 2018, PNAS). In this study the authors examined publicly available data on the distribution of the National Institutes of Health's K99/R00 awards - an early career postdoc-to-faculty transition funding mechanism - and showed that although 89% of K99 awardees successfully transitioned into faculty, disparities in subsequent R01 grant obtainment emerged along three characteristics: researcher mobility, gender, and institution. Men who moved to a top-25 NIH funded institution in their postdoc-to-faculty transition experienced the shortest median time to receiving a R01 award, 4.6 years, in contrast to the median 7.4 years for women working at less well-funded schools who remained at their postdoc institutions.

      Amongst the three characteristics, the finding that researcher mobility has the largest effect on subsequent funding success is key and novel. Other data supplement this finding: for example, although the total number of R00 awards has increased, most of this increase is for awards to individuals moving to different institutions. In 2010, 60% of R00 awards were activated at different institutions compared to 80% in 2022. These findings enhance previous work on the relationship between mobility and ones' access to resources, collaborators, or research objects (e.g., Sugimoto and Larivière, 2023, Equity for Women in Science (Harvard University Press)).

      These results empirically demonstrate that even after receiving a prestigious early career grant, researchers with less mobility belonging to disadvantaged groups at less-resourced institutions continue to experience barriers that delay them from receiving their next major grant. This result has important policy implications aimed at reducing funding disparities - mainly that interventions that focus solely on early career or early stage investigator funding alone will not achieve the desired outcome of improving faculty diversity.

      The authors also highlight two incredible facts: No postdoc at a historically Black college or university (HBCU) has been awarded a K99 since the program's launch. And out of all 2,847 R00 awards given thus far, only two have been made to faculty at HBCUs. Given the track record of HBCUs for improving diversity in STEM contexts, this distribution of awards is a massive oversight that demands attention.

      At no fault of the authors, the analysis is limited to only examining K99 awardees and not those who applied but did not receive the award. This limitation is solely due to the lack of data made publicly available by the NIH. If this data were available, this study would have been able to compare the trajectory of winners versus losers and therefore could potentially quantify the impact of the award itself on later funding success, much like the landmark paper by Bol et al. (PNAS; 2018) that followed the careers of an early career grant scheme in the Netherlands. Such an analysis would also provide new insights that would inform policy.

      Although data on applications versus awards for the K99/R00 mechanism are limited, there exists data for applicant race and ethnicity for the 2007-2017 period, which were made available by a Freedom of Information Act request through the now defunct Rescuing Biomedical Research Initiative (https://web.archive.org/web/20180723171128/http://rescuingbiomedicalresearch.org/blog/examining-distribution-k99r00-awards-race/). These results are highly relevant given the discussion of K99 award impacts on the sociodemographic composition of U.S. biomedical faculty. During the 2007-2017 period, the K99 award rate for white applicants was 31% compared to 26.7% for Asian applicants and 16.2% for Black applicants. In terms of award totals, these funding rates amount to 1,384 awards to white applicants, 610 to Asian applicants, and 25 to Black applicants. However, the work required to include these data may be beyond the scope of the study.

      The conclusions are well-supported by the data, and limitations of the data and the name-gender matching algorithm are described satisfactorily.

    5. Reviewer #3 (Public Review):

      Summary<br /> The researchers aim add to the literature on faculty career pathways with particular attention to how gender disparities persist in the career and funding opportunities of researchers. The researchers also examine aspects of institutional prestige that can further amplify funding and career disparities. While some factors about individuals' pathways to faculty lines are known, including the prospects of certain K award recipients, the current study provides the only known examination of the K99/R00 awardees and their pathways.

      Strengths<br /> The authors establish a clear overview of the institutional locations of K99 and R00 awardees and the pathways for K99-to-R00 researchers and the gendered and institutional patterns of such pathways. For example, there's a clear institutional hierarchy of hiring for K99/R00 researchers that echo previous research on the rigid faculty hiring networks across fields, and a pivotal difference in the time between awards that can impact faculty careers. Moreover, there's regional clusters of hiring in certain parts of the US where multiple research universities are located. Moreover, documenting the pathways of HBCU faculty is an important extension of the study by Wapman et al. (2022: https://www.nature.com/articles/s41586-022-05222-x), and provides a more nuanced look at the pathways of faculty beyond the oft-discussed high status institutions. (However, there is a need for more refinement in this segment of the analyses). Also, the authors provide important caveats throughout the manuscript about the study's findings that show careful attention to the complexity of these patterns and attempting to limit misinterpretations of readers.

      Weaknesses<br /> The authors have addressed my recommendations in the previous review round in a satisfactory way.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors showed that activation of RelA and Stat3 in hepatocytes of DSS-treated mice induced CYPs and thereby produced primary bile acids, particularly CDCA, which exacerbated intestinal inflammation.

      Strengths:

      This study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis.

      Weaknesses:

      Additional evidence will strengthen the conclusion.

      1) In Fig. 1C, photos show that phosphorylation of RelA and Stat3 was induced in only a few hepatocytes. The authors conclude that activation of both RelA and Stat3 induces inflammatory pathways. Therefore, the authors should show that phosphorylation of RelA and Stat3 is induced in the same hepatocytes during DSS treatment.

      Experiments in progress and data will be submitted in the revised manuscript- Co-staining of pRela and pStat3(727) on treated liver sections.

      2) In Fig. 5, the authors treated mice with CDCA intraperitoneally. In this experiment, the concentration of CDCA in the colon of CDCA-treated mice should be shown.

      Experiments in progress and data will be submitted in the revised manuscript - Supplementation of CDCA to knockout animals and estimation of CDCA in the colon of DSS treated and untreated animals.

      Reviewer #2 (Public Review):

      Singh and colleagues employ a methodic approach to reveal the function of the transcription factors Rela and Stat3 in the regulation of the inflammatory response in the intestine.

      Strengths of the manuscript include the focus on the function of these transcription factors in hepatocytes and the discovery of their role in the systemic response to experimental colitis. While the systemic response to induce colitis is appreciated, the cellular and molecular mechanisms that drive such systemic response, especially those involving other organs beyond the intestine are an active area of research. As such, this study contributes to this conceptual advance. Additional strengths are the complementary biochemical and metabolomics approaches to describe the activation of these transcription factors in the liver and their requirement - specifically in hepatocytes - for the production of bile acids in response to colitis.

      Some weaknesses are noted in the presentation of the data, including a lack of comprehensive representation of findings in all conditions and genotypes tested.

      These will be incorporated in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      The authors try to elucidate the molecular mechanisms underlying the intra-organ crosstalks that perpetuate intestinal permeability and inflammation.

      Strengths:

      This study identifies a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases via the gut-liver axis using both murine models and human samples.

      Weaknesses:

      1) The mechanism by which DSS administration induces the activation of the Rela and Stat3 pathways and subsequent modification of the bile acid pathway remains clear. As the authors state, intestinal bacteria are one candidate, and this needs to be clarified. I recommend the authors investigate whether gut sterilization by administration of antibiotics or germ-free condition affects 1. the activation of the Rela and Stat3 pathway in the liver by DSS-treated WT mice and 2. the reduction of colitis in DSS-treated relaΔhepstat3Δhep mice.

      Experiments in progress and data will be submitted in the revised manuscript - Antibiotic treatment for 2/4 weeks, subsequently mice will be treated with DSS and the Rela and Stat3 phosphorylation will be tested using western blotting.

      2) It has not been shown whether DSS administration causes an increase in primary bile acids, represented by CDCA, in the colon of WT mice following activation of the Rela and Stat3 pathways, as demonstrated in Figure 6.

      We have demonstrated a enhanced level of CDCA in the colon following DSS treatment in the wild type animals in figure 4B.

      3) The implications of these results for IBD treatment, especially in what ways they may lead to therapeutic intervention, need to be discussed.

      These will be incorporated in the revised version.

    1. Author Response

      We decided to address the comments of the reviewers with additional experiments and modification of the text with the aim of submitting a new version of the report.

      We would like to underline that the current study is an extension of the work published in eLife (Atze et al., 2021). For this reason, and in agreement with eLife guidelines, we did not repeat all the background information on the method used to identify PG subunit isotopologues using mass spectrometry.

      Reviewer #1 (Public Review):

      Summary:

      Liang et. al., uses a previously devised full isotope labeling of peptidoglycan followed by mass spec to study the kinetics of Lpp tethering to PG and the hydrolysis of this bond by YafK.

      Strengths:

      -The labeling and mass spec analysis technique works very well to discern differentially labelled Tri-KR muropeptide containing new and old Lpp and PG.

      Weaknesses:

      -Only one line of experimentation using mass spec based analysis of labeled PG-Lpp is used to make all conclusions in the paper. The evidence is also not enough to fully deleanate the role of YafK.

      Our approach based on heavy isotope labelling and mass spectrometry has the power to identify and kinetically characterize the specific products of the reaction leading to the tethering of Lpp to PG and the hydrolysis of the corresponding bond. We therefore advocate that our experimentation is sufficient to obtain meaningful results without combining other lines of experimentation.

      -Only one mutant (YafK) is used to make the conclusion.

      The aim of the study is to determine the effect of the hydrolysis of the PG→Lpp bond on the dynamics of the tethering of Lpp to PG. Since YafK is the only enzyme catalyzing this reaction, it is appropriate to compare the wild-type strain to an isogenic yafK deletion mutant. Nonetheless, we carefully consider this comment and will investigate the dynamics of the tethering of Lpp to PG in mutants deficient in the production of the L,D-transpeptidases responsible for tethering Lpp to PG.

      -The paper makes a lot of 'implications' with minimal proof to support their hypothesis. Other lines of experimentations must be added to fully delineate their claims.

      See our answer to the first comment.

      -Time points to analyse Tri-KR isotopologues in Wt (0,10,20,40,60 min) and yafK mutant (0,15, 25, 40, 60 min) are not the same.

      The purpose of the experiments is to compare the kinetics of formation and hydrolysis of the PG→Lpp bond in the WT versus ΔyafK strains. Comparison of the kinetics is therefore possible even though the kinetics are not based on the exact same time points. Nonetheless, we will reproduce the kinetics experiment (see also answers to Reviewer 2) and use the same time points in these additional experiments.

      -Experiments to define physiological role of YafK are also missing

      We will investigate the effect of the yafK deletion on the formation of outer membrane vesicles.

      Reviewer #2 (Public Review):

      Summary:

      The authors of this study have sought to better understand the timing and location of the attachment of the lpp lipoprotein to the peptidoglycan in E. coli, and to determine whether YafK is the hydrolase that cleaves lpp from the peptidoglycan.

      Strengths:

      The method is relatively straightforward. The authors are able to draw some clear conclusions from their results, that lpp molecules get cleaved from the peptidoglycan and then re-attached, and that YafK is important for that cleavage.

      Weaknesses:

      However, the authors make a few other conclusions from their data which are harder to understand the logic of, or to feel confident in based on the existing data. They claim that their 5-time point kinetic data indicates that new lpp is not substantially added to lipidII before it is added to the peptidoglycan, and that instead lpp is attached primarily to old peptidoglycan. I believe that this conclusion comes from the comparison of Fig.s 3A and 3C, where it appears that new lpp is added to old peptidoglycan a few minutes before new lpp is added to new peptidoglycan. However, the very small difference in the timing of this result, the minimal number of time points and the complete lack of any presentation of calculated error in any of the data make this conclusion very tenuous. In addition, the authors conclude that lpp is not significantly attached to septal peptidoglycan. The logic behind this conclusion appears to be based on the same data, but the authors do not provide a quantitative model to support this idea.

      The reviewer is correct in stating that we claim that Lpp is not substantially added to lipid II before incorporation of the disaccharide-pentapeptide subunit into the expanding PG network. This conclusion is based on the paucity of PG-Lpp covalent adducts containing light PG and Lpp moieties at the earliest time points. To substantiate more thoroughly this finding, we will reproduce the kinetic experiments with more early time points. The paucity of the new→new PG-Lpp isotopologues also implies that Lpp might not be extensively tethered to septal peptidoglycan since the latter is assembled from newly synthesized PG (see our previous publication Atze et al. 2021 and references therein). Quantitatively, septal synthesis roughly accounts for one third of the total PG synthesis. It is therefore expected that tethering of Lpp to septal PG would represent one third of the total number of newly synthesized Lpp molecules tethered to PG. We therefore proposed that the paucity of new→new PG- Lpp isotopologues at early time points of the kinetics implies that Lpp is preferentially tethered to the side wall. This is only one of several conclusions that we reach in the present study and we were very careful in the wording of our results.

      -This work will have a moderate impact on the field of research in which the connections between the OM and are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

      We respectfully disagree with this reviewer’s comment. The work reported in this article for E. coli opens the way to the analysis and comparison of the mechanisms of the tethering of proteins to PG in various bacteria. In addition, we would like to stress that the Gram-negative bacteria that produce Lpp-related proteins and tether them to the PG include other major pathogens such as Pseudomonas aeruginosa (DOI: 10.1128/spectrum.05217-22).

    1. eLife assessment

      The identification of existing and new agents for the treatment of T-cell leukemias is clearly significant to the field of cancer biology and experimental therapeutics. This manuscript identifies an important role of Cannabis based derivatives in the treatment of T-ALL in disease-relevant cell-based and in vivo models of the disease. The work has provided new mechanistic insights into how these drugs are working, with convincing evidence. However, further work to define the exact molecular target of these drugs and expanding the work beyond a limited number of cell lines would strengthen the conclusions and impact of this work.

    2. Reviewer #1 (Public Review):

      This is an interesting manuscript that extends prior work from this group identifying that a chemovar of Cannabis induces apoptosis of T-ALL cells by preventing NOTCH1 cleavage. Here the authors isolate specific components of the chemovar responsible for this effect to CBD and CBDV. They identify the mechanism of action of these agents as occurring via the integrated stress response. Overall the work is well performed but there are two lingering questions that would be helpful to address as follows:

      -Exactly how CBD and CBDV result in the upregulation of the TRPV1/integrated stress response is unclear. What is the most proximal target of these agents that results in these changes?

      -Related to the above, all experiments to confirm the mechanism of action of CBD/CBDV rely on chemical agents, whose precise targets are not fully clear in some cases. Thus, some use of genetic means (such as by knockout of TRPV1, ATF4) to confirm the dependency of these pathways on drug response and NOTCH cleavage would be very helpful.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The Meiri group previously showed that Notch1-activated human T-ALL cell lines are sensitive to a cannabis extract in vitro and in vivo (Ref. 32). In that article, the authors showed that Extract #12 reduced NICD expression and viability, which was partially rescued by restoring NICD expression. Here, the authors have identified three compounds of Extract #12 (CBD, 331-18A, and CBDV) that are responsible for the majority of anti-leukemic activity and NICD reduction. Using a pharmacological approach, the authors determined that Extract #12 exerted its anti-leukemic and NICD-reducing effects through the CB2 and TRPV1 receptors. To determine the mechanism, the authors performed RNA-seq and observed that Extract #12 induces ER calcium depletion and stress-associated signals -- ATF4, CHOP, and CHAC1. Since CHAC1 was previously shown to be a Notch inhibitor in neural cells, the authors assume that the cannabis compounds repress Notch S1 cleavage through CHAC1 induction. The induction of stress-associated signals, Notch repression, and anti-leukemic effects were reversed by the integrated stress response (ISR) inhibitor ISRIB. Interestingly, combining the 3 cannabinoids gave synergistic anti-leukemic effects in vitro and had growth-inhibitory effects in vivo.

      Strengths:<br /> 1. The authors show novel mechanistic insights that cannabinoids induce ER calcium release and that the subsequent integrated stress response represses activated NOTCH1 expression and kills T-ALL cells.

      2. This report adds to the evidence that phytocannabinoids can show a so-called "entourage effect" in which minor cannabinoids enhance the effect of the major cannabinoid CBD.

      3. This report dissects the main cannabinoids in the previously described Extract #12 that contribute to T-ALL killing.

      4. The manuscript is clear and generally well-written.

      5. The data are generally high quality and with adequate statistical analyses.

      6. The data generally support the authors' conclusions. The exception is the experiments related to Notch.

      7. The authors' discovery of the role of the integrated stress response might explain previous observations that SERCA inhibitors block Notch S1 cleavage and activation in T-ALL (Roti Cancer Cell 2013). The previous explanation by Roti et al was that calcium depletion causes Notch misfolding, which leads to impaired trafficking and cleavage. Perhaps this explanation is not entirely sufficient.

      Weaknesses:<br /> 1. Given the authors' previous Cancer Communications paper on the anti-leukemic effects and mechanism of Extract #12, the significance of the current manuscript is reduced.

      2. It would be important to connect the authors' findings and a wealth of literature on the role of ER calcium/stress on Notch cleavage, folding, trafficking, and activation.

      3. There is an overreliance on the data on a single cell line -- MOLT4. MOLT4 is a good initial choice as it is Notch-mutated, Notch-dependent, and representative of the most common T-ALL subtype -- TAL1. However, there is no confirmatory data in other TAL1-positive T-ALLs or interrogation of other T-ALL subtypes.

      4. Fig. 6H. The effects of the cannabinoid combination might be statistically significant but seem biologically weak.

      5. Fig. 3. Based on these data, the authors conclude that the cannabinoid combination induces CHAC1, which represses Notch S1 cleavage in T-ALL cells. The concern is that Notch signaling is highly context-dependent. CHAC1 might inhibit Notch in neural cells (Refs. 34-35), but it might not do this in a different context like T-ALL. It would be important to show evidence that CHAC1 represses S1 cleavage in the T-ALL context. More importantly, Fig. 3H clearly shows the cannabinoid combination inducing ATF4 and CHOP protein expression, but the effects on CHAC1 protein do not seem to be satisfactory as a mechanism for Notch inhibition. Perhaps something else is blocking Notch expression?

      6. Fig. 4B-C/S5D-E. These Western blots of NICD expression are consistent with the cannabinoid combination blocking Furin-mediated NOTCH1 cleavage, which is reversed by ISR inhibition. However, there are many mechanisms that regulate NICD expression. To support their conclusion that the effects are specifically Furin-medated, the authors should probe full-length (uncleaved) NOTCH1 in their Western blots.

      7. Fig. S4A-B. While these pharmacologic data are suggestive that Extract #12 reduces NICD expression through the CB2 receptor and TRPV1 channel, the doses used are very high (50uM). To exclude off-target effects, these data should be paired with genetic data to support the authors' conclusions.

    1. eLife assessment

      This study presents a valuable finding on how the GAP DLC1, a deactivator of the small GTPase RhoA, regulates RhoA activity globally as well as at Focal Adhesions. Using a new acute optogenetic system coupled to a RhoA activity biosensor, the authors present solid evidence that DLC1 amplifies local Rho activity at Focal Adhesions. Nevertheless, the proposed mechanism could be further supported by a deeper analysis of the data.

    2. Joint Public Review:

      Summary:

      The manuscript of Heydasch et al. addresses the spatiotemporal regulation of Rho GTPase signaling in living cells and its coupling to the mechanical state of the cell. They focus on a GAP of RhoA, the Rho-specific GAP Deleted in Liver Cancer 1 (DLC1). They first show that removing DLC1 either by a CRISPR KO or by downregulation using siRNA leads to increased contractility and globally elevated RhoA activity, as revealed by a FRET biosensor. This result was expected, since DLC1 is deactivating RhoA its absence should lead to increasing amounts of active RhoA. To go beyond global and steady levels of RhoA activity, the authors developed an acute optogenetic system to study transient RhoA activity dynamics in different genetic and subcellular contexts. In WT cells, they found that pulses of activation lead to an increased RhoA activity at focal adhesions (FA) compared to plasma membrane (PM), which suggests that FAs contain less RhoA GAPs, more RhoA, or that FAs involve positive feedback implying other GEFs for example. In DLC1 KO cells, they found that the RhoA response upon pulses of optogenetic activation was increased (higher peak) both at FA and PM, which could be expected since less GAP should increase the amount of active RhoA. But surprisingly, they observed a higher rate of RhoA deactivation in DLC1 KO cells, which is counterintuitive: less GAP should result in a slower rate of deactivation. Less GAP should also lead to a lower rate of observed RhoA activation (smaller koff) and delayed peak. From the data, it seems hard to conclude on these two expectations since the initial rates (slopes right after the activation) and times at peak appear similar in both WT and DLC1 KO cells. Further on, the authors study the dynamics of DLC1 on FAs depending on the mechanical state and nicely show a causal decrease of DLC1 enrichment at FA upon FA reinforcement, hereby probing a positive feedback where RhoA activation is further amplified as the force exerted at FA is increasing.

      Strengths:

      - Experiments are precise and well done.<br /> - Technically, the work brings original and interesting data. The use of transient optogenetic activation within focal adhesions together with a biosensor of activity is new and elegant.<br /> - The link between DLC1 and global contractility/RhoA activity is clear and convincing.<br /> - The surprisingly higher rate of RhoA deactivation in DLC1 KO cells is convincing, as well as the differences in the dynamics of RhoA between focal adhesions and plasma membrane.<br /> - The correlation between DLC1 enrichment and focal adhesion dynamics is very clear.

      Weaknesses:

      - There is no explanation for the higher rate of RhoA deactivation in DLC1 KO cells.<br /> - For the optogenetic experiments, it is not clear if we are looking at the actual RhoA dynamics of the activity or at the dynamics of the optogenetic tool itself.<br /> - There is no model to analyze transient RhoA responses, however, the quantitative nature of the data calls for it. Even a simple model with linear activation-deactivation kinetics fitted on the data would be of benefit for the conclusions on the observed rates and absolute amounts.

    1. eLife assessment

      This study presents valuable findings characterising the genomic features of E. coli isolated from neonatal meningitis from seven countries, and documents bacterial persistence and reinfection in two case studies. The genomic analyses are solid, although the inclusion of a larger number of isolates from more diverse geographies would have strengthened the generalisability of findings. The work will be of interest to people involved in the management of neonatal meningitis patients, and those studying E. coli epidemiology, diversity, and pathogenesis.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This study uses whole genome sequencing to characterise the population structure and genetic diversity of a collection of 58 isolates of E. coli associated with neonatal meningitis (NMEC) from seven countries, including 52 isolates that the authors sequenced themselves and a further 6 publicly available genome sequences. Additionally, the study used sequencing to investigate three case studies of apparent relapse. The data show that in all three cases, the relapse was caused by the same NMEC strain as the initial infection. In two cases they also found evidence for gut persistence of the NMEC strain, which may act as a reservoir for persistence and reinfection in neonates. This finding is of clinical importance as it suggests that decolonisation of the gut could be helpful in preventing relapse of meningitis in NMEC patients.

      Strengths:<br /> The study presents complete genome sequences for n=18 diverse isolates, which will serve as useful references for future studies of NMEC. The genomic analyses are high quality, the population genomic analyses are comprehensive and the case study investigations are convincing.

      Weaknesses:<br /> The NMEC collection described in the study includes isolates from just seven countries. The majority (n=51/58, 88%) are from high-income countries in Europe, Australia, or North America; the rest are from Cambodia (n=7, 12%). Therefore it is not clear how well the results reflect the global diversity of NMEC, nor the populations of NMEC affecting the most populous regions.

      The virulence factors section highlights several potentially interesting genes that are present at apparently high frequency in the NMEC genomes; however, without knowing their frequency in the broader E. coli population it is hard to know the significance of this.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this work, the authors present a robust genomic dataset profiling 58 isolates of neonatal meningitis-causing E. coli (NMEC), the largest such cohort to be profiled to date. The authors provide genomic information on virulence and antibiotic resistance genomic markers, as well as serotype and capsule information. They go on to probe three cases in which infants presented with recurrent febrile infection and meningitis and provide evidence indicating that the original isolate is likely causing the second infection and that an asymptomatic reservoir exists in the gut. Accompanying these results, the authors demonstrate that gut dysbiosis coincides with the meningitis.

      Strengths:<br /> The genomics work is meticulously done, utilizing long-read sequencing.<br /> The cohort of isolates is the largest to be sampled to date.<br /> The findings are significant, illuminating the presence of a gut reservoir in infants with repeating infection.

      Weaknesses:<br /> Although the cohort of isolates is large, there is no global representation, entirely omitting Africa and the Americas. This is acknowledged by the group in the discussion, however, it would make the study much more compelling if there was global representation.

    4. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, Schembri et al performed a molecular analysis by WGS of 52 E. coli strains identified as "causing neonatal meningitis" from several countries and isolated from 1974 to 2020. Sequence types, virulence genes content as well as antibiotic-resistant genes are depicted. In the second part, they also described three cases of relapse and analysed their respective strains as well as the microbiome of three neonates during their relapse. For one patient the same E. coli strain was found in blood and stool (this patient had no meningitis). For two patients microbiome analysis revealed a severe dysbiosis.

      Major comments:<br /> Although the authors announce in their title that they study E. coli that cause neonatal meningitis and in methods stipulate that they had a collection of 52 NMEC, we found in Supplementary Table 1, 29 strains (threrefore most of the strains) isolated from blood and not CSF. This is a major limitation since only strains isolated from CSF can be designated with certainty as NMEC even if a pleiocytose is observed in the CSF. A very troubling data is the description of patient two with a relapse infection. As stated in the text line 225, CSF microscopy was normal and culture was negative for this patient! Therefore it is clear that patient without meningitis has been included in this study.

      Another major limitation (not stated in the discussion) is the absence of clinical information on neonates especially the weeks of gestation. It is well known that the risk of infection is dramatically increased in preterm neonates due to their immature immunity. Therefore E. coli causing infection in preterm neonates are not comparable to those causing infection in term neonates notably in their virulence gene content. Indeed, it is mentioned that at least eight strains did not possess a capsule, we can speculate that neonates were preterm, but this information is lacking. The ages of neonates are also lacking. The possible source of infection is not mentioned, notably urinary tract infection. This may have also an impact on the content of VF.

      Sequence analysis reveals the predominance of ST95 and ST1193 in this collection. The high incidence of ST95 is not surprising and well previously described, therefore, the concluding sentence line 132 indicating that ST95 E. coli should exhibit specific virulence features associated with their capacity to cause NM does not add anything. On the contrary, the high incidence of ST1193 is of interest and should have been discussed more in detail. Which specific virulence factors do they harbor? Any hypothesis explaining their emergence in neonates? In the paragraph depicted the VF it is only stated that ST95 contained significantly more VF than the ST1193 strains. And so what? By the way "significantly" is not documented: n=?, p=?<br /> The complete sequence of 18 strains is not clear. Results of Supplementary Table 2 are presented in the text and are not discussed.

      46 years is a very long time for such a small number of strains, making it difficult to put forward epidemiological or evolutionary theories. In the analysis of antibiotic resistance, there are no ESBLs. However, Ding's article (reference 34) and other authors showed that ESBLs are emerging in E. coli neonatal infection. These strains are a major threat that should be studied, unfortunately, the authors haven't had the opportunity to characterize such strains in their manuscript.

      Second part of the manuscript:<br /> The three patients who relapsed had a late neonatal infection (> 3 days) with respective ages of 6 days, 7 weeks, and 3 weeks. We do not know whether they are former preterm newborns (no term specified) or whether they have received antibiotics in the meantime.

      Patient 1: Although this patient had a pleiocytose in CSF, the culture was negative which is surprising and no explanation is provided. Therefore, the diagnosis of meningitis is not certain. Pleiocytose without meningitis has been previously described in neonates with severe sepsis.

      Line 215: no immunological abnormalities were identified (no details are given).

      Patient 2: This patient had a recurrence of bacteremia without meningitis (line 225: CSF microscopy was normal and culture negative!). This case should be deleted.

      Patient 3: This patient had two relapses which is exceptional and may suggest the existence of a congenital malformation or a neurological complication such as abscess or empyema therefore, "imaging studies" should be detailed.

      The authors suggest a link between intestinal dysbiosis and relapse in three patients. However, the fecal microbiomes of patients without relapse were not analysed, so no comparison is possible. Moreover, dysbiosis after several weeks of antibiotic treatment in a patient hospitalized for a long time is not unexpected. Therefore, it's impossible to make any assumption or draw any conclusion. This part of the manuscript is purely descriptive. Finally, the authors should be more prudent when they state in line 289 "we also provide direct evidence to implicate the gut as a reservoir [...] antibiotic treatment". Indeed the gut colonization of the mothers with the same strain may be also a reservoir (as stated in the discussion line 336).

      Finally, the authors do not discuss the potential role of ceftriaxone vs cefotaxime in the dysbiosis observed. Ceftriaxone may have a major impact on the microbiota due to its digestive elimination.

    1. eLife assessment

      This work reports a valuable finding on glucocorticoid signaling in male and female germ cells in mice, pointing out sexual dimorphism in transcriptomic responsiveness. The convincing evidence provided supports an inert GR signaling despite the presence of GR in the female germline and GR-mediated alternative splicing in response to dexamethasone treatment in the male germline. The work may interest basic researchers and physician-scientists working on reproduction and stress-related disease conditions.

    2. Joint Public Review:

      Summary:

      Cincotta et al set out to investigate the presence of glucocorticoid receptors in the male and female embryonic germline. They further investigate the impact of tissue specific genetically induced receptor absence and/or systemic receptor activation on fertility and RNA regulation. They are motivated by several lines of research that report inter and transgenerational effects of stress and or glucocorticoid receptor activation and suggest that their findings provide an explanatory mechanism to mechanistically back parental stress hormone exposure induced phenotypes in the offspring.

      Strengths:

      - A chronological immunofluorescent assessment of GR in fetal and early life oocyte and sperm development.<br /> - RNA seq data that reveal novel cell type specific isoforms validated by q-RT PCR E15.5 in the oocyte.<br /> - 2 alternative approaches to knock out GR to study transcriptional outcomes. Oocytes: systemic GR KO (E17.5) with low input 3-tag seq and germline specific GR KO (E15.5) on fetal oocyte expression via 10X single cell seq and 3-cap sequencing on sorted KO versus WT oocytes - both indicating little impact on polyadenylated RNAs -<br /> - 2 alternative approaches to assess the effect of GR activation in vivo (systemic) and ex vivo (ovary culture): here the RNA seq did show again some changes in germ cells and many in the soma.<br /> - They exclude oocyte specific GR signaling inhibition via beta isoforms<br /> - Perinatal male germline shows differential splicing regulation in response to systemic Dex administration, results were backed up with q-PCR analysis of splicing factors.

      Weaknesses:

      - Sequencing techniques used are not Total RNA but either are focused on all polyA transcripts (10x) - effects on non-polyA-transcripts are left unexplored.<br /> The number of replicates in the low input seq is very low and hence this might be underpowered. Since Dex treatment showed some (modest) changes in oocyte RNA effects of GR depletion might only become apparent upon Dex treatment as an interaction. Meaning GR activation in the presence of GR shows changes, upon GR depletion those changes are abolished --> statistically speaking an interaction --> conclusion: there are germline GR effects that get abolished when there is no GR hinting on germline GR autonomous effects.<br /> - Effects in oocytes following systemic Dex might be indirect due to GR activation in the soma. The changes observed might be irrelevant to meiosis and thus in the manuscript are deemed irrelevant, but they could still lead to settle consequences. in other terms.

      Even though ex vivo culture of ovaries shows GR translocation to nucleus it is not sure whether the in vivo systemic administration does the same. The authors argue in their rebuttal that GR is already nuclear in fetal oocytes hence the<br /> conclusion that fetal oocytes are resistant to GR manipulation is understandable, at least for the readouts that were considered. Yet the question arises: If GR is already nuclear (active) in the absence of additional Dex treatment why does GR knock out not elicit any changes in the considered readouts -> what are we missing.

      This work is a good reference point for researchers interested in glucocorticoid hormone signaling fertility and RNA splicing. It might spark further studies on germline-specific GR functions and the impact of GR activation on alternative splicing.<br /> The study provides a characterization of GR and some aspects of GR perturbation, and the negative findings in this study do help to rule out a range of specific roles of GR in the germline. This will help the study of unexplored options. The authors do acknowledge the unexplored options in their discussion.<br /> The intro of the study eludes to implications for intergenerational effects via epigenetic modifications in the germline and points out additional potential indirect effects of reproductive tissue GR signaling on the germline. Future studies might hence focus on further exploration of epigenetic modifications and/or indirect effects.

    1. eLife assessment

      This solid study presents a useful dataset regarding chromatin remodeling by the BAF complex in the context of meiotic sex chromosome inactivation. Using knockouts of the BAF complex subunit ARID1A, there appears to be pachynema arrest and a failure to repress sex-linked genes, which is supported by an increase in chromatin accessibility, as assessed by ATAC-seq.

    2. Reviewer #1 (Public Review):

      The work by Debashish U. Menon, Noel Murcia, and Terry Magnuson brings important knowledge about histone H3.3 dynamics involved in meiotic sex chromosome inactivation (MSCI). MSCI is unique to gametes and failure during this process can lead to infertility. Classically, MSCI has been studied in the context of DNA Damage repair pathways and little is known about the epigenetic mechanisms behind maintenance of the sex body as a silencing platform during meiosis. One of the major strengths of this work is the evidence provided on the role of ARID1A, a BAF subunit, in MSCI through the regulation of H3.3 occupancy in specific genic regions.

      Using RNA seq and CUT&RUN and ATAC-seq, the authors show that ARID1A regulates chromatin accessibility of the sex chromosomes and XY gene expression. Loss of ARID1A increases promoter accessibility of XY linked genes with concomitant influx of RNA pol II to the sex body and up regulation of XY-linked genes. This work suggests that ARID1A regulates chromatin composition of the sex body since in the absence of ARID1A, spermatocytes show less enrichment of H3.3 in the sex chromosomes and stable levels of the canonical histones H3.1/3.2. By overlapping CUT&RUN and ATAC-seq data, authors show that changes in chromatin accessibility in the absence of ARID1A are given by redistribution of occupancy of H3.3. Gained open chromatin in mutants corresponds to up regulation of H3.3 occupancy at transcription start sites of genes mediated by ARID1A.

      Interestingly, ARID1A loss caused increased promoter occupancy by H3.3 in regions usually occupied by PRDM9. PRDM9 catalyzes histone H3 lysine 4 trimethylation during meiotic prophase I, and positions double strand break (DSB) hotspots. Lack of ARID1A causes reduction in occupancy of DMC1, a recombinase involved in DSB repair, in non-homologous sex regions. These data suggest that ARID1A might indirectly influence DNA DSB repair on the sex chromosomes by regulating the localization of H3.3. This is very interesting given the recently suggested role for ARID1A in genome instability in cancer cells. It raises the question of whether this role is also involved in meiotic DSB repair in autosomes and/or how this mechanism differs in sex chromosomes compared to autosomes.

      The fact that there are Arid1a transcripts that escape the Cre system in the Arid1a KO mouse model might difficult the interpretation of the data. The phenotype of the Arid1a knockout is probably masked by the fact that many of the sequencing techniques used here are done on a heterogeneous population of knockout and wild type spermatocytes. In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed. Knockout mice produce sperm, and probably litters, although a full description of the subfertility phenotype is lacking, along with identification of the stage at which cell death is happening by detection of apoptosis.<br /> It is clear from this work that ARID1a is part of the protein network that contribute to silencing of the sex chromosomes. However, it is challenging to understand the timing of the role of ARID1a in the context of the well-known DDR pathways that have been described for MSCI. Staining of chromosome spreads with Arid1a antibody showed localization at the sex chromosomes by diplonema, however, analysis of gene expression in Arid1a ko was performed on pachytene spermatocytes. Therefore, is not very clear how the chromatin remodeling activity of Arid1a in diplonema is affecting gene expression of a previous stage. CUTnRUN showed that ARID1a is present at the sex chromatin in earlier stages, leading to hypothesize that immunofluorescence with ARID1a antibody might not reflect ARID1a real localization.

    3. Reviewer #2 (Public Review):

      The authors tried to characterize the function of the SWI/SNF remodeler family, BAF, in spermatogenesis. The authors focused on ARID1A, a BAF-specific putative DNA binding subunit, based on gene expression profiles. The study has several serious issues with the data and interpretation. The conditional deletion mouse model of ARIDA using Stra8-cre showed inefficient deletion; spermatogenesis did not appear to be severely compromised in the mutants. Using this data, the authors claimed that meiotic arrest occurs in the mutants. This is obviously a misinterpretation. In the later parts, the authors performed next-gen analyses, including ATAC-seq and H3.3 CUT&RUN, using the isolated cells from the mutant mice. However, with this inefficient deletion, most cells isolated from the mutant mice appeared not to undergo Cre-mediated recombination. Therefore, these experiments do not tell any conclusion pertinent to the Arid1a mutation. Furthermore, many of the later parts of this study focus on the analysis of H3.3 CUT&RUN. However, Fig. S7 clearly suggests that the H3.3 CUT&RUN experiment in the wild-type simply failed. Thus, none of the analyses using the H3.3 CUT&RUN data can be interpreted. Overall, I found that the study does not have rigorous data, and the study is not interpretable. If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      In this revised manuscript, the authors did not make any efforts to address my major criticisms, and I do not see any improvement. I only found the responses to 4 points, but I do not see any response to other major and minor comments. I understand the challenge (~70 deletion efficiency in the mutants) in this study. However, the inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

    4. Reviewer #3 (Public Review):

      In this manuscript, Magnuson and colleagues investigate the meiotic functions of ARID1A, a putative DNA binding subunit of the SWI/SNF chromatin remodeler BAF. The authors develop a germ cell specific conditional knockout (cKO) mouse model using Stra8-cre and observe that ARID1A-deficient cells fail to progress beyond pachytene, although due to inefficiency of the Stra8-cre system the mice retain ARID1A-expressing cells that yield sperm and allow fertility. Because ARID1A was found to accumulate at the XY body late in Prophase I, the authors suspected a potential role in meiotic silencing and by RNAseq observe significant misexpression of sex-linked genes that typically are silenced at pachytene. They go on to show that ARID1A is required for exclusion of RNA PolII from the sex body and for limiting promoter accessibility at sex-linked genes, consistent with a meiotic sex chromosome inactivation (MSCI) defect in cKO mice. The authors proceed to investigate the impacts of ARID1A on H3.3 deposition genome-wide. H3.3 is known be regulated by ARID1A and is linked to silencing, and here the authors find that upon loss of ARID1A, overall H3.3 enrichment at the sex body as measured by IF failed to occur, but H3.3 was enriched specifically at transcriptional start sites of sex-linked genes that are normally regulated by ARID1A. The results suggest that ARID1A normally prevents H3.3 accumulation at target promoters on sex chromosomes and based on additional data, restricts H3.3 to intergenic sites. Finally, the authors present data implicating ARID1A and H3.3 occupancy in DSB repair, finding that ARID1A cKO leads to a reduction in focus formation by DMC1, a key repair protein. Overall the paper provides new insights into the process of MSCI from the perspective of chromatin composition and structure, and raises interesting new questions about the interplay between chromatin structure, meiotic silencing and DNA repair.

      In general the data are convincing. The conditional KO mouse model has some inherent limitations due to incomplete recombination and the existence of 'escaper' cells that express ARID1A and progress through meiosis normally. This reviewer feels that the authors have addressed this point thoroughly and have demonstrated clear and specific phenotypes using the best available animal model. The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase also is suggested by the data in Table 1. The western blot showing ARID1A expression in WT vs. cKO spermatocytes (Fig. S2) is supportive of the cKO model but raises some questions. The blot shows many bands that are at lower intensity in the cKO, at MWs from 100-250kDa. The text and accompanying figure legend have limited information. Are the various bands with reduced expression different isoforms of ARID1A, or something else? What is the loading control 'NCL'? How was quantification done given the variation in signal across a large range of MWs?

      An additional weakness relates to how the authors describe the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling and the authors even indicate that "it is reasonable to hypothesize that DDR signaling might recruit BAF-A to the sex chromosomes." It therefore is difficult to understand why the authors continue to state that "...the mechanisms underlying ARID1A-mediated repression of the sex-linked transcription are mutually exclusive to DDR pathways regulating sex body formation" (p. 8) and that "BAF-A-mediated transcriptional repression of the sex chromosomes occurs independently of DDR signaling" (p. 16). The data provided do not justify these conclusions, as a role for DDR signaling upstream of ARID1A would mean that these mechanisms are not mutually exclusive or independent of one another.

      A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive but it would be appropriate to add a qualifier to the conclusion regarding RAD51 in the discussion which states that "...loss of ARID1a decreases DMC1 foci on the XY chromosomes without affecting RAD51" given that the provided RAD51 data are not rigorous. In the long-term it also would be interesting to quantitatively examine DMC1 and RAD51 focus formation on autosomes as well.

    1. eLife assessment

      This manuscript describes rigorous experiments that provide a wealth of virologic, respiratory physiology, and particle aerodynamic data pertaining to aerosol transmission of SARS-CoV-2 between infected Syrian hamsters. The significance of the paper is fundamental because infection is compared between alpha and delta variants, and because viral load is assessed via numerous assays (gRNA, sgRNA, TCID) and in tissues as well as the ambient environment of the cage. The strength of evidence is compelling.

    2. Reviewer #1 (Public Review):

      In the submitted manuscript, Port et al. investigated the host and viral factors influencing the airborne transmission of SARS-CoV-2 Alpha and Delta variants of concern (VOC) using a Syrian hamster model. The authors analyzed the viral load profiles of the animal respiratory tracts and air samples from cages by quantifying gRNA, sgRNA, and infectious virus titers. They also assessed the breathing patterns, exhaled aerosol aerodynamic profile, and size distribution of airborne particles after SARS-CoV-2 Alpha and Delta infections. The data showed that male sex was associated with increased viral replication and virus shedding in the air. The relationship between co-infection with VOCs and the exposure pattern/timeframe was also tested. This study appears to be an expansion of a previous report (Port et al., 2022, Nature Microbiology). The experimental designs were rigorous, and the data were solid. These results will contribute to the understanding of the roles of host and virus factors in the airborne transmission of SARS-CoV-2 VOCs.

    3. Reviewer #2 (Public Review):

      This manuscript by Port and colleagues describes rigorous experiments that provide a wealth of virologic, respiratory physiology, and particle aerodynamic data pertaining to aerosol transmission of SARS-CoV-2 between infected Syrian hamsters. The data is particularly significant because infection is compared between alpha and delta variants, and because viral load is assessed via numerous assays (gRNA, sgRNA, TCID) and in tissues as well as the ambient environment of the cage. The paper will be of interest to a broad range of scientists including infectious diseases physicians, virologists, immunologists and potentially epidemiologists.

    1. Author Response

      eLife assessment

      The manuscript presents valuable evidence of temporal correlations during specific oscillatory activity between the prefrontal cortex, thalamic nucleus reuniens, and the hippocampus, in naturally sleeping animals. Such correlations represent solid evidence to support the notion that the thalamic nucleus reuniens participates in the hippocampal and prefrontal cortex dialogue subserving memory processes.

      Thank you for your assessment.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Basha and colleagues aim to test whether the thalamic nucleus reuniens can facilitate the hippocampus/prefrontal cortex coupling during sleep. Considering the importance of sleep in memory consolidation, this study is important to understand the functional interaction between these three majorly involved regions. This work suggests that the thalamic nucleus reuniens has a functional role in synchronizing the hippocampus and prefrontal cortex.

      Strengths:

      The authors performed recordings in naturally sleeping cats, and analysed the correlation between the main slow wave sleep oscillatory hallmarks: slow waves, spindles, and hippocampal ripples, and with reuniens' neurons firing. They also associated intracellular recordings to assess the reuniens-prefrontal connectivity, and computational models of large networks in which they determined that the coupling of oscillations is modulated by the strength of hippocampal-thalamic connections.

      Thank you for your positive evaluation.

      Weaknesses:

      The authors' main claim is made on slow waves and spindle coupling, which are recorded both in the prefrontal cortex and surprisingly in reuniens. Known to be generated in the cortex by cortico-thalamic mechanisms, the slow waves and spindles recorded in reuniens show no evidence of local generation in the reuniens, which is not anatomically equipped to generate such activities. Until shown differently, these oscillations recorded in reuniens are most likely volume-conducted from nearby cortices. Therefore, such a caveat is a major obstacle to analysing their correlation (in time or frequency domains) with oscillations in other regions.

      1. We fully agree with the reviewer that reuniens likely does not generate neither slow waves nor spindles. We do not make such claim, which we clearly stated in the discussion (lines 319-324). We propose that Reuniens neurons mediate different forms of activity. In the model, we introduced MD nucleus only because without MD we were unable to generate spindles. While the slow waves and spindles are generated in other thalamocortical regions, the REU neurons show these rhythms due to long-range projections from these regions to REU as has been shown in the model.

      2. Definitely, we cannot exclude some influence of volume conductance on obtained LFP recordings in REU nucleus. However, we show modulation of spiking activity within REU by spindles. Spike modulation cannot be explained by volume conductance but can be explained by either synaptic drive (likely the case here) or some intrinsic neuronal processes (like T-current).

      3. In our REU recordings for spike identification we used tetrode recordings. If slow waves and spindles are volume conducted, then slow waves and spindles recorded with tetrodes should have identical shape. Following reviewer comment, we took these recordings and subtracted one channel from another. The difference in signal during slow waves is in the order 0.1 mV. Considering that the distance between electrodes is in the order of 20 um, such a difference in voltage is major and can only be explained by local extracellular currents, likely due to synaptic activities originating in afferent structures.

      Finally, the choice of the animal model (cats) is the best suited one, as too few data, particularly anatomical ones regarding reuniens connectivity, are available to support functional results.

      1. Thalamus of majority of mammals (definitely primates and carnivores, including cats) contain local circuit interneurons (about 30 % of all neurons). A vast majority of studies in rodents (except LGN nucleus) report either absence or extremally low (i.e. Jager P, Moore G, Calpin P, et al. Dual midbrain and forebrain origins of thalamic inhibitory interneurons. eLife. 2021; 10: e59272.) number of thalamic interneurons. Therefore, studies on other species than rodents are necessary, and bring new information, which is impossible to obtain in rodents.

      2. Cats’ brain is much larger than the brain of mice or rats, therefore, the effects of volume conductance from cortex to REU are much smaller, if not negligible. The distance between REU and closest cortical structure (ectosylvian gyrus) in cats is about 15 mm.

      3. Indeed, there is much less anatomical data on cats as opposed to rodents. This is why, we performed experiments shown in the figure 1. This figure contains functional anatomy data. Antidromic responses show that recorded structure projects to stimulated structure. Orthodromic responses show that stimulated structure projects to recorded structure.

      Reviewer #2 (Public Review):

      Summary:

      The interplay between the medial prefrontal cortex and ventral hippocampal system is critical for many cognitive processes, including memory and its consolidation over time. A prominent idea in recent research is that this relationship is mediated at least in part by the midline nucleus reuniens with respect to consolidation in particular. Whereas the bulk of evidence has focused on neuroanatomy and the effects of temproary or permanent lesions of the nucleus reuniens, the current work examined the electrophysiology of these three structures and how they inter-relate, especially during sleep, which is anticipated to be critical for consolidation. They provide evidence from intercellular recordings of the bi-directional functional connectivity among these structures. There is an emphasis on the interactions between these regions during sleep, especially slow-wave sleep. They provide evidence, in cats, that cortical slow waves precede reuniens slow waves and hippocampal sharp-wave ripples, which may reflect prefrontal control of the timing of thalamic and hippocampal events, They also find evidence that hippocampal sharp wave ripples trigger thalamic firing and precede the onset of reuniens and medial prefrontal cortex spindles. The authors suggest that the effectiveness of bidirectional connections between the reuniens and the (ventral) CA1 is particularly strong during non-rapid eye movement sleep in the cat. This is a very interesting, complex study on a highly topical subject.

      Strengths:

      An excellent array of different electrophysiological techniques and analyses are conducted. The temporal relationships described are novel findings that suggest mechanisms behind the interactions between the key regions of interest. These may be of value for future experimental studies to test more directly their association with memory consolidation.

      We thank this reviewer for very positive evaluation of our study.

      Weaknesses:

      Given the complexity and number of findings provided, clearer explanation(s) and organisation that directed the specific value and importance of different findings would improve the paper. Most readers may then find it easier to follow the specific relevance of key approaches and findings and their emphasis. For example, the fact that bidirectional connections exist in the model system is not new per se. How and why the specific findings add to existing literature would have more impact if this information was addressed more directly in the written text and in the figure legends.

      Thank you for this comment. In the revised version, we will do our best to simplify presentation and more clearly explain our findings.

    2. eLife assessment

      The manuscript presents valuable evidence of temporal correlations during specific oscillatory activity between the prefrontal cortex, thalamic nucleus reuniens, and the hippocampus, in naturally sleeping animals. Such correlations represent solid evidence to support the notion that the thalamic nucleus reuniens participates in the hippocampal and prefrontal cortex dialogue subserving memory processes.

    3. Reviewer #1 (Public Review):

      Summary:<br /> In this study, Basha and colleagues aim to test whether the thalamic nucleus reuniens can facilitate the hippocampus/prefrontal cortex coupling during sleep. Considering the importance of sleep in memory consolidation, this study is important to understand the functional interaction between these three majorly involved regions. This work suggests that the thalamic nucleus reuniens has a functional role in synchronizing the hippocampus and prefrontal cortex.

      Strengths:<br /> The authors performed recordings in naturally sleeping cats, and analysed the correlation between the main slow wave sleep oscillatory hallmarks: slow waves, spindles, and hippocampal ripples, and with reuniens' neurons firing. They also associated intracellular recordings to assess the reuniens-prefrontal connectivity, and computational models of large networks in which they determined that the coupling of oscillations is modulated by the strength of hippocampal-thalamic connections.

      Weaknesses:<br /> The authors' main claim is made on slow waves and spindle coupling, which are recorded both in the prefrontal cortex and surprisingly in reuniens. Known to be generated in the cortex by cortico-thalamic mechanisms, the slow waves and spindles recorded in reuniens show no evidence of local generation in the reuniens, which is not anatomically equipped to generate such activities. Until shown differently, these oscillations recorded in reuniens are most likely volume-conducted from nearby cortices. Therefore, such a caveat is a major obstacle to analysing their correlation (in time or frequency domains) with oscillations in other regions.

      Finally, the choice of the animal model (cats) is the best suited one, as too few data, particularly anatomical ones regarding reuniens connectivity, are available to support functional results.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The interplay between the medial prefrontal cortex and ventral hippocampal system is critical for many cognitive processes, including memory and its consolidation over time. A prominent idea in recent research is that this relationship is mediated at least in part by the midline nucleus reuniens with respect to consolidation in particular. Whereas the bulk of evidence has focused on neuroanatomy and the effects of temproary or permanent lesions of the nucleus reuniens, the current work examined the electrophysiology of these three structures and how they inter-relate, especially during sleep, which is anticipated to be critical for consolidation. They provide evidence from intercellular recordings of the bi-directional functional connectivity among these structures. There is an emphasis on the interactions between these regions during sleep, especially slow-wave sleep. They provide evidence, in cats, that cortical slow waves precede reuniens slow waves and hippocampal sharp-wave ripples, which may reflect prefrontal control of the timing of thalamic and hippocampal events, They also find evidence that hippocampal sharp wave ripples trigger thalamic firing and precede the onset of reuniens and medial prefrontal cortex spindles. The authors suggest that the effectiveness of bidirectional connections between the reuniens and the (ventral) CA1 is particularly strong during non-rapid eye movement sleep in the cat. This is a very interesting, complex study on a highly topical subject.

      Strengths:<br /> An excellent array of different electrophysiological techniques and analyses are conducted. The temporal relationships described are novel findings that suggest mechanisms behind the interactions between the key regions of interest. These may be of value for future experimental studies to test more directly their association with memory consolidation.

      Weaknesses:<br /> Given the complexity and number of findings provided, clearer explanation(s) and organisation that directed the specific value and importance of different findings would improve the paper. Most readers may then find it easier to follow the specific relevance of key approaches and findings and their emphasis. For example, the fact that bidirectional connections exist in the model system is not new per se. How and why the specific findings add to existing literature would have more impact if this information was addressed more directly in the written text and in the figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Activity has effects on the development of neural circuitry during almost any step of differentiation. In particular during specific time periods of circuit development, so-called critical periods (CP), altered neural activity can induce permanent changes in network excitability. In complex neural networks, it is often difficult to pinpoint the specific network components that are permanently altered by activity, and it often remains unclear how activity is integrated during the CP to set mature network excitability. This study combines electrophysiology with pharmacological and optogenetic manipulation in the Drosophila genetic model system to pinpoint the neural substrate that is influenced by altered activity during a critical period (CP) of larval locomotor circuit development. Moreover, it is then tested whether and how different manipulations of synaptic input are integrated during the CP to tune network excitability.

      Strengths:

      Based on previous work, during the CP, network activity is increased by feeding the GABA-AR antagonist PTX. This results in permanent network activity changes, as highly convincingly assayed by a prolonged recovery period following induced seizure and by altered intersegmental locomotor network coordination. This is then used to provide two important findings: First, compelling electro- and optophysiological experiments track the site of network change down to the level of single neurons and pre- versus postsynaptic specializations. In short, increased activity during the CP increases both the magnitude of excitatory and inhibitory synaptic transmission to the aCC motoneuron, but excitation is affected more strongly. This results in altered excitation inhibition ratios. Fine electrophysiology shows that excitatory synapse strengthening occurs postsynaptically. High-quality anatomy shows that dendrite size and numbers of synaptic contacts remain unaltered. It is a major accomplishment to track the tuning of network excitability during the CP down to the physiology of specific synapses to identified neurons.

      Second, additional experiments with single neuron resolution demonstrate that during the CP different forms of activity manipulation are integrated so that opposing manipulations can rescue altered setpoints. This provides novel insight into how developing neural network excitability is tuned, and it indicates that during the CP, training can rescue the effects of hyperactivity.

      Weaknesses:

      There are no major weaknesses to the findings presented, but the molecular cause that underlies increased motoneuron postsynaptic responsiveness as well as the mechanism that integrates different forms of activity during the CP remain unknown. It is clear that addressing these experimentally is beyond the scope of this study, but some discussion about different candidates would be helpful.

      We discuss likely mechanisms that underpin the increase in postsynaptic responsiveness below (Reviewer #1 (Recommendations For The Authors):, point 2). To address possible mechanisms that integrate different forms of activity we now include a new paragraph in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors use the tractable Drosophila embryonic/larval motor circuit to determine how manipulations of activity during a critical period (CP) modify the circuit in ways that persist into later developmental stages. Previously, this group demonstrated that manipulations to the aCC/MN-Ib neuron in embryonic stages enhance (or can rescue) susceptibility to seizures at later larval stages. Here, the authors demonstrate that following enhanced excitatory drive (by PTX feeding), the aCC neuron acquires increased sensitivity to cholinergic excitatory transmission, presumably due to increased postsynaptic receptor abundance and/or sensitivity, although this is not clarified. Although locomotion is not altered at later developmental larval stages, the authors suggest there is reduced "robustness" to induced seizures. The second part of the study then goes on to enhance inhibition during the CP in an attempt to counteract the enhanced excitation, and show that many aspects of the CP plasticity are rescued. The authors conclude that "average" E/I activity is integrated during the CP to determine the excitability of the mature locomotor network.

      Overall, this study provides compelling mechanistic insight into how a final motor output neuron changes in response to enhanced excitatory drive during a CP to change the functionality of the circuit at later mature developmental stages. The first part of this study is strong, clearly showing the changes in the aCC neuron that result from enhanced excitatory input. This includes very nice electrophysiology and imaging data that assess synaptic function and structure onto aCC neurons from pre-motor inputs resulting from PTX exposure during development. However, the later experiments in Figures 6 and 7 designed to counteract the CP plasticity are somewhat difficult to interpret. In particular, the specificity of the manipulations of the ch neuron intended to counteract the CP plasticity is unclear, given the complexities of how these changes impact the excitability of all neurons during development. It is clear that CP plasticity is largely rescued in later stages, but it is hard to know if downstream or secondary adaptations may be masking the PTX-induced plasticity normally observed. Nonetheless, this study provides an important advance in our understanding of what parameters change during CPs to calibrate network dynamics at later developmental stages.

      Reviewer #3 (Public Review):

      Summary:

      In Hunter, Coulson et al, the authors seek to expand our understanding of how neural activity during developmental critical periods might control the function of the nervous system later in life. To achieve increased excitation, the authors build on their previous results and apply picrotoxin 17-19 hours after egg-laying, which is a critical period of nervous system development. This early enhancement of excitation leads to multiple effects in third-instar larvae, including prolonged recovery from electroshock, increased synchronization of motor neuron networks, and increased AP firing frequency. Using optogenetics and whole-cell patch clamp electrophysiology, the authors elegantly show that picrotoxin-induced over-excitation leads to increased strength of excitatory inputs and not loss of inhibitory inputs. To enhance inhibition, the authors chose an approach that involved the stimulation of mechanosensory neurons; this counteracts picrotoxin-induced signs of increased excitation. This approach to enhancing inhibition requires further control experiments and validation.

      Strengths:

      • The authors confirm their previous results and show that 17-19 hours after egg laying is a critical period of nervous system development.

      • Using Ca2+/Sr2+ substitutions, the authors demonstrate that synaptic connections between A18a  aCC show increased mEPSP amplitudes. The authors show that this aCC input is what is driving enhanced excitation.

      • The authors demonstrate that the effects of over-excitation attributed to picrotoxin exposure are generalizable and also occur in bss mutant flies.

      Weaknesses:

      • The authors build on their previous work and argue that the critical period (17-19h after egg-laying) is a uniquely sensitive period of development. Have the authors already demonstrated that exposure to picrotoxin at L1 or L2 (and even early L3 if experimentally possible) does not lead to changes in induced seizure at L3? This would further the authors' hypothesis of the uniqueness of the 17-19h AEL period. If this has already been established in prior publications, then this needs to be further explained. I do note in Gaicehllo and Baines (2015) that Fig 2E shows the identification of the 17-19h window.

      This is a pertinent comment. We now have evidence that activity manipulation (in this instance by increasing temperature, which recapitulates the effect of PTX) is not effective at larval stages (L1 to L3) but remains effective between 17-19hrs AEL. This observation forms part of a separate study where we explore the role of circadian activity on embryonic and larval neuronal development. We include a brief statement to address this comment in the revision (first paragraph of Results).

      • Regarding experiments in Fig 2, authors only report changes in AP firing frequency. Can the authors also report other metrics of excitability, including measures of intrinsic excitability with and without picrotoxin exposure (including RMP, Rm)? Was a different amount of current injection needed to evoke stable 5-10 Hz firing with and without picrotoxin? In the representative figure (Fig. 2A), it appears that the baseline firing frequencies are different prior to optogenetic stimulation.

      No differences in RM, Rin or capacitance were observed due to PTX. This is now included in the revision along with an explanation that different levels of current injection were used to measure effects to excitatory vs inhibitory synaptic drive. We did not specifically monitor the amount of current required to maintain stable firing.

      • The ch-related experiments require further controls and explanation. Regarding experiments in Fig 6, what is the effect of ch neuron stimulation alone on time lag and AP frequency? Can the authors further clarify what is known about connections between aCC and ch neurons? It is difficult for this reviewer to conceptualize how enhancing ch-mediated inhibition would worsen seizures. While the cited study (Carreira-Rosario et al 2021) convincingly shows that inhibition of mechanosensory input leads to excessive spontaneous network activity, has it been shown that the converse - stimulation of ch neurons - indeed enhances network inhibition?

      • The interpretation of ch-related experiments is further complicated by the explanation in the Discussion that ch neuron stimulation depolarizes aCC neurons; this seems to undercut the authors' previous explanation that the increased E:I ratio is corrected by enhanced inhibition from ch neurons. The idea that ch neurons are placing neurons in a depolarized refractory state is not substantiated by data in the paper or citations.

      To respond to these two points combined: The reviewer is correct in stating that additional experiments will be required to fully understand mechanism. We believe that cholinergic (excitatory) chordotonal input to aCC may be an important component for setting the rhythm of the locomotor CPG. Indeed, it may be that CPG rhythm is a key factor during the CP. Our observations suggest optogenetic stimulation of Ch neurons alone is sufficient to induce large, ~400-, currents that resemble endogenous spontaneous rhythmic currents (SRCs) associated with CPG activity. SRCs occur with a characteristic frequency of ~1Hz, and we have some unpublished data that suggests it is possible to change this frequency using ch stimulation. This data therefore unifies prior work (Carreira-Rosario et al., 2021 description of a brake) with our own (observation that ch depolarize aCC). However, we do not include this speculation in the Discussion because the experiments we have conducted were pilots. They may be expanded upon and included in future work.

      • In the Discussion, the authors suggest that enhanced proprioception leading to seizures is reminiscent of neurological conditions. This seems to be an oversimplification. Connecting abnormal proprioception to seizures is quite different from connecting abnormal proprioception to disorders of coordination. This should be revised.

      Because this is peripheral to our main study, we have deleted this from the revision.

      Reviewer #1 (Recommendations For The Authors):

      1. Although the authors have to be commended for the scrutiny with which they pinpoint a site of circuit change, it cannot be excluded that other parts of the circuit also undergo adjustments in response to activity manipulation during the CP, e.g. the membrane properties of the interneurons. This is not a problem but should be discussed.

      We agree with this comment and have added the following text to the discussion……’However, we recognise that other parts of the locomotor network may also undergo change due to CP manipulation. The advantage of this system is that most of these elements are now open to specific manipulation through cell-specific genetic drivers’. (Discussion paragraph 3)

      1. It is surprising that there is no discussion of the potential molecular cause for the observed increases in postsynaptic responses to SV release from cholinergic neurons. Given that there are no differences in postsynaptic structure, puncta number etc., the subunit composition of the nAChR seems an obvious guess. What is known about the nAChRs subunit composition on aCC, and when during development do the receptors/different subunits become expressed? A paragraph in the discussion on this issue would be highly relevant to the manuscript.

      Our own work (unpublished) together with a recent paper from the Littleton lab (https://www.sciencedirect.com/science/article/pii/S0896627323005810?via%3Dihub#mmc2) suggests that aCC expresses the majority, if not all, of the 7 alpha and 3 beta subunits that compromise nAChRs. The situation is further complicated by the fact that these receptors are pentameric and are composed of various subunits – the composition significantly altering channel kinetics. Less is known about expression timelines for each receptor subunit, and certainly not in aCC. We already include the following sentence in the results text……’ A change in the frequency of mini excitatory postsynaptic potentials (mEPSPs, a.k.a. minis) would suggest the adaptation is primarily presynaptic (e.g. increased probability of release), whilst a change in distribution and/or amplitude of minis is more consistent with a mechanism acting postsynaptically (e.g. increased or altered receptor subunits).’ Given that we know next to nothing about the nAChR subunit composition in aCC and how this might change due to CP manipulation, we feel it better not to speculate further. To help the reader, we include the following sentence in the discussion……’The precise mechanism contributing to increased mini amplitude remains to be determined, but a plausible scenario may involve change in cholinergic subunit composition.’ (Discussion paragraph 3)

      1. It would be important to provide the p-values for Figures 1B and C, especially because it seems that the inhibition also becomes stronger upon PTX treatment during the CP. There is no statistical testing mentioned, was no test done or was it not significant? It is agreed that the effect size is clearly stronger for the increased excitation than for the increased inhibition, but looking at the data suggests that the effect on excitation is not much more significant than the effect on inhibition.

      The reviewer is referring to Fig 2B&C. P values have been added to both main text and to the figure legend.

      1. Associated with the point above, in the discussion line 407 and below the authors come back to this point and reason that it is surprising that increased excitation is not compensated for by homeostatic mechanisms. It is concluded that homeostatic compensation brings the system back to a setpoint that is defined during the critical period, but the setpoint is set higher in this case. However, an alternative explanation is that GABA administration during the critical period causes the excitation set point to be too high, but this is then partially counteracted in a homeostatic manner by increasing inhibition. If the p-values in Figures 2B and C are rather similar, this might even be the favorable interpretation.

      We believe the reviewer means ‘PTX administration’ and not GABA. This is an interesting idea and one we had not really considered. We address this comment by adding the following text………. ‘Alternatively, whilst the increased inhibition we observe is not statistically significant (p = 0.15), it is close and has a medium effect size (Cohen’s d = 0.78), and thus may be indicative of an attempt by the locomotor network to rebalance activity back towards a genetically pre-determined level. In this regard, it may just not have sufficient range to be able to counter the increase in excitation due to CP manipulation.’ (Discussion paragraph 5)

      1. To asses the magnitudes of A18a-mediated excitation and A31k-mediated inhibition to aCC, changes in aCC firing frequency were measured. For this aCC was injected with current to fire at all. However, the current injections were chosen to cause firing at 5-10 Hz. During a crawling burst, aCC fires well above 100Hz (Kadas et al., 2017). Are the effects also visible at such firing frequencies, or at least across different firing frequencies? I am not asking for additional experiments, but maybe the data are there and can be referred to?

      Spiking in aCC occurs as burst firing, evoked by cholinergic synaptic drive, that lasts for ~300ms and achieving firing frequencies of between 50-100Hz (Kadas et al., 2017 and our own unpublished data). We did not test for effects to excitation or inhibition at these higher frequencies. We now make this explicit in the discussion by adding the following sentence……’The firing frequencies that we imposed (1-10Hz) are also lower than seen during fictive locomotion (Kadas et al., 2017), which shows burst firing lasting for ~300 ms and achieving spike frequencies of up to 100Hz.’ (Discussion paragraph 3)

      1. In Figure 3B some minis are demarked by green arrows and others are not. Were the non-marked ones not included in the analysis, and what were the criteria to mark some and others not? This is particularly important because the cumulative distribution of minis is analyzed in Figure 3D, and this depends crucially on what qualifies as mini and what does not.

      All mini’s are marked by green arrows. The events not marked are not mini’s. Drosophila neurons are small and have an unfavourable dendritic structure for recording minis. Thus, we carefully analyse traces by eye taking only events that show very rapid rise times and slower, exponential decay (the typical mini shape). There are, however, other events which are most likely single/multiple channel openings, which due to filtering are rounded. We now include this same trace, greatly expanded, as Fig S1D to show how we identified minis from non-minis.

      1. The asynchronous release experiment under Sr2+ seems an elegant way to analyze minis upon optogenetic stimulation of an identified presynaptic cholinergic neuron. I suggest being a little more conservative with the term asynchronous release (or replacing it), which is usually the release of many single vesicles that follow AP-mediated synaptic transmission and has nicely been demonstrated at the Drosophila NMJ (Besse et al., 2007). Also, please show the trace in Figure S2A under Sr2+ at a higher pA magnification, it is really hard to see the minis there.

      We have adopted a previously published technique that, in our view, correctly uses the term ‘asynchronous release’. This is not to say that all asynchronous release occurs via the same mechanism. Indeed, the papers that report the technique we use predate Besse 2007. We also expand the trace in Fig S1A (not S2A as wrongly indicated).

      Reviewer #2 (Recommendations For The Authors):

      1. Can the authors explain what they think is the parameter of "activity" being measured in the locomotor circuit (mainly aCC) during the CP? Is the aCC neuron simply summing (perhaps through a proxy like Ca2+) total excitation/inhibition over time during the CP?

      Reviewer #1 also requests that we discuss how activity is ‘measured’ and thus we now include a dedicated paragraph in the discussion to address this concern. Whether aCC sums ‘average’ activity or perhaps is influenced by activity extremes remains uncertain. Our data is consistent with the former but further work is required to validate our conclusion. This work will be published in due course.

      Related to understanding this concept, could the authors' silence activity (using Kir2.1, TNT, or BoNT) from each of the monosynaptic premotor inputs in otherwise wildtype and following PTX exposure to determine how the circuit responds when each of the monosynaptic inputs are silenced? This might inform the role they play in instructing how activity is measured over time during the CP.

      This is an excellent suggestion and, indeed, we have planned such experiments. Silencing specific neurons, whilst manipulating the CP, may well result in more significant network instability due to the setting of multiple (and physiologically inappropriate) homeostatic set points. Such studies go beyond the scope of the present study and thus we prefer not to speculate at this early stage, but to wait for experimental data.

      On a related note, the authors focus on just 2 premotor inputs, presumably due to the availability of specific drivers. But do the authors know how many other inputs (other ACh, Gaba, and glutamate) onto aCC there are, and to what extent do the authors think these are changed in similar or distinct ways? Is it implied that all neurons are similarly altered by the manipulations?

      The connectome details the number and types of neurons that directly contact the aCC motoneuron (Zarin et al., 2019). In terms of cholinergic excitors, the results present in Figure 3 suggest that most (all?) inputs are strengthened following embryonic PTX exposure. However, to conclude this would be highly speculative and thus we refrain from doing so in the manuscript. As other single-neuron driver lines become available, such expts will hopefully be possible.

      1. If PTX treatment does indeed increase CPG synchronicity, shouldn't there be a readout of this effect on larval locomotion? While the speed of locomotion wasn't significantly impacted, perhaps another parameter was altered.

      It is quite possible that other aspects of locomotion are being altered (turning, rearing, etc), but we have not analysed for these more subtle behaviours. Indeed, although not statistically significant, there is a modest reduction in average velocity in larvae derived from PTX-exposed embryos. We see similar reductions in characterised seizure mutants which also show increased synchronicity (Streit et al., 2016).

      1. In Figure 2 and elsewhere, what is the baseline level of AP firing rate in each aCC neuron, before optogenetic stimulation? Is this informative about how PTX exposure alters excitability to begin with, perhaps by changing intrinsic excitability.

      We now include this data in the relevant results section. Interestingly, following exposure to PTX, basal firing was significantly increased in A18a (excitatory premotor) but not in A31k (inhibitory premotor). This reflects our experiment in which we conclude that excitatory drive to aCC is increased relative to inhibitory synaptic drive. Thus, this measure seemingly validates our conclusion that E:I balance has been altered following activity-manipulation during the CP.

      1. Figure 3: The apparent increase in mini amplitude is very small (4.1 vs 4.5 pA); is this physiologically meaningful? Although the authors say the decrease in mini freq is not significant in Fig. 3B after PTX, it does appear rather large, a 40% reduction (5 vs 3 Hz).

      We must be guided by statistics in drawing conclusions, but the reader can interpret our data as they wish. Minis measure quantal release and thus to appreciate how small change can, when combined over the many receptors present, influence cell physiology, one needs to compare spiking activity. We show in Fig 2 that such change is sufficient to increase the excitatory synaptic drive provided by the A18a neuron. The seemingly larger reduction in mini frequency is intriguing and may reflect additional change, but without further experiments we cannot draw firm conclusions.

      1. The clever vibration assay is a good one to induce the activation of mechanosensory neurons, but the specificity of the changes induced by this is difficult to ascertain. One possibility would be to silence the output of the ch neurons (by expression to tetanus or botulinum toxin) and still put the larvae through the same vibration during the CP to see if the rescue is lost.

      We agree that further experiments are required to fully understand underlying mechanism(s). However, we will not be able to complete such follow-on expts in a timely manner and thus, these must wait and form the basis of future studies.

      Minor points 1. Typos - there are numerous areas where it seems a comma is used inappropriately (e.g. lines 28, 69, 77, 104, 348, 365, etc). Suggest line editing the final "version of record".

      Checked and corrected.

      1. It would be of benefit to show the genotypes of the larvae in the various experimental manipulations in the relevant figure legends. This reviewer could not follow exactly how each experiment was done as it was not always clear which driver was being used to express which transgene in what genetic background.

      Done

      Reviewer #3 (Recommendations For The Authors):

      • Please provide sample videos of electroshock-induced seizures (e.g. Fig 1B). Is it clear that the period of immobility after electroshock is a seizure (perhaps defined as hyperactivity originating from the brain)? I acknowledge the Baines group is quite skilled in this technique and perhaps there is a straightforward answer or citation to include.

      We refer the reader to Marley and Baines 2011 which contains videos of seizure activity (first paragraph of Results).

      • Seizures are generated in the brain and travel to the periphery. Do the authors think it is possible that the peripheral manipulations in this manuscript might be controlling the behavioral readout of seizures without affecting hypersynchronous activity in the brain?

      We include the following statement (in methods) to provide our best understanding for how peripheral electroshock induces seizure………. ‘Strong peripheral stimulation likely causes excessive and synchronous synaptic excitation within the CNS resulting in seizure. However, the precise mechanism of this effect remains to be determined.’ Moreover, we feel it unlikely that manipulation of Ch neurons, by vibration, would suppress the effects we observe via peripheral mechanisms. Indeed, the Ch manipulation is limited to the embryonic CP, whilst our seizure assays are recorded many days later at L3.

      • How might enhancement of inhibition lead to worsened seizures? Is the enhancement of ch-related inhibition selectively affecting inhibitory circuits, thereby leading to a net increase in excitation?

      This is a difficult point to respond to at present. Enhanced inhibition per se might similarly disturb the encoding of an appropriate homeostatic setpoint(s) thus leaving a network open to being destabilized by a strong stimulus. Indeed, we have previously shown that increased inhibition during the CP results in the same effect (seizure) as increasing excitation (Giachello and Baines, 2015). Thus, presuming activation of Ch neurons during the CP translates to increased inhibition, then worsened seizure behaviour is a predictable effect. How this is achieved remains unknown and we prefer not to speculate here.

    2. eLife assessment

      This valuable study combines electrophysiology and neuroanatomy with pharmacological and optogenetic manipulation in the Drosophila genetic model system to pinpoint the neural substrate that is influenced by altered activity during a critical period (CP) of larval locomotor circuit development. Increasing activity during the CP causes permanent network changes, manifesting in increased recovery times from seizures and altered intersegmental coordination during locomotion, thus indicating that a setpoint of network excitability is determined during the CP. Next, compelling experiments demonstrate that this goes along with increased excitation/inhibition ratios to single identified motoneurons and most importantly, for excitability setpoint determination during the CP excitatory and inhibitory inputs are integrated such that the effect of CP hyperexcitation is rescued by the stimulation of endogenous inhibitory inputs to the motoneurons. This provides novel insight into how developing neural network excitability is tuned and how it can be entrained during the CP.

    3. Reviewer #1 (Public Review):

      Activity has effects on the development of neural circuitry during almost any step of neuronal differentiation. In particular during specific time periods of circuit development, so called critical periods (CP), altered neural activity can induce permanent changes of neuronal and network excitability. In complex neural networks it is often difficult to pinpoint the specific network components that are permanently altered by activity, and it often remains unclear how activity is integrated during the CP to set mature network excitability. This study combines electrophysiology with pharmacological and optogenetic manipulation in the Drosophila genetic model system to pinpoint the neural substrate that is influenced by altered activity during a critical period (CP) of larval locomotor circuit development. Moreover, it is then tested whether and how different manipulations of synaptic input are integrated during the CP to tune network excitability.

      Strengths: Based on previous work, during the CP network activity is increased by feeding the GABA-AR antagonist PTX. This results in permanent network activity changes as highly convincingly assayed by a prolonged recovery period following induced seizure and by altered intersegmental locomotor network coordination. This is then used to provide two important findings: First, compelling electro- and optophysiological as well as anatomical experiments track the site of network change down to the level of single neurons and pre- versus postsynaptic specializations. In short, increased activity during the CP increases both, the magnitude of excitatory and inhibitory synaptic transmission to the aCC motoneuron, but excitation is affected more strongly. This results in altered excitation inhibition ratios. Fine electrophysiology shows that excitatory synapse strengthening occurs postsynaptically. High quality anatomy shows that dendrite size and numbers of synaptic contacts remain unaltered. It is a major accomplishment to track the tuning of network excitability during the CP down to the physiology of specific synapses at identified neurons.<br /> Second, additional experiments with single neuron resolution demonstrate that during the CP different forms of activity manipulation are integrated so that opposing manipulations can rescue altered setpoints. This provides novel insight into how developing neural network excitability is tuned, and it indicates that during the CP training can rescue the effects of hyperactivity.

      Weaknesses: There are no major weaknesses to the findings presented, but the molecular cause that underlies increased motoneuron postsynaptic responsiveness as well as the mechanism that integrates different forms of activity during the CP remain unknown. However, the discussion addresses this point adequately.

    4. Reviewer #2 (Public Review):

      SUMMARY: In this study, the authors use the tractable Drosophila embryonic/larval motor circuit to determine how manipulations to activity during a critical period (CP) modify the circuit in ways that persist into later developmental stages. Previously, this group demonstrated that manipulations to the aCC/MN-Ib neuron in embryonic stages enhance (or can rescue) susceptibility to seizures at later larval stages. Here, the authors demonstrate that following enhanced excitatory drive (by PTX feeding), the aCC neuron acquires increased sensitivity to cholinergic excitatory transmission, presumably due to increased postsynaptic receptor abundance and/or sensitivity, although this is not clarified. Although locomotion is not altered at later developmental larval stages, the authors suggest there is reduced "robustness" to induced seizures. The second part of the study then goes on to enhance inhibition during the CP in an attempt to counteract the enhanced excitation, and show that many aspects of the CP plasticity are rescued. The author conclude that "average" E/I activity is integrated during the CP to determine excitability of the mature locomotor network.

      Overall, this study provides compelling mechanistic insight into how a final motor output neuron changes in response to enhanced excitatory drive during a CP to change functionality of the circuit at later mature developmental stages. The first part of this study is strong, clearly showing the changes in the aCC neuron that result from enhanced excitatory input. This includes very nice electrophysiology and imaging data that assess synaptic function and structure onto aCC neurons from pre-motor inputs resulting from PTX exposure during development. However, the later experiments in Figures 6 and 7 designed to counteract the CP plasticity are somewhat difficult to interpret. In particular, the specificity of the manipulations of the ch neuron intended to counteract the CP plasticity is unclear, given the complexities of how these changes impact excitability all neurons during development. It is clear that CP plasticity is largely rescued in later stages, but it is hard to know if downstream or secondary adaptations may be masking the PTX-induced plasticity normally observed. Nonetheless, this study provides an important advance in our understanding of what parameters change during CPs to calibrate network dynamics at later developmental stages.

    5. Reviewer #3 (Public Review):

      Summary:<br /> In Hunter, Coulson et al, the authors seek to expand our understanding of how neural activity during developmental critical periods might control the function of the nervous system later in life. To achieve increased excitation, the authors build on their previous results and apply picrotoxin 17-19 hours after egg-laying, which is a critical period of nervous system development. This early enhancement of excitation leads to multiple effects in third-instar larvae, including prolonged recovery from electroshock, increased synchronization of motor neuron networks, and increased AP firing frequency. Using optogenetics and whole-cell patch clamp electrophysiology, the authors elegantly show that picrotoxin-induced over-excitation leads to increased strength of excitatory inputs, and not loss of inhibitory inputs. To enhance inhibition, the authors chose an approach that involved stimulation of mechanosensory neurons; this counteracts picrotoxin-induced signs of increased excitation. This approach to enhancing inhibition requires further validation.

      Strengths:<br /> • The authors confirm their previous results and show that 17-19 hours after egg laying is a critical period of nervous system development.<br /> • Using Ca2+/Sr2+ substitutions, the authors demonstrate that synaptic connections between A18a & aCC show increased mEPSP amplitudes. The authors show that this aCC input is what is driving enhanced excitation.<br /> • The authors demonstrate that the effects of over-excitation attributed to picrotoxin exposure are generalizable and also occur in bss mutant flies.

      Weaknesses:<br /> • The authors build on their previous work and argue that the critical period (17-19h after egg-laying) is a uniquely sensitive period of development. Establishing the developmental window of the critical period is important for the present study. The present study would benefit from demonstrating that exposure to picrotoxin at L1 or L2 do not lead to changes in induced seizure at L3. This would further the authors hypothesis of the criticality of the 17-19h AEL period.<br /> • The ch-related experiments require further controls and explanation. Regarding experiments in Fig 6, what is the effect of ch neuron stimulation alone on time lag and AP frequency? The authors report related pilot experiments have been performed; the present study would be strengthened with inclusion of these data.

    1. Reviewer #2 (Public Review):

      Summary:<br /> A bidirectional occasion-setting design is used to examine sex differences in the contextual modulation of reward-related behaviour. It is shown that females are slower to acquire contextual control over cue-evoked reward seeking. However, once established, the contextual control over behaviour was more robust in female rats (i.e., less within-session variability and greater resistance to stress) and this was also associated with increased OFC activation.

      Strengths:<br /> The authors use sophisticated behavioural paradigms to study the hierarchical contextual modulation of behaviour. The behavioural controls are particularly impressive and do, to some extent, support the specificity of the conclusions. The analyses of the behavioural data are also elegant, thoughtful, and rigorous.

      Weaknesses:<br /> My primary concern is that the authors' claim of sex differences in context-dependent discrimination behaviour is not fully supported by their data.

      First, the basic behavioural effect does not seem to replicate across experiments. The authors first show sex differences in the % time in food port and the discrimination ratio (Figures 1 and 2) such that males show better context-dependent discrimination than females (group ctx-dep O1). However, this difference is not observed in the baseline condition group in the next experiment, which investigates the effect of acute stress on context-gated reward seeking: "In Figure 4, we observe no difference between males versus females in group "ctx-dep O1".

      Second, I am not fully convinced by the authors' assertion that the results are specific to the contextual modulation process. The authors' main conclusions are derived from comparing a group trained with the differential outcome procedure (group cxt-dep O1/O2) and a group with the non-differential outcome procedure (group cxt-dep O1). However, importantly, a different number of training sessions was used for ctx-dep O1/O2 and ctx-dep O1. Is it not possible that sex differences could have emerged with additional training in the cxt-dep O1/O2 group? Moreover, the authors also seem to assume that rats are not using a contextual strategy in the context-dep O1/O2 condition (i.e., rats use instead distinct context-outcome associations) but what is the evidence for this? Also, the authors argue that the impact of stress is specific to the hierarchical contextual modulation of behaviour however inspection of Figure 4A suggests that there may also be an effect of stress on the context-dependent O1/O2 group.

      I also had some minor issues with how the authors interpreted some of the findings. First, it is shown that recent rewards disrupt contextual control of reward seeking in male, but not female, rats. That is, in males, prior reward increased the probability of responding on subsequent non-rewarded trials but trial history had no effect in females. How do the authors reconcile this finding with the quicker acquisition and better discrimination that is observed in males? It is not evident to me how males can have difficulty inhibiting responding to non-rewarded cues following recent reward yet still show better discrimination throughout training.

      Finally, the authors argue that the contextual control over behaviour was more robust in female rats as females show less within-session variability and greater resistance to stress. What evidence is there that the restraint stress procedure causes a similar stress response in both sexes?

    2. eLife assessment

      This valuable manuscript reveals sex differences in bi-conditioning Pavlovian learning and conditional behavior. Males learn hierarchical context-cue-outcome associations more quickly, but females show more stable and robust task performance. These sex differences are related to cellular activation in the orbitofrontal cortex. Although the evidence for the claims is solid, the claim of sex differences in context-dependent discrimination behaviour is not fully supported by the data. Nevertheless, the results will be of interest to many behavioural neuroscientists, particularly those who investigate sex-specific behaviours.

    3. Reviewer #1 (Public Review):

      Summary:<br /> Peterson et al., present a series of experiments in which the Pavlovian performance (i.e. time spent at a food cup/port) of male and female rats is assessed in various tasks in which context/cue/outcome relationships are altered. The authors find no sex differences in context-irrelevant tasks and no such differences in tasks in which the context signals that different cues will earn different outcomes. They do find sex differences, however, when a single outcome is given and context cues must be used to ascertain which cue will be rewarded with that outcome (Ctx-dep O1 task). Specifically, they found that males acquired the task faster, but that once acquired, the performance of the task was more resilient in female rats against exposure to a stressor. Finally, they show that these sex differences are reflected in differential rates of c-fos expression in all three subregions of rat OFC, medial, lateral, and ventral, in the sense that it is higher in females than males, and only in the animals subject to the Ctx-dep O1 task in which sex differences were observed.

      Strengths:<br /> • Well-written.<br /> • Experiments elegantly designed.<br /> • Robust statistics.<br /> • Behaviour is the main feature of this manuscript, rather than any flashy techniques or fashionable lab methodologies, and luckily the behaviour is done really well.<br /> • For the most part I think the conclusions were well supported, although I do have some slightly different interpretations to the authors in places.

      Weaknesses:<br /> 1. With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      2. I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      3. Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      4. In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This manuscript reports an experiment that compared groups of rats acquisition and performance of a Pavlovian bi-conditional discrimination, in which the presence of one cue, A, signals that the presentation of one CS, X, will be followed by a reinforcer and a second CS, Y, will be nonreinforced. Periods of cue A alternated with periods of cue B, which signaled the opposite relationship, cue X is nonreinforced, and cue Y is reinforced. This is a conditional discrimination problem in which the rats learned to approach the food cup in the presence of each CS conditional on the presence of the third background cue. The comparison groups consisted of the same conditional discrimination with the exception that each CS was paired with a different reinforcer. This makes the problem easier to solve as the background is now priming a differential outcome. A third group received simple discrimination training of X reinforced and Y nonreinforced in cues A and B, and the final group was trained with X and Y reinforced on half the trials (no discrimination). The results were clear that the latter two discrimination learning procedures resulted in rapid learning in comparison to the first. Rats required about 3 times as many 4-session blocks to acquire the bi-conditional discrimination than the other two discrimination groups. Within the biconditional discrimination group, female and male rats spent the same amount of time in the food cup during the rewarded CS, but females spent more time in the food cup during CS- than males. The authors interpret this as a deficit in discrimination performance in females on this task and use a measure that exaggerates the difference in CS+ and CS_ responding (a discrimination ratio) to support their point. When tested after acute restraint stress, the male rats spent less time in the food cup during the reinforced CS in comparison to the female rats, but did not lose discrimination performance entirely. The was also some evidence of more fos-positive cells in the orbitofrontal cortex in females, but this difference was of degree.

      Overall, I think the authors were successful in documenting performance on the biconditional discrimination task. Showing that it is more difficult to perform than other discriminations is valuable and consistent with the proposal that accurate performance requires encoding of conditional information (which the authors refer to as "context"). There is evidence that female rats spend more time in the food cup during CS-, but I hesitate to agree that this is an important sex difference. There is no cost to spending more time in the food cup during CS- and they spend much less time there than during CS+. Males and females also did not differ in their CS+ responses, suggesting similar levels of learning. A number of factors could contribute to more food cup time in CS-, such as smaller body size and more locomotor activity. The number of food cup entries during CS+ and CS- was not reported here. Nevertheless, I think the manuscript will make a useful contribution to the field and hopefully lead readers to follow up on these types of tasks.

      One area for development would be to test the associative properties of the cues controlling the conditional discrimination, can they be shown to have the properties of Pavlovian occasion-setting stimuli? Such work would strengthen the justification/rationale for using the terms "context" and "occasion setter" to refer to these stimuli in this task in the way the authors do in this paper.

      Strengths:<br /> - Nicely designed and conducted experiment.<br /> - Documents performance difference by sex.

      Weaknesses:<br /> - Overstatement of sex differences.<br /> - Inconsistent, confusing, and possibly misleading use of terms to describe/imply the underlying processes contributing to performance.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We are pleased that Reviewers 1 and 3 have recommended that the revised paper be published.

      Reviewer #2

      For point A: Their preliminary simulation in 3D looks also nice, although it’s referenced in the discussion but not actually included in manuscript - I would advise adding it even under the mention of preliminary.

      We appreciate the reviewer for liking our 3D results and suggesting to include them in the manuscript. However, these are preliminary results of our ongoing work. We are yet to establish the corresponding viscosity results quantitatively in the 3D simulations. Because the relationship between viscosity and relaxation time is not (always) linear in glass forming systems, we hesitate to report our results for publication. We hope to report the new results as part of a separate work.

      For point B/C: I see some of the points of the authors - although not all of it made it in the main text. I still have some points that puzzle me. For instance, the authors mention that a single value of viscosity (from Green-Kubo) is ”valid for all time scales and amplitude”. This sounds very surprising to me for a complex fluid even at equilibrium: doesn’t it for instance assume linear response (hence small amplitudes)? Fast vs slow probing of a complex medium should also matter (see refs previously mentioned). Related to this, it’s not clear how can self-propulsion not matter if one would shear the system at a finite time scale, given past work on motility-driven unjamming and the mechanism of the authors from facilitation ( wouldn’t shearing at time scales larger vs smaller than the typical time for given cells to spontaneously rearrange from self-propulsion change drastically the effective complex modulus of the system?)

      There might be a slight misunderstanding between the reviewer and us when

      we say ‘single value of viscosity is valid for all time-scales and amplitude’. Let us explain this point more carefully. In our problem, we are studying the dynamics of a many body system which is undergoing Brownian dynamics where the fluctuation-dissipation theorem need not be valid (as the friction and the selfpropulsion noise strength are not related via Fluctuation-Dissipation Theorem). Now, for us to use the concepts of linear-response (which in the present study are the Green-Kubo relations for the transport coefficients in terms of timecorrelations functions), we need to show that the within the simulation time, the system has reached state that could be described using an “equilibrium” probability measure. This is the precise reason we calculated the ergodicity measure, which is a way to show that all the phase-space have been sampled uniformly under the given Brownian dynamics. This suggests (does not prove) that the system has attained a stationary probability measure (i.e, near equilibrium) for the value of self-propulsion used. Now for this value of self-propulsion, the Green-Kubo relations hold for ‘any time-scale of the simulations’ so that we can perform a time average over the trajectories of the particles (which is an alias of the stationary probability measure under the values of self-propulsion used). If we change the amplitude of the self-propulsion, we need to again compute the ergodicity measure and show the stationarity of the probability measure. If the system is ergodic with respect to the new self-propulsion, we can again use Green-Kubo for the simulations. Note that we will definitely get a different value of viscosity under the new self-propulsion as the shear-stresses generated will be different but the Green-Kubo holds. If the system is not ergodic, for the self-propulsion with the new amplitude, we cannot use Green-Kubo relations. Also a priori, one cannot say what is a large/small amplitude of self-propulsion because it has to be compared with the intrinsic energy scale, which is encoded in the energy function, which is difficult to say without explicit calculations.

      This is what we meant when we said, ‘single value of viscosity is valid for all time-scales and amplitude’. It is valid for time-scales of the simulations for a given amplitude of self-propulsion only if the system is ergodic. Note that if the system is not ergodic, then the results of Ref. [14] (in the main text) could be questioned on theoretical grounds, because they were analyzed using 3 the equilibrium rigidity percolation theory. Nevertheless, the authors of Ref. [14] showed that equilibrium phase transition theory works in tissues. For these reasons, we have been, just like the Reviewer, puzzled that equilibrium ideas appear to be valid in the cell system. Additional theoretical work has to be done to clarify these links in tissues. Although this is not the last word, we hope this clarifies our view point.

      For point D: I agree with the simplicity argument, although the added sentence from the discussion “Furthermore, the physics of the dynamics in glass forming materials does not change in systems with and without attractive forces” seems a bit strong given works like Lois et al., PRL, 2008 or Koeze et al, PRL, 2018 finding fundamentally different physics of jamming with or without adhesion. In the two cited papers the authors only consider equilibrium transitions in systems with attraction using computer simulations. Apparently, jamming properties depend on the strength of attraction. There are no attempts to characterize the dynamics, the focus of our work.

      What we meant is that any universal relations, such as the Vogel-FulcherTammann relation, would still be valid. Of course, non-universal quantities such as glass transition temperature Tg or fragility will change. In our case, changing the adhesion strength would change ϕS, and the parameters in the VFT. However, our contention is that the overall finding that increase in viscosity followed by saturation is unlikely to change. We have added some clarifying statements in the manuscript to make this clear.

    2. eLife assessment

      This fundamental study substantially advances our physical understanding of the sharp increase and saturation of the viscosity of non-confluent tissues with increasing cell density. Through the analysis of a simplified model this study provides compelling evidence that polydispersity in cell size and the softness of cells together can lead to this phenomenon. The work will be of general interest to biologists and biophysicists working on development.

    3. Joint Public Review:

      This paper explores how minimal active matter simulations can model tissue rheology, with applications to the in vivo situation of zebrafish morphogenesis. The authors explore the idea of active noise, particle softness and size heterogeneity cooperating to give rise to surprising features of experimental tissue rheologies (in particular an increase and then a plateau in viscosity with fluid fraction). In general, the paper is interesting from a theoretical standpoint, by providing a bridge between concepts from jamming of particulate systems and experiments in developmental biology. The idea of exploring a free space picture in this context is also interesting. It will be interesting in the future to see whether and how the findings change when considering 3D tissues with less size heterogeneity or how viscosity is impacted by the time scale of measurements.

    1. Author Response

      We would like to thank the reviewers for their encouraging comments and useful feedback, which will enable us to improve the manuscript. We would like to briefly comment on some of the points they raised.

      1. We agree this is a fairly specialized pipeline that has some requirements in terms of photographic setup. We are working hard to make these requirements as minimal as possible. However, given the huge variability in camera angles, backgrounds, arrangement of brain slices, etc., making the pipeline fully automated for unconstrained photos is extremely challenging.

      2. In principle, it should be possible to extend our method to sagittal slices of the cerebellum or axial slices f the brainstem, but this would require collecting and labeling additional training data and thus remains as future work.

      3. Producing accurate surfaces with sparse photographs is a very challenging problem and also remains as future work. We have a conference article producing surfaces on MRI scans with sparse slices (https://doi.org/10.1007/978-3-031-43993-3_4) but we haven’t gotten it to work well on photographs yet.

      4. Another challenging issue that remains as future work is getting the pipeline to work well with nonlinear deformations, e.g., slices of fresh tissue. While incorporating nonlinear deformation into the model is trivial from the coding perspective, we have not been able to make it work at the level of robustness that we achieve with affine transformations. This is because the nonlinear model introduces huge ambiguity in the space of solutions: for example, if one adds identical small nonlinear deformations to every slice, the objective function barely changes.

      5. As we acknowledge in the manuscript, the validation of the reconstruction error (in mm) with synthetic data is indeed optimistic, but informative in the sense that they reflect the trends of the error as a function of slice thickness and its variability (“jitter”).

      6. Since we use a single central coronal slice in the direct evaluation, SAMSEG yields very high Dice scores for large structures with strong contrast (e.g., the lateral ventricles). However, Photo-SynthSeg provides better average results across the board, particularly when considering 3D analysis out of the coronal plane (see qualitative results in Figure 2 and results on volume correlations).

    2. eLife assessment

      The authors present a valuable open-source tool for three-dimensional analysis of dissected slices of human brains including 3D reconstruction and high-resolution 3D segmentation. Convincing evidence is provided based on experiments on both real and synthetic data. This tool would be useful to researchers in the neuropathology and neuroimaging field.

    3. Reviewer #1 (Public Review):

      Gazula and co-workers presented in this paper a software tool for 3D structural analysis of human brains, using slabs of fixed or fresh brains. This tool will be included in Freesurfer, a well-known neuroimaging processing software. It is possible to reconstruct a 3D surface from photographs of coronal sliced brains, optionally using a surface scan as a model. A high-resolution segmentation of 11 brain regions is produced, independent of the thickness of the slices, interpolating information when needed. Using this method, the researcher can use the sliced brain to segment all regions, without the need for ex vivo MRI scanning.

      The software suite is freely available and includes 3 modules. The first accomplishes preprocessing steps, for correction of pixel sizes and perspective. The second module is a registration algorithm that registers a 3D surface scan obtained prior to sectioning (reference) to the multiple 2D slices. It is not mandatory to scan the surface - a probabilistic atlas can also be used as a reference - however, the accuracy is lower. The third module uses machine learning to perform the segmentation of 11 brain structures in the 3D reconstructed volume. This module is robust, dealing with different illumination conditions, cameras, lenses, and camera settings. This algorithm ("Photo-SynthSeg") produces isotropic smooth reconstructions, even in high anisotropic datasets (when the in-plane resolution of the photograph is much higher than the thickness), interpolating the information between slices.

      To verify the accuracy and reliability of the toolbox, the authors reconstructed 3 datasets, using real and synthetic data. Real data of 21 postmortem confirmed Alzheimer's disease cases from the Massachusetts Alzheimer's Disease Research Center (MADRC) and 24 cases from the AD Research at the University of Washington (who were MRI scanned prior to processing) were employed for testing. These cases represent a challenging real-world scenario. Additionally, 500 subjects of the Human Connectome project were used for testing error as a continuous function of slice thickness. The segmentations were performed with the proposed deep-learning new algorithm ("Photo-SynthSeg") and compared against MRI segmentations performed to "SAMSEG" (an MRI segmentation algorithm, computing Dice scores for the segmentations. The methods are sound and statistically showed correlations above 0.8, which is good enough to allow volumetric analysis. The main strengths of the methods are the datasets used (real-world challenging and synthetic) and the statistical treatment, which showed that the pipeline is robust and can facilitate volumetric analysis derived from brain sections and conclude which factors can influence the accuracy of the method (such as using or not 3D scan and using constant thickness).

      Although very robust and capable of handling several situations, the researcher has to keep in mind that processing has to follow some basic rules in order for this pipeline to work properly. For instance, fiducials and scales need to be included in the photograph, and the slabs must be photographed against a contrasting background. Also, only coronal slices can be used, which can be limiting for certain situations.

      The authors achieved their aims, and the statistical analysis confirms that the machine learning algorithm performs segmentations comparable to the state-of-the-art of automated MRI segmentations.

      Those methods will be particularly interesting to researchers who deal with post-mortem tissue analysis and do not have access to ex vivo MRI. Quantitative measurements of specific brain areas can be performed in different pathologies and even in the normal aging process. The method is highly reproducible, and cost-effective since it allows the pipeline to be applied by any researcher with small pre-processing steps.

      The paper is very interesting and well structured, adding an important tool for fixed and fresh brain analysis. The software tool is robust and demonstrated good and consistent results in the hard task of managing automated segmentation from brain slices. In the future, segmentation of the histological slices could be developed and histological structures added (such as small brainstem nuclei, for instance). Also, dealing with axial and sagittal planes can be useful to some labs.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The authors developed a tool-set Photo-SynthSeg for the software FreeSurfer which performs 3D reconstruction and high-resolution 3D segmentation on a stack of dissection photographs of brain tissues. The tool-set consists of three modules: the pre-processing module, which performs dissection photography correction; the registration module, which registers corrected dissection photographs based on 3D surface scan, ex vivo MRI or probabilistic atlas; the segmentation module based on U-Net. To prove the performance of the tools, three experiments were conducted, including a volumetric comparison of brain tissues on AD and HC groups from MADRC, a quantitative evaluation of segmentation on UW-ADRC and a quantitative evaluation of 3D reconstruction on HCP digitally sliced MRI data.

      Strengths:<br /> The quantitative evaluation of segmentation and reconstruction on synthetic and real data demonstrates the accuracy of the methodology. Also, the successful application of this toolset on two brain banks with different slice thicknesses, tissue processing, and photograph settings demonstrates its robustness. The toolset also benefits from its adaptability of different 3D references, such as surface scans, ex vivo MRI, and even probabilistic atlas, suiting the needs of different brain banks.

      Weaknesses:<br /> 1) The current method could only perform accurate segmentation on subcortical tissues. It is of more interest to accurately segment cortical tissues, whose morphometrics are more predictive of neuropathology. The authors also mentioned that they would extend the toolset to allow for cortical tissue segmentation in the future.

      2) Brain tissues are not rigid bodies, so dissected slices could be stretched or squeezed to some extent. Also, dissected slices that contain temporal poles may have several disjoined tissues. Therefore, each pixel in dissected photographs may go through slightly different transformations. The authors constrain that all pixels in each dissected photograph go through the same affine transform in the reconstruction step probably due to concerns of computational complexity. But ideally, dissected photographs should be transformed with some non-linear warping or locally linear transformations. Or maybe the authors could advise how to place different parts of dissected slices when taking dissection photographs to reduce such non-linearity of transforms.

      3) For the quantitative evaluation of the segmentation on UW-ARDC, the authors calculated 2D Dice scores on a single slice for each subject. Could the authors specify how this single slice is chosen for each subject? Is it randomly chosen or determined by some landmarks? It's possible that the chosen slice is between dissected slices so SAMSEG cannot segment accurately. Also from Figure 3, it seems that SAMSEG outperforms Photo-SynthSeg on large tissues, WM/Cortex/Ventricle. Is there an explanation for this observation?

      4) In the third experiment, quantitative evaluation of 3D reconstruction, each digital slice went through random affine transformations and illumination fields only. However, it's better to deform digital slices using random non-linear warping due to the non-rigidity of the brain as mentioned in 2). So, the reconstruction errors estimated here are quite optimistic. It would be more realistic if digital slices were deformed using random non-linear warping.

      Overall, this is quite useful a toolset that could be widely used in many brain banks without MRI scanners.

    1. Author Response:

      We would like to thank the editor and the three reviewers for their time and effort taken in reviewing our manuscript and providing constructive feedback. Unfortunately, the first author of this manuscript is no longer involved in academia, and does not wish to further revise this manuscript. However, we agree with the entirety of the feedback and critiques provided by the referees, and feel these points should be taken into account when interpreting our results and conclusions.

    2. Reviewer #3 (Public Review):

      Perrodin, Verzat and Bendor describe the response of female mice to the playback of male mouse ultrasonic songs. The experiments were performed in a Y-maze-like apparatus with two acoustically separate response chambers. Sounds were presented in 4 trials, alternating strictly between the left and right branches of the Y. Cumulative dwell time in the two chambers was measured, and used as an index of female preference. They first show, consistent with previous observations, that female mice will spend more time near a speaker playing a male mouse song than near a speaker playing nothing. They then performed several manipulations-time reversals, syllable order randomization, phase scrambled replacement, pure tone replacement, and 'hyper-regular' inter-syllable-intervals-which female mice did not discriminate from the normal song in this assay. Finally, they show that females spent more time near normal songs than near songs with more variable inter-syllable-intervals

      The authors' approach to the problem was ethologically sensible -- females were tested in proestrus and estrus, the male odor was used to increase motivation, mouse handling was with tube transfers to reduce stress, mice were age-matched across conditions, and experiments were conducted in the dark (active) phase. In addition, animals were habituated to handling and to the apparatus.

      The acoustics were very good. The acoustic structure of the vocal signals was well described. Specific ranges of dB SPL were reported, speaker flatness was evaluated, the sound amplitude was matched in manipulated and unmanipulated songs, and playback onset timing jittered randomly between manipulated and unmanipulated signals.

      I think it is a reasonable result. My concerns are the following:

      1) The authors use "approach" as it has been used in other publications, but what is actually measured is dwell time. Pomerantz et al, 1983 observed that female mice approached mute and singing males the same number of times (e.g. approached both at the same rate), but spent more time with the singing than the mute male. Their use of "approach" to describe dwell time was a bit confusing to me, but sticking with the way the literature is defensible. However, they also refer to the assay as a "place preference assay", which I found confusing.

      2) I am a bit worried about their method of removing side bias (29% of trials). It certainly seems like a reasonable thing to exclude mice that simply picked one side or the other, but, because the stimulus always alternated between the sides, this exclusion of mice exhibiting a side bias is also excluding, specifically, behavior that would be incorrect.

      3) Given the observation by Hammerschmidt et al, 2009, that female mice would only discriminate male songs in a playback assay on the first presentation, it is important to know whether females were used across the different manipulations. How many conditions did each female experience? How often did a female display positive discrimination in a condition after having displayed no discrimination?

      Specific comments:

      1) For Figure 2L

      The heat map legend is labeled "Towards" indicating a motion towards either the speaker playing the song or the silent speaker. However, there is nothing in the methods that indicates that the direction of movement was ever measured. I may have missed it, but I can't figure out how this heat map was generated and what it represents. The figure legend states: "Normalized temporal profiles of approach behaviour to mouse songs vs silence over the course of 4 sound presentation trials (x-axis, coloured bars) for each of the behavioural sessions (y-axis, each animal is one line, n = 29), calculated as in I. Sessions (lines) are ordered by the amplitude of their last element." 2I states " I. Temporal profile of approach behaviour over the four sound presentation trials in the example session in C, calculated as the cumulative sum of time in the intact song playback (positively weighted) vs silent (negatively weighted) speaker zone." I interpret this to mean that "Towards" is an inaccurate description of what is being plotted, as there is no motion, only dwell time.

      References

      K. Hammerschmidt, K. Radyushkin, H. Ehrenreich & J. Fischer (2009) Female mice respond to male ultrasonic 'songs' with approach behavior. Biol. Lett. 5:589-592.

      Pomerantz, S.M., Nunez, A.A. & Bean, J (1983) Female behavior is affected by male ultrasonic vocalizations in house mouse. Physiol. Behav. 31:91-96.

    1. eLife assessment

      Urtecho et al. use genome-integrated massively parallel reporter assays to catalog and characterize promoters throughout the Escherichia coli genome. The result is a state-of-the-art atlas of promoters, coupled with information on their regulation, that is readily accessible through the website http://ecolipromoterdb.com. This compelling work provides an important resource for researchers studying bacterial transcriptional regulation.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This paper uses a high-throughput assay of transcription levels to (i) assess the potential of large numbers of Escherichia coli genomic sequences to function as promoters, and (ii) identify regulatory sequences in some of those promoters. This is a substantial undertaking, and while much of the work supports principles of transcription and transcription regulation described by many prior studies, there is considerable value in assessing promoters on such a large scale. The identification of putative regulatory sequences in larger numbers of promoters will likely be valuable to other groups studying transcription regulation in E. coli. And the analysis of antisense promoters provides some interesting new insight that goes beyond previous anecdotal studies.

      Strengths:<br /> - The presentation of the work is very clear, and the conclusions are mostly well supported by the data.<br /> - The assays are rigorously controlled and analyzed.<br /> - Conclusions regarding the impact of antisense transcription on sense transcript levels provide new insight. While these data are consistent with previous anecdotal studies, to my knowledge this is the first large-scale analysis supporting a negative regulatory role for antisense transcription.<br /> - The putative regulatory elements mapped in the high-throughput mutagenesis experiments will be a valuable resource for the scientific community.

      Weaknesses:<br /> (all minor)<br /> - There are some parts where the authors could clarify their arguments.<br /> - I'm not convinced that intragenic promoters impact codon usage rather than the other way around.<br /> - The authors should present a more nuanced discussion of promoters that avoids making yes/no calls (i.e., characterize sequences by promoter strength rather than a binary yes/no call of being a promoter).<br /> - Data relating to intragenic promoters should be presented and discussed for sense and antisense promoters separately.

    3. Reviewer #2 (Public Review):

      In this work, Urtecho et al. use genome-integrated massively parallel reporter assays (MPRAs) to catalog the locations of promoters throughout the E. coli genome. Their study uses four different MPRA libraries. First, they assayed a library containing 17,635 promoter regions having transcription start sites (TSSs) previously reported by three different sources. They found that 2,760 of these regions exhibited transcription above an experimentally determined threshold. Second, they assayed a library using sheared E. coli genome fragments. This library allowed the authors to systematically identify candidate promoter regions throughout the genome, some of which had not been identified before. Additionally, by performing experiments with this library under different growth conditions, the authors were able to identify promoters with condition-dependent activity. Third, to improve the resolution at which they were able to identify transcription start sites, the authors assayed a library that tiled all candidate promoter regions identified using the genomic fragments library. Data from the tiled library allowed the authors to identify minimal promoter regions. Fourth, the authors assayed a scanning mutagenesis library in which they systematically scrambled individual 10 bp windows within 2,057 previously identified active promoters at 5 bp intervals. After validation with known promoters, this approach allowed the authors to identify novel functional elements within regulatory regions. Finally, the authors fit multiple machine learning models to their data with the goal of predicting promoter activity from DNA sequences.

      The work by Urtecho et al. provides an important resource for researchers studying bacterial transcriptional regulation. Despite decades of study, a comprehensive catalogue of E. coli promoters is still lacking. The results of Urtecho et al. provide a state-of-the-art atlas of promoters in the E. coli genome that is readily accessible through the website, http://ecolipromoterdb.com. The authors' work also provides an important demonstration of the power of genome-integrated MPRAs. Unlike many MPRA-based studies, the authors use the results of their initial MPRAs to design follow-up MPRAs, which they then carry out. Finally, the scanning mutagenesis MPRAs the authors perform provide valuable data that could lead to the discovery of novel transcription factor binding sites and other functional regulatory sequence elements.

      Below I provide two major critiques and some minor critiques of the paper. The purpose of these critiques is simply to help the authors improve the quality of the manuscript.

      Major points:<br /> 1. Ultimately, a comprehensive atlas of E. coli promoters should include nucleotide resolution TSS data, which is not present in the MPRA datasets reported by Urtecho et al.. The authors do use some methods to narrow down the positions of TSSs, but these methods do not provide the resolution one would ideally like to see in a TSS atlas. I understand that acquiring single-nucleotide-resolution data is beyond the scope of this manuscript, but it still might make sense for the authors to discuss this limitation in the Discussion section.

      2. The authors should clarify which points in the Results section are novel conclusions or observations, and which points are simply statements that prior conclusions or observations were confirmed. This distinction can be unclear at times.

      Minor points:<br /> 1. Line 200-203: "We conclude that inactive TSS-associated promoters lack -35 elements but may become active in growth conditions where additional transcription factors mobilize and facilitate RNAP positioning in the absence of a -35 motif." Making this type of mechanistic observations from the slight difference observed in the enrichment analysis seems too speculative to me. Also, I do not understand how the discrepancies can be explained in terms of transcription factor differences. If the previous studies from which the annotated TSS were extracted were also performed during the log phase in rich media, why would the transcription factors present be different?

      2. Line 224-226: "Active TSSs not overlapping a candidate promoter region generally exhibited weak activity, which may indicate that greater sensitivity is achieved through testing of oligo-array synthesized regions (Figure S3)." The authors should clarify this statement. In particular, it is mechanistically unclear why one library would be more sensitive than another if they contain similar sequences.

      3. Figure 2B. The authors should clarify that the heights of the arrows correspond to TSS activity as assayed by one library and that the pile-up plots represent promoter activity as assayed by a different library.

      4. Line 255-257: "We also observed an enrichment for 150 bp minimal promoter regions, although these were generally weak indicating that our resolution is limited when tiling weaker promoters." The authors should clarify whether the peak at 150 bp is an artifact of using oligos containing 150 bp tiles to construct the library. Also, the authors should clarify why there are some minimal promoters with lengths > 150 bp when the length of the tiles was 150 bp.

      5. Line 262 refers to "Supplementary Table 1", but I was not able to find this table in the supplement.

      6. Line 324-325: "We used a σ70 PWM to identify the highest-scoring σ70 motifs within intragenic promoters and determined their relative coding frames". I find the term "relative coding frame" here to be unclear; the authors should clarify what they mean.

      7. Figure 3 C , D: The authors should use the same terminology in the plots and the methods section describing them. They should also clarify how the values plotted in C and D were computed.

      8. Line 329-332: "The observed depletion of -35 motifs positioned in the +2 reading frame and -10 motifs in the +1 reading frame is likely due to the fact that the canonical sequences for these motifs would create stop codons within the protein if placed at these positions." The definition of the reading frame here is unclear. Do the authors mean that the 0 frame is defined as occurring when the hexamer exactly overlaps 2 codons, the +1 frame is when the hexamer is shifted 1 nt downstream of that position, and the +2 frame is when the hexamer is shifted 2 nt downstream of that position?

      9. Line 538-539: "We performed hyperparameter tuning for a three-layer CNN and achieved an AUPRC =0.44." The authors should explicitly describe the architecture used for the CNN, and perhaps include a diagram of this architecture. In addition, the authors should clarify the mathematical forms of the other methods tested.

      10. Line 1204-1205: "We standardized all datasets as detailed above in 'Universal Promoter Expression Quantification and Activity Thresholding'". That title does not appear before in the text. I believe the appropriate subsection is called "Standardizing Promoter Expression Quantification and Activity Thresholding".

      11. Line 1265-1266: "We include a k-mer if the absolute correlation with expression is greater than the 'random' k-mer frequency, resulting in 4800/5440 filtered k-mers." It is unclear to me which two correlations are being compared. Please clarify. For example, would this be accurate: "We include a k-mer if the absolute correlation of its frequency with expression is greater than the absolute correlation of its 'random' frequency with expression"?

    4. Reviewer #3 (Public Review):

      In this revised manuscript, Urtecho et al., present an updated version of their earlier submission. They characterized thousands of promoter sequences in E. coli using a massively-parallel reporter assay and built a number of computational models to classify active from inactive promoters or associate the sequence to promoter expression/strength. As eluded in the earlier review cycle, the amount of experimental, bioinformatics, and analytical work presented here is astounding.

      Identifying promoters and associating genomic (or promoter) sequences to promoter strength is nontrivial. Authors report challenges in achieving this grand goal even with the state-of-the-art characterization technology used here. Nevertheless, the experimental work, analytic workflow, and data resource presented here will serve as a milestone for future researchers.

    1. eLife assessment

      This paper reports the development of SCA-seq, a new method derived from PORE-C for simultaneously measuring chromatin accessibility, genome 3D and CpG DNA methylation. Most of the conclusions are supported by convincing data. SCA-seq has the potential to become a useful tool to the scientific communities to interrogate genome structure-function relationships.

    2. Reviewer #1 (Public Review):

      In this work, Xie et al. developed SCA-seq, which is a multiOME mapping method that can obtain chromatin accessibility, methylation, and 3D genome information at the same time. SCA-seq first uses M.CviPI DNA methyltransferase to treat chromatin, then perform proximity ligation followed by long-read sequencing. This method is highly relevant to a few previously reported long read sequencing technologies. Specifically, NanoNome, SMAC-seq, and Fiber-seq have been reported to use m6A or GpC methyltransferase accessibility to map open chromatin, or open chromatin together with CpG methylation; Pore-C and MC-3C have been reported to use long read sequencing to map multiplex chromatin interactions, or together with CpG methylation. Therefore, as a combination of NanoNome/SMAC-seq/Fiber-seq and Pore-C/MC-3C, SCA-seq is one step forward. The authors tested SCA-seq in 293T cells and performed benchmark analyses testing the performance of SCA-seq in generating each data module (open chromatin and 3D genome). The QC metrics appear to be good and I am convinced that this is a valuable addition to the toolsets of multi-OMIC long-read sequencing mapping.

      The revised manuscript addressed most of my questions except my concern about Fig. S9. This figure is about a theory that a chromatin region can become open due to interaction with other regions, and the author propose a mathematic model to compute such effects. I was concerned about the errors in the model of Fig. S9a, and I was also concerned about the lack of evidence or validation. In their responses, the authors admitted that they cannot provide biological evidence or validations but still chose to keep the figure and the text.

      The revised Fig. S9a now uses a symmetric genome interaction matrix as I suggested. But Figure S9a still have a lot of problems. Firstly, the diagonal of the matrix in Fig. S9a still has many 0's, which I asked in my previous comments without an answer. The legend mentioned that the contacts were defined as 2, 0 or -2 but the revised Fig. S9a only shows 1,0, or -1 values. Furthermore, Fig. S9b,9c,9d all added a panel of CTCF+/- but there is no explanation in text or figure legend about these newly added panels. Given many unaddressed problems, I would still suggest deleting this figure.

      In my opinion, this paper does not need Fig. S9 to support its major story. The model in this figure is independent of SCA-seq. I think it should be spinoff as an independent paper if the authors can provide more convincing analysis or experiments. I understand eLife lets authors to decide what to include in their paper. If the authors insist to include Fig. S9, I strongly suggest they should at least provide adequate explanation about all the figure panels. At this point, the Fig. S9 is not solid and clearly have many errors. The readers should ignore this part.

    3. Reviewer #2 (Public Review):

      In this manuscript, Xie et al presented a new method derived from PORE-C, SCA-seq, for simultaneously measuring chromatin accessibility, genome 3D and CpG DNA methylation. SCA-seq provides a useful tool to the scientific communities to interrogate the genome structure-function relationship.

      The revised manuscript has clarified almost of the concerns raised in the previous round of review, though I still have two minor concerns,

      1) In fig 2a, there is no number presented in the Venn diagram (although the left panel indeed showed the numbers of the different categories, including the numbers in the right panel would be more straightforward).

      2) The authors clarified the discrepancy between sfig 7a and sfig 7g. However, the remaining question is, why is there a big difference in the percentage of the cardinality count of concatemers of the different groups between the chr7 and the whole genome?

    1. eLife assessment

      The paper addresses the important question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. However, the logic of the channel concept as employed here, as well as the claims regarding a sensorimotor basis for these channels, is incomplete and thus requires clarification and/or modification.

    2. Reviewer #1 Public Review

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each was dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis. Some questions do remain in the data, and there are aspects of the presentation that could be adjusted.

      -The use of a binary colormap for the correlation matrix seems unnecessary. Binary colormaps between two opposing colors (with white in the middle) are best for results spanning positive and negative values (say, correlation values between -1 and +1), but the correlations here are all positive, so a uniform colormap should be applied. I can appreciate that the authors were trying to emphasize that a 2+ channel system would lead to lower correlations at larger ratios, but that's emphasized better in the numerical ratio line plots.

      -In Figure 1, the correlation matrices in Figure 1 appear blurred out. I am not sure if this was intentional but suspect it was not, and so they should appear like those presented in Figure 3.

      -It's notable that the authors also collected data on a timing task to rule out a duration-based strategy in the numerosity task. If possible, it would be great to have the author also conduct the rest of the analyses on the duration task as well; that is, to look at WF correlation matrices/ratios as well as PCA. There is evidence that duration processing is also distinctly sensorimotor, and may also rely on similar channels. Evidence either for or against this would likely be of great interest.

      -For the duration task, there was no fast tapping condition. Why not? Was this to keep the overall task length short?

      -The number of subjects/trials seems a bit odd. Why did some subjects perform both and not others? The targets say they were presented "between 25 and 30 times", but why was this variable at all?

      -For the PCA analysis, my read of the methods and results is that this was done on all the data, across subjects. If the data were run on individual subjects and the resulting PCA components averaged, would the same results be found?

      -For the data presented in Figure 2, it would be helpful to also see individual subject data underlaid on the plots to get a sense of individual differences. For the reproduced number, these will likely be clustered together given how small the error bars are, but for the WF data it may show how consistently "flat" the data are. Indeed, in other magnitude reproduction tasks, it is not uncommon to see the WF decrease as a function of target magnitude (or even increase). It may be possible that the reason for the observed findings is that some subjects get more variable (higher WFs) with larger target numbers and others get less variable (lower WFs).

      -Regarding the two-channel model, I wonder how much the results would translate to different ranges of numerosities? For example, are the two channels supported here specific to these ranges of low and high numbers, or would there be a re-mapping to a higher range (say, 32 to 64 dots) or to a narrower range (say 16 to 32 dots). It would be helpful to know if there is any evidence for this kind of remapping.

    3. Reviewer #2 Public Review

      The authors wish to apply established psychophysical methods to the study of number. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another for encoding larger numbers (around 27).

      Strengths

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously, and for exploring new avenues in the study of numerical cognition.

      Weaknesses

      Inter-subject-correlation<br /> The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggled to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." As I understood it, the correlations are performed "between participants, for all targets values" - meaning that they are measuring the extent to which different participants' WFs vary together. But why is this a good measure of channels? This analysis seems to assume that if people have channels for numerical estimation, they will have the same channels, tuned to the same numerical ranges. But this is an empirical question - individual participants could have wildly different channels, and perhaps different numbers of channels (even in the tested range). If they do, then this between-subject analysis would mask these individual differences (despite the subtitle).

      Different channels<br /> I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. However, as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." So I believe this technique does not provide more evidence for the existence of 2 channels as for the existence of 4 or 8 or 11 channels, the upper bound for a task testing 11 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      Several other questions arose for me when thinking through this technique. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? More broadly, I was unsure what advantages channels would have - that is - how in principle would having distinct channels for processing similar stimuli improve (rather than impede) discrimination abilities?

      No number perception<br /> I was uncertain about the analogy to studies of other continuous dimensions like spatial frequency, motion, and color. In those studies, participants view images with different spatial frequency, motion, or color - the analogy would be to see dot arrays containing different numbers of dots. Instead, here participants read written numerals (like "19"), symbols which themselves do not have any numerical properties to perceive. How does that difference change the interpretation of the effects? One disadvantage of using numerals is that they introduce a clear discontinuity: Our base-10 numerical system artificially chunks integers into decades, potentially causing category-boundary effects in people's reproductions.

      Sensorimotor<br /> The authors wished to test for "sensorimotor mechanisms selective to numerosity" but it's not clear what makes their effects sensorimotor (or selective to numerosity, see below). It's true they found effects using a tapping task (which like all behavior is sensorimotor), but it's not clear that this effect is specific to sensorimotor number reproduction. They might find similar effects for numerical comparison or estimation tasks. Such findings would suggest the effect may be a general feature of numerical cognition across modalities.

      Specific to numbers<br /> The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. (Given this, I am not sure what we stand to learn by comparing the two tapping speeds.) The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect.

      If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for numbers to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

      Theories of numerical cognition. An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets (but see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

    1. eLife assessment

      The study by Chardon et al. is fundamental to advancing our understanding of presynaptic control of motor neuron output. Large-scale computer simulations were performed using well-established single motor neuron models to provide compelling evidence regarding the time-varying patterns of inputs that control motor neuron ensembles. The work will interest the community of motor control, motor unit physiology, neural engineering, and computational neuroscience.

    2. Reviewer #1 (Public Review):

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

    1. eLife assessment

      The study used slice physiology and modeling to investigate neurotransmitter release at the cerebellar parallel fiber-to-molecular layer interneuron synapse, revealing that each docking site can accommodate up to two synaptic vesicles simultaneously. The evidence presented is convincing. These important findings validate a two-step docking model and shed light on the mechanisms underlying short-term synaptic plasticity and strategies for achieving synaptic reliability, which plays a critical role in information processing in the brain.

    2. Reviewer #1 (Public Review):

      Summary: By elevating Ca influx and inducing PTP, the authors have maximised the release probability. In this condition, the release probability is nearly one. Under such a condition, the release site can release another vesicle in a short time. By analyzing mean, variance and covariance, the authors propose a release model that each release site contains a docking site and a replacement site. They excluded the LS-TS model (Neher and Brose) based on discrepancy between model and the data (mean and covariance).

      Strengths: The authors have used a minimal stimulation and modelling nicely to look into stochastic nature of release sites with good resolution. This cannot be done at other synapses. Overall conclusions are reasonable and convincing.

      Weaknesses: The interpretation is somewhat model-dependent, and it is unclear if the interpretation is unique. For example, it is unclear if the heterogeneous release probability among sites, silent sites, can explain the results. However, the authors discuss these potential caveats in a fair manner and argue that their model is very likely to be the best so far.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Silva et al. describe an experimental study conducted on cerebellar parallel fiber-to-molecular interneuron synapses to investigate the size of the readily releasable pool (RRP) of synaptic vesicles (SVs) per docking site in response to trains of action potentials. The study aims to determine whether there are multiple binding sites for SVs at each docking site, which could lead to a higher RRP size than previously thought.

      The researchers used this glutamatergic synapse to conduct their experiments. They employed various techniques and manipulations to enhance release probability, docking site occupancy, and synaptic depression. By counting the number of released SVs in response to action potential trains and normalizing the results based on the number of docking sites, they estimated the RRP size per docking site.

      The key findings and observations in the manuscript are as follows:

      Docking Site Occupancy and Release Probability Enhancement: The researchers used 4-amidopyridine (4-AP) and post-tetanic potentiation (PTP) protocols to enhance the release probability of docked SVs and the occupancy of docking sites, respectively.

      Synchronous and Asynchronous Release: Synchronous release refers to SVs released in response to individual action potentials, while asynchronous release involves SVs released after the initial release response due to calcium elevation. The study observed changes in the balance between synchronous and asynchronous release under different conditions, revealing the degree of filling of the RRP.

      Modeling of Release Dynamics: The researchers employed a modeling approach based on the "replacement site/docking site" (RS/DS) model, where SVs bind to a replacement site before moving to a docking site and eventually undergoing release. The model was adjusted to experimental conditions to estimate parameters like docking site occupancy and release probabilities.

      Comparison of Different Models: The study compared the RS/DS model with an alternative model known as the "loosely docked/tightly docked" (LS/TS) model. The LS/TS model assumes that a docking site can only accommodate one SV at a time, while the RS/DS model considers the possibility of accommodating multiple SVs.

      Maximum RRP Size: Through a combination of experimental results and model simulations, the study revealed that the maximum RRP size per docking site reached close to two SVs under certain conditions, supporting the idea that each docking site can accommodate multiple SVs.

      Strengths:<br /> The study is rigorously conducted and takes into consideration previous work of RRP size and SV docking site estimation. The study addresses a long-standing question in synaptic physiology.

      Weaknesses:<br /> It remains unclear how generalizable the findings are to other types of synapses.

    1. eLife assessment

      Identifying chromatin interactions with high sensitivity and resolution remains technically challenging using genome-wide approaches. This study presents findings using the refined MNase-based proximity ligation method called MChIP-C, which allows for the measurement of chromatin interactions at single-nucleosome resolution on a genome-wide scale. Overall, the evidence in this manuscript is solid, and the technological advances will be valuable for the study of 3D genome structure.

    2. Reviewer #1 (Public Review):

      The authors presented a new MNase-based proximity ligation method called MChIP-C, allowing for the measurement of protein-mediated chromatin interactions at single-nucleosome resolution on a genome-wide scale. With improved resolution and sensitivity, they explored the spatial connectivity of active promoters and identified the potential candidates for establishing/maintaining E-P interactions. Finally, with published CRISPRi screens, they found that most functionally verified enhancers do physically interact with their cognate promoters, supporting the enhancer-promoter looping model.

      The study's experimental approach and findings are interesting. However, several issues need to be addressed.

      1. The authors described that "the lack of interaction between experimentally-validated enhancers and their cognate promoters in some studies employing C-methods has raised doubts regarding the classical promoter-enhancer looping model", so it's intriguing to see whether the MChIP-C could indeed detect the E-P interactions which were not identified by C-methods as they mentioned (Benabdallah et al., 2019; Gupta et al., 2017). I agree that they identified more E-P interactions using MChIP-C, but specifically, they should show at least 2-3 cases. It's important since this is the main conclusion the authors want to draw.

      2. The authors compared their data to those of Chen et al. (Chen et al., 2022), who used PLAC-seq with anti-H3K4me3 antibodies in K562 cells and standard Micro-C data previously reported for K562, concluding that "MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes.". This is not convincing since they only compared their data to one dataset. More datasets from other cell lines should be included.

      3. The reasons for choosing Chen's data (Chen et al., 2022) and CRISPRi screens (Fulco et al., 2019; Gasperini et al., 2019) should be provided since there are so many out there.

      4. The authors identify EP300 histone acetyltransferase and the SWI/SNF remodeling complex as potential candidates for establishing and/or maintaining enhancer-promoter interactions, but not RNA polymerase II, mediator complex, YY1, and BRD4. More explanation is needed for this point since they're previously suggested to be associated with E-P interactions.

      5. The limitations of the method should be discussed.

    3. Reviewer #2 (Public Review):

      Summary:<br /> Golov et al performed the capture of MChIP-C using the H3K4me3 antibody. The new method significantly increases the resolution of Micro-C and can detect clear interactions which are not well described in the previous HiChIP/PLAC-seq method. Overall, the paper represents a significant technological advance that can be valuable to the 3D genomic field in the future.

      Strengths:<br /> 1. The authors established a novel method to profile the promoter center genomic interactions based on the Micro-C method. Such a method could be very useful to dissect the enhancer promoter interaction which has long been an issue for the popular HiC method.

      2. With the MChIP-C method the authors are able to find new genomic interactions with promoter regions enriched in CTCF. The author has significantly increased the detection sensitivity of such methods as PLAC-seq, Micro-C, and HiChIP.

      3. The authors identified a new type of interaction between the CTCF-less promoter and the CTCF binding site. This particular type of interaction could explain the CTCF's function in regulating gene transcription activity as observed in many studies. I personally think the second stripe model of P-CTCF interaction is more likely as this has been proposed for the super-enhancer stripe model before. The author should also discuss this part of the story more.

      Weaknesses:<br /> 1. The data presentation should include the contact heat map. The current data presentation makes it hard for the readers to have a comprehensive view of pair-wise interactions between promoters and the PIR. In particular, these maps may directly give answers to the proposed model of promoter-CTCF interactions by the authors in Figure 3a.

      2. In Fig 3D, there seems a very limited increase of power predicting MChIP-C signal for DHS-promoter pairs beyond the addition of CTCF. This figure could be simplified with fewer factors.

      3. The current method seems to have a big fraction of unusable reads. How the authors process the data should be included to allow for future reproduction. Ideally, the authors should generate a package on R or Bioconda for this processing.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This manuscript represents a technological development- specifically a micrococcal nuclease chromatin capture approach, termed MChIP-C to identify promoter-centered chromatin interactions at single nucleosome resolution via a specific protein, similar to HiChIP, ChIA-PET, etc.. In general, the manuscript is technically well done. Two major issues raise concerns that need to be addressed. First, it does not appear that novel chromatin interactions identified by MChIP-C which were missed by other approaches such as HiChIP, were validated. This is central to the argument of "improved" sensitivity, which is one of the key factors to assess sensitivity. Second is the question of resolution. Because the authors focus on a histone mark (H3K4me3) it is unclear whether the resolution of the assay truly exceeds other approaches, especially microC. These two issues are not completely supported by the data provided.

      Strengths:<br /> 1) The method appears to hold promise to improve both the sensitivity and resolution of protein-centered chromatin capture approaches.

      Weaknesses:<br /> 1) Specific validation experiments to demonstrate the identification of previously missed novel interactions are missing.

      2) It is unclear if the resolution is really superior based on the data provided.

      3) It is unclear how much advantage the approach has, especially compared to existing approaches such as HiChIP< since sequencing depth as a variable is not adequately addressed.

    1. eLife assessment

      This manuscript is an important contribution, assessing the role of intraspecific consumer interference in maintaining diversity using a mathematical model. Consistent with long-standing ecological theory, the authors convincingly show that predator interference allows for the coexistence of multiple species on a single resource, beyond the competitive exclusion principle. The model matches observed rank-abundance curves in several natural ecosystems. However, a more detailed synthesis of relevant prior studies is needed to clarify the contribution of this manuscript in the context of existing knowledge.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The manuscript considers a mechanistic extension of MacArthur's consumer-resource model to include chasing down food and potential encounters between the chasers (consumers) that lead to less efficient feeding in the form of negative feedback. After developing the model, a deterministic solution and two forms of stochastic solutions are presented, in agreement with each other. Finally, the model is applied to explain observed coexistence and rank-abundance data.

      Strengths:<br /> - The application of the theory to natural rank-abundance curves is impressive.<br /> - The comparison with the experiments that reject the competitive exclusion principle is promising. It would be fascinating to see if in, e.g. insects, the specific interference dynamics could be observed and quantified and whether they would agree with the model.<br /> - The results are clearly presented; the methods adequately described; the supplement is rich with details.<br /> - There is much scope to build upon this expansion of the theory of consumer-resource models. This work can open up new avenues of research.

      Weaknesses:<br /> - I am questioning the use of carrying capacity (Eq. 4) instead of using nutrient limitation directly through Monod consumption (e.g. Posfai et al. who the authors cite). I am curious to see how these results hold or are changed when Monod consumption is used.

      - Following on the previous comment, I am confused by the fact that the nutrient consumption term in Eq. 1 and how growth is modeled (Eq. 4) are not obviously compatible and would be hard to match directly to experimentally accessible quantities such as yield (nutrient to biomass conversion ratio). Ultimately, there is a conservation of mass ("flux balance"), and therefore the dynamics must obey it. I don't quite see how conservation of mass is imposed in this work.

      - These models could be better constrained by more data, in principle, thereby potential exists for a more compelling case of the relevance of this interference mechanism to natural systems.

      - The underlying frameworks, B-D and MacArthur are not properly exposed in the introduction, and as a result, it is not obvious what is the specific contribution in this work as opposed to existing literature. One needs to dig into the literature a bit for that. The specific contribution exists, but it might be more clearly separated and better explained. In the process, the introduction could be expanded a bit to make the paper more accessible, by reviewing key features from the literature that are used in this manuscript.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript by Kang et al investigates how the consideration of pairwise encounters (consumer-resource chasing, intraspecific consumer pair, and interspecific consumer pair) influences the community assembly results. To explore this, they presented a new model that considers pairwise encounters and intraspecific interference among consumer individuals, which is an extension of the classical Beddington-DeAngelis (B-D) phenomenological model, incorporating detailed considerations of pairwise encounters and intraspecific interference among consumer individuals. Later, they connected with several experimental datasets.

      Strengths:<br /> They found that the negative feedback loop created by the intraspecific interference allows a diverse range of consumer species to coexist with only one or a few types of resources. Additionally, they showed that some patterns of their model agree with experimental data, including time-series trajectories of two small in-lab community experiments and the rank-abundance curves from several natural communities. The presented results here are interesting and present another way to explain how the community overcomes the competitive exclusion principle.

      Weaknesses:<br /> The authors only explore the case with interspecific interference or intraspecific interference exists. I believe they need to systematically investigate the case when both interspecific and intraspecific interference exists. In addition, the text description, figures, and mathematical notations have to be improved to enhance the article's readability. I believe this manuscript can be improved by addressing my comments, which I describe in more detail below.

      1. In nature, it is really hard for me to believe that only interspecific interference or intraspecific interference exists. I think a hybrid between interspecific interference and intraspecific interference is very likely. What would happen if both the interspecific and intraspecific interference existed at the same time but with different encounter rates? Maybe the authors can systematically explore the hybrid between the two mechanisms by changing their encounter rates. I would appreciate it if the authors could explore this route.

      2. In the first two paragraphs of the introduction, the authors describe the competitive exclusion principle (CEP) and past attempts to overcome the CEP. Moving on from the first two paragraphs to the third paragraph, I think there is a gap that needs to be filled to make the transition smoother and help readers understand the motivations. More specifically, I think the authors need to add one more paragraph dedicated to explaining why predator interference is important, how considering the mechanism of predator interference may help overcome the CEP, and whether predator interference has been investigated or under-investigated in the past. Then building upon the more detailed introduction and movement of predator interference, the authors may briefly introduce the classical B-D phenomenological model and what are the conventional results derived from the classical B-D model as well as how they intend to extend the B-D model to consider the pairwise encounters.

      3. The notations for the species abundances are not very informative. I believe some improvements can be made to make them more meaningful. For example, I think using Greek letters for consumers and English letters for resources might improve readability. Some sub-scripts are not necessary. For instance, R^(l)_0 can be simplified to g_l to denote the intrinsic growth rate of resource l. Similarly, K^(l)_0 can be simplified to K_l. Another example is R^(l)_a, which can be simplified to s_l to denote the supply rate. In addition, right now, it is hard to find all definitions across the text. I would suggest adding a separate illustrative box with all mathematical equations and explanations of symbols.

      4. What is the f_i(R^(F)) on line 131? Does it refer to the growth rate of C_i? I noticed that f_i(R^(F)) is defined in the supplementary information. But please ensure that readers can understand it even without reading the supplementary information. Otherwise, please directly refer to the supplementary information when f_i(R^(F)) occurs for the first time. Similarly, I don't think the readers can understand \Omega^\prime_i and G^\prime_i on lines 135-136.

    4. Reviewer #3 (Public Review):

      Summary:<br /> A central question in ecology is: Why are there so many species? This question gained heightened interest after the development of influential models in theoretical ecology in the 1960s, demonstrating that under certain conditions, two consumer species cannot coexist on the same resource. Since then, several mechanisms have been shown to be capable of breaking the competitive exclusion principle (although, we still lack a general understanding of the relative importance of the various mechanisms in promoting biodiversity).

      One mechanism that allows for breaking the competitive exclusion principle is predator interference. The Beddington-DeAngelis is a simple model that accounts for predator interference in the functional response of a predator. The B-D model is based on the idea that when two predators encounter one another, they waste some time engaging with one another which could otherwise be used to search for resources. While the model has been influential in theoretical ecology, it has also been criticized at times for several unusual assumptions, most critically, that predators interfere with each other regardless of whether they are already engaged in another interaction. However, there has been considerable work since then which has sought either to find sets of assumptions that lead to the B-D equation or to derive alternative equations from a more realistic set of assumptions (Ruxton et al. 1992; Cosner et al. 1999; Broom et al. 2010; Geritz and Gyllenberg 2012). This paper represents another attempt to more rigorously derive a model of predator interference by borrowing concepts from chemical reaction kinetics (the approach is similar to previous work: Ruxton et al. 1992). The main point of difference is that the model in the current manuscript allows for 'chasing pairs', where a predator and prey engage with one another to the exclusion of other interactions, a situation Ruxton et al. (1992) do not consider. While the resulting functional response is quite complex, the authors show that under certain conditions, one can get an analytical expression for the functional response of a predator as a function of predator and resource densities. They then go on to show that including intraspecific interference allows for the coexistence of multiple species on one or a few resources, and demonstrate that this result is robust to demographic stochasticity.

      Strengths:<br /> I appreciate the effort to rigorously derive interaction rates from models of individual behaviors. As currently applied, functional responses (FRs) are estimated by fitting equations to feeding rate data across a range of prey or predator densities. In practice, such experiments are only possible for a limited set of species. This is problematic because whether a particular FR allows stability or coexistence depends on not just its functional form, but also its parameter values. The promise of the approach taken here is that one might be able to derive the functional response parameters of a particular predator species from species traits or more readily measurable behavioral data.

      Weaknesses:<br /> The main weakness of this paper is that it devotes the vast majority of its length to demonstrating results that are already widely known in ecology. We have known for some time that predator interference can relax the CEP (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004).

      While the model presented in this paper differs from the functional form of the B-D in some cases, it would be difficult to formulate a model that includes intraspecific interference (that increases with predator density) that does not allow for coexistence under some parameter range. Thus, I find it strange that most of the main text of the paper deals with demonstrating that predator interference allows for coexistence, given that this result is already well known. A more useful contribution would focus on the extent to which the dynamics of this model differ from those of the B-D model.

      The formulation of chasing-pair engagements assumes that prey being chased by a predator are unavailable to other predators. For one, this seems inconsistent with the ecology of most predator-prey systems. In the system in which I work (coral reef fishes), prey under attack by one predator are much more likely to be attacked by other predators (whether it be a predator of the same species or otherwise). I find it challenging to think of a mechanism that would give rise to chased prey being unavailable to other predators. The authors also critique the B-D model: "However, the functional response of the B-D model involving intraspecific interference can be formally derived from the scenario involving only chasing pairs without predator interference (Wang and Liu, 2020; Huisman and De Boer, 1997) (see Eqs. S8 and S24). Therefore, the validity of applying the B-D model to break the CEP is questionable.".

      However, the way "chasing pairs" are formulated does result in predator interference because a predator attacking prey interferes with the ability of other predators to encounter the prey. I don't follow the author's logic that B-D isn't a valid explanation for coexistence because a model incorporating chasing pairs engagements results in the same functional form as B-D.

      More broadly, the specific functional form used to model predator interference is of secondary importance to the general insight that intraspecific interference (however it is modeled) can allow for coexistence. Mechanisms of predator interference are complex and vary substantially across species. Thus it is unlikely that any one specific functional form is generally applicable.

    1. eLife assessment

      This important work addresses an interesting question for the vertebrate olfactory community of whether mice can discriminate odorant intermittency to help them navigate the environment. The data were collected and analyzed using solid methodology, however, the paper seems to fall short in demonstrating that animal is actually sensitive to intermittency but not other flow parameters. The work will be of interest to researchers working on sensory neurobiology and animal behavior.

    2. Reviewer #1 (Public Review):

      Gumaste et al studied if a parameter of odor plumes, the intermittency can be detected by an animal species, such as mice that heavily rely on olfaction to navigate and search for food and mates, among other behaviors. They also ask if the animals can extract information from this to gain knowledge about the odor source. Intermittency is defined as the fraction of time an odorant is present at a sampled point within the odor plume space. Their findings could be summarized as follows: they found that animals are capable of detecting differences in intermittency levels and suggest that this parameter of odor plumes is important for odor-based navigation in mammals, as it has been seen in other animals such as flying insects. The authors used a combination of behavioral training while concomitantly performing calcium imaging of olfactory receptor neurons (input to the olfactory bulb) and also mitral cells (output of the olfactory bulb). They found that mice are able to behaviorally discriminate between odor plumes of high and low intermittency. Interestingly, they found that the response of both input and output neurons of the olfactory bulb is capable to encode the intermittency experienced by the animals. The methods utilized in this work are very well suited for the kind of questions that the authors are asking. The combination of behavior and imaging, as opposed to only anesthetized imaging gives the authors a lot of power to interpret their data. A very relevant point is the generation of the olfactory stimuli that will be used to test the animals. The authors go to great lengths to generate more naturalistic odorant stimulations, as opposed to the typically used square pulses. Although there are some issues that can be addressed, the authors succeeded in answering the questions they set at the beginning of this work, and their conclusions are supported by their experiments. This work would generate interest among a relatively broad audience because the issue presented here (how the temporal structure of the odor plume affects the detection and encoding of an odorant) is novel in mice olfactory research.

    3. Reviewer #2 (Public Review):

      The study from Gumaste et al investigates whether mice can use changes of intermittency, a temporal odor feature, to locate an odor source. First, the study tries to demonstrate that mice can discriminate between low and high intermittency and that their performance is not affected by the odor used or the frequency of odor whiffs. Then, they show that there is a correlation between glomerular responses (OSNs and mitral cells) and intermittency. Finally, they conclude that sniffing frequency impacts the behavioral discrimination of intermittency as well as its neural representation. Overall, the authors seek to demonstrate that intermittency is an odor-plume property that can inform olfactory navigation.

      The paper explored an interesting question, the use of intermittency of an odor plume as a behavioral cue, which is a new and intriguing hypothesis. However, it falls short in demonstrating that the animal is actually sensitive to intermittency but not other flow parameters, and is missing some important details.

      Major concerns

      1) One of the cornerstones of this paper consists in showing that mice are behaviorally able to distinguish among different intermittency values (high or low), across a variety of different stimuli and without confounds such as the number of whiffs or concentration. However, I could not find in the paper a convincing explanation of how these confounds were tested. It is clear that the authors repeat their measurements in different conditions (low or high concentration, and different whiff numbers) but it is not specified how: do the authors mix all stimuli in the same session, and so the animals simply generalize across all the stimuli and only consider intermittency for the behavioral choices? Or do authors repeat different sessions for different parameters? For example: do they perform two separate sessions with low concentration and high concentration? If this last one is the case, I would argue that this is not enough proof that animals generalize across concentrations, as the animals might simply use concentration as a cue and change the decision criteria at each session. Please clarify.

      2) It looks to me that the measure of intermittency strongly depends on the set. What is the logic of setting a specific threshold? Do the results hold when this threshold changes within a reasonable range? The same questions (maybe even more important) go for the measure of glomerular intermittence. Unfortunately, a sensitivity analysis for both measures is missing, which makes it hard to interpret the results.

      3) The logic of choosing the decision boundary for the discrimination task is not clear: low intermittency is considered to be below 0.15 and high intermittency is considered to be between 0.2 and 0.8. Do these values correspond to natural intermittency distribution? How were these values chosen?

      4) Only 2 odors were used in the whole study and some results were in disagreement between the two odors. By looking at only two odors it is very difficult to make a general conclusion about intermittency encoding in the OB.

      5) Assuming that all the above issues are resolved, one can conclude that intermittency can be perceived by an animal. The study puts a strong accent on the fact that this feature could be used for navigation. I understand that it is extremely hard to demonstrate that this feature is actually used for navigation, however, the analysis of relevance of this measure is missing. Even if it is used in navigation, most probably this would be in combination with other features, thus its relative importance needs to be discussed, or even better, established.

    4. Reviewer #3 (Public Review):

      In this study, Gumaste et al. aim to determine whether mice can discriminate odor intermittency and whether the olfactory bulb encodes intermittency. Using a Go/No-Go task, the study first showed that mice can be trained to discriminate odor stimuli with a low versus high intermittency value. Next, the authors demonstrated that early olfactory processing in the OSNs and mitral/tufted cells encodes intermittency. Through calcium imaging of olfactory bulb glomeruli, they obtained the glomerular response properties across intermittency and demonstrated the effects of sniff frequency on the glomerular representation of intermittency. Although the results are expected based on previous literature, they do lend support to the notion that intermittency can be used for odor-guided navigation.

      Strengths:

      The counterbalanced olfactometer used in this study keeps the air flow constant while odor concentration changes. This design is very useful for experiments in which odor delivery needs to be precisely controlled.

      In a Go/No-Go task, mice were successfully trained to discriminate CS+ versus CS- odor stimuli with high versus low intermittency values in three different stimulus types (termed naturalistic, binary naturalistic, and square wave).

      The olfactory bulb glomerular activity (from either olfactory sensory neurons or mitral/tufted cells) was monitored while mice performing the behavioral tasks, supporting that intermittency coding could arise from early olfactory processing.

      Weaknesses:

      Alternative interpretations of the behavioral outcome could be better discussed. For instance, the odors delivered with high intermittency values may lead to higher odor concentrations that olfactory sensory neurons encounter in the mucus. Mice might discriminate the total amount of odors present in the mucus rather than intermittency.

      The conclusion that intermittency encoding is odor specific and depends on the spatial patterning/intrinsic glomerular properties is only based on two odorants used in this study.

    1. eLife assessment

      This manuscript describes a valuable new circuit mapping and profiling technique called Multiplexed projEction neuRons retrograde barcodE (MERGEseq) that combines transcriptome and projectome data at a single-cell resolution. The authors provide solid evidence that MERGEseq can be used to identify projection targets and cell type/layer/transcriptome differences of projection neurons in the mouse prefrontal cortex, and validation experiments are rigorous. While this report is a proof-of-principle that MERGEseq is useful for circuit mapping and profiling and many potential details will influence conclusions, this technique could easily be adapted to other regions with known projection targets and adds to a growing arsenal of combinatorial circuit mapping and profiling tools.

    2. Reviewer #1 (Public Review):

      With MERGEseq, the authors sought to develop a scalable and accessible method for getting both projectome and transcriptome information at the single-cell level from multiple projection targets within a single animal. MERGEseq uses a retro rAAV2 to deliver a 15-nucleotide barcode driven by a CAG promoter with co-expression of eGFP to enrich barcoded cells using FACS. Injection of this rAAV2 in distinct regions (with each injection region distinguished by a unique barcode that is specific to the virus used) allows retrograde trafficking and expression of the barcodes in cells that project to the injected region. In this manuscript, rAAVs harboring 5 unique barcodes were stereotactically delivered to 5 targets of the mouse: dorsomedial striatum (DMS), mediodorsal thalamic nucleus (MD), basal amygdala (BLA), lateral hypothalamus (LH), and agranular insular cortex (AI). After a 6-week period to allow for viral transduction and expression, the ventromedial prefrontal cortex (vmPFC) was harvested for scRNAseq. vmPFC scRNAseq data were validated against previously published PFC datasets, demonstrating that MERGEseq does not disrupt transcript expression and identifies the same principal cell types as annotated in previous studies. Importantly, MERGEseq enabled the identification of cell types in the vmPFC that project to distinct areas, with separation occurring largely based on cell type and cortical layer. The application of stringent criteria for barcode index determination is rigorous and improves confidence that barcoded cells are correctly identified. The observation that all barcoded cells were excitatory is consistent with prior work, although it is not clear if viral tropism contributes to this in some way. In a parallel experiment, FAC-sorted cells (vmPFC cells expressing EGFP) were isolated as a comparison. Notably, EGFP+ cells were exclusively excitatory neurons, consistent with literature showing PFC projection neurons are excitatory. Next, barcode analysis was combined with transcriptional identification of neuronal subtypes to define general projection patterns and single-cell projection patterns, which were validated by the DMS and MD in situ using retrograde tracing in combination with RNA FISH. MERGEseq data were also used to identify transcriptional differences between neurons with dedicated and bifurcated projections. DMS+LH and DMS+MD projecting neurons had distinct transcriptional profiles, unlike cells with other targets. RNA FISH for marker gene Pou3f and retrograde tracing from DMS+LH projecting cells demonstrate enrichment of this gene in this projection population. Finally, machine-learning was used to predict projection targets based on transcriptional profiles. In this dataset, 50 highly variable genes (HVGs) were optimal for predicting projection patterns, though this might vary in different circuits. Overall, the results of this manuscript are well presented and include rigorous validation for select vmPFC targets with in situ techniques. The application of unique barcodes for retro-AAV delivery is an accessible tool that other labs can implement to study other brain circuits.

      Ultimately, MERGEseq is a subtle conceptual advancement over VECTORseq (retro-AAV delivered transgenes rather than barcodes, in combination with scRNAseq) that offers higher confidence in the described projectome diversity in comparison. The use of a retrograde AAV inherently limits the number of projection areas that can be assessed, a weakness compared to anterograde approaches such as MAPseq/BARseq. However, BARseq demands more time and resources; further, the use of the highly toxic Sindbis virus limits the application of this technique. This manuscript builds upon previous work by utilizing machine learning to predict projection targets. BARseq2 could be used to rigorously validate predicted projectomes and gain single-cell information regarding target neurons. Overall, MERGEseq is an accessible technique that can be used across many animal models and serve as an important starting point to define circuits at the single-cell level.

    3. Reviewer #2 (Public Review):

      Investigating the relationship between transcriptomic profiles, their axonal projection and collateralization patterns will help define neuronal cell types in the mammalian central nervous system. The study by Xu et al. combined multiple retrograde viruses with barcodes and single-cell RNA-sequencing (MERGE-seq) to determine the projection and collateralization patterns of transcriptomically defined ventral medial prefrontal cortex (vmPFC) projection neurons. They found a complex relationship: the same transcriptomically defined cell types project to multiple target regions, and the same target region receives input from multiple transcriptomic types of vmPFC neurons. Further, collateralization patterns of vmPFC to the five target regions they investigated are highly non-random.

      While many of the biological conclusions are not surprising given recent studies on the collateralization patterns of vmPFC neurons using single neuron tracing and other methods that integrate transcriptomics and projections, MERGE-seq provides validation, at the single cell level, collateralization patterns of individual vmPFC neurons, and thus offer new and valuable information over what has been published. The method can also be used to study collateralization patterns of other neuron types.

      Some of the conclusions the authors draw depend on the efficiency of retrograde labeling, which was not determined. Without quantitative information on retrograde labeling efficiency, and unless such efficiency is close to 100%, these conclusions are likely misleading.

    4. Reviewer #3 (Public Review):

      This manuscript describes a multiplexed approach for the identification of transcriptional features of neurons projecting to specific target areas at the single-cell level. This approach, called MERGE-seq, begins with multiplexed retrograde tracing by injecting distinctly barcoded rAAV-retro viruses into different target areas. The transcriptomes and barcoding of neurons in the source area are then characterized by single-cell RNA sequencing (scRNAseq) on the 10xGenomics platform. The projection targets of barcoded neurons in the source area can be inferred by matching the detected barcodes to the barcode sequences to of rAAV-retro viruses injected into the target areas.

      The authors validated their approach by injecting five rAAV-retro GFP viruses, each encoding a different barcode, into five known targets of the ventromedial prefrontal cortex (vmPFC). The transcriptomes and barcoding of vmPFC neurons were then analyzed by scRNA-seq with or without enrichment of retrogradely labeled neurons based on GFP fluorescence. The authors confirmed the previously described heterogeneity of vmPFC neurons. In addition, they showed that most transcriptionally defined cell types project to multiple targets and that the five targets received projections from multiple transcriptomic types. The authors further characterized the transcriptomic features of barcoded vmPFC neurons with different projection patterns and defined Pou3f1 as a marker gene of neurons extending collateral branches to the dorsomedial striatum and lateral hypothalamus.

      Overall, the results of the manuscript are convincing: the transcriptomic vmPFC cell types defined by scRNAseq in this study appear to correlate well with previous studies, the bifurcated projection patterns inferred by barcoding are validated using dual-color retro-AAV tracing, and marker genes for projection-specific cell subclasses are validated in retrogradely labeled vmPFC using RNA FISH for marker detection.

      The concept of combining retrograde tracing and scRNAseq is not new. Previous studies have applied recombinase-expressing viruses capable of retrograde labeling, such as CAV, rabies virus, and AAV2-Retro, to retrogradely label and induce the expression of fluorescence markers in projection neurons, therefore facilitating enrichment and analysis of neurons projecting to a specific target. Multiplexed analysis can be achieved with the combination of different reporter viruses or viruses expressing different recombinases and appropriate reporter mouse lines. The advantages of MERGE-seq include that no transgenic lines are required and that it could be applied at even higher levels of multiplexity.

      However, previously existing datasets that have already profiled this region with scRNAseq have not been utilized to their full extent. Therefore, for the proper context with prior literature, bioinformatic integration of these scRNAseq and prior scRNAseq data is needed.

      Moreover, robust detection of barcodes in neurons labeled by barcoded AAV-retro viruses remains a challenge. The authors should clearly discuss the difficulties with barcode detection in this approach, as well as discuss potential solutions, which are important for others interested in its approach.

      While this study is limited to the five known targets of vmPFC, the results suggest that MERGE-seq is a valuable tool that could be used in the future to characterize projection targets and transcriptomes of neurons in a multiplexed manner. As MERGE-seq uses AAVs to deliver barcodes, this method has the potential for application in model organisms for which transgenic lines are not available. Further improvements in experimental design and data analysis should be considered when applying MERGE-seq to poorly characterized source areas or with increased multiplexity of target areas.

      In summary, this is a valuable approach, but the authors should clearly provide the context for their study within the existing literature, transparently discuss the limitations of MERGE-seq, as well as suggest improvements for the future.