10,000 Matching Annotations
  1. Mar 2026
    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hathaway et al. describes a set of elegant behavioral experiments designed to understand which aspects of cue-reward contingencies drive risky choice behavior. The authors developed several clever variants of the well established rodent gambling task (also developed by this group) to understand how audiovisual cues alter learning, choice behavior, and risk. Computational and sophisticated statistical approaches were used to provide evidence that: 1) audiovisual cues drive risky choice if they are paired with rewards and decrease risk if only paired with loss, 2) pairing cues with rewards reduces learning from punishment, and 3) differences in risk taking seem to be present early on in training.

      Strengths:

      The paper is well written, the experiments well designed, and the results are highly interesting particularly for understanding how cues can motivate and invigorate normal and abnormal behavior.

      Comments on revisions:

      The authors have done an exceptional job at addressing my initial concerns and questions regarding the evidence to support their claims. I have no additional suggestions or concerns.

    2. Reviewer #3 (Public review):

      Summary:

      In this work, Hathaway and colleagues aim to understand how audiovisual cues at time of outcome promote selection of risky choices. A real life illustration of this effect is used in electronic gambling machines which signal a win with flashing lights and jingles, encouraging the player to keep betting. More specifically, the author asks whether the cue has to be paired exclusively to wins, or whether it can be paired to both outcomes, or exclusively loss outcomes, or occur randomly. To tackle this question, they employ a version of the Iowa Gambling Task adapted to rats, and test the effect of different rules of cue-outcome associations on the probability of selecting the riskier options; they then test the effect of prior reward devaluation on the task; finally, the optimise computational models on the early phases of the experiment to investigate potential mechanisms underlying the behavioural differences.

      Strengths:

      The experimental approach is very thorough, in particular the choice of the different task variants cover a wide range of different potential hypotheses. Using this approach, they find that, although rats prefer the optimal choices, there is a shift towards selecting riskier options in the variants of the task where the cue is paired to win outcomes. They analyse this population average shift by showing that there is a concurrent increase in the number of risk-taking individuals in these tasks. They also make the novel discovery that pairing cues with loss outcomes instead reduces the tendency for risky decisions.

      The computational strategy is appropriate and in keeping with the accepted state of the art: defining a set of candidate models, optimising them, comparing them, simulating the best ones to ensure they replicate the main experimental results, then analysing parameter estimates in the different tasks to speculate abut potential mechanisms.

      Weaknesses:

      While the overall computational approach is excellent, there is a missed opportunity in the computational modelling section due to the choice of models which is dependent on a preceding study by Langdon et al. (2019). Loss trials come at a double cost: firstly the lost opportunity of not having selected a winning option which is reflected straightforwardly in Q-learning by the fact that r=0, secondly a waiting period which will affect the overall reward rate. The authors combine these costs by converting the time penalty into "reward currency" using three different functions which make up the three different tested models. This means the question when comparing models is not something along the lines of "are individuals in the paired win-cue tasks more sensitive to risk? or less sensitive to time? etc." but rather "what is the best way of converting time into Q-value currency to fit the data?". Instead, the authors could have contrasted other models which explicitly track time as a separate variable (see for example "Impulsivity and risk-seeking as Bayesian inference under dopaminergic control" (Mikhael & Gershman 2021)) or give actions an extra risk bonus (as in "Nicotinic receptors in the VTA promote uncertainty seeking" (Naude et al 2016)) to better disentangle the mechanisms at play.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      When do behavioral differences emerge between the task variants? Based on the results and discussion, the cues increase the salience of either the wins or the losses, biasing behavior in favor of either risky or optimal choice. If this is the case, one might expect the cues to expedite learning, particularly in the standard and loss condition. Providing an analysis of the acquisition of the tasks may provide insight into how the cues are "teaching" decision-making and might explain how biases are formed and cemented.

      While considerable differences in decision making emerge in early sessions of training, we do not observe any evidence that cuing outcomes expedites the development of stable choice patterns. Indeed, since the outcomes are cued across all four options, there is no categorical difference in salience between optimal and risky choices. Thus, our interpretation is that cuing wins and/or losses alters the integration of this feedback into choice preference, rather than the rate of the development of choice preference. To quantitatively address this point, we have included the following analysis:

      “To quantitatively examine choice variability during training, we binned sessions 1-5 and 6-10 and analyzed variability in choice patterns across task variants. Analysis of the first five sessions of training revealed a significant shift in decision score across sessions (F(3, 502) = 31.23, p <.0001), which differed between task variants (session x task: F(16, 502) = 2.13, p = .007). Conversely, while significant differences in overall score were observed between task variants in sessions 6-10 (task: F(5, 156) = 6.81, p <.0001), there was no significant variability across sessions (session: F(3, 481) = 2.06, p = .10, task x session: F(15, 481) = 0.78, p = .71). This indicates that the variability in choice preference (and presumably, learning about outcomes) is maximized in the first five sessions, and there are no obvious differences in the rate of development of stable choice patterns between task variants.”

      Does the learning period used for the modeling impact the interpretation of the behavioral results? The authors indicate that computational modeling was done on the first five sessions and used these data to predict preferences at baseline. Based on these results, punishment learning predicts choice preference. However, these animals are not naïve to the contingencies because of the forced choice training prior to the task, which may impact behavior in these early sessions. Though punishment learning may initially predict risk preference, other parameters later in training may also predict behavior at baseline.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      The authors also present simulated data from the models for sessions 18-20, but according to the statistical analysis section, sessions 35-40 were used for analysis (and presumably presented in Figure 1). If the simulation is carried out in sessions 35-40, do the models fit the data?

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      Finally, though the n's are small, it would be interesting to see how the devaluation impacts computational metrics. These additional analyses may help to explain the nuanced effects of the cues in the task variants. 

      Unfortunately, as the devaluation experiment is only one session, there are insufficient data to run the same models. Furthermore, changes in choice are subtle and not uniform across rats, making it difficult to reliably model this effect at the individual level. A separate experiment could investigate the specific cognitive processes underlying the devaluation effect.

      Reviewer #1 (Recommendations for the authors):

      The authors do not present individual data points for behavior. Including these data points would improve the interpretability of the results. Adding significant notations to the bar graphs would also help the reader. Although the stats are provided and significant comparisons highlighted, it isn't easy to go between the table and the figure to detect significant outcomes. If done, the statistics tables could be moved to the supplement. Including estimates of effect size for main findings in the main text would also benefit the reader. 

      We thank the reviewer for their feedback on our approach to the figures and significance reporting – we have updated the relevant figures to include individual data points. Furthermore, we’ve added significance notations for task variants that are significantly different from the uncued or standard cued tasks on the figures. We’ve also moved some statistics tables to the supplement, as suggested. 

      The authors allude to other metrics of the task (trials, omissions, etc.) but do not present these data anywhere. Including supplementary figures including individual data points and statistical analyses in the supplement is strongly encouraged.

      A supplementary figure visualizing these metrics (choice latency, trials completed, and omissions) has been added, with individual data points included. Statistical analyses are reported in the main text – no significant effect in the ANOVAs were observed for any of these metrics, so post hoc analyses were not performed. 

      Figure 4 is confusing. Presenting the WAIC values for each model rather than compared to the nonlinear model would be easier to understand. It is also unclear if statistical tests were used to assess differences in model fit as no test information is provided.

      Figure 4 has been updated to increase clarity and address feedback from another reviewer. Raw WAIC values are not ideal for visualization, as the task variants have differing amounts of data and thus would be difficult to include on the same Y-axis. Instead, we present each model’s difference in WAIC relative to a basic model with no timeout penalty transform, so that all three models are visible, and the direction of model improvement is clearly indicated. Statistical tests of WAIC differences are not standard, as the numerical differences themselves indicate a better fit.

      The authors do not provide a data availability statement.

      We thank the reviewer for calling our attention to this oversight. A data availability statement has been added. 

      Reviewer 2 (Public review):

      Additional support and evidence are needed for the claims made by the authors. Some of the statements are inconsistent with the data and/or analyses or are only weakly supportive of the claims.

      We appreciate the reviewer’s overarching concern that some claims in the original manuscript were insufficiently supported by the data or analyses. To address this, we have provided further rationale for the devaluation experiment and clarified our interpretation of those results, expanded the computational modeling analyses, and revised figures and wording to improve clarity. Below, we respond to the reviewer’s specific comments in detail.

      Reviewer #2 (Recommendations for the authors):

      Different variants of an RL model were used to understand how loss outcomes impacted choice behavior across the gambling task variants. Did the authors try different variants for rewarded outcomes? I wonder whether the loss specific RL effects are constrained to that domain or perhaps emerged because choice behavior to losses was better estimated with the different RL variants. For example, rewarded outcomes across the different choices may not scale linearly (e.g., 1, 2, 3, 4) so including a model in which Rtr is scaled by a free parameter might improve the fit for win choices.

      We agree that asymmetries in model flexibility could, in principle, contribute to the observed effects. While we are somewhat limited in our ability to develop and validate further models due to the small size of the datasets compared to the high degree of choice variability between rats, we have explored the possibility as far as the data allow by fitting a model that includes a scaling parameter for rewards in addition to punishments:

      “While we restricted our model selection to those previously validated on larger datasets, the specificity of the main finding to the punishment learning rate may be due to the greater flexibility afforded to loss scaling, rather than a true asymmetry in learning. To test this hypothesis, we fit a model featuring a scaling parameter for rewards, in addition to scaled costs:

      where mRew is a linear scaling parameter for reward size. A separate scaling parameter was used for timeout penalty duration (i.e., same as scaled cost model). Group-level parameter estimates (Figure S3) reflected similar differences in the punishment learning rate and reward learning rate as the scaled cost model (Figure S4). Furthermore, all 95% HDIs for the mRew scaling parameter included 1, indicating that at least at the group level, scaling of reward size across the P1-P4 options closely follows the actual number of earned sucrose pellets. Thus, we find no evidence that our results can be simply attributed to the increased parameterization of losing outcomes.”

      Additionally, I would like to see evidence that these alternative models provide a better fit compared to a standard delta-rule updating for unrewarded choices.

      Each model is now compared directly to a standard delta-rule update model in the WAIC figure to demonstrate that the current models are a better fit for the data.

      Could the authors provide some visualization of how variation in the r, m, or b parameters impact choices and/or patterns of choices?

      We have added a figure to the supplementary section to visualize how different values for the r, m, and b parameters could alter the size of updates to Q-values on each trial across the four different options, thereby impacting subsequent choice. 

      It was challenging to understand the impact of the reported effects and interpretation of the authors at various points in the manuscript. For example, the authors state that "only rats trained on tasks without win-paired cues exhibited shifts in risk preference following reinforcer devaluation". Figure 3 however seems to indicate that rats trained on the reverse-cued task show shifts in risk preference. 

      We agree the original wording did not fully capture the nuance apparent in the figure. While not significantly different from baseline, rats in the reverse-cued experiment could have indeed updated their choice patterns and we were underpowered to detect the effect. We have updated the results section to include this point, and to more specifically outline that win-paired cues that scale with reward size lead to insensitivity to reinforcer devaluation:

      “This indicates that pairing audiovisual cues with reward induces some degree of inflexibility in risk-preferring rats. Importantly, pairing cues with losses alone does not elicit rigidity in choice. Thus, in keeping with the observed effect on overall choice patterns, pairing cues with wins has a unique impact on sensitivity to reinforcer devaluation. Although not statistically significant, visual inspection of the reverse-cued task suggests that some choice flexibility may be present, and the study may be underpowered to detect this effect. Nonetheless, win-paired cues that scale with reward size reduce flexibility in choice patterns following reinforcer devaluation.”

      It was not clear to me why the authors did a devaluation test and what was expected. Adding details regarding the motivation for specific analyses and/or experiments would improve understanding of these exciting results.

      Further explanation has been added to the results section for the devaluation test to clarify the rationale and expected results:

      “We next tested whether pairing salient audiovisual cues with outcomes on the rGT impacts flexibility in decision making when outcome values are updated. Reinforcer devaluation, in which subjects are sated on the sugar pellet reinforcer prior to task performance (presumably devaluing the outcome), is a common test of flexibility of decision making (Adams & Dickinson, 1981). We have previously employed this method to demonstrate that rats trained on the standard-cued task are insensitive to reinforcer devaluation (i.e., choice patterns do not shift despite devaluation of the sugar pellet reward; Hathaway et al., 2021).”

      Some rats in the rGT become risk takers and some do not, but whether this is an innate phenomenon or emerges with training is not known. The authors report some correlations between the RL parameters and subsequent risk scores but this may be an artifact because the risk scores and many of the parameters differ between the experimental groups. Restricting these analyses to the rats in the standard procedure (or even conducting it in other rats that have been run in the rGT standard task) would alleviate this concern. The authors should also expand upon this result in the discussion. (if it holds up) and provide graphs of this relationship in the manuscript.

      In a previous paper on which these analyses were based (Langdon et al., 2019), analyses of the relationship between RL parameter estimates and final decision score were conducted separately for rats trained on either the uncued or standard cued task, as the reviewer has suggested here. Those analyses showed that parameters controlling the learning from negative outcomes were specifically related to final score in both tasks. While we don’t have the appropriate n per group to split the analyses by task variant in the current study, we have highlighted these previous findings in the results section to address this concern:

      “In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      I don't see a b parameter in the nonlinear cost model, but is presented in Figure 6 and also in the "Parameters predicting risk preference on the rGT". The authors either need to update the formula or clarify what the b parameter quantifies in the nonlinear model.

      We thank the reviewer for pointing out this oversight; the equation has been updated to include the b parameter.

      The risk score is very confusing as high numbers or % indicate less risk and lower (more negative numbers) indicate greater risk. I've had to reread the text multiple times to remind myself of this, so I anticipate the same will be true for other readers. Perhaps the authors can add a visual guide to their y-axis indicating more positive numbers are less risky choices.

      We acknowledge that this measure can be confusing – the calculation of this score is standard for the Iowa Gambling Task conducted in humans, on which the rGT is based, and was therefore adopted here. We’ve changed the name from “risk score” to “decision score”, along with including a visual guide to the y-axis in Figure 2, to address this point.

      Negative learning rate is confusing as it almost implies that the learning was a negative value, rather than being a learning rate for negative outcomes. Please revise in the figures and in the text.

      We have updated the text and figures where appropriate from “negative learning rate” to “punishment learning rate”. We have also changed the text from “positive learning rate” to “reward learning rate” to match this terminology.

      Reviewer 3 (Public review):

      There is a very problematic statistical stratagem that involves categorising individuals as either risky or optimal based on their choice probabilities. As a measurement or outcome, this is fine, as previously highlighted in the results, but this label is then used as a factor in different ANOVAs to analyse the very same choice probabilities, which then constitutes a circular argument (individuals categorised as risky because they make more risky choices, make more risky choices...).

      Risk status was included as a factor to test whether the effects of the cue paradigms differed between risky versus optimal rats (i.e., interaction effects), not as an independent predictor of choice preference. We focus on results showing a significant task x risk status interaction, and conducted follow-up analyses separately within each group, at which point risk status was no longer included as a factor. We do not interpret main effects or choice x status interactions, which would indeed be circular for the reason noted by the reviewer.

      A second experiment was done to study the effect of devaluation on risky choices in the different tasks. The results, which are not very clear to understand from Figure 3, would suggest that reward devaluation affects choices in tasks where the win-cue pairing is not present. The authors interpret this result by saying that pairing wins with cues makes the individuals insensitive to reward devaluation. Counter this, if an individual is prone to making risky choices in a given task, this points to an already distorted sense of value as the most rewarding strategy is to make optimal non-risky choices.

      We have included significance notations in Figure 3 and included further detail in the text to improve clarity of the findings for the devaluation test. The reviewer raises an interesting point that risk-preferring rats have a distorted sense of value, since they do not follow the optimal strategy. However, we believe that this is at least partially separable from insensitivity to devaluation, since risk-preferring rats trained on tasks that don’t feature win-paired cues still exhibit flexibility in choice. We have added the following point to the discussion to address this:

      “While risk-preferring rats exhibit some degree of distortion in reward valuation, as they do not follow the most rewarding strategy (i.e., selecting optimal options), we believe this to be at least partially separable from choice inflexibility, as risk-preferring rats on tasks that don’t feature win-paired cues remain sensitive to devaluation.”

      While the overall computational approach is excellent, I believe that the choice of computational models is poor. Loss trials come at a double cost, something the authors might want to elaborate more upon, firstly the lost opportunity of not having selected a winning option which is reflected in Q-learning by the fact that r=0, and secondly a waiting period which will affect the overall reward rate. The authors choose to combine these costs by attempting to convert the time penalty into "reward currency" using three different functions that make up the three different tested models. This is a bit of a wasted opportunity as the question when comparing models is not something like "are individuals in the paired win-cue tasks more sensitive to risk? or less sensitive to time? etc" but "what is the best way of converting time into Q-value currency to fit the data?" Instead, the authors could have contrasted other models that explicitly track time as a separate variable (see for example "Impulsivity and risk-seeking as Bayesian inference under dopaminergic control" (Mikhael & Gershman 2021)) or give actions an extra risk bonus (as in "Nicotinic receptors in the VTA promote uncertainty seeking" (Naude et al 2016)).

      We thank the reviewer for their thoughtful suggestions and agree that alternative modeling frameworks that explicitly track time or incorporate uncertainty bonuses would be highly informative for understanding the mechanisms underlying risky choice. However, the models employed here are drawn from previous work that required >100 rats per group for model development and validation. Due to the high degree of variability in decision making within the groups and the relatively small number of rats, this dataset is not well suited for substantial model innovation. Indeed, the most complex model from previous work had to be simplified to achieve model convergence. Testing models that greatly diverge from the previously validated RL models would make it difficult to determine whether poor model fit reflects a misspecified model or insufficient data.

      We’d also like to note that the driving question for this study is to investigate the impact of different cue variants on choice patterns – untangling the relationship between timing, uncertainty, and risky choice is an important and interesting question, but beyond the scope of the present work. 

      To address this limitation, we have expanded our justification of model choice in the results section to emphasize that we are applying previously developed models, with minor extensions:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions.”

      Another weakness of the computational section is the fact, that despite simulations having been made, figure 5 only shows the simulated risk scores and not the different choice probabilities which would be a much more interesting metric by which to judge model validity. 

      We have expanded Figure 5 to show the simulated choice of each option.

      In the last section, the authors ask whether the parameter estimates (obtained from optimisation on the early sessions) could be used to predict risk preference. While this is an interesting question to address, the authors give very little explanation as to how they establish any predictive relationship. A figure and more detailed explanation would have been warranted to support their claims.

      We have expanded this section to provide clearer detail on the methods used to conduct this analysis and added a figure. To address a point raised by another reviewer, the statistical approach has been revised to more closely align with that used in Langdon et al. (2019), and the results have been updated appropriately:

      “We next tested whether any of the subject-level parameter estimates in the nonlinear or scaled + offset model could reliably predict risk preference scores at the end of training. In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      Why were the simulated risk scores calculated for sessions 18-20 and not 35-39 as in the experimental data, and why were the models optimised only on the first sessions?

      These points were addressed in response to reviewer #1:

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      Concerning the figures, could you consider replacing or including with the bar plots, the full distribution of individual dots, or a violin plot, something to better capture the distribution of the data. This would be particularly beneficial for Figure 2B the risk score which, without a distribution suggests all individuals are optimal, something which in the text claim is not the case. 

      Individual data points have been added to the relevant figures.

      Is this not a case of compositional data where ANOVA is definitely not an appropriate method (compositional data consist in reporting proportions of different elements in a whole, eg this rock is 60% silicate, 20% man-made cement, etc.) because of violation of normality and mostly dependence between measurements (the sum must be 100% as in your case where knowing the proportions of P1, P2 and P3, I automatically deduce P4). I leave to you the care of finding a potential alternative. In any case, I also had difficulties understanding the varying degrees of freedom of the different reported F statistics which worry me that this has not been done properly.

      This is a fair criticism, as choice proportions across P1-P4 are not fully independent. While alternative approaches do exist, there is no widely adopted or straightforward method that has been validated for this task. Accordingly, ANOVA remains the standard analytical approach for this task, as it facilitates comparison with previous work and is readily understood by readers. As mentioned in the methods, an arcsine transformation was applied to the proportional data to mitigate issues associated with bounded measures (i.e., summing to 100%). We thank the reviewer for drawing our attention to the discrepancies in the degrees of freedom – these have now been corrected.

    1. eLife Assessment

      This study provides a useful analysis of the changes in chromatin organization and gene expression that occur during the differentiation of two cell types (anterior endoderm and prechordal plate) from a common progenitor in zebrafish, together with investigations into the molecular factors involved. Although the findings are consistent with previous work, the evidence presented appears to be incomplete and would benefit from more rigorous quantification of live imaging and Cre-Lox experiments, a stronger rationale and controls for experiments manipulating chromatin remodeling factors, and a strong justification for the explant model especially given differences between explant and whole embryo data. This work may be of interest to zebrafish developmental biologists investigating the mechanisms underlying specification.

    2. Reviewer #1 (Public review):

      Summary:

      During vertebrate gastrulation, mesendoderm cells are initially specified by morphogens (e.g. Nodal) and segregate into endoderm and mesoderm in part based on Nodal concentrations. Using zebrafish genetics, live imaging, and single-cell multi-omics, the manuscript by Cheng et al presents evidence to support a claim that anterior endoderm progenitors derive primarily from prechordal plate progenitors, with transcriptional regulators goosecoid (gsc) and ripply1 playing key roles in this cell fate determination. Such a finding would represent a significant advance in our understanding of how anterior endoderm is specified in vertebrate embryos.

      Strengths:

      Live imaging based tracking of PP and endo reporters (Fig 2) are well executed and convincing, though a larger number of individual cell tracks will be needed. In the first round of review, only a single cell track (n=1) was quantified. Now, more tracks have been collected but these data are still not clearly reported in a way that warrants their evaluation.

      Weaknesses:

      (1) While the authors have made an effort to include a gsc:CRE lineage tracing component to their study, the experimental data now presented (Figure S4E and reviewer figures) could be much stronger and more thorough. In the new panel, authors show a single microscopy image containing both red and green fluorescent cells. The green signal, which seems to mark the PP, is presumably derived from Tg(gsc:EGFP). The red mCherry signal is presumably derived from the combined effects of a Tg(gsc:CRE) and Tg(sox17-lox-STOP-lox-mCherry), i.e., labeling the progeny of gsc+ progenitors which expressed CRE and underwent recombination to create a productive endoderm-specific Tg(sox17:mCherry) reporter. The result appears to be promising and in line with the authors' predictions. However, this result should be strengthened by performing the experiment in stable transgenic lines (not just freshly injected F0 embryos) and should be properly quantified. The authors state in the legend that "the experiment was performed on at least 3 independent replicates", but offer no further detail, explanations, or quantifications. This issue is reminiscent of concerns from the previous round of review, where live tracking data derived from examining just a single (n=1) cell were presented. These standards might be adequate for generating preliminary insights, but fall far below what we would have previously expected from an Elife publication.

      (2) I found the authors' rebuttal to my concerns about URD-trajectory derived insights and gsc/sox17 expression timing confusing. The authors claim that they get different results regarding gsc expression prevalence in the hypothetical PP/endoderm progenitor cluster when comparing scRNAseq data from embryos vs explants. Then they seem to use this difference to justify the use of the explants over the embryos - presumably because the explants enriched for the behavior that they wanted to see? They conclude that "directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly accurate results." I strongly disagree with this. I would argue that the whole-embryo dataset is likely doing a better job of cleanly separating these trajectories from each other.

      (3) My concern about the use of n=1 cell for live tracking has been partially but not fully addressed. The authors should plot data point from each individual cell in the revised Figure 2D, instead of just saying "multiple cells" they should report the total number of cells that are actually included now (n=?), and should provide representative movies for a few additional examples.

      At present the authors' data, as presented, still only partially support their aims and conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      During vertebrate gastrulation, the mesoderm and endoderm arise from a common population of precursor cells and are specified by similar signaling events, raising questions as to how these two germ layers are distinguished. Here, Cheng and colleagues use zebrafish gastrulation as a model for mesoderm and endoderm segregation. By reanalyzing published single cell sequencing data, they identify a common progenitor population for anterior endoderm and the mesodermal prechordal plate (PP). They find that expression levels of PP genes gsc and ripply are among the earliest differences between these populations, and that their increased expression suppresses the expression of endoderm markers. Further analysis of chromatin accessibility and Ripply CUT-and-TAG is consistent with direct repression of endoderm by this PP marker. This study demonstrates roles for Gsc and Ripply in suppressing anterior endoderm fate, but this role for Gsc was already known and the effect of Ripply is limited to a small population of anterior endoderm.

      Strengths:

      Integrated single cell ATAC- and RNA-seq convincingly demonstrate changes in chromatin accessibility that may underlie segregation of mesoderm and endoderm lineages, including gsc and ripply. Identification of Ripply-occupied genomic regions augments this analysis. The genetic mutants for both genes provide strong evidence for their function anterior mesendoderm development, although these phenotypes are subtle.

      Weaknesses:

      The use of zebrafish embryonic explants for cell fate trajectory analysis (rather than intact embryos) is not justified. Much of the work is focused on the role of Nodal in the mesoderm/endoderm fate decision, but the results largely confirm previous studies and again provide few new insights. The authors similarly confirm previous findings that FGF signaling likely plays a larger role in this fate decision, but these results are largely overlooked by the authors.

    4. Reviewer #3 (Public review):

      Summary of work:

      Cheng, Liu, Dong, et al. demonstrate that anterior endoderm cells can arise from prechordal plate progenitors, which is suggested by pseudotime reanalysis of published scRNAseq data, pseudotime analysis of new scRNAseq data generated from Nodal-stimulated explants, live imaging from sox17:DsRed and gsc:eGFP transgenics, fluorescent in situ hybridization, and a Cre/Lox system. Early fate mapping studies already suggested that progenitors at the dorsal margin give rise to both of these cell types (Warga), and live imaging from the Heisenberg lab (Sako 2016, Barone 2017) convincingly showed this previously. However, the data presented for this point are very nice and further cement this result. Though better demonstrated by previous work (Alexander 1999, Gritsman 1999, Gritsman 2000, Sako 2016, Rogers 2017, others), the manuscript presents confirmatory data that high Nodal signaling is required for both cell types. The manuscript generates new single-cell RNAseq data from Nodal-stimulated explants with increased (lft1 KO) or decreased (ndr1 KD) Nodal signaling and multi-omic ATAC+scRNAseq data from wild-type 6 hpf embryos, which can be used as a resource, though few new conclusions are drawn from it in this manuscript. Lastly, the manuscript presents suggests that SWI/SNF remodelers and Ripply1 may be involved in the anterior endoderm - prechordal plate decision, but these data are less convincing. The SWI/SNF remodeler experiments are unconvincing because the demonstration that these factors are differentially expressed or active between the two cell types are weak. The Ripply1 gain-of function experiments are unconvincing because they are based on incredibly high overexpression of ripply1 (500 pg or 1000 pg) that generates a phenotype that is not in line with previously demonstrated overexpression studies (with phenotypes from 10-20x lower expression). Similarly, the cut-and-tag data seems low quality and is based on high overexpression, so may not support direct binding of ripply1 to these loci.

      During revision, the authors addressed some comments, including eliminating references to "lineage" when referring to pseudotime trajectories, eliminating conclusions drawn from locations of cells on UMAP plots, and reducing use of the term "cooperative" which may have been confusing in this context, as well as increasing the number of embryos analyzed for some experiments. The authors also point out that whole-embryo transcriptional trajectories typically do not associate endodermal cells with prechordal plate cells, despite classical evidence that they are related. This is most likely because endodermal cells arise from several different previous transcriptional states in different regions of the embryonic margin and are, as the authors point out, difficult to computationally sort into dorsal, lateral, and ventral populations. Thus, there is value in generating data to more specifically look at the relationship between dorsal mesodermal and endodermal populations. However, the decision to use an artificial Nodal-treated explant system, rather than isolating the relevant population from whole embryos (such as by dissection prior to dissociation) remains a weakness of the manuscript, since it is unclear whether endodermal specification has been altered in this system (there seem to be few endodermal cells produced and the system involves manipulating one of the signals under study in this work). Concerns about the rigor of experiments concerning ripply1 and SWI/SNF experiments remains. While the authors improved peak calling in their ripply1 cut-and-tag, it is still based on massive overexpression of ripply1 that may drive binding outside of its endogenous loci.

      In the end, this study provides some additional details in the cell fate decision between the prechordal plate and anterior endoderm and generates new data that may be useful for reanalysis by other experts in the field. However, this work does not make clear how Nodal signaling, FGF signaling, and elements of the gene regulatory network (including gsc, possibly ripply1, and other factors) interact to make the decision. I suggest that this manuscript is of interest to Nodal signaling or zebrafish germ layer patterning afficionados, but may not be of interest to a broad audience. While it provides new datasets and observations, it does not weave these into a convincing story that advances our understanding of the specification of these cell types.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides a useful analysis of the changes in chromatin organization and gene expression that occur during the differentiation of two cell types (anterior endoderm and prechordal plate) from a common progenitor in zebrafish. Although the findings are consistent with previous work, the evidence presented in the study appears to be incomplete and would benefit from more rigorous interpretation of single-cell data, more in-depth lineage tracing, overexpression experiments with physiological levels of Ripply, and a clearer justification for using an explant system. With these modifications, this paper will be of interest to zebrafish developmental biologists investigating mechanisms underlying differentiation.

      We sincerely thank the editor and the reviewers for their valuable time and efforts. Their insightful comments were greatly appreciated and have been largely addressed in the revised manuscript. We are confident that these revisions have enhanced the overall quality and clarity of our paper.

      Reviewer #1 (Public review):

      Summary:

      During vertebrate gastrulation, mesendoderm cells are initially specified by morphogens (e.g. Nodal) and segregate into endoderm and mesoderm in part based on Nodal concentrations. Using zebrafish genetics, live imaging, and single-cell multi-omics, the manuscript by Cheng et al presents evidence to support a claim that anterior endoderm progenitors derive primarily from prechordal plate progenitors, with transcriptional regulators goosecoid (Gsc) and ripply1 playing key roles in this cell fate determination. Such a finding would represent a significant advance in our understanding of how anterior endoderm is specified in vertebrate embryos.

      We would like to thank reviewer #1 for his/her comments and positive feedbacks about our manuscript.

      Strengths:

      Live imaging-based tracking of PP and endo reporters (Figure 2) is well executed and convincing, though a larger number of individual cell tracks will be needed. Currently, only a single cell track (n=1) is provided.

      We thank the reviewer for the positive comments and the valuable suggestion. As the reviewer suggested, we re-performed live imaging analyses on the embryos of Tg(gsc:EGFP;sox17:DsRed). We tracked dozens of cells during their transformation from gsc-positive to sox17-positive. Furthermore, we performed quantification of the RFP/GFP signal intensity ratio in these cells over the course of development (Please see the revised Figure 2D and MovieS4).

      Weaknesses:

      (1) The central claim of the paper - that the anterior endoderm progenitors arise directly from prechordal plate progenitors - is not adequately supported by the evidence presented. This is a claim about cell lineage, which the authors are attempting to support with data from single-cell profiling and genetic manipulations in embryos and explants. The construction of gene expression (pseudo-time) trajectories, while a modern and powerful approach for hypothesis generation, should not be used as a substitute for bona fide lineage tracing methods. If the authors' central hypothesis is correct, a CRE-based lineage tracing experiment (e.g. driving CRE using a PP marker such as Gsc) should be able to label PP progenitor cells that ultimately contribute to anterior endoderm-derived tissues. Such an experiment would also allow the authors to quantify the relative contribution of PP (vs non-PP) cells to the anterior endoderm, which is not possible to estimate from the indirect data currently provided. Note: while the present version of the manuscript does describe a sox17:CRE lineage tracing experiment, this actually goes in the opposite direction that would be informative (sox:17:CRE-marked descendants will be a mixture of PP-derived and non-PP derived cells, and the Gsc-based reporter does not allow for long-term tracking the fates of these cells).

      We sincerely thank the reviewer for the professional comments and the constructive suggestions. As the reviewer indicated, utilizing the single-cell transcriptomic trajectory analyses on zebrafish embryos and Nodal-injected explants system, along with the live imaging analyses on Tg(gsc:EGFP;sox17:DsRed) embryos, we revealed that anterior endoderm progenitors arise from prechordal plate progenitors. To further verify this observation, we conducted two sets of lineage-tracing assays. Initial evidence came from the results of co-injecting sox17:Cre and gsc:loxp-STOP-loxp-mcherry plasmids. We observed RFP-positive cells at 8 hpf, demonstrating the presence of cells that had expressed both genes. To explicitly follow the proposed lineage, we then implemented a reciprocal strategy, as suggested by the reviewer, that constructed and co-injected sox17:loxp-STOP-loxp-mcherry and gsc:Cre plasmids. The appearance of RFP-positive cells in the anterior dorsal region at 8 hpf provides direct evidence for a transition from gsc-positive to sox17-positive identity. These results are now included in the revised manuscript (Please see Author response image 1 and Figure S4E). However, in accordance with the reviewer's caution, we acknowledge that this does not prove this is the sole origin of anterior endoderm. Consequently, we have revised the text to clarify that our findings demonstrate that anterior endoderm can be specified from prechordal plate progenitors, without claiming that it is the only source.

      Author response image 1.

      Characterization of anterior endoderm lineage by Cre-Lox recombination system.

      (2) The authors' descriptions of gene expression patterns in the single-cell trajectory analyses do not always match the data. For example, it is stated that goosecoid expression marks progenitor cells that exist prior to a PP vs endo fate bifurcation (e.g. lines 124-130). Yet, in Figure 1C it appears that in fact goosecoid expression largely does not precede (but actually follows) the split and is predominantly expressed in cells that have already been specified into the PP branch. Likewise, most of the cells in the endo branch (or prior) appear to never express Gsc. While these trends do indeed appear to be more muddled in the explant data (Figure 1H), it still seems quite far-fetched to claim that Gsc expression is a hallmark of endoderm-PP progenitors.

      We thank the reviewer for pointing out this issue. Our initial analysis proposed that the precursors of the prechordal plate (PP) and anterior endoderm (endo) more closely resemble a PP cell fate, as their progenitor populations highly express PP marker genes, such as gsc. The gsc gene is widely recognized as a PP marker[1]. The reviewer pointed out that in our analysis, these precursor cells do not initially exhibit high gsc expression; rather, gsc expression gradually increases as PP fate is specified.

      The reason for this observation is as follows: First, for the in vivo data, we used the URD algorithm to trace back all possible progenitor cells for both the PP and anterior endo trajectory. As mentioned in the manuscript, the PP and anterior endo are relatively distant in the trajectory tree of the zebrafish embryonic data. Consequently, this approach likely included other, confounding progenitor cells that do not express gsc (like ventral epiblast, Author response image 2). However, we further investigated the expression of gsc and sox17 along these two trajectories. The conclusion remains that gsc expression is indeed higher than sox17 in the progenitor cells common to both trajectories (Author response image 2). Combined with the live imaging analysis presented in this study, which shows that gsc expression increases progressively in the PP, this supports the notion that the progenitor cells for both PP and anterior endoderm initially bias towards a PP cell fate.

      On the other hand, in our previously published work using the Nodal-injected explant system, which specifically induces anterior endo and PP, the cellular trajectory analysis also revealed that the specifications of PP and anterior endo follow very similar paths. Therefore, we proceeded to analyze the Nodal explant data. Similarly, when using URD to trace the differentiation trajectories of PP and anterior endo cells, a small number of other progenitor cells were also captured. This explains why a minority of cells do not express gsc—these are likely ventral epiblast cells (Author response image 2). However, based on the Nodal explant data, gsc is specifically highly expressed in the progenitor cells of the PP and anterior endo. Its expression remains high in the PP trajectory but gradually decreases in the endoderm trajectory (Figure 1H).

      Author response image 2.

      (A) The expression of ventral epiblast markers in PP and anterior Endo URD trajectory. (B) The expression of gsc, sox32 and sox17 in the progenitors of PP and anterior endo in embryos and Nodal explants.

      (3) The study seems to refer to "endoderm" and "anterior endoderm" somewhat interchangeably, and this is potentially problematic. Most single-cell-based analyses appearing in the study rely on global endoderm markers (sox17, sox32) which are expressed in endodermal precursors along the entire ventrolateral margin. Some of these cells are adjacent to the prechordal plate on the dorsal side of the gastrula, but many (most in fact) are quite some distance away. The microscopy-based evidence presented in Figure 2 and elsewhere, however, focuses on a small number of sox17-expressing cells that are directly adjacent to, or intermingled with, the prechordal plate. It, therefore, seems problematic for the authors to generalize potential overlaps with the PP lineage to the entire endoderm, which includes cells in ventral locations. It would be helpful if the authors could search for additional markers that might stratify and/or mark the anterior endoderm and perform their trajectory analysis specifically on these cells.

      We thank the reviewer for these comments and suggestions. We fully agree with the reviewer's point that the expression of sox32 and sox17 cannot be used to distinguish dorsal endoderm from ventral-lateral endoderm cells. However, during the gastrulation stage, all endodermal cells express sox32 and sox17, and there are currently no specific marker genes available to distinguish between them.

      After gastrulation ends, the dorsal endoderm (i.e., the anterior endoderm) begins to express pharyngeal endoderm marker genes, such as pax1b. Therefore, in the analysis of embryonic data in vivo, when studying the segregation of the anterior endoderm and PP trajectory, we specifically used the pharyngeal endoderm as the subject to trace its developmental trajectory.

      In the case of Nodal explants, Nodal specifically induces the fate of the dorsal mesendoderm, which includes both the PP and pharyngeal endoderm (anterior endoderm). Precisely for this reason, we consider the Nodal explant system as a highly suitable model for investigating the mechanisms underlying the cell fate separation between anterior endoderm and PP. Thus, in the Nodal explant data, we included all endodermal cells for downstream analysis.

      To avoid any potential confusion for readers, we have revised the term "endoderm" in the manuscript to "anterior endoderm" as suggested by the reviewer.

      (4) It is not clear that the use of the nodal explant system is allowing for rigorous assessment of endoderm specification. Why are the numbers of endoderm cells so vanishingly few in the nodal explant experiments (Figure 1H, 3H), especially when compared to the embryo itself (e.g. Figures 1C-D)? It seems difficult to perform a rigorous analysis of endoderm specification using this particular model which seems inherently more biased towards PP vs. endoderm than the embryo itself. Why not simply perform nodal pathway manipulations in embryos?

      We sincerely thank the reviewer for raising this important question. In our study of the fate separation between the PP and anterior endoderm, we initially analyzed zebrafish embryonic data. However, when reconstructing the transcriptional lineage tree using URD, we observed that these two cell trajectories were positioned relatively far apart on the tree. Yet, existing studies have shown that the anterior endoderm and PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells[2-4], and they share transcriptional similarities[5]. Therefore, as the reviewer pointed out, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral epiblast cells (Author response image 2). For this reason, we concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly accurate results.

      In contrast, our group’s previous work, published in Cell Reports, demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endoderm, PP, and notochord[5]. Thus, we considered the Nodal explant system to be a highly suitable model for investigating the mechanism of fate separation between PP and anterior endoderm. Ultimately, by analyzing both in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the reviewer’s concern about the relatively low number of endodermal cells in the Nodal explant system, we speculate that this is because the explants predominantly induce anterior endoderm. Since endodermal cells constitute only a small proportion of cells during gastrulation, and anterior endoderm represents an even smaller subset, the absolute number is naturally limited. Nevertheless, the anterior endodermal cells captured in our Nodal explants were sufficient to support our analysis of the fate separation mechanism between anterior endoderm and PP. Finally, to further strengthen the findings from scRNA-seq analyses, we subsequently performed live imaging validation experiments using both zebrafish embryos and the explant system.

      (5) The authors should not claim that proximity in UMAP space is an indication of transcriptional similarity (lines 207-208), especially for well-separated clusters. This is a serious misrepresentation of the proper usage of the UMAP algorithm. The authors make a similar claim later on (lines 272-274).

      We would like to extend our gratitude to the reviewer for their insightful comments. We have revised the descriptions regarding UMAP throughout the manuscript as suggested (Please see the main text in revised manuscript).

      Reviewer # 1 (Recommendations For The Authors):

      - Pseudotime trajectories constructed from single-cell snapshots are not true "lineage" measurements. Authors should refrain from referring to such data as lineage data (e.g. lines 99, 100, 103, 109, 112, 127, etc). Such models should be referred to as "trajectories", "hypothetical lineages", or something else.

      We are grateful to the reviewer for this comment. Following their recommendation, we have revised the terminology from "transcriptional lineage tree" to "trajectory" across the entire manuscript (Please see main text in revised manuscript).

      - The live imaging data presented in Figure 2 (and supplemental figures) are compelling and do seem to show that some cells can switch between PP and endo states. However, the number of cells reported is still too low to be able to ascertain whether or not this is just a rare/edge-case phenomenon. Tracks for just a single cell are reported in Figure 2C-D. This is insufficient. Tracks for many more cells should be collected and reported alongside this current sole (n=1) example. The choice of time window for these live imaging experiments should also be better explained. These live imaging experiments are being performed at or after 6hpf, but authors claim in the text that "... the segregation between PP and Endo has already occurred by 6hpf." (lines 126-127). Why not perform these live imaging experiments earlier, when the initial fate decision between PP and endo is supposedly occurring?

      We sincerely appreciate the reviewer’s insightful questions and constructive feedback. In response, we have made several important revisions. First, the reviewer noted that our original manuscript tracked only a single cell and suggested increasing the number of tracked cells. Following this recommendation, we repeated the live-imaging experiments and expanded the number of tracked endodermal cells (Please see the revised Movie S4 and Figure 2D). The experimental conditions were kept identical to the previous setup, and these cells consistently exhibited a gradual transition from a gsc+ fate to a sox17+ endodermal fate. In addition, the reviewer recommended performing live imaging at an earlier time point (Movie S5). Accordingly, we conducted additional experiments initiating live imaging at around 5.7 hours and observed the onset of a sox17 expression in gsc+ cells at approximately 6 hpf, which is consistent with our single-cell transcriptomic analysis.

      - The sections devoted to lengthy descriptions of GO terms (lines 131-146, 239-254) and receptor-ligand predictions (lines 170-185) are largely speculative. Consider streamlining.

      Thanks for the reviewer's comment. We have streamlined the content related to the GO analysis as suggested (Please see Lines 128-132, 157-167, 221-225).

      - The use of a "Nodal Activity Score" (lines 212-226) is clever but might actually be less informative than showing contributions from individual nodal target genes. The combining of counts data from 29 predicted nodal targets means that the contribution (or lack of contribution) from each gene becomes masked. The authors should include supplementary dot plots that break down the score across all 29 genes, allowing the reader to assess overall contributions and/or sub-clusters of gene co-expression patterns, if present.

      Thank you very much for the reviewer's positive feedback on our use of the "Nodal Activity Score" and the valuable suggestions provided. Following the recommendation, we analyzed the expression of the 29 Nodal direct targets used in our study across the WT, ndr1 knockdown (kd), and lft1 knockout (ko) groups. We found that the known axial mesoderm genes, such as chrd, tbxta, noto, and gsc, contributed significantly to the Nodal score. The newly conducted analysis has been included in the Supplementary Information (Please see Figure S7L).

      - The differential expression trends being reported for srcap (line 251) do not appear to be significant. Are details and P-values for these DEG tests reported somewhere in the manuscript?

      We thank the reviewer for raising this question. Based on the reviewer's comment, we performed statistical tests (Wilcoxon test) to compare the expression of srcap in PP and Endo. Our analysis revealed that while srcap expression is slightly higher in PP than in Endo, this difference is not statistically significant. The specific p-value and fold change have been indicated in the revised figure (Please see Figure 4J and S7H). Based on this analysis, we revised our description to state that srcap expression is slightly higher in the PP compared to in the anterior endoderm.

      - Following the drug experiments with the drug AU15330 (lines 254-263), authors have only reported #s of endodermal cells, which seem to have increased, which the authors suggest indicates a fate switch from PP to endo. However, the authors have not reported whether the numbers of PP cells decreased or stayed the same in these embryos. This would be helpful information to include, as it is very difficult to discern quantitative trends from the images presented in Fig 4H and 4L.

      Thank the reviewer for his/her comments and suggestions. Following the reviewer's suggestions, we performed Imaris analysis on the HCR staining results from the DMSO (control), 1μM AU15330-treated, and 5μM AU15330-treated groups. Our analysis focused on the number of frzb-positive cells (PP), and the comparison revealed that treatment with AU15330 significantly reduces the PP cell number. These findings have been incorporated into the revised manuscript and supplementary information (Please see Figures S7J and S7K).

      Reviewer #2 (Public review):

      Summary:

      During vertebrate gastrulation, the mesoderm and endoderm arise from a common population of precursor cells and are specified by similar signaling events, raising questions as to how these two germ layers are distinguished. Here, Cheng and colleagues use zebrafish gastrulation as a model for mesoderm and endoderm segregation. By reanalyzing published single-cell sequencing data, they identify a common progenitor population for the anterior endoderm and the mesodermal prechordal plate (PP). They find that expression levels of PP genes Gsc and ripply are among the earliest differences between these populations and that their increased expression suppresses the expression of endoderm markers. Further analysis of chromatin accessibility and Ripply cut-and-tag is consistent with direct repression of endoderm by this PP marker. This study demonstrates the roles of Gsc and Ripply in suppressing anterior endoderm fate, but this role for Gsc was already known and the effect of Ripply is limited to a small population of anterior endoderm. The manuscript also focuses extensively on the function of Nodal in specifying and patterning the mesoderm and endoderm, a role that is already well known and to which the current analysis adds little new insight.

      We would like to thank the reviewer #2 for the constructive comments and positive feedback regarding our manuscript.

      Strengths:

      Integrated single-cell ATAC- and RNA-seq convincingly demonstrate changes in chromatin accessibility that may underlie the segregation of mesoderm and endoderm lineages, including Gsc and ripply. Identification of Ripply-occupied genomic regions augments this analysis. The genetic mutants for both genes provide strong evidence for their function in anterior mesendoderm development, although these phenotypes are subtle.

      We thank the reviewer for recognizing our work, and we greatly appreciate the constructive suggestions from the reviewer.

      Weaknesses:

      The use of zebrafish embryonic explants for cell fate trajectory analysis (rather than intact embryos) is not justified. In both transcriptomic comparisons between the two fate trajectories of interest and Ripply cut-and-tag analysis, the authors rely too heavily on gene ontology which adds little to our functional understanding. Much of the work is focused on the role of Nodal in the mesoderm/endoderm fate decision, but the results largely confirm previous studies and again provide few new insights. Some experiments were designed to test the relationship between the mesoderm and endoderm lineages and the role of epigenetic regulators therein, but these experiments were not properly controlled and therefore difficult to interpret.

      We sincerely thank the reviewer for the comments. As we previously answered, in our study of the fate differentiation between the PP and the anterior endoderm, we initially analyzed zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we found that these two cell trajectories were distantly located on the tree. Existing studies have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2-4]. Therefore, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral mesendodermal cells (Please see Author response image 2A). Based on this, we believe that directly using embryonic data to decipher the mechanism of fate differentiation between the PP and the anterior endoderm may not yield sufficiently precise results. In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system can specifically enrich dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Thus, we consider the Nodal explant system as an ideal model for studying the fate differentiation mechanism between the PP and the anterior endoderm. Ultimately, through comprehensive analysis of in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the GO analysis, we have streamlined it as suggested by the reviewers. In the revised manuscript, we analyzed the expression of specific genes contributing to key GO functions. Additionally, in the revised version, we conducted more live imaging experiments and quantitative cell assays. We designed gRNA for srcap using the CRISPR CAS13 system to knock down srcap, which further corroborated the morpholino knockdown results, showing consistency with the morpholino data. We also performed Western blot validation of the SWI/SNF complex's response to the drug AU15330, confirming the drug's effectiveness. We hope these additional experiments adequately address the reviewers' concerns.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, the authors state that mesendoderm segregates into mesoderm and endoderm in a Nodal-concentration dependent manner. While it is true that higher Nodal signaling levels are required for endoderm specification, A) this is also true for some mesoderm populations, and B) Work from Caroline Hill's lab has shown that Nodal activity alone is not determinative of endoderm fate. Although the authors cite this work, it is conclusions are not reflected in this over-simplified explanation of mesendoderm development. The authors also state that it is not clear when PP and endoderm can be distinguished transcriptionally, but this was also addressed in Economou et al, 2022, which found that they can be distinguished at 60% epiboly but not 50% epiboly.

      We sincerely thank the reviewer for raising this question and reminding us of the conclusions drawn from that excellent study. As the reviewer pointed out, Economou et al. demonstrated that Nodal signaling alone is insufficient to determine the cell fate segregation of mesendoderm[6]. However, their study primarily focused on the fate segregation of the ventral-lateral mesendoderm lineage. In contrast, we believe that the mechanisms underlying dorsal mesendoderm specification may differ.

      First, it is well-studied that in zebrafish embryos, the most dorsal mesendoderm is initially specified by the activity of the dorsal organizer. Notably, the Nodal signaling ligands ndr1 and ndr2 begin to be expressed in the dorsal organizer as early as the sphere stage[7]. In our study, through single-cell transcriptomic trajectory analysis and live imaging analysis, we observed that the cell fate segregation of the dorsal mesendoderm can be traced back to the shield stage.

      Second, the regulatory mechanisms governing dorsal mesendoderm fate differentiation may differ from those of the ventral-lateral mesendoderm. For instance, the gsc gene is exclusively expressed in the dorsal mesendoderm and is absent in the ventral-lateral mesendoderm. Given that gsc is a critical master gene, its overexpression in the ventral side can induce a complete secondary body axis. Similarly, ripply1, identified in our study, is also expressed early and specifically in the dorsal mesendoderm. Overexpression of ripply1 in the ventral side similarly induces a secondary body axis, albeit with the absence of the forebrain[5]. In this study, we found that gsc and ripply1 as the repressor, collectively inhibited dorsal (anterior) endoderm specified from PP progenitors.

      In summary, our study focuses on the regulatory mechanisms of fate segregation in the dorsal (anterior) mesendoderm, which differs from the mechanisms of ventral-lateral mesendoderm lineage segregation reported by Economou et al. We believe that this distinction represents a key novelty of our work.

      (2) As noted in the manuscript, Warga and Nusslein-Volhard determined long ago that PP and anterior endoderm share a common precursor. It is surprising that this close relationship is not apparent from the lineage trees in whole embryos but is apparent in lineage trees from explants. The authors speculate that the resolution of the whole embryo dataset is insufficient to detect this branch point and propose explants as the solution, but it is not clear why the explant dataset is higher resolution and/or more appropriate to address this question.

      We sincerely thank the reviewer for their thoughtful comments. As we mentioned previously, our investigation of fate differentiation between the PP and the anterior endoderm initially involved the analysis of zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we observed that these two cell trajectories were located far apart. Previous elegant studies, as the reviewer mentioned, have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2,3,8]. Consequently, when tracing all progenitor cells of these two trajectories using the URD algorithm, other cell types—such as ventral mesendodermal cells—are easily included. Based on this, we believe that directly using embryonic data to elucidate the mechanism of fate differentiation between the PP and the anterior endoderm may lack sufficient precision.

      In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Therefore, we consider the Nodal explant system as an ideal model for studying the mechanism underlying fate differentiation between the PP and the anterior endoderm. Through comprehensive analyses of both in vivo embryonic and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further supported by live imaging experiments.

      (3) Much of the analysis of DEGs between the lineages of interest is focused on GO term enrichment. But this logic is circular. The endoderm lineage is defined as such because it expresses endoderm-enriched genes, therefore the finding that the endoderm lineage is enriched for endoderm-related GO terms adds no new insights.

      We thank the reviewer for these comments. As the reviewers suggested, in the revised manuscript, we indicated specific genes associated with key GO terms (Please see Figure 4B). Additionally, we have streamlined the content related to the GO analysis as suggested.

      (4) The authors describe the experiment in Figure S4 as key evidence that Gsc+ cells can give rise to endoderm, but no controls are presented. Only a few cells are shown that express mCherry upon injection of sox17:cre constructs. Is mCherry also expressed in the occasional cell injected with Gsc:lox-stop-lox-mCherry in the absence of cre? Although they report 3 independent replicates, it appears that only 2 individual embryos express mCherry. This very small number is not convincing, especially in the absence of appropriate controls.

      We thank the reviewer for raising this question. Following the reviewer's suggestion, we injected gsc:loxp-stop-loxp-mCherry into zebrafish embryos at the 1-cell stage as a control. After performing at least three independent replicates and analyzing no fewer than 100 embryos, we did not observe any mCherry-positive cells. Additionally, we co-injected gsc:loxp-stop-loxp-mCherry with sox17:cre and increased the sample size. Furthermore, we constructed plasmids of sox17:loxp-stop-loxp-mCherry and gsc:cre, and upon injection at the 1-cell stage, we observed RFP-positive cells at 8 hpf (Please see Author response image 1 and Figure S4E). Together with our live imaging data, these experiments collectively demonstrate that anterior endodermal cells can originate from PP progenitors.

      (5) The authors spend a lot of effort demonstrating that PP and anterior endoderm are Nodal dependent. First, these data (especially Figures 3E and 3I) are not very convincing, as the differences shown are very small or not apparent. Second, this is already well-known and adds nothing to our understanding of mesoderm-endoderm segregation.

      We sincerely thank the reviewer for their insightful questions. First, the reviewer mentioned that in the initial version of our manuscript, the effects of ndr1 knockdown and lefty1 knockout on Nodal signaling and cell fate—particularly prechordal plate (PP) and anterior endoderm (endo)—in Nodal-induced explants were not very pronounced. We recognize that the negative feedback mechanism between Nodal and Lefty signaling may explain why Nodal acts as a morphogen, regulating pattern formation through a Turing-like model[9]. Therefore, knocking down a Nodal ligand gene, such as ndr1 in this study, or knocking out a Nodal inhibitor, such as lft1, may only have a subtle impact on Nodal signaling[10].

      Accordingly, in this study, we performed extensive pSmad2 immunofluorescence analysis and observed that although the overall intensity of Nodal activity did not change dramatically, there was a statistically significant difference. Importantly, this subtle variation in Nodal signaling strength is precisely what we intended to capture, since PP and anterior endoderm are highly sensitive to Nodal signaling[11], and even minor differences may bias their fate segregation.

      This leads directly to the reviewer’s second concern. While numerous studies suggest that the strength of Nodal signaling influences mesendodermal fate—with high Nodal promoting endoderm and lower concentrations inducing mesoderm—most of these studies focus on ventral-lateral mesendoderm development[4,6,10]. In contrast, the mechanisms underlying dorsal mesendoderm fate specification differ, which is a key innovation of our study.

      Previous work by Bernard Thisse and colleagues demonstrated that even a slight reduction in Nodal signaling, achieved by overexpressing a Nodal inhibitor, is sufficient to cause defects in the specification of PP and endoderm[11]. This indicates that PP and endoderm require the highest levels of Nodal signaling for proper specification. Moreover, the most dorsal mesendoderm, PP and anterior endoderm are not only spatially adjacent but also share similar transcriptional states, making the regulation of their fate separation particularly challenging to study.

      The Dr. C.P. lab made important contributions to this issue, showing that the duration of Nodal exposure is critical for segregating PP and anterior endoderm fates: prolonged Nodal signaling promotes expression of the transcriptional repressor Gsc, which directly suppresses the key endodermal transcription factor Sox17, thereby inhibiting anterior endoderm specification[3]. They also found that tight junctions among PP cells facilitate Nodal signal propagation[8]. However, their studies revealed that Gsc mutants do not exhibit endodermal phenotypes, suggesting that additional factors or mechanisms regulate PP versus anterior endoderm fate separation[3].

      In our study, we first observed that subtle differences in Nodal concentration may bias the fate choice between PP and anterior endoderm. Given that ndr1 knockdown and lft1 knockout mildly reduce or enhance Nodal signaling, respectively, we reasoned that using these two perturbations in a Nodal-induced explant system combined with single-cell RNA sequencing could generate transcriptomic profiles under slightly reduced and enhanced Nodal signaling. This approach may help identify key decision points and transcriptional differences during PP and anterior endoderm segregation, ultimately uncovering the molecular mechanisms downstream of Nodal that govern their fate separation.

      (6) The authors claim that scrap expression differs between the 2 lineages of interest, but this is not apparent from Figure 4J-K. Experiments testing the role of SWI/SNF and scrap also require additional controls. Can scrap MO phenotypes be rescued by scrap RNA? Is there validation that SWI/SNF components are degraded upon treatment with AU15330?

      We are very grateful for the reviewers' questions. Using single-cell data from zebrafish embryos and Nodal explants, we compared the expression of srcap in the PP and anterior Endo cell populations. We found that srcap expression showed a slight increase in PP compared to anterior Endo, but the difference was not statistically significant (Please see Figure 4J and S7H). Therefore, we modified our description in the revised manuscript. However, we speculate that this slight difference might influence the distinct cell fate specification between PP and anterior endo. In the original version of the manuscript, we reported that either treatment with AU15330, an inhibitor of the SWI/SNF complex, or injection of morpholino targeting srcap—a key component of the SWI/SNF complex—enhanced anterior endo fate while reducing PP cell specification. During this round of revision, we initially attempted to follow the reviewer’s suggestion to co-inject srcap mRNA along with srcap morpholino to rescue the phenotype. However, we found that the length of srcap mRNA exceeds 10,000 bp, and despite multiple attempts, we were unable to successfully obtain the srcap mRNA. Therefore, we were unable to perform the rescue experiment and instead adopted an alternative approach to validate the function of srcap. We aimed to use anthor knockdown approach (CRISPR/Cas system) to determine whether a phenotype similar to that observed with morpholino knockdown could be achieved. Using the CRISPR/Cas13 system, we designed gRNA targeting srcap, knocked down srcap, and examined the cell specification of PP and anterior endo. We found that, consistent with our previous results, knocking down srcap obviously reduced PP cell fate while increasing anterior endo cell fate (Author response image 3). Additionally, the reviewer raised the question of whether the SWI/SNF complex is degraded after AU15330 treatment. Following the reviewer’s suggestion, we attempted to perform Western blot analysis on BRG1, one of the components of the SWI/SNF complex. However, despite multiple attempts, we were unable to achieve successful detection of the BRG1 protein by the antibody in zebrafish. Several studies have reported that knockdown or knockout of brg1 leads to defects in neural crest cell specification in zebrafish[12,13]. Therefore, alternatively, we treated zebrafish embryos at the one-cell stage with 0 μM (DMSO), 1 μM, and 5 μM AU15330, and examined the expression of sox10 and pigment development around 48 h. We found that treatment with 1 μM AU15330 reduced sox10 expression and pigment production, though not significantly, whereas treatment with 5 μM AU15330 significantly disrupted neural crest cell development. Thus, this experiment demonstrates that AU15330 is functional in zebrafish. (Author response image 3).

      Author response image 3.

      (A) Characterization of anterior endoderm and PP cells following CRISPR-Cas13d-mediated srcap knockdown. (B) Validation of srcap mRNA expression by RT‑qPCR following CRISPR‑Cas13d knockdown. (C) RT‑qPCR shows the expression of sox10 after treatment with increasing concentrations of AU15300. (D) Morphology of zebrafish embryos at 48 hpf after treatment with increasing concentrations of AU15300.

      (7) The authors conclude from their chromatin accessibility analysis that variations in Nodal signaling are responsible for expression levels of PP and endoderm genes, but they do not consider the alternative explanation that FGF signaling is playing this role. Such a function for FGF was established by Caroline Hill's lab, and the authors also show in Figure S5G that FGF signaling in enriched between these cell populations.

      Thank you very much for raising this issue. As the reviewer pointed out, Caroline Hill's lab has conducted elegant work demonstrating that FGF signaling plays a crucial role in the separation of ventral-lateral mesendoderm cell fates[4,6]. In contrast, our study primarily focuses on studying the mechanisms underlying the separation of dorsal mesendoderm cell fates. However, our research also reveals that FGF signaling significantly regulates the fate separation of the dorsal mesendoderm, as inhibiting FGF signaling suppresses PP cell specification while promoting anterior Endo fate. In our previously published work, we found that Nodal signaling can directly activate the expression of FGF ligand genes[5]. Therefore, we hypothesize that Nodal signaling, acting as a master regulator, activates various downstream target genes—including FGF—and how FGF signaling regulates the cell fate separation of the dorsal mesendoderm warrants further investigation in our further studies.

      (8) When interpreting the results of their Ripply cut-and-run experiment, the authors again rely heavily on GO term analysis and claim that this supports a role for Ripply as a transcriptional repressor. GO term enrichment does not equal functional analysis. It would be more convincing to intersect DEGs between WT and ripply-/- embryos with Ripply-enriched loci.

      Thanks for raising this important issue and the constructive suggestion. In response to the reviewer's valid concern regarding the GO term analyses from our CUT&Tag data, we implemented a more stringent filtering strategy. We identified peaks enriched in the treatment group and applied differential analysis, selecting genes with a log<sub>2</sub>FoldChange > 3, padj < 0.05, and baseMean > 30 as high-confidence Ripply1 binding targets. A GO enrichment analysis of these genes revealed significant terms related to muscle development, consistent with Ripply1's established role in somite development, thereby validating our approach. We supplemented the related gene list in the revised manuscript. Moreover, within this refined analysis, we found that sox32 met our binding threshold, while sox17 did not. Furthermore, as suggested, we examined mespbb—a known Ripply1-repressed gene—which was present, and gsc, a Nodal target used as a negative control, which was absent. This confirms the specificity of our analysis (Figure 6 and Figure S11). Consequently, our revised analyses support a model in which Ripply1 directly binds the sox32 promoter. Given that Sox32 is a known upstream regulator of sox17, this binding provides a plausible direct mechanism for the observed regulation of sox17 expression. We have updated the figures and text accordingly. We attempted to generate ripply1<sup>-/-</sup> mutants but found that homozygous loss results in embryonic lethality.

      (9) The way N's are reported is unconventional. N= number of embryos used in the experiment, n= number of embryos imaged. If an embryo was not imaged or analyzed in any way, it cannot be considered among the embryos in an experiment. If only 4 embryos were imaged, the N for that experiment is 4 regardless of how many embryos were stained. Authors should also report not only the number of embryos examined but also the number of independent trials performed for all experiments.

      Thank you very much for the reviewer's suggestion. As suggested, we have revised the description regarding the number of embryos and experimental replicates in the figure legends.

      (10) The authors should avoid the use of red-green color schemes in figures to ensure accessibility for color-blind readers.

      Thanks for the suggestions. We have updated the figures in our revised manuscript and adjusted the color schemes to avoid red-green combinations.

      Reviewer #3 (Public Review):

      Summary:

      Cheng, Liu, Dong, et al. demonstrate that anterior endoderm cells can arise from prechordal plate progenitors, which is suggested by pseudo time reanalysis of published scRNAseq data, pseudo time analysis of new scRNAseq data generated from Nodal-stimulated explants, live imaging from sox17:DsRed and Gsc:eGFP transgenics, fluorescent in situ hybridization, and a Cre/Lox system. Early fate mapping studies already suggested that progenitors at the dorsal margin give rise to both of these cell types (Warga) and live imaging from the Heisenberg lab (Sako 2016, Barone 2017) also pretty convincingly showed this. However, the data presented for this point are very nice, and the additional experiments in this manuscript, however, further cement this result. Though better demonstrated by previous work (Alexander 1999, Gritsman 1999, Gritsman 2000, Sako 2016, Rogers 2017, others), the manuscript suggests that high Nodal signaling is required for both cell types, and shows preliminary data that suggests that FGF signaling may also be important in their segregation. The manuscript also presents new single-cell RNAseq data from Nodal-stimulated explants with increased (lft1 KO) or decreased (ndr1 KD) Nodal signaling and multi-omic ATAC+scRNAseq data from wild-type 6 hpf embryos but draws relatively few conclusions from these data. Lastly, the manuscript presents data that SWI/SNF remodelers and Ripply1 may be involved in the anterior endoderm - prechordal plate decision, but these data are less convincing. The SWI/SNF remodeler experiments are unconvincing because the demonstration that these factors are differentially expressed or active between the two cell types is weak. The Ripply1 gain-of-function experiments are unconvincing because they are based on incredibly high overexpression of ripply1 (500 pg or 1000 pg) that generates a phenotype that is not in line with previously demonstrated overexpression studies (with phenotypes from 10-20x lower expression). Similarly, the cut-and-tag data seems low quality and like it doesn't support direct binding of ripply1 to these loci.

      In the end, this study provides new details that are likely important in the cell fate decision between the prechordal plate and anterior endoderm; however, it is unclear how Nodal signaling, FGF signaling, and elements of the gene regulatory network (including Gsc, possibly ripply1, and other factors) interact to make the decision. I suggest that this manuscript is of most interest to Nodal signaling or zebrafish germ layer patterning afficionados. While it provides new datasets and observations, it does not weave these into a convincing story to provide a major advance in our understanding of the specification of these cell types.

      We sincerely thank the reviewer for their thorough and thoughtful assessment of our work. The reviewer acknowledged several strengths of our study, such as the use of multiple technical approaches to demonstrate that anterior endoderm differentiates from PP progenitor cells, and recognized the value of the newly added single-cell omics data. The reviewer also raised some concerns regarding the initial version of our work, including the SWI/SNF remodeler experiments and the Ripply1 gain-of-function experiment. In the revised manuscript, we have supplemented these parts with additional control experiments to better support our conclusions. We hope that our updated manuscript adequately addresses the points raised by the reviewer.

      Major issues:

      (1) UMAPs: There are several instances in the manuscript where UMAPs are used incorrectly as support for statements about how transcriptionally similar two populations are. UMAP is a stochastic, non-linear projection for visualization - distances in UMAP cannot be used to determine how transcriptionally similar or dissimilar two groups are. In order to make conclusions about how transcriptionally similar two populations are requires performing calculations either in the gene expression space, or in a linear dimensional reduction space (e.g. PCA, keeping in mind that this will only consider the subset of genes used as input into the PCA). Please correct or remove these instances, which include (but are not limited to):

      p.4 107-110

      p.4 112

      p.8 207-208

      p.10 273-275

      We would like to thank the reviewer for raising this question. The descriptions of UMAP have been revised throughout the manuscript in accordance with the reviewer's suggestion (Please see the main text in the revised manuscript).

      (2) Nodal and lefty manipulations: The section "Nodal-Lefty regulatory loop is needed for PP and anterior Endo fate specification" and Figure 3 do not draw any significant conclusions. This section presents a LIANA analysis to determine the signals that might be important between prechordal plate and endoderm, but despite the fact that it suggests that BMP, Nodal, FGF, and Wnt signaling might be important, the manuscript just concludes that Nodal signaling is important. Perhaps this is because the conclusion that Nodal signaling is required for the specification of these cell types has been demonstrated in zebrafish in several other studies with more convincing experiments (Alexander 1999, Gritsman 1999, Gritsman 2000, Rogers 2017, Sako 2016). While FGF has recently been demonstrated to be a key player in the stochastic decision to adopt endodermal fate in lateral endoderm (Economou 2022), the idea that FGF signaling may be a key player in the differentiation of these two cell types has strangely been relegated to the discussion and supplement. Lastly, the manuscript does not make clear the advantage of performing experiments to explore the PP-Endo decision in Nodal-stimulated explants compared to data from intact embryos. What would be learned from this and not from an embryo? Since Nodal signaling stimulates the expression of Wnts and FGFs, these data do not test Nodal signaling independent of the other pathways. It is unclear why this artificial system that has some disadvantages is used since the manuscript does not make clear any advantages that it might have had.

      We sincerely thank the reviewers for their valuable comments. As mentioned in our manuscript, although a substantial number of studies have reported on the mechanisms governing the segregation of mesendoderm fate in zebrafish embryos—including the Dr. Hill laboratory’s work cited by the reviewers, which demonstrated the involvement of FGF signaling in the ventral mesendoderm fate specification—research on the regulatory mechanisms underlying anterior mesendoderm differentiation remains relatively limited. This is largely due to the challenges posed by the close physical proximity and similar transcriptional states of anterior mesendoderm cells, as well as their shared dependence on high levels of Nodal signaling for specification.

      Several studies from the Dr. C.P. Heisenberg’s laboratory have attempted to elucidate the fate segregation between anterior mesendoderm cells, namely the prechordal plate (PP) and anterior endoderm (endo) cells. They found that PP cells are tightly connected, facilitating the propagation of Nodal signaling[8]. Prolonged exposure to Nodal activates the expression of Gsc, which acts as a transcriptional repressor to inhibit sox17 expression, thereby suppressing endodermal fate[3]. However, they also noted that Gsc mutants do not exhibit endoderm developmental defects, suggesting the involvement of additional factors in this process.

      The reviewer inquired about our rationale for using the Nodal-injected explant system. In our investigation of the fate separation between the PP and the anterior endo, we initially analyzed zebrafish embryonic data. Using URD to reconstruct the transcriptional lineage tree, we found that these two cell types were positioned distantly from each other. However, existing literature indicates that the anterior endoderm and PP are not only spatially adjacent but also derive from common mesendodermal progenitors and exhibit transcriptional similarities[2,8]. As the reviewer noted, when tracing all progenitor cells of these two lineages using URD, it is easy to inadvertently include other cell types—such as ventral epiblast cells—which may compromise the accuracy of the analysis. We therefore concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly precise results.

      By contrast, our group’s earlier study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endo, PP, and notochord[5]. This makes the Nodal explant system a highly suitable model for studying the fate separation between PP and anterior endo. Ultimately, by analysing in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitors—a conclusion further supported by live imaging experiments.

      As we answered above, we first used the analyses of single-cell RNA sequencing and live imaging to demonstrate that anterior endoderm can originate from PP progenitor cells. Understanding the mechanism underlying the fate segregation between these two cell populations became a key focus of our research. We began by applying cell communication analysis to our single-cell data to identify signaling pathways that may be involved. This analysis specifically highlighted the Nodal-Lefty signaling pathway. Since Lefty acts as an inhibitor of Nodal signaling, we hypothesized that differences in Nodal signaling strength might regulate the fate of these two cell populations. By overexpressing different concentrations of Nodal mRNA and examining the fates of PP and anterior Endo cells, we confirmed this hypothesis.

      Thus, we propose that even subtle differences in Nodal signaling levels may influence anterior mesendoderm fate decisions. To test this, we generated systems with slightly reduced Nodal signaling (via ndr1 knockdown) and slightly elevated Nodal signaling (via lft1 knockout). Using these models, we precisely captured the critical stage of fate segregation between PP and anterior endo cells and identified a novel transcriptional repressor, Ripply1, which works in concert with Gsc to suppress anterior endoderm differentiation.

      (3) ripply1 mRNA injection phenotype inconsistent with previous literature: The phenotype presented in this manuscript from overexpressing ripply1 mRNA (Fig S11) is inconsistent with previous observations. This study shows a much more dramatic phenotype, suggesting that the overexpression may be to a non-physiological level that makes it difficult to interpret the gain-of-function experiments. For instance, Kawamura et al 2005 perform this experiment but do not trigger loss of head and eye structures or loss of tail structures. Similarly, Kawamura et al 2008 repeat the experiment, triggering a mildly more dramatic shortening of the tail and complete removal of the notochord, but again no disturbance of head structures as displayed here. These previous studies injected 25 - 100 pg of ripply1 mRNA with dramatic phenotypes, whereas this study uses 500 - 1000 pg. The phenotype is so much more dramatic than previously presented that it suggests that the level of ripply1 overexpression is sufficiently high that it may no longer be regulating only its endogenous targets, making the results drawn from ripply1 overexpression difficult to trust.

      We sincerely thank the reviewer for raising this question. First, we apologize for not providing a detailed description of the amount of HA-ripply1 mRNA injected in our previous manuscript. We injected 500 pg of HA-ripply1 mRNA at the 1-cell stage and allowed the embryos to develop until 6 hpf for the CUT&Tag experiment. In the supplementary materials, we included a bright-field image of an 18 hpf-embryo injected with HA-ripply1 mRNA, which morphologically exhibited severe developmental abnormalities. The reviewer pointed out that the amount of ripply1 mRNA we injected might be excessive, potentially leading to non-specific gain-of-function effects. The injection dose of 500 pg was determined based on conclusions from our previous study. In that study, injecting 24 pg of ripply1 mRNA into one cell of zebrafish embryos at the 16–32 cell stage was sufficient to induce a secondary axis lacking the forebrain[5]. From this, we estimated that an injection concentration of approximately 500–1000 pg would be appropriate at the 1-cell stage, so that after several rounds of cell division, each cell gained 20-30 pg mRNA at 32 cell stage. Additionally, we conducted supplementary experiments injecting 100 pg, 250 pg, and 500 pg of ripply1 mRNA, and observed 500 pg of ripply1 mRNA led to a dramatic suppression of endoderm formation (Author response image 4).

      Finally, our study focuses on the mechanism of cell fate segregation in the anterior mesendoderm, primarily during gastrulation. The embryos injected with ripply1 mRNA underwent normal gastrulation, and our CUT&Tag experiment was performed at 6 hpf. Therefore, we believe that the amount of ripply1 mRNA injected in this study is appropriate for addressing our research question.

      Author response image 4.

      Different concentrations of ripply1 mRNA were injected into zebrafish embryos at the one-cell stage, with RFP fluorescence labeling sox17-positive cells.

      (4) Ripply1 binding to sox17 and sox32 regulatory regions not convincing: The Cut and Tag data presented in Fig 6J-K does not seem to be high quality and does not seem to provide strong support that Ripply 1 binds to the regulatory regions of these genes. The signal-to-noise ratio is very poor, and the 'binding' near sox17 that is identified seems to be even coverage over a 14 kb region, which is not consistent with site-specific recruitment of this factor, and the 'peaks' highlighted with yellow boxes do not appear to be peaks at all. To me, it seems this probably represents either: (1) overtagmentation of these samples or (2) an overexpression artifact from injection of too high concentration of ripply1-HA mRNA. In general, Cut and Tag is only recommended for histone modifications, and Cut and Run would be recommended for transcriptional regulators like these (see Epicypher's literature). Given this and the previous point about Ripply1 overexpression, I am not convinced that Ripply1 regulates endodermal genes. The existing data could be made somewhat more convincing by showing the tracks for other genes as positive and negative controls, given that Ripply1 has known muscle targets (how does its binding look at those targets in comparison) and there should be a number of Nodal target genes that Ripply1 does not bind to that could be used as negative controls. Overall this experiment doesn't seem to be of high enough quality to drive the conclusion that Ripply1 directly binds near sox17 and sox32 and from the data presented in the manuscript looks as if it failed technically.

      We sincerely thank the reviewer for raising this question. We apologize that the binding regions of sox17 marked in our previous analysis were incorrect, and we have made the corresponding revisions in the latest version of the manuscript.

      The reviewer noted that our CUT&Tag data contain considerable noise. To address this, we further refined our data processing: we annotated all peaks enriched in the treatment group and performed differential analysis, selecting genes with log<sub>2</sub>FoldChange > 3, padj < 0.5, and baseMean > 30 as candidate targets of Ripply1 binding. Subsequent GO enrichment analysis of these genes revealed significant enrichment of muscle development-related GO terms, which is consistent with previously reported roles of Ripply1 in regulating somite development. Therefore, we believe our filtering method effectively removes a large number of noise peaks and their associated genes.

      Under these screening criteria, we found that sox32 meets the threshold, while sox17 does not. In addition, following the reviewer’s suggestion, we examined mespbb—a known gene repressed by Ripply1—and gsc, a Nodal target gene, as a negative control.

      Based on these new analyses, we have revised our figures and text accordingly. Our data now support the possibility that Ripply1 may directly bind to the promoter region of sox32. Since sox32 acts as a direct upstream regulator of sox17, this binding could influence sox17 expression (Figure 6 and Figure S11).

      Finally, we would like to note that studies have reported Ripply1 as a transcriptional repressor, which may function by recruiting other co-factors, such as Groucho, to form a complex[14,15]. This might explain why our CUT&Tag data detected Ripply1 binding to a broad set of genes.

      (5) "Cooperatively Gsc and ripply1 regulate": I suggest avoiding the term "cooperative," when describing the relationship between Ripply1 and Gsc regulation of PP and anterior endoderm - it evokes the concept of cooperative gene regulation, which implies that these factors interact with each biochemically in order to bind to the DNA. This is not supported by the data in this manuscript, and is especially confusing since Ripply1 is thought to require cooperative binding with a T-box family transcription factor to direct its binding to the DNA.

      We sincerely thank the reviewer for raising this important issue. The reviewer pointed out that the term "Cooperatively" may not be entirely appropriate in the context of our study. In accordance with the reviewer's suggestion, we have replaced "Cooperatively" with "Collectively" in the relevant sections.

      (6) SWI/SNF: The differential expression of srcap doesn't seem very remarkable. The dot plots in the supplement S7H don't help - they seem to show no expression at all in the endoderm, which is clearly a distortion of the data, since from the violin plots it's obviously expressed and the dot-size scale only ranges from ~30-38%. Please add to the figure information about fold-change and p-value for the differential expression. Publicly available scRNAseq databases show scrap is expressed throughout the entire early embryo, suggesting that it would be surprising for it to have differential activity in these two cell types and thereby contribute to their separate specification during development. It seems equally possible that this just mildly influences the level of Nodal or FGF signaling, which would create this effect.

      Thank the Reviewer for this question. As suggested, we performed Wilcoxon tests to compare srcap expression between PP and Endo populations. The analysis shows that while srcap expression is moderately elevated in PP compared to in Endo, this difference is not statistically significant. The corresponding p-value and fold change have now been included in the revised figure (Please see Figure 4J and S7H). Although the transcriptional level of srcap shows no significant difference between PP and anterior endoderm, our subsequent experiments—using AU15330 (an inhibitor of the SWI/SNF complex) and injecting morpholino targeting srcap, a key component of the SWI/SNF complex—demonstrated that its inhibition indeed promotes anterior endoderm fate while reducing PP cell specification. Therefore, we propose that subtle differences in the SWI/SNF complex may regulate the fate specification of PP and anterior endoderm through two mechanisms. First, as mentioned in our study, these chromatin remodelers modulate the expression of master regulators such as Gsc and Ripply1, thereby influencing cell fate decisions. Second, as noted by the reviewer, these chromatin remodelers may affect the interpretation of Nodal signaling, ultimately contributing to the divergence between PP and anterior endoderm fates.

      The multiome data seems like a valuable data set for researchers interested in this stage of zebrafish development. However, the presentation of the data doesn't make many conclusions, aside from identifying an element adjacent to ripply1 whose chromatin is open in prechordal plate cells and not endodermal cells and showing that there are a number of loci with differential accessibility between these cell types. That seems fairly expected since both cell types have several differentially expressed transcriptional regulators (for instance, ripply1 has previously been demonstrated in multiple studies to be specific to the prechordal plate during blastula stages). The manuscript implies that SWI/SNF remodeling by Srcap is responsible for the chromatin accessibility differences between these cell types, but that has not actually been tested. It seems more likely that the differences in chromatin accessibility observed are a result of transcription factors binding downstream of Nodal signaling.

      We thank the reviewer for recognizing the value of our newly generated data. Through integrative analysis of single-cell data from wild-type, ndr1 kd, and lft1 ko groups of Nodal-injected explants at 6 hours post-fertilization (hpf), we identified a critical branching point in the fate segregation of the prechordal plate (PP) and anterior endoderm (Endo), where chromatin remodelers may play a significant role. Based on this finding, we performed single-cell RNA and ATAC sequencing on zebrafish embryos at 6 hpf. Analysis of this multi-omics dataset revealed that transcriptional repressors such as Gsc, Ripply1, and Osr1 exhibit differences in both transcriptional and chromatin accessibility levels between the PP and anterior Endo. Subsequent overexpression and loss-of-function experiments further demonstrated that Gsc and Ripply1 collaboratively suppress endodermal gene expression, thereby inhibiting endodermal cell fate. Previous studies have reported that for the activation of certain Nodal downstream target genes, the pSMAD2 protein of the Nodal signaling pathway recruits chromatin remodelers to facilitate chromatin opening and promote further transcription of target genes[16]. Therefore, our data provide chromatin accessibility profiles for Gsc and Ripply1, offering a valuable resource for future investigations into their pSMAD2 binding sites.

      Minor issues:

      Figure 2 E-F: It's not clear which cells from E are quantitated in F. For instance, the dorsal forerunner cells are likely to behave very differently from other endodermal progenitors in this assay. It would be helpful to indicate which cells are analyzed in Fig F with an outline or other indicator of some kind. Or - if both DFCs and endodermal cells are included in F, to perhaps use different colors for their points to help indicate if their fluorescence changes differently.

      Thank you for the reviewer's suggestion. In the revised version of the figure, we have outlined the regions of the analyzed cells.

      Fig 3 J: Should the reference be Dubrulle et al 2015, rather than Julien et al?

      Thanks, we have corrected.

      References:

      Alexander, J. & Stainier, D. Y. A molecular pathway leading to endoderm formation in zebrafish. Current biology : CB 9, 1147-1157 (1999).

      Barone, V. et al. An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Dev. Cell 43, 198-211.e12 (2017).

      Economou, A. D., Guglielmi, L., East, P. & Hill, C. S. Nodal signaling establishes a competency window for stochastic cell fate switching. Dev. Cell 57, 2604-2622.e5 (2022).

      Gritsman, K. et al. The EGF-CFC protein one-eyed pinhead is essential for nodal signaling. Cell 97, 121-132 (1999).

      Gritsman, K., Talbot, W. S. & Schier, A. F. Nodal signaling patterns the organizer. Development (Cambridge, England) 127, 921-932 (2000).

      Kawamura, A. et al. Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744 (2005).

      Kawamura, A., Koshida, S. & Takada, S. Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Molecular and cellular biology 28, 3236-3244 (2008).

      Sako, K. et al. Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell Rep. 16, 866-877 (2016).

      Rogers, K. W. et al. Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6, e28785 (2017).

      Warga, R. M. & Nüsslein-Volhard, C. Origin and development of the zebrafish endoderm. Development 126, 827-838 (1999).

      References:

      (1) Steinbeisser, H., and De Robertis, E.M. (1993). Xenopus goosecoid: a gene expressed in the prechordal plate that has dorsalizing activity. C R Acad Sci III 316, 959-971.

      (2) Warga, R.M., and Nusslein-Volhard, C. (1999). Origin and development of the zebrafish endoderm. Development (Cambridge, England) 126, 827-838. 10.1242/dev.126.4.827.

      (3) Sako, K., Pradhan, S.J., Barone, V., Inglés-Prieto, Á., Müller, P., Ruprecht, V., Čapek, D., Galande, S., Janovjak, H., and Heisenberg, C.P. (2016). Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell reports 16, 866-877. 10.1016/j.celrep.2016.06.036.

      (4) van Boxtel, A.L., Economou, A.D., Heliot, C., and Hill, C.S. (2018). Long-Range Signaling Activation and Local Inhibition Separate the Mesoderm and Endoderm Lineages. Developmental cell 44, 179-191.e175. 10.1016/j.devcel.2017.11.021.

      (5) Cheng, T., Xing, Y.Y., Liu, C., Li, Y.F., Huang, Y., Liu, X., Zhang, Y.J., Zhao, G.Q., Dong, Y., Fu, X.X., et al. (2023). Nodal coordinates the anterior-posterior patterning of germ layers and induces head formation in zebrafish explants. Cell reports 42, 112351. 10.1016/j.celrep.2023.112351.

      (6) Economou, A.D., Guglielmi, L., East, P., and Hill, C.S. (2022). Nodal signaling establishes a competency window for stochastic cell fate switching. Developmental cell 57, 2604-2622 e2605. 10.1016/j.devcel.2022.11.008.

      (7) Schier, A.F., and Talbot, W.S. (2005). Molecular genetics of axis formation in zebrafish. Annual review of genetics 39, 561-613. 10.1146/annurev.genet.37.110801.143752.

      (8) Barone, V., Lang, M., Krens, S.F.G., Pradhan, S.J., Shamipour, S., Sako, K., Sikora, M., Guet, C.C., and Heisenberg, C.P. (2017). An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Developmental cell 43, 198-211.e112. 10.1016/j.devcel.2017.09.014.

      (9) Muller, P., Rogers, K.W., Jordan, B.M., Lee, J.S., Robson, D., Ramanathan, S., and Schier, A.F. (2012). Differential diffusivity of Nodal and Lefty underlies a reaction-diffusion patterning system. Science (New York, N.Y.) 336, 721-724. 10.1126/science.1221920.

      (10) Rogers, K.W., Lord, N.D., Gagnon, J.A., Pauli, A., Zimmerman, S., Aksel, D.C., Reyon, D., Tsai, S.Q., Joung, J.K., and Schier, A.F. (2017). Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6. 10.7554/eLife.28785.

      (11) Thisse, B., Wright, C.V., and Thisse, C. (2000). Activin- and Nodal-related factors control antero-posterior patterning of the zebrafish embryo. Nature 403, 425-428. 10.1038/35000200.

      (12) Eroglu, B., Wang, G., Tu, N., Sun, X., and Mivechi, N.F. (2006). Critical role of Brg1 member of the SWI/SNF chromatin remodeling complex during neurogenesis and neural crest induction in zebrafish. Developmental dynamics : an official publication of the American Association of Anatomists 235, 2722-2735. 10.1002/dvdy.20911.

      (13) Hensley, M.R., Emran, F., Bonilla, S., Zhang, L., Zhong, W., Grosu, P., Dowling, J.E., and Leung, Y.F. (2011). Cellular expression of Smarca4 (Brg1)-regulated genes in zebrafish retinas. BMC developmental biology 11, 45. 10.1186/1471-213X-11-45.

      (14) Kawamura, A., Koshida, S., Hijikata, H., Ohbayashi, A., Kondoh, H., and Takada, S. (2005). Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744. 10.1016/j.devcel.2005.09.021.

      (15) Kawamura, A., Koshida, S., and Takada, S. (2008). Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Mol Cell Biol 28, 3236-3244. 10.1128/MCB.01754-07.

      (16) Ross, S., Cheung, E., Petrakis, T.G., Howell, M., Kraus, W.L., and Hill, C.S. (2006). Smads orchestrate specific histone modifications and chromatin remodeling to activate transcription. EMBO J 25, 4490-4502. 10.1038/sj.emboj.7601332.

    1. eLife Assessment

      This important study investigates how the Cdv division proteins of Metallosphaera sedula assemble on and interact with curved membranes in vitro, advancing our understanding of this reduced ESCRT-like machinery. The data provide support for sequential protein recruitment and curvature-dependent enrichment at membrane necks, based on well-controlled reconstitution assays and quantitative analysis. The work establishes a convincing experimental framework for dissecting Cdv-mediated membrane remodeling. The study will be of broad interest to evolutionary and synthetic biologists as well as membrane biophysicists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbell-shaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.<br /> The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two sub-complexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks. The mechanistic details of how these complexes drive membrane remodeling, particularly through subcomplex removal by the proteasome and/or CdvC, remain insufficiently addressed, and the study therefore mainly provides an experimental framework for future mechanistic investigation.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.<br /> The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes.

      Weaknesses:

      The findings of this study suggest that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which represents a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain largely unexplored. Nevertheless, this study provides a valuable experimental framework to address these questions in the future.

    3. Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a convincing extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      Weaknesses:

      The initial manuscript was a bit light on quantitative detail, across the various figures - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution - for the interests of a broad audience. These issues have been addressed in the authors' revision.

    4. Reviewer #3 (Public review):

      In this revised report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Using the floatation assay, they demonstrate that CdvA, CdvB, and CdvB1 bind liposomes. CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division, possibly indicating a sensing of catenoid membrane geometry. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      In their revision, the authors have addressed the vast majority of our previous concerns. The paper is much improved as a result. The Figures are improved and the authors have added appropriate controls and additional experiments, strengthening their conclusions.

      There are still some discrepancies between these results and what is know about Sulfolobus division. Since the initial submission, other work has shown that in S. acidocaldarius, CdvA is the first component to assemble a ring (in absence of CdvB , doi.org/10.1073/pnas.2513939122) and that CdvB2 is able to bind membranes in vitro (doi.org/10.1073/pnas.2525941123). This might reflect differences between Sulfolobus and Metallosphaera, but probably should be discussed.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbellshaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.

      The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two subcomplexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, while the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks, the mechanistic details of how these complexes drive membrane remodeling by subcomplexes removal by the proteasome and/or CdvC remain speculative.

      Although the work highlights intriguing similarities with eukaryotic ESCRT-III systems and explores unique archaeal adaptations, the conclusions drawn would benefit from stronger experimental validation and a more comprehensive mechanistic framework.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.

      The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes. This finding suggests that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which is a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain speculative, and the study would benefit from stronger experimental validation and expanded contextualization within the field.

      We thank the Referee for his/her appreciation of our work.

      Weaknesses:

      This manuscript presents several methodological inconsistencies and lacks key controls to validate its claims. Additionally, there is insufficient information about the number of experimental repetitions, statistical analyses, and a broader discussion of the major findings in the context of open questions in the field.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a solid extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      The manuscript is a bit light on quantitative detail, across the various figures, and several key controls are missing (CdvA, B alone to better interpret the co-polymerisation phenotypes and establish the true order of recruitment, for example) - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution, as well as a brief exploration of the possible reasons for the discrepancy between the foci observed in their liposome assays and the large rings observed in cells - to better serve the interests of a broad audience.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #3 (Public review):

      Summary:

      In this report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Strengths:

      Using the floatation assay, they demonstrate that CdvA and CdvB bind liposomes when combined. While CdvB1 also binds liposomes under these conditions, in the floatation assay, CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      These experiments provide a good starting point for the in vitro study the interaction of Cdv system components with the membrane and their consecutive recruitment. However, several experimental controls are missing that complicate their ability to draw strong conclusions. Moreover, some results are inconsistent across the two main assays which make the findings difficult to interpret:

      (1) Missing controls.

      Various protein mixtures are assessed for their membrane-binding properties in different ways. However, it is difficult to interpret the effect of any specific protein combination, when the same experiment is not presented in a way that includes separate tests for all individual components. In this sense, the paper lacks important controls. For example, Fig 1C is missing the CdvB-only control. The authors remark that CdvB did not polymerise (data not shown) but do not comment on whether it binds membrane in their assays. In the introduction, Samson et al., 2011 is cited as a reference to show that CdvB does not bind membrane. However, here the authors are working with protein from a different organism in a different buffer, using a different membrane composition and a different assay. Given that so many variables are changing, it would be good to present how M. sedula CdvB behaves under these conditions.

      We thank the referee for raising this point. We have now added these data in Figure 1C. Indeed it turns out that CdvB from M. sedula exhibits clear membrane binding on its own in a flotation assay.

      Similarly, there is no data showing how CdvB alone or CdvA alone behave in the dumbbell liposome assay.

      Without these controls, it's impossible to say whether CdvA recruits CdvB or the other way around. The manuscript would be much stronger if such data could be added.

      We have now added these data in Figure 1E, 1F and 1G. Overall, we can confirm that CdvA binds the membrane better in the presence of CdvB (although both proteins can bind the membrane on their own). Both proteins appear to recognize the curved region of the membrane neck.

      (2) Some of the discrepancies in the data generated using different assays are not discussed.

      The authors show that CdvB2∆C binds membrane and localizes to membrane necks in the dumbbell liposome assay, but no membrane binding is detected in the flotation assay. The discrepancy between these results further highlights the need for CdvB-only and CdvA-only controls.

      We have now added these controls in Figure 1. In addition, we would like to clarify that the flotation assay and the SMS dumbbell assay serve different purposes and are not directly comparable in quantitative terms. In the flotation assay, all the protein present as input is eventually recovered and visualized. Thus, quantitative information on the proportion of the fraction of the total protein bound to lipids can be inferred from this assay. The SMS assay, in contrast, provides a very different kind of information. Because of the particular protocol required to generate dumbbells (De Franceschi, 2022), the total amount of protein in the inner buffer in dumbbells is not accurately defined, because protein that is not correctly reconstituted (e.g. which aggregates while still in the droplet phase) will interfere with vesicle generation, with the result that dumbbell with such aggregates is generally not formed in the first place. This renders it impossible to draw any quantitative conclusions about the proportion of the sample bound to lipids. The SMS is therefore not directly comparable to the flotation assay, and it is rather complementary to it. Indeed, the purpose of the SMS is to provide information about curvature selectivity of the protein.

      (3) Validation of the liposome assay.

      The experimental setup to create dumbbell-shaped liposomes seems great and is a clever novel approach pioneered by the team. Not only can the authors manipulate liposome shape, they also state that this allows them to accurately control the species present on the inside and outside of the liposome. Interpreting the results of the liposome assay, however, depends on the geometry being correct. To make this clearer, it would seem important to include controls to prove that all the protein imaged at membrane necks lie on the inside of liposomes. In the images in SFig3 there appears to be protein outside of the liposome. It would also be helpful to present data to show test whether the necks are open, as suggested in the paper, by using FRAP or some other related technique.

      We thank the Referee for his/her appreciation. The proteins are encapsulated inside the liposomes, not outside of them. While Figure S3 might give the appearance that there is some protein outside, this is actually just an imaging artifact. Author response image 1 (below) explains this: When the membrane and protein channel are shown separately, it is clear that the protein cluster that appeared to be ‘outside’ actually colocalizes with an extra small dumbbell lobe (yellow arrowhead). The protein appeared to be outside of it because (1) the protein fluorescent signal is stronger than the signal from the membrane, and (2) there is a certain time delay in the acquisition of the two channels (0.5-1 second), thus the membrane may have slightly shifted out of focus when the fluorescence was being acquired. We are confident that the protein is inside in these dumbbells because the procedure for preparing the dumbbells requires extensive emulsification by pipetting, which requires ≈ 1 minute. This time is more than sufficient for proteins with high affinity for the membrane, like ESCRT and Cdv, to bind the membrane. For an example of how fast binding under confinement can be, please see movie 2 from this paper: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.

      Moreover, in many instances, we observed that the protein is inside because, by increasing the gain in the images post-acquisition, a clear protein signal appear in the lumen (see Author response image 2).

      Author response image 1.

      Separate channels showing colocalization of protein and lipids (adapted from Figure S3). The zoom-in shows separate channels, highlighting that the CdvB2 cluster that seems to be ‘outside the dumbbell’ actually colocalizes with the small terminal lobe of the dumbbell, indicating that the protein is encapsulated within that lobe.

      Author response image 2.

      Residual protein present inside lumen of dumbbells as visualized by increasing the brightness post-acquisition.

      We are not sure what the referee means by “test whether the necks are open, as suggested in the paper”. We are confident that the lobes of dumbbells originated from a single floppy vesicle, and were therefore mutually connected with an open neck (at least at the onset of the experiment). We have performed extensive FRAP assays on dumbbells in previous papers (De Franceschi et al., ACS nano 2022 and De Franceschi et al., Nature Nanotech 2024) which unequivocally proved that these chains of dumbbells are connected with open necks. We now also performed a few FRAP assay with reconstituted Cdv proteins, which confirmed this point. We have added a movie of such an experiment to the manuscript (Movie 1).

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (4) Quantification of results from the liposome assay.

      The paper would be strengthened by the inclusion of more quantitative data relating to the liposome assay. Firstly, only a single field of view is shown for each condition. Because of this, the reader cannot know whether this is a representative image, or an outlier? Can the authors do some quantification of the data to demonstrate this? The line scan profiles in the supplemental figures would be an example of this, but again in these Figures only a single image is analyzed.

      The images that we showed are indeed representative. The dumbbells that are generated by the SMS approach contain an “internal control”: in each dumbbell, the protein has the option of localizing at the neck or localizing elsewhere in the region of flat membrane. We see consistently that Cdv proteins have a strong preference for localizing at the neck.

      We would recommend that the authors present quantitative data to show the extent of co-localization at the necks in each case. They also need a metric to report instances in which protein is not seen at the neck, e.g. CdvB2 but not CdvB1 in Fig2I, which rules out a simple curvature preference for CdvB2 as stated in line 182.

      While the request for better quantitation is reasonable, this would require carrying out very significant new experiments at the microscope, which is rendered near-impossible since both first authors left the lab on to new positions.

      Secondly, the authors state that they see CdvB2∆C recruited to the membrane by CdvB1 (lines 184-187, Fig 2I). However, this simple conclusion is not borne out in the data. Inspecting the CdvB2∆C panels of Fig 2I, Fig3C, and Fig3D, CdvB2∆C signal can be seen at positions which don't colocalize with other proteins. The authors also observe CdvB2∆C localizing to membrane necks by itself (Fig 2E). Therefore, while CdvB1 and CdvB2∆C colocalize in the flotation assay, there is no strong evidence for CdvB2∆C recruitment by CdvB1 in dumbbells. This is further underscored by the observation that in the presented data, all Cdv proteins always appear to localize at dumbbell necks, irrespective of what other components are present inside the liposome. Although one nice control is presented (ZipA), this suggests that more work is required to be sure that the proteins are behaving properly in this assay. For example, if membrane binding surfaces of Cdv proteins are mutated, does this lead to the accumulation of proteins in the bulk of the liposome as expected?

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have an affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then? We estimate that the simple answer is that, in this particular case, there are more clusters than there are necks, so some of the clusters must necessarily localize somewhere else.

      Author response image 3.

      Current Figure 2H, where clusters that are double-positive for both CdvB1 and CdvB2ΔC are indicated by yellow arrowheads, while cluster that apparently only contain CdvB2ΔC are indicated by red arrowheads. It is observed that all the double-positive clusters are localized at necks.

      (5) Rings.

      The authors should comment on why they never observe large Cdv rings in their experiments. In crenarchaeal cell division, CdvA and CdvB have been observed to form large rings in the middle of the 1 micron cell, before constriction. Only in the later stages of division are the ESCRTs localized to the constricting neck, at a time when CdvA is no longer present in the ring. Therefore, if the in vitro assay used by the authors really recapitulated the biology, one would expect to see large CdvAB rings in Figs 1EF. This is ignored in the model. In the proposed model of ring assembly (line 252), CdvAB ring formation is mentioned, but authors do not discuss the fact that they do not observe CdvAB rings - only foci at membrane necks. The discussion section would benefit from the authors commenting on this.

      The referee is correct: it is intriguing that we don’t see micron-sized rings for CdvA and CdvB. We do note that our EM data (Fig.S1) show that CdvA in its own can form rings of about 100-200nm diameter, well below the diffraction limit, that could well correspond to the foci that we optically resolve in Figure 1. We now added a brief comment on this to the manuscript on lines 256-264.

      (6) Stoichiometry

      It is not clear why 100% of the visible CdvA and 100% of the the visible CdvB are shifted to the lipid fraction in 1C. Perhaps this is a matter of quantification. Can the authors comment on the stoichiometry here?

      We agree that this was unclear. Since that particular gel was stained by coumassie, the quantitative signals might be unreliable, and hence we have repeated this experiment using fluorescently labelled proteins, which show indeed a less extreme distribution. This was also done to make the data more uniform, as requested by the referees.

      (7) Significance of quantification of MBP-tagged filaments.

      Authors use tagging and removal of MBP as a convenient, controllable system to trigger polymerisation of various Cdv proteins. However, it is unclear what is the value and significance of reporting the width and length of the short linear filaments that are formed by the MBP-tagged proteins. Presumably they are artefactual assemblies generated by the presence of the tag?

      Providing a measure of the changes induced by MBP removal, in fact, validates that this actually has an effect. But perhaps this places too much emphasis on the short filaments. We now opted for a compromise, removing the quantification of the width and length of short filaments formed by MBPtagged protein from the text, but keeping the supplementary figure showing their distribution as compared to the other filaments (Figure S2E, SF).

      Similar Figure 2C doesn't seem a useful addition to the paper.

      We removed panel 2C, and now merely report these values in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest the authors perform a deeper discussion about their findings, such as what are the evolutionary implications, how they think lipids from these archaea may affect the recruitment process,...

      Because there is no exact homology between Archaea Cdv proteins and Eukaryotic ESCRT-III proteins, we do not feel our work brings new evolutionary implications beyond what we already state in the manuscript. We also dis not perform experiments using Archaea lipids, thus we would rather not speculate on how they may potentially affect the recruitment of Cdv proteins.

      In general, the manuscript lacks information regarding some scale bars, number of experimental repetitions (n or N), statistical analysis when needed, information about protein concentrations used in their assays.

      We have now added this information in the manuscript.

      Below, I provide a list of comments that I think the authors should address to improve the manuscript:

      (1) Line 113-114: The authors test protein-membrane interactions using flotation assays with positively curved SUV membranes but encapsulate proteins in dumbbell-shaped liposomes with negative curvature at the connecting necks. Might the use of membranes with opposite curvatures affect the recruitment process? Since the proteins are fluorescently labeled, I suggest testing recruitment using flat giant unilamellar vesicles or supported lipid bilayers (with zero curvature) to validate their findings.

      We thank the referee for this suggestion. Please do note that we are not claiming in our paper that Cdv proteins recognize negative curvature. We merely observe that they localize at necks. The neck of a dumbbell exhibits the so-called “catenoid” geometry, which is characterized by having both positive and negative curvature.

      Experimentally, on the SUVs, we now realize there was a mistake in the method section: In the flotation assay we in fact used multilamellar vesicles, not SUVs, precisely for the reason mentioned by the referee. We apologize for the oversight and have now corrected this in the methods. Multilamellar vesicles are not characterized by a strong positive curvature as SUVs do, but we do agree that they likely don’t have negative curvature there either. Because of the heterogeneous nature of the multilamellar vesicles, they provide a binding assay that was rather independent of the curvature. Complementary to the flotation assay, the SMS approach was employed to reveal the curvature preference of proteins.

      Finally, we performed the experiment on large GUVs suggested by the referee using CdvB as an example, but this turned out to be inconclusive because the protein forms clusters: these clusters may be creating local curvature at the nanometer scale, which cannot be resolved by optical microscopy (Author response image 4). This is quite typical for proteins that recognize curvature (cf. for instance: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.)

      Author response image 4.

      Fluorescently labelled CdvB bound to giant unilamellar vesicle. The protein was added in the outer buffer. CdvB forms distinct clusters, which may generate a local region of high membrane curvature.

      (2) Line 138-139: How is His-ZipA binding the membrane? Wouldn't Ni<sup>2+</sup>-NTA lipids be required? If not, how is the binding achieved?

      Indeed, NTA-lipids were present. This is now stated both in the legend and in the methods.

      (3) In the encapsulated protein assays, why does the luminal fluorescence intensity of the encapsulated protein sometimes appear similar to the bulk fluorescence signal? Since only a small fraction of the protein assembles at membrane necks, shouldn't the luminal pool of unbound protein show higher fluorescence intensity inside the liposomes?

      We thank the referee for raising this point and giving us the opportunity to explain this. The reason is that Cdv proteins have a very high affinity for the neck, and when they cluster at the neck the fluorescence intensity of the cluster is many times higher than the background fluorescence. Because we were interested in imaging the clusters and avoiding overexposing them, we adjusted the imaging conditions accordingly, with the result that the fluorescence from both the lumen and the bulk is at very low level.

      By choosing different imaging conditions, however, it can be actually seen that the signal inside the lumen is clearly higher than the bulk: this can be seen for instance in Author response image 2, where the brightness has been properly adjusted.

      (4) Line 184-185: In Fig. 2I, some CdvB2ΔC puncta seem independent of CdvB1 and are not localized at membrane necks. How many such puncta exist? For example, in the provided micrograph, 2 out of 5 clusters are independent of CdvB1. This proportion is significant. Could the authors quantify the prevalence of these structures and discuss why they form?

      We thank the referee for giving us the opportunity to explain this apparent discrepancy. We’ll like to stress the fact that CdvB2ΔC and CdvB1 form an obligate heterodimer: in all our experiments, without exception, we find that they form a strong complex when we mix the two proteins. This is true both in dumbbells and in flotation assays.

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then?

      (5) Figure 1E and 1F: Why do lipids accumulate and colocalize with the proteins? How can the authors confirm lumen connectivity between vesicles? Performing FRAP assays could validate protein localization and enrichment at the lumen of the membrane necks.

      At first sight, indeed some lipid enrichment seems to be observed at the neck between lobes of dumbbells.

      This is, however, an imaging artifact due to the fact that the neck is diffraction limited. As shown in the Author response image 5, we are acquiring the membrane signal from both lobes at the neck region, and therefore the signal is roughly double, hence the apparent lipid enrichment.

      Author response image 5.

      Schematic illustrating that the neck between two lobes is smaller than the diffraction limit of optical microscopy (the size of a typical pixel is indicated by the green square). Because of this technical limitation, the fluorescence intensity of the membrane at the neck is twice that of a single membrane.

      The referee is correct in pointing out that these images do not prove that the lobes are connected, and that FRAP assays is the only way to prove this point. However, in previous papers we have confirmed extensively that in chains of dumbbells the lobes are connected:

      - De Franceschi N, Pezeshkian W, Fragasso A, Bruininks BMH, Tsai S, Marrink SJ, Dekker C. Synthetic Membrane Shaper for Controlled Liposome Deformation. ACS Nano. 2022 Nov 28;17(2):966–78. doi: 10.1021/acsnano.2c06125.

      - De Franceschi N, Barth R, Meindlhumer S, Fragasso A, Dekker C. Dynamin A as a one-component division machinery for synthetic cells. Nat Nanotechnol. 2024 Jan;19(1):70-76. doi: 10.1038/s41565023-01510-3.

      Random sticking of liposomes would also generate clusters of vesicles, not linear chains. We now provide also a Movie (Movie 1) supporting this point.

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (6) Why didn't the authors use the same lipid composition, particularly the same proportion of negatively charged lipids, on the SUVs of the flotation assays and on the dumbbell-shaped liposomes?

      In flotation assays, it is typical to use a relatively large proportion of negatively charged lipids, to promote protein binding. This is because the aim is to maximize membrane coverage by the protein. The SMS procedure to generate dumbbell-shaped GUVs is completely different, however. Rather than covering the membrane with protein, the idea is to reduce the amount of protein to a minimum, so that any curvature preference can be best visualized. This is e.g. routinely done in tube pulling experiments, for the same reason (See for instance Prévost C, Zhao H, Manzi J, Lemichez E, Lappalainen P, Callan-Jones A, Bassereau P. IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun. 2015 Oct 15;6:8529. doi: 10.1038/ncomms9529).

      (7) Line 117-119: The suggestion that polymer formation between CdvA and CdvB facilitates membrane recruitment is intriguing. However, fluorescence microscopy experiments could better elucidate whether there is sequential recruitment of CdvB followed by CdvA, or if these proteins form a heteropolymer composite for membrane binding. Can CdvB bind membranes independently, or does this require synergy between CdvA and CdvB.

      We thank the referee for prompting us to perform this experiment. As we now show in Figure 1C, CdvB indeed is able to bind the membrane independently of CdvA. Whether this happens sequentially or simultaneously is an interesting question, but one that is impossible to address with either the SMS or the flotation assay, because in both cases we can only observe the endpoint of the recruitment.

      We would also like to clarify one specific experimental detail. Perhaps unsurprisingly, the results from the flotation assay are dependent on the way the assay is performed. In particular, we observed that the same protein can exhibit a different binding profile depending on whether it is being loaded either at the top or at the bottom of the gradient. This can be seen in Author response image 6. This is counterintuitive, since once the equilibrium is reached, the result should only depend on the density of the sample. We performed an overnight centrifugation (> 16 hours) on a short tube (< 3 cm tall), thus equilibrium is being reached (which is corroborated by the fact that CdvB1 and CdvB2 can float to the top of the gradient within this timespan, as shown in Figure 2C, 2E, 2G). We ascribe the difference between top and bottom loading to the fact that, when the sample is loaded at the bottom, it has to be mixed with a concentrated sucrose solution, while in the case of loading from the top, this is not done.

      In literature, both loading from top and from bottom have been used:

      - Lata S, Schoehn G, Jain A, Pires R, Piehler J, Gottlinger HG, Weissenhorn W. Helical structures of ESCRTIII are disassembled by VPS4. Science. 2008 Sep 5;321(5894):1354-7. doi: 10.1126/science.1161070

      - Moriscot C, Gribaldo S, Jault JM, Krupovic M, Arnaud J, Jamin M, Schoehn G, Forterre P, Weissenhorn W, Renesto P. Crenarchaeal CdvA forms double-helical filaments containing DNA and interacts with ESCRT-III-like CdvB. PLoS One. 2011;6(7):e21921. doi: 10.1371/journal.pone.0021921.

      - Senju Y, Lappalainen P, Zhao H. Liposome Co-sedimentation and Co-flotation Assays to Study LipidProtein Interactions. Methods Mol Biol. 2021;2251:195-204. doi: 10.1007/978-1-0716-1142-5_14. In performing the flotation assay for CdvB1 and CdvB2ΔC, or when using all 4 proteins together, we loaded the sample at the bottom, and we could detect reproducible binding to liposomes (Figures 2D, 2F, 2H, 3A). However, CdvB does not bind the membrane when loaded at the bottom. Thus, for the experiments shown in figure 1C, we loaded the proteins at the top. This experimental setup allowed us to highlight that CdvB indeed induce a stronger interaction between CdvA and the membrane.

      Author response image 6.

      CdvB binding to multilamellar vesicles in a flotation assay. In the left panel, the sample was loaded at the top of the sucrose gradient; in the right panel it was loaded at the bottom.

      (8) Line 165-173: The authors claim that filament curvature differs between CdvB2ΔC alone and the CdvB1:CdvB2ΔC complex. Are these differences statistically significant? What is the sample size (N)? Furthermore, how do the authors confirm interactions between these proteins in the absence of membranes based solely on EM micrographs?

      We can confirm that the filaments are composed by both proteins, because the filaments have different curvature when both proteins are present. However, as requested by referee 3, point (7), we removed the quantification of curvature from panel 2C. We report the N number in the text.

      (9) Line 121-123: Are the authors referring to positive or negative membrane curvatures? The cited literature suggests ESCRT-III proteins either lack curvature preferences (e.g., Snf7, CHMP4B) or prefer high positive curvature (e.g., late ESCRT-III subunits). This is confusing since the authors later test recruitment to negatively curved necks.

      We do not claim that Cdv proteins prefer positive or negative curvature, because the necks present in dumbbells have a catenoid geometry, which include both positive and negative curvature. We have now clarified this in the discussion.

      (10) Since the conclusions rely on the oligomeric state of the proteins, providing SEC-MALS spectra to show the protein oligomeric state right after the purification would strengthen the claims.

      While such SEC-MALDI experiments may be interesting, practical implementation of this is not possible since both first authors left the lab on to new positions.

      (11) Line 157-160: Suppl. Fig. 2 shows only a single EM micrograph of a small filament. Could the authors provide lower magnification images showing more filaments?

      As requested by Referee 3, point (7), we have toned down the importance of these short filaments.

      Also, why are the sample sizes for filament length (N=161) and width (N=129) different?

      Protein filaments formed by Cdv tend to stick to each other side by side, so that for some filaments the width could not be accurately assessed, and accordingly those were removed from the analysis.

      (12) The introduction states that CdvA binds membranes while CdvB does not. However, the results suggest CdvB facilitates membrane binding, helping CdvA attach. This discrepancy needs further explanation.

      We thank the referee for raising this point. We have now performed additional experiments (both SMS assay and flotation assays) showing that indeed CdvB from M. sedula is (unlike CdvB from Sulfolobus) able to bind the membrane on its own (Figure 1C, 1F).

      Reviewer #2 (Recommendations for the authors):

      Best practice would be to show single fluorescence channels in grayscale or inverted grayscale, retaining pseudocolouring only for the merged multichannel image.

      We decided to retain and standardize the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. We believe this improves readability, and this was also a request from Referee 3. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      It would be great to include a quantification of liposome curvature vs focal intensity of the various Cdv components - across figures.

      Quantification of liposome curvature at the neck can be done (De Franceschi et al., Nature Nanotech. 2024). However, in practice, this requires transferring of the sample post-preparation into a new chamber in order to increase the signal-to-noise ratio of the encapsulated dye, a procedure that drastically reduces the yield of dumbbells. The very sizeable amount of work required to obtain reliable measurements, especially considering all the proteins and protein combinations used in this study, indicates that this represents a project in itself, which goes well beyond the scope of this manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) We would encourage the authors to consider including the length of the scale bar next to the scale bar in each image and not in the figure description. This would greatly aid in clarity and interpretation of figures.

      We have now written the length of the scale bar in the figures.

      (2) In a similar vein, could the authors consider labeling panels throughout the manuscript, writing that sample is being presented? This goes mainly for the negative stain and the dumbbell fluorescence images, as having to continuously consult the figure legend again hinders clarity.

      We have now labelled the EM images as requested by the referee.

      (3) Lines 254-256: would the statement hold not only for CdvB2∆C, but for all imaged proteins? They all seem to localize to membrane necks, presumably favoring membrane binding to a specific membrane topology.

      We agree with the referee, and changed the phrasing accordingly.

      (4) CdvB2∆C construct - presumably this was a truncation of helix 5 of the ESCRT-III domain? Figure 1A shows that the ESCRT-III domain spans residues 34-170 and therefore implies that all five ESCRT-III helices (which make up the ESCRT-III domain) are present in the C-terminal truncation. Could the authors clarify?

      Indeed, the truncation was done at residue 170.

      (5) Results of the liposome flotation assays are presented inconsistently across the three figures (Figs 1C, 2DFH, and 3A). This makes it more difficult than it needs to be to interpret and compare results. Could the authors consider presenting the three gels in a more similar, standardized way across the three figures?

      To improve readability, we now standardized the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      (6) From the data presented in Fig 1EF, it cannot be concluded whether CdvB and CdvA colocalize, as only one protein is labelled. Is there a technical reason for this?

      We have now repeated the same experiment by having both proteins labelled, confirming that there is co-localization at the neck (Figure 1G).

      (7) Fig 2C: is the difference between the two samples significant

      As requested by Referee 3, we have removed Figure 2C.

      (8) Fig 2I is missing a 'merged' panel.

      We have now added the merged panel.

      (9) The fluorescence intensity plots in Supp Figs 1C and 3C would be easier to interpret if the lipid and protein signal would be plotted on the same plot (say, with normalized fluorescence intensity)

      It is not immediately obvious to us what the signal should be normalized to. What we wished to convey with these plots was that the intensity of proteins spikes at the neck region. In an attempt to improve clarity, we have now aligned the plots vertically, and highlighted the position of the neck.

      (10) CdvA should have a capital "A" in Figure 3A, panel 3.

      We have now corrected this.

      (11) The discussion doesn't comment on the need to truncate CdvB2.

      This is explained in the result session.

    1. eLife Assessment

      This study systematically characterizes the activity patterns of a lateral supramammillary nucleus (SuM)-medial septum (MS)-hippocampus circuit across sleep-wake cycles and its role in memory consolidation. This work is fundamental because it identifies a previously unrecognized brain hub that helps coordinate how different types of memory are supported during a specific sleep state, advancing our understanding of how sleep contributes to memory organization. The work is well-designed, and the data are solid, supporting clear and significant conclusions; however, some mechanistic details and causal relationships would benefit from further clarification or additional experiments. The paper provides new insights into how distinct memory modalities could be processed by parallel, sleep-active subcortical-hippocampal circuits, which would be of general interest to a broad neuroscience audience.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aim to define how rapid eye movement sleep supports memory consolidation by identifying the brain circuits that are selectively engaged during this sleep state. They focus on a pathway linking a hypothalamic region involved in sleep regulation to the medial septum and onward to a hippocampal subregion that is critical for social memory. By combining recordings of neural activity with sleep-state-specific circuit manipulations, the study seeks to explain how information is routed during sleep to support distinct types of memory.

      A major strength of the work is the use of state-of-the-art circuit-based approaches to link sleep dynamics to defined long-range connections and behavioral outcomes. The authors show that neurons in the lateral supramammillary region projecting to the medial septum are selectively active during rapid eye movement sleep, and that silencing this pathway during this sleep state disrupts consolidation of both social and contextual fear memories. Further dissection of downstream circuitry reveals that inhibition of the medial septum-to-hippocampal CA2 pathway during rapid eye movement sleep selectively impairs social memory. These results provide support for functional specialization within parallel pathways and suggest that this circuit acts as a hub for routing memory-related information during sleep.

      While the evidence supporting a role for this circuit in sleep-dependent memory consolidation is compelling, several important mechanistic details remain unresolved. The chemical signaling used by the neurons connecting the hypothalamus to the medial septum is not clearly defined, leaving open whether these cells release excitatory signals, inhibitory signals, or a combination of both. In addition, the medial septum contains multiple neuronal populations with distinct downstream targets, and the specific cell types receiving input from this pathway are not clearly identified. Similarly, the nature of the signals delivered from the medial septum to the hippocampus remains unclear, making it difficult to link circuit anatomy to the observed behavioral specificity. Finally, because different circuit segments are manipulated independently, the causal relationship between upstream and downstream pathways remains suggestive rather than definitive and should be discussed explicitly as a limitation or addressed experimentally.

      Overall, the authors largely achieve their aims by identifying a rapid eye movement sleep-specialized circuit that contributes to memory consolidation in a modality-specific manner. The findings are likely to have a meaningful impact on the field by advancing understanding of how sleep organizes memory through parallel neural pathways and by providing a useful framework for future studies of sleep-dependent brain state regulation. With additional clarification of circuit mechanisms or a clearer discussion of current limitations, the study would offer even greater value to the neuroscience community.

    3. Reviewer #2 (Public review):

      Summary:

      This study systematically characterizes the activity patterns of a lateral supramammillary nucleus (SuM)-medial septum (MS)-hippocampus circuit across sleep-wake cycles and its role in memory consolidation. The authors demonstrate that the lateral SuM-MS projection is specifically active during REM sleep, and that REM-selective inhibition of this circuit, and of its downstream MS-CA2 pathway, impairs the consolidation of social memory. The work is well-designed, and the data are robust in supporting clear and significant conclusions. It provides important new insights into how distinct memory modalities could be processed by parallel, sleep-active subcortical-hippocampal circuits. The manuscript is of high quality overall, with some points to address as detailed below.

      Strengths:

      (1) Novel finding:<br /> The identification of a REM-specialized subpopulation within the lateral SuM-MS pathway and its specific role in social memory consolidation via the lateral SuM-MS-CA2 projection is a significant advance. It effectively complements the previously described direct SuM-CA2 pathway and supports a model of the SuM as a "REM-hub" routing information through dedicated downstream targets.

      (2) Technical rigor:<br /> The combination of retrograde tracing, in vivo calcium imaging, single-unit identification via optrode recording, and temporally precise (REM-sleep-specific) optogenetic manipulation provides strong correlative and causal evidence.

      (3) Appropriate controls:<br /> Behavioral experiments include crucial controls for optogenetic inhibition (GtACR1 group, NREM/Wake inhibition control, mCherry control), effectively ruling out nonspecific effects of light or timing.

      Weaknesses:

      (1) Figure titles/descriptions:<br /> For clarity, the authors should consider specifying the recording method in the figure titles or legends. For instance, Figure 2: "Bulk Ca2+ activity (fiber photometry) of lateral SuM-MS projecting neurons..." and Figure 3: "Single-unit activity patterns (optrode recordings) of lateral SuM-MS projecting neurons...".

      (2) Statistical details:<br /> The use of "LSD post-hoc comparison" following ANOVA is noted. LSD is sensitive but can increase Type I error risk with multiple comparisons. Please justify its use or consider employing a more conservative post-hoc test (e.g., Tukey's or Bonferroni) for key comparisons like the social preference index in Figure 4h to bolster robustness.

      (3) Data presentation:<br /> When reporting statistical results in figure legends (e.g., Figures 2d, 3i-k), please provide the specific test statistic values (e.g., F, χ²) and exact P values where possible, rather than only significance asterisks.

      (4) Deepening mechanistic insight:<br /> The study excellently demonstrates "what" the circuit does. The discussion could be strengthened by further exploring "how" it might work. The finding that SuM-MS inhibition does not affect CA1 theta power is interesting and distinguishes it from other MS/hippocampal pathways. The suggestion of a theta-independent mechanism is plausible. Could the authors hypothesize more specifically? For example, might this circuit modulate reactivation events in the local CA2 network, neurochemical milieu (e.g., acetylcholine), or synapse-specific plasticity during REM sleep to facilitate social memory consolidation?

      (5) Implications of regional heterogeneity:<br /> The functional divergence between lateral (90% REM-active) and medial SuM-MS neurons is intriguing. A brief discussion on the potential anatomical basis (differential inputs/outputs) and functional significance (e.g., integration of specific affective or arousal signals) of this subdivision would be valuable.

    1. eLife Assessment

      This paper describes an important advance in a 2D in vitro neural culture system to generate mature, functional, diverse, and geometrically consistent cultures, in a 384-well format with defined dimensions and the absence of the necrotic core, which persists for up to 300 days. The well-based format and conserved geometry make it a promising tool for arrayed screening studies. The evidence is compelling and provides a method for generating consistent 3D cortical layer-like organization.

    2. Reviewer #3 (Public review):

      Summary:

      Kroeg et al. introduced a novel method for generating 3D cortical layer-like organization in hiPSC-derived models, achieving remarkably consistent topography within compact dimensions. Their approach involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells into 384-well plates, which triggers the spontaneous assembly of adherent cortical organoids comprising diverse neuronal subtypes, astrocytes, and oligodendrocyte-lineage cells.

      Strengths:

      Compared with existing brain organoid models, these adherent cortical organoids exhibit enhanced reproducibility and improved cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and investigation of brain disorder pathophysiology. Overall, this study addresses an important and timely need for advancing current brain organoid systems.

      Weaknesses:

      Highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial to emphasize the broad future potential of this new organoid system for large-scale pharmacological screening. The authors provided a substantial amount of new data during the revision process to support the reproducibility of neuronal activity. The next step would be to leverage this platform for functional screening of chemical and genetic perturbations to identify new drug candidates.

      Comments on revisions:

      Most of my previous concerns were adequately addressed through the revision.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.

      Strengths:

      The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.

      We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.

      Weaknesses:

      (1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction.

      We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or nonglobular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.

      (2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work.

      It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to stateof-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust.

      We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1C. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we have adapted the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure 2 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and have modified the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across an even broader range of parameters.

      (4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking.

      A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dyebased staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 (Figure 2A,D;Figure S4C). Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.

      (5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.

      We have revised the manuscript to include a more detailed step-by-step overview of the protocol.

      (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc.

      As suggested, we have adapted the graphical abstract to include more detail.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      (1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity.

      (2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.

      (3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.

      (4) The method has been demonstrated with multiple cell lines which is a strength.

      (5) The manuscript provides high-quality immunostaining for multiple markers.

      We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.

      Weaknesses:

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      In our opinion, it would be extremely difficult to directly compare methods. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a more standardized shape confined by the geometry of a 384well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods, as addressed in response to comment 2 of Reviewer 1 above.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      Figure S1 shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we have added the number of wells that were filled per plate.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      Figure S1 provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers' feedback and have modified the methods to provide more detail.

      Reviewer #3 (Public review):

      Summary:

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems.

      We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.

      Weaknesses:

      While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      We appreciate the feedback and have added more detail on consistency and standardization of functional outputs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points

      (1) As the preprint is officially part of the eLife review, I have to remark that the preprint which is made available on bioarxiv, suffers from some serious compatibility or format problem: one cannot highlight sentences as in a regular PDF and when trying to copypaste sentences from it jumbled characters are copied to the clipboard.

      The updated version of the paper on bioRxiv should not suffer from these compatibility issues.

      (2) Since the paper is presenting a new method it should briefly describe how each step, including the hiPSC culture was done, the reference to an earlier publication in this case is not sufficient, and this practice is generally best to avoid for methods papers.

      Each step in the culturing process has now been described in the methods.

      (3) The EB stage is insufficiently described. The "2D - 3D - 2D" transitions should be clearly explained.

      The methods section has been rewritten and expanded to include these processes in more detail.

      (4) Is there one FACS sorting in the protocol, or multiple (additional at IPS culture)? What markers each? What is the motivation for sorting and purifying the neural progenitors? Was the culture impure? What was purity? What cell types are expected after sorting, and what is removed?

      Only one FACS sorting step is performed at the NPC stage. This was added as an improvement to our original neural network protocol (Günhanlar et al 2018) to ensure consistency over different hiPSC source cell lines that can yield variable amounts of frontal cortical patterned NPCs. Positive sorting for neural lineage markers CD184 and CD24, and negative sorting for mesenchymal/neural crest CD217 and CD44 glial progenitor markers, according to Yuan et al 2011, ensures frontal-patterned cortical NPCs as confirmed for all batches by immunohistochemistry for SOX2, Nestin and FOXG1. We have added new text to the Methods section to clarify this more explicitly.

      (5) Seeding protocol and parameters are insufficiently described, and from what I read they are poorly defined: "Specifically, the optimal seeding density was determined by visual inspection of the organoids between 28 to 42 days after seeding a range of cell densities in the 384-well plate wells." For a new method, precise, actionable instructions are needed. I may have overlooked those elsewhere, in this case, please clarify these sections.

      The Methods section was rewritten and expanded to describe the methodology in greater detail with more actionable instructions.

      (6) The timeline in Figure 1 is not clearly delineated; I found it hard to understand which figure corresponds to which stage (e.g. facs sorting is not mentioned in the first part of the results but it is part of Figure 1A, neural rosette formation can happen both before and after facs sorting, simply referring to rosettes is not clear). Later parts of the manuscript 
> clearly introduce the terms sorting and seeding in the context of this method, and how ages (days) refer to these time points.

      Figure 1 was adapted to clarify the generation of Neural Progenitor Cells (NPCs) and subsequent seeding of NPCs to generate Adherent Cortical Organoids (ACOs).

      (7) The authors define: "cortical organized defined as a single radial structure." This is not a commonly used definition of organoids, for nomenclature, please see: doi: 10.1038/s41586-022-05219-6 (Pasca et al 2022).

      To clarify, the statement is not meant to reflect a definition of organoids in general, but rather the scoring of proper structure formation for Figure S1C. For discussion on nomenclature, see our response to point 1 of Reviewer 1 in the public review. We changed the wording to be more accurate.

      (8) In Figure S1d, the authors write: "the fraction of structurally intact cultures decreased to 50%", but I'm looking at that graph there seems to be no notable decrease, but huge variability. The authors should quantify claims of decrease by linear regression and an R square. Variation within and the cross-cell lines seem to be large. Also, it is unclear if dots are corresponding to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria? to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria?

      We agree with the reviewer that the claim on fraction of intact cultures decreasing over time to 50% is an overinterpretation due the large variability. We changed the wording in the manuscript to: While some later batches show moderately reduced success rates compared with the earliest batches, properly formed single-structure organoids were still obtained at 40–90% success across all examined time points (Figure S1C), indicating that long-term culture is feasible albeit with variable efficiency. The data are not longitudinal as each dot represents an endpoint of a different batch of organoids, totaling 18 independent batches across the three lines. We have clarified this in the figure legend. Success was defined at the well level as the presence of a single, continuous radial structure occupying the well, without obvious fragmentation or fusion events, as assessed by LIVE/DEAD that also confirmed viability. Wells were scored as successful only when the radial structure showed predominantly live signal with no large necrotic areas. Wells containing multiple radial structures, fused aggregates, or predominantly dead tissue were scored as unsuccessful.

      (9) Figure s1c: the numbering to this panel should be swapped, because it is referenced after other panels in the text. The reference is confusing: "Plotting the interaction between proliferation and the amount of NPCs required to be seeded for the successful generation of adherent cortical organoids" - success is not present in this graph at all? How is that measured?

      Figures S1C and S1D have been adapted to clarify the measure of ‘successful organoid formation’.

      (a) The description of this plot is confusing: "The doubling time of the NPCs explains more than half the variation (r2 = 0.67) of the required seeding density." What else is there? I thought that this was the formula the authors suggested to determine seeding density, but it seems not. Or is "manual inspection" the determinant, and that seems to correlate with this metric?

      Even though the rate of proliferation, measured as doubling time, is the main determinant of the seeding density, it is not the only determinant of the seeding density. For instance, intrinsic differences in differentiation potential could also play a role. Therefore, NPC lines with similar doubling times might still have slightly different optimal seeding densities. We have added clarification of this conclusion to the Results section.

      (b) Seeding density is a key parameter in many in vitro differentiation and culture protocols. This importance however does not mean that this density is attributable to differences in cell proliferation rate. Alternatively, the amount of cells determines the amount of secreted molecules and cell-to-cell contacts.

      Here, when we refer to the cell density, we specifically refer to the cell density needed to generate the ACO. We show that the most important contributor to the variation in ACO formation is the proliferation, measured here as the doubling time. We agree that there are other factors involved such as the secreted molecules, cell-to-cell contacts as well as the ability of a given NPC line to differentiate into a post-mitotic cell.

      (c) Is it mentioned which cell line this experiment corresponds to?

      The data in Figure S1D is from the 3 reported cell lines, as well as 2 clones from a fourth IPS cell line. This is detailed in the Methods section of the proliferation assay.

      (d) Without a more detailed explanation, seeding density and doubling time could be independent variables.

      These two variables are highly correlated as shown in Figure S1D, but it is true that there can be other variables that account for the observed variance, as discussed above in Point 9b.

      (e) In this figure the success rate is not visible at all so I have no idea how the autors arrive at a conclusion about success rate.

      We have adapted the figure legend to reflect which cell lines the dots in Fig. S1D represent. NPC lines can have substantial variation in proliferation rates. The figure reflects data of NPCs of 5 clones of 4 different hiPSC lines (as indicated in the Methods) with different proliferation rates. Also, the ACO success rate (operationally defined uniformly to the data shown in Fig. S1C) was also included.

      (10) Figure 2: Clean spatial segregation seems to be a strength of the system and therefore I would recommend putting more of the relevant microscopy images to the main figure, which are now currently in Figure S4.

      We have adapted Figure 2 accordingly, and included additional representative cortical layering images in Figure S4.

      (11) The variability in interneuron content seems to be significant, as currently presented in the figure. However, this may be due to a special organization. It would first quantify in consecutive rings around the centers whether interneurons have a tendency to be enriched towards the center or the edge of the culture. Maybe this explains the variability that is currently present in Figure s5b.

      We agree that spatial organization of interneurons could, in principle, contribute to variability. In our analysis, however, images were acquired from positions selected by a random sampling grid across the entire culture, rather than from specific central or peripheral regions. Each field contained on average 130.6 ± 16.1 NeuN+ nuclei, which provided a relatively large sampling volume per position. If interneurons were strongly enriched at the center or edge, we would expect systematic differences in interneuron fraction between fields assigned to central versus peripheral grid positions. We did not observe such a pattern in our dataset, suggesting that spatial organization is not the main driver of the observed variability.

      (12) Because in previous figures it seems like there is considerable variability across individual cultures and images here are coming from separate cultures, please use different shapes of the points coming from different cultures/wells, to see if maybe there is a culture-to-culture difference that explains the variability present in the figure.

      We have added different symbols per organoid for the interneuron quantifications and moved this quantification to main Figure 2.

      (13) I believe it is currently the standard error of the mean which is displayed in the figure, which is not an appropriate representation for variability, or the reproducibility across individual data points. SEM quantifies the reproducibility of the mean, not the reproducibility of the individual data points, which matters here. Mean refers to the mean of this quantification experiment and therefore it's not a biological entity. A box plot showing the interquartile range besides the individual data points would be an accurate representation of the spread of the data.

      We agree and have adapted the data, now in Figure 5, accordingly.

      (14) Again, in general, the main figures should contain much more of the quantification, as opposed to just raw images.

      Quantifications have been added in Figure 2 for the GAD67/NeuN for all cell lines as well as a time course quantification of GAD67/NeuN for 1 of the cell lines. In Figure 4, we have added excitatory and inhibitory synaptic quantifications.

      (15) Figure 2F-I the location of the center of the rosette should be marked with a star so that the conclusion about the direction of processes can be established.

      The suggested addition of a marker at the center of each rosette was evaluated but not implemented, because it reduced rather than improved figure clarity.

      (16) Figure 3 b and c:

      High magnification images of single cells, can't show changes in cell type morphology, and one cannot conclude that these cells are present in significant numbers across time. Zoomed-out images or quantification would be necessary for such a claim. The authors already have such images as presented in the next panels, so quantification without new experiments.
> I am uncertain about the T3 supplement here - do these images correspond to the same conditions?

      (a) It is unclear to me why different markers are used in the different panels, namely why NG2 is not used in any of the other images.

      NG2 was used at early developmental time points to show the presence of Oligodendrocyte Precursor Cells (OPCs). At later time points, the focus switched to MBP staining to indicate more mature oligodendrocyte lineage cells. Although NG2 and MBP are not in the same panels, the staining was performed for both antibodies at the same developmental time point (Day 119) as seen in Figure 3C and 3D.

      (b) Color coding in Figure 3G is ambiguous; the use of two blues should be avoided, and the Sub-sub panels should be individually labeled for the color code.

      We agree, and have now used different colors.

      (c) It is unclear if the presence of the t3 molecule is part of the standard procedure or if it was a side experiment to enhance the survival of oligodendrocytes. Are there no oligodendrocytes without? How does T3 affect other cell types, and the general health and differentiation of the cultures?

      Indeed, T3 is essential for oligodendrocyte formation. We did not observe obvious effects on the general health or differentiation potential of the cultures.

      (d) Is the 2ng/ml t3 from day one to the final day?

      Indeed, in the organoids cultured to study oligodendrocyte formation, T3 was added from Day 1. These details have now been clarified in the Methods and Results sections.

      (17) Figure 4:

      (a) Microscopy in this figure is high quality and very convincing about neural maturity.

      (b) The term "cluster" should be avoided. Unclear what it means here, but my best guess is "cells in a frame of view." Cluster is used with a different meaning in electrophysiology.

      This was adapted to ‘neurons in a field of view (FOV)’.

      (c) Panel J: I assume each row corresponds to a single cell? Could this be clarified? Are these selected cells from each frame, or all active cells are represented?

      Indeed, each row corresponds to a single cell, showing all active cells in the frame. This is now clarified in the legend.

      (d) How many Wells do these data correspond to, and in which line it was measured?

      As reported in the legend for Figure 5, these data correspond to 2 wells at Day 61 to which we have now added calcium imaging data from 3 wells from a different batch at Day 100. We have included in the legend that these recordings were from Line 1.

      (e) Panels G to I, again, the use of standard error of the mean is inappropriate and misleading: looking at the error bar one must conclude that there is minimal variation, which is the exact opposite of the conclusions, when one would look at the variability of the raw data points.

      As suggested, the graphs have been adapted as boxplots with interquartile ranges to highlight the distribution of data points.

      (f) It is unclear how many neurons and how many total actively firing neurons are present in the videos analyzed

      All neurons that were active in the field of view and showed at least one calcium event during the ~10 minute recording were included in the analysis. Using this method, we cannot comment on the proportion of neurons that were active from the total amount of neurons present, since the AAV virus we used does not transduce all neurons.

      (g) This figure shows the strength of the method in achieving neural maturity and function. There seems to be that there is considerable activity in the neuronal cultures analyzed. To conclude how reliably the method leads to such mature cultures one would need to measure at least a dozen wells (even if with some simpler and low-resolution method). Concluding reproducibility from one or two hand-picked examples is not possible.

      We agree with the reviewer that the number of wells used for calcium imaging analysis was limited. We are currently working on more advanced methods to increase the throughput of this analysis. However, we’ve now added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at Day 100, as well as Day 61.

      Methods:

      (1) Stem cell culture. The artist described that line 3 is grown on MEFs. Is this true for the other two lines, furthermore were they cultured in identical conditions?

      Line 2 and 3 were not grown on MEFs. We specifically chose different sources of NPCs to reflect the robust nature of the differentiation protocol. We have recently also adapted the protocol from Line 3 NPCs to confirm that the protocol also works starting from hiPSCs grown in feeder-free conditions in StemFlex medium, by adapting NPC differentiation according to our recent publication in Frontiers in Cellular Neuroscience (Eigenhuis et al 2023).

      (2) "NPCs were differentiated to adherent cortical organoids between passages 3 and 7 after sorting." Please clarify this sentence. I assume it refers to the first facs sorting of the protocol, but a section is not sufficiently detailed.

      We have adapted the methods to clarify that the FACS purification step occurs at the NPC stage.

      (3) I didn't fully understand: It seems to be that there are two steps of fact sorting involved, one after passage 3 and one after week 4. This should be represented in the graphical abstract of Figure 1.

      As outlined above, there is only 1 FACS sorting step at NPC stage. We have adapted this in the Methods and in the graphical abstract.

      (4) Neural differentiation: The authors write that optimal seeding density was determined by visual inspection of the organoids - this is.

      We have clarified the Methods section to better explain the process of optimizing the seeding density for each NPC line to generate the ACOs.

      (5) What does the following sentence mean: "Cells were refreshed every 2-3 days." Does it mean in replacement of the complete media? How much Media was added to the Wells?

      This is a very good point that we have now clarified in the Methods, as full replenishment of media is neither feasible, nor desirable. From the total volume of 110 µl per well, 80 µl is taken out and replaced with 85 µl to compensate for evaporation.

      (6) Calcium imaging: can the authors explain the decision to move the cultures one day before imaging into brainphys neural differentiation medium? In 3D organoid protocols, brainphys is gradually introduced to avoid culture shock (very different composition), and used for multiple months to enhance neural differentiation. For recording electrophysiological activity, artificial CSF is the most common choice.

      Indeed, for whole cell recordings of 2D neural networks as performed in Günhanlar et al 2018, we used gradual transition to aCSF. For the current ACOs, we found that using BrainPhys from the start of organoid differentiation prevents structure formation, probably because of increased speed of maturation disrupting proliferation and organization of radial glia differentiation. However, by changing the media to BrainPhys just one day before recording (reflecting a gradual change as not all medium is fully replenished and easier than switching to aCSF during recording), we saw greatly improved neuronal activity.

      (7) Statistical analysis : As I pointed out before, the standard error of the mean is not an appropriate metric to represent the variability of the data. It is meant to represent the variability of the estimated average. The following thought experiment should make it clear: I measured the expression of a gene in my system. 50 times I measured 0 and 50 times I measured 100. The average is 50, but of course it is a very bad representation of the data because no such data points exist with that value. Yet the standard error of the mean would be plus minus 5.

      We have revised Figures 5C–5D to boxplots displaying the interquartile range with all individual data points overlaid, which more accurately represents the variability in the dataset.

      Discussion

      (1) The discussion focuses on human cortical development, however, the methods presented by the authors entail dissociation and replating through multiple stages not part of brain development. I see the approach as more valuable as a possibly reliable method that generates both diverse and mature neural cultures.

      We have revised the Discussion to avoid explicitly invoking an in vitro recapitulation of human cortical development. Nevertheless, given that the NPCs from which the organoids originate exhibit frontal cortical identity, coupled with the timely emergence of cortical neuronal markers and rudimentary cortical layering, we are increasingly confident that the development of these cultures most likely mirrors that of the frontal cortex. To further substantiate this hypothesis, single-cell RNA sequencing experiments will be conducted in the future to provide additional insights.

      (2) One of the major claims of the authors is that the method is very reproducible. However, there is almost no data on reproducibility throughout the paper. Mostly single, high magnification images are presented, which therefore represent a small region of a single well of a single batch of a single cell line. Based on the data presented it is not possible to evaluate the reproducibility of the method.

      We agree that the original version did not sufficiently document reproducibility. To address this, we have refined and expanded our presentation of reproducibility data. The previous success-rate panel (original Figure S1D) has been moved and adapted as the new Figure S1C. In this updated version, each dot still represents the endpoint success rate of an independent batch, but dot size now scales with batch size (10–40 organoids), and the legend specifies the total numbers of organoids analyzed per line (line 1: n=248; line 2: n=70; line 3: n=70). Together with the distribution of success rates between ~40– 90% across multiple time points and three iPSC lines, this more detailed representation allows readers to directly assess the robustness of line-to-line and batch-to-batch performance. In addition, new time course quantifications of interneuron proportion (Figure 2G,H), synaptic marker densities (Figure 4H, I), and late-stage calcium imaging (Figure 5C,D,E) further demonstrate that key structural and functional read-outs show overlapping ranges across lines and independent differentiations, reinforcing that the method yields reproducible core phenotypes despite some biological variability.

      (3) The data presented is very promising, and it suggests that the authors derived optimal conditions for neural differentiation and neural culture diversification. I am confident that the authors can show that reproducibility, at least in a practical sense (e.g. in wells that form a culture) is high.

      Overall, this is a very promising and exciting work, that I am looking forward to reading in a mature manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      We have now more clearly elaborated the differences with other methods. As addressed in our response to point 2 of Reviewer 1 in the public reviews, there are several limitations and advantages to the adherent cortical organoids model listed as follows:

      Advantages of adherent cortical organoids:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      We have addressed this question in the current version of Fig. S1C, in which multiple batches of organoids of all three lines were scored for their success rate. The graph reflects the proportion of properly formed organoids of +/- 400 seeded wells scored at different timepoints, in which each timepoint is a different batch. As mentioned in the response to Reviewer 1, we have also added data on the number of organoids seeded per line in the figure legend.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      As outlined in the response to Reviewer 1, we have clarified the Methods and Discussion sections on seeding density and proliferation rate.

      Reviewer #3 (Recommendations for the authors):

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells. Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems. While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Particularly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      (1) Considering the emergence of astrocyte markers (GFAP, S100b) and upper layer neuron marker (CUX1) around Day 60, the overall differentiation speed is significantly faster compared to other forebrain organoid protocols. Are these accelerated sequences of neurodevelopment consistent across different hiPSC lines?

      As shown in Fig. S5, astrocytes are present around Day 60 for all three lines. For comparison with other organoid protocols, an important consideration is that the timeline for these organoids starts at NPC plating, while for other protocols timing often starts from the hiPSC stage. We have clarified the timeline in the graphical abstract in Figure 1A and in the Methods.

      (2) The calcium imaging results in Figure 4G were recorded at a single time point, Day 61, a relatively early time window compared to other forebrain organoid protocols (more than 100 days, PMID: 31257131; PMID: 36120104). Are the neurons in adherent cortical organoids functionally mature enough around Day 61? How consistent is this functional activity across different cell lines and independent differentiation batches?

      As discussed above in Point 1, it is important to consider that the specified timeline starts from NPC plating. In analogy to 2D neural networks, robust neuronal activity can be observed after ~8 weeks in culture. In addition, we have now added calcium imaging data for an additional batch of organoids at Day 100 in Figure 5, which exhibit comparable levels of neuronal activity as observed on Day 61.

      (3) Along the same line, Various cell types, such as oligodendrocytes and astrocytes, are believed to influence neuronal maturation. Therefore, longitudinal studies until the late stage are necessary to observe changes in electrophysiological activity based on the degree of neuronal maturation (at least two more later time points, such as 100 days and 150 days).

      As described in the previous points, we have now included a Day 100 time point in the calcium imaging data, in addition to the recordings at Day 61 (Figure 5C-E).

      (4) The authors assert that heterogeneity among organoids has been diminished using the human adherent cortical organoids protocol. However, there is inadequate quantitative data to prove the consistency of neuronal activities between different wells. Therefore, experiments quantifying the degree of heterogeneity between organoids, such as through methods like calcium imaging, are necessary to determine if neuron activity occurs consistently across each organoid well.

      We agree with the review and have added several quantitative experiments: a) we’ve added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at day 100, as well as day 61; b) we added synapse quantification in Figure 4, and c) interneuron quantification in Figure 2. We are currently also pursuing high throughput measures of activity to assess the longitudinal activity of ACOs in a larger number of wells. This way we can more definitively quantify the time-dependent variance in organoid activity.

      (5) Is this platform applicable to other functional measurements for neuronal activity, such as the MEA system? When observing the morphology of neurons formed in organoids, they appear to extend axons and dendrites in a consistent direction, suggesting a radial structure that demonstrates high reproducibility across wells. A culture system where neurons are arranged with such consistency in directionality could be highly beneficial for experiments utilizing the MEA system to assess parameters such as the speed of electrical activity transmission and stimulus-response. Therefore, there seems to be a need for a more detailed explanation of the utility of the structural characteristics of the culture system.

      The ACO platform is indeed suitable for MEA recordings. We are in the process of engineering the required geometry using HD-MEA systems through specialized inserts to generate ACOs on MEA systems.

      (6) In Figure 2E-I, authors suggest morphological diversity of GFAP+/S100b+ astrocyte, but the imaging data presented in Figure F-I is only based on GFAP immunoreactivity.

      Since GFAP is also expressed in radial glial cells at this stage (Figure 2I), many fibrous astrocytes and interlaminar astrocytes are likely radial glial neural progenitor cells instead of astrocytes. It appears necessary to perform additional staining using astrocyte markers such as S100B or outer radial glia markers such as HOPX to demonstrate that the figure depicts subtype-specific morphologies of astrocytes.

      In Figure 2M, we stained for GFAP and PAX6 to mark radial glia that look different than the astrocyte morphologies we describe in Figure 2J-L. We see a large overlap in GFAP and S100B staining in Figure 2I, in which most GFAP+ cells are double positive for S100B (yellow) that is more consistent with astrocyte maturation than radial glia. Furthermore, we have not seen PAX6 staining outside the dense edges of the center of the ACO.

      (7) In Figure 4D, the axon appears to exhibit directionality. Additional explanation regarding the organization of the axon is necessary. Further research utilizing sparse staining to examine the morphology of single neurons seems warranted.

      The polarized directionality of the axons is something we indeed have also noticed. We are looking into options to further investigate this intriguing property of the ACOs.

      (8) Figure 1E-F only showed cell viability in the early stages around Day 40-50. To demonstrate the superior long-term viability of ACO culture, it appears necessary to illustrate the ratio of dead cells to live cells over the course of a time course.

      Figure S1B shows LIVE/DEAD staining for ACOs of all three lines, revealing minimal DEAD staining at Day 56. A longitudinal time course experiment was not performed, however the line- and batch-specific quantifications over developmental timepoints in Figure S1C provide an indication of the robust long-term viability of the ACOs.

    1. eLife Assessment

      This study presents a valuable finding on the neural representation of time from two distinct egocentric and allocentric reference frames. The evidence is solid and largely supports the hypothesis, with one caveat that the task differences could impact the observed effects. The work will be of interest to cognitive neuroscientists working on the perception and memory of time.

    2. Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      Suggestions:

      The authors have satisfactorily addressed my last remaining suggestion.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible temporal construals. For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and participants were instructed to consider the event from "an internal" or from "an external" perspective. The authors found distinct patterns of brain activity in the posterior parietal cortex (PPC) and anterior hippocampus for the internal and the external viewpoint. Specifically, activation in the posterior parietal cortex positively correlated with distance during the external-perspective task, but negatively during the internal-perspective task. The anterior hippocampus positively correlated with distance in both perspectives. The authors conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are supported by the parietal cortex.

      We thank the reviewer for the accurate summary of our study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and the work tackles them from the perspective of construals theory.

      We appreciate the reviewer's positive and encouraging comments.

      Weaknesses:

      Although the work uses two distinct psychological tasks, the authors do not elaborate on the cognitive operationalization the tasks entail, nor the implication of the task design for the observed neural activation.

      We thank the reviewer for bringing this issue to our attention. In the revised manuscript, we have added a paragraph to the Discussion acknowledging this potential limitation of the study. Please see our response below.

      Reviewer #1 (Recommendations for the authors):

      Overall, I thank the authors for providing clear responses and much-needed detail on their original work, which enables a better understanding of their perspectives. I still have some detailed questions about the reported work, which I provide below. It could help clarify the work for a more general audience and its replicability by the community.

      We thank the reviewer for their positive evaluation of our previous revisions.

      Main general concern:

      I have one remaining core concern, which I distill as being a very different take on the usefulness of task design with neuroimaging. This concern follows from the authors' response to my original comment, which suggested possible confounds in fMRI data analysis and interpretation, as differences in task design and behavioral outcomes were not incorporated in the analytical approach.

      The authors confirmed that "there is a substantial difference between the two tasks" but argue that these differences are not relevant seing that "the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component " However, the authors do perform such contrasts in their analysis (e.g. p. 10: "We first directly contrasted the activity level between external- and internal-perspective tasks in the time window of...") and build inferences on brain activation from them (e.g., p. 10: "Compared with the internal-perspective task, the externalperspective task specifically activated the...").

      To clarify, my original concern was not about comparing neural activity in response to the two tasks but about the brain activity generated by two distinct tasks, which aim to reveal fundamentally distinct neural processes. The authors' response raises several concerns about the theoretical, methodological and empirical foundation of the work that are beyond the scope of a single empirical study and too long to detail here. Cognitive neuroscience relies on tasks to infer neural processes; this is the fertile and essential ground for using behavior in neuroscience to get to a mechanistic understanding of brain functions (e.g., Krakauer et al., 2017). In short, task design is fundamental because it shapes what neural processes are being investigated. Any inferences about brain activity recorded while a participant performs a task result from manipulated variables that should be under the control of the experimenter. Acknowledging that two tasks are distinct is acknowledging that different (neural) processes may govern their resolution. My initial remark was meant to highlight that, from basic signal detection theory, a same/different task and a temporal order task may not yield the same kind of basic biases and decision-making processes; these are far below and more basic than the posited sophisticated representations herein (construals, perspective taking).

      In short, the general approach is far coarser than the level of interpretational granularity being pushed forward in the paper would suggest.

      We greatly appreciate the reviewer’s comments and agree that this is a very fair point. We acknowledge that the two tasks differ in their underlying decision-making processes. In the revised manuscript, we have added a paragraph at the end of the Discussion to explicitly acknowledge this limitation and to outline possible avenues for future research (Page 23).

      “One limitation of the present study is that the external- and internal-perspective tasks differed not only in the type of perspective-taking they were intended to elicit, but also in their underlying decision-making processes. The external-perspective task explicitly required participants to compare two events with respect to external temporal landmarks and judge whether they occurred in the same or different parts of the day (i.e., a same/different judgment), whereas the internalperspective task explicitly required participants to project themselves into a reference event and judge whether the target event occurred in the future or the past relative to that reference (i.e., a temporal-order judgment). This task design ensured that participants adopted two distinct perspectives on the event series, but at the expense of coherence in the cognitive operations required to make the two types of judgments. One alternative approach would be to more closely align the response demands of the two tasks by drawing on McTaggart’s (1908) A-series and Bseries distinction: in the external-perspective task, participants could judge whether the target event occurred before or after the reference event (i.e., a before/after judgment), whereas in the internal-perspective task they could judge whether the target event occurred in the past or future relative to the reference event (i.e., a past/future judgment). Although such a design would improve coherence in the underlying decision-making processes (i.e., both are temporal-order judgments), it would reduce experimental control over the perspective-taking manipulation. For example, before/after judgments could still be made from an internal perspective. Future studies are therefore needed to determine whether findings obtained from these two task designs converge.”

      Additional clarifications:

      Intro/theory

      In this revised MS, the authors provided some clarifications of their theoretical perspective in the introduction. From my standpoint, the motivation remains insufficiently precise for a scientific report. Some theoretical aspects, such as construals or perspective taking remain evasive in relation to ego and allocentric representations. A couple of paragraphs dedicated to explaining what the authors mean precisely when using these terms would greatly help to situate the validity of the working hypothesis. In the absence of clear definitions, it remains difficult to evaluate what is being tested. For instance, what do the authors mean by "time construal"? How is a time construal the same or not as a "temporal distance" or a "temporal sequence"? This would greatly help the readership.

      Additionally, some assertions are not clearly identified or fairly attributed. For instance, the assertion that EST provides a means to spatialize time is the authors' point of view or interpretation of this work, not an original proposition of the theory. Another example is McTaggart's metaphysics on time series (in the ontology of time in physics) "echoed" in linguistics; it has effectively been proposed and popularized by L. Boroditskty. The prospective and retrospective views of time should not be attributed to Tsao et al but to Hicks or Block in the 70's, who studied the psychology of time in humans.

      We sincerely thank the reviewer for this criticism, which prompted us to clarify the relevant concepts in our manuscript. In the revised version, we made the following three main changes to the Introduction.

      In the second paragraph of the Introduction (page 3), we clarify that event segmentation theory is independent of, but related to, the spatial construal of time hypothesis. We also clarify what we mean by time construals and explain that the two temporal components—duration and sequence—can be represented within such time construals, rather than constituting time construals themselves. These revisions were intended to prevent potential misunderstandings for the reader. In addition, we incorporated Boroditsky’s contributions relevant to this framework:

      “One solution, which might be unique to humans, is to conceptualize time in terms of space (i.e., the spatial construal of time; e.g., Clark, 1973; Traugott, 1978; Lakoff & Johnson, 1980). Within this framework, time is usually first segmented into events—the basic temporal entities that observers conceive as having a beginning and an end (Zacks & Tversky, 2001). These temporal entities are then ordered in space, such that events occurring at different times can be maintained in working memory, allowing them to be flexibly accessed from different perspectives and easily referenced during communication (e.g., Casasanto & Boroditsky, 2008; Núñez & Cooperrider, 2013; Bender & Beller, 2014; Abrahamse et al., 2014; Figure 1A). The two core temporal components—duration and sequence—can be readily represented in such time construals.”

      In the third paragraph of the Introduction (pages 3-4), we acknowledge the contributions of earlier behavioral studies on prospective and retrospective timing by citing the work suggested by the reviewer (Block & Zakay, 1997), which indicates that two distinct cognitive systems underlie timing processes. These behavioral findings converge with the conclusions of more recent neuroimaging studies:

      “Unlike prospective timing tracking the continuous passage of time, durations in time construals are event-based (Sinha & Gärdenfors, 2014): the interval boundaries are constituted by events, and the event durations reflect their span (Figure 1A). Accumulating evidence suggests that distinct cognitive systems underlie these two types of duration (e.g., Block & Zakay, 1997). The motor and attentional system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the episodic memory system—particularly the hippocampus—is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184-197.

      In the fifth paragraph of the Introduction (page 5), we added a sentence to clarify the relationship between allocentric and egocentric reference frames and perspective taking:

      “However, the neural mechanisms that enable the brain to generate distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which posits the existence of stable allocentric representations that are independent of viewpoint, from which variable egocentric representations corresponding to different perspectives can be generated.”

      Methods:

      While more detail is provided in the Methods, some additional detail would be helpful to enable the replication of this work. For instance,

      - The table reports a sequence of phrases with assigned durations. Are the event phrases actual sentences given to participants? If so, how were participants made aware of the duration of the events, seeing that these sentence parts do not provide time information?

      We apologize that we did not make this clear. The full text used during the reading phase of learning has already been provided in Figure 1—source data 1, which includes the information about event durations. In the revised manuscript, we now explicitly refer to this information in the Methods section (page 38): In the reading phase, participants read a narrative describing the whole ritual on a computer screen twice (Figure 1—source data 1).

      - One of my original questions was about the narrative. In the Methods section, the authors state that participants read a text. Providing the full text would be helpful, also as a sanity check for sequentiality.

      As clarified in the previous response, the texts are provided in Figure 1—source data 1, which illustrates the texts for both even- and odd-numbered participants.

      - In the imagination phase, the authors introduce proportionality between imagination and experience (p. 37). What scale was used? What motivated it?

      We thank the reviewer for bringing this issue to our attention. In this study, participants did not directly experience the events; instead, they learned the event information through narrative reading or imagination to ensure experimental control and efficiency. As clarified in the Methods section, the ratio between imagination duration and actual event duration was 30 seconds to 1 hour. In the revised manuscript, we have further explained our motivation for this design choice (page 39):

      Here, we let participants learn the event information through narrative reading or imagination. Compared to learning through actual experience, this approach prioritizes experimental control and efficiency. The timing of the events is compressed, akin to the process of retrospectively recalling our experiences, in which we mentally traverse events without requiring the actual time they originally took. However, future studies may be needed to investigate whether the encoding of events from first- and second-hand experience differs.

      Results:

      - p. 10: the interpretation of the data on chunking and boundary effects should be properly referenced to e.g. Davachi's published work.

      We thank the reviewer for highlighting Davachi’s important work on event boundaries. We have appropriately cited these studies in the revised manuscript (page 10), as reflected in the following passage: This pattern can be interpreted as a categorical effect: sequential distances within the same part of the day were perceived as shorter (i.e., a chunking effect), whereas distances spanning different parts of the day were perceived as longer (i.e., a boundary effect). Similar boundary- or chunking-related effects on event cognition have been reported in previous studies (e.g., Ezzyat & Davachi, 2011; DuBrow & Davachi, 2013; Radvansky & Zacks, 2017).

      Ezzyat, Y., & Davachi, L. (2011). What constitutes an episode in episodic memory?. Psychological Science, 22(2), 243-252.

      DuBrow, S., & Davachi, L. (2013). The influence of context boundaries on memory for the sequential order of events. Journal of Experimental Psychology: General, 142(4), 1277.

      Radvansky, G. A., & Zacks, J. M. (2017). Event boundaries in memory and cognition. Current Opinion in Behavioral Sciences, 17, 133-140.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Suggestions:

      The authors have done a commendable job addressing my previous comments. In particular, the additional analyses elucidating the potential contribution of boundary effects to the behavioural data, the impact of incorporating RT into the fMRI GLMs, and the differential contributions of RT and sequential distance to neural activity (i.e., in PPC) are valuable and strengthen the authors' interpretation of their findings.

      My one remaining suggestion pertains to the potential contribution of boundary effects. While the new analyses suggest that the RT findings are driven by sequential distance and duration independent of a boundary effect (i.e., Same vs. Different factor), I'm wondering whether the same applies to the neural findings? In other words, have the authors run a GLM in which the Same vs. Different factor is incorporated alongside distance and duration?

      We thank the reviewer for their positive evaluation of our previous revisions and are pleased that the additional analyses adequately address the boundary effects in the behavioral data and the RT effects in the neural data.

      With respect to boundary effects in the neural data, we followed the reviewer’s suggestion and constructed a more complex GLM that incorporated the Same/Different part of the day as an additional regressors modulating the target events. Importantly, the same PPC region continued to show an interaction effect between Task Type and Sequential Distance. We have added this important control analysis in our revised manuscript (Pages 13–14):

      “To further assess whether the observed PPC reactivation can be attributed to boundary or chunking effects introduced by the Parts of the Day, as well as other behavioral outputs, we performed an additional control analysis. Using a more complex first-level model, we included two extra regressors modulating the target events in both internal- and external-perspective tasks, alongside Sequential Distance and Duration: (1) Same/Different parts of the day (coded as 1/−1) and (2) Future/Past (coded as 1/−1). Even with these additional controls, the same PPC region remained the strongest area across the entire brain, showing an interaction effect between Task Type and Sequential Distance, although the cluster size was slightly reduced (voxel-level p < 0.001; clusterlevel FWE-corrected p = 0.054).”

    1. eLife Assessment

      The study provides important mechanistic insight into the transcriptional control of γδT17 development, elegantly demonstrating how HEB and Id3 act sequentially and cooperatively to regulate γδT17 cell specification and maturation. The study provides compelling evidence that advances the understanding of E-Id protein dynamics in thymic T cell specification. The work is comprehensive, technically rigorous, and conceptually clear, and will be of interest to immunologists, developmental biologists, and those studying the molecular underpinnings of physiological outcomes.

    2. Reviewer #1 (Public review):

      The authors use Flow cytometry and scRNA seq to identify and characterize the defect in gdT17 cell development from HEB f/f, Vav-icre (HEB cKO) and Id3 germline-deficient mice. HEB cKO mice showed defects in the gdT17 program at an early stage, and failed to properly upregulate expression of Id3 along with other genes downstream of TCR signaling. Id3KO mice showed a later defect in maturation. The results together indicate HEB and Id3 act sequentially during gdT17 development. The authors further showed that HEB and TCR signaling synergize to upregulate Id3 expression in the Scid-adh DN3-like T cell line. Analysis of previously published Chip-seq data revealed binding of HEB (and Egr2) at overlapping regulatory regions near Id3 in DN3 cells.The study provides insight into mechanisms by which HEB and Id3 act to mediate gdT17 specification and maturation. The work is well performed and clearly presented.

      Comments on revisions:

      The authors have answered all of my questions. I am strongly supportive of the revised work.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Comments on revisions:

      In this revised version of their manuscript the authors have effectively addressed all of my previous concerns. In its current form the study represents a significant advancement in our understanding of how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. In this revised version of their manuscript the authors have effectively addressed all of my previous concerns. In its current form the study represents a significant advancement in our understanding of how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this manuscript have addressed a key concept in T cell development: how early thymus gd T cells subsets are specified and the elements that govern gd T17 versus other gd T cell subset or ab T cell subsets are specified. They show that the transcriptional regulator HEB/Tcf12 plays a critical role in specifying the gd T17 lineage and, intriguingly that it up regulates the inhibitor Id3 which is later required for further gd T17 maturation.

      Strengths:

      The conclusions drawn by the authors are amply supported by a detailed analysis of various stages of T cell maturation in WT and KO mouse strains at the single cell level both phenotypically, by flow cytometry for various diagnostic surface markers, and transcriptionally, by single cell sequencing. Their conclusions are balanced and well supported by the data and citations of previous literature.

      Weaknesses:

      I actually found this work to be quite comprehensive.

      Comments on revisions:

      Nothing to add here. The authors were very thorough in their original submission, and all minor issues identified have been addressed to my satisfaction.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their enthusiasm and insightful suggestions. Our responses to specific concerns and questions are detailed below.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors use Flow cytometry and scRNA seq to identify and characterize the defect in gdT17 cell development from HEB f/f, Vav-icre (HEB cKO), and Id3 germline-deficient mice. HEB cKO mice showed defects in the gdT17 program at an early stage, and failed to properly upregulate expression of Id3 along with other genes downstream of TCR signaling. Id3KO mice showed a later defect in maturation. The results together indicate HEB and Id3 act sequentially during gdT17 development. The authors further showed that HEB and TCR signaling synergize to upregulate Id3 expression in the Scid-adh DN3-like T cell line. Analysis of previously published Chi-seq data revealed binding of HEB (and Egr2) at overlapping regulatory regions near Id3 in DN3 cells.

      The study provides insight into mechanisms by which HEB and Id3 act to mediate gdT17 specification and maturation. The work is well performed and clearly presented. We only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Weaknesses:

      (1) The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors.

      We agree that this is a valid concern and continue to work on confirming the mechanism from several other angles. Validating HEB/E2A and Egr2 co-binding in gdT17 cell progenitors by ChIP-qPCR would/will be a very precise and definitive experiment, but it will be very challenging to perform, in part due to the low numbers of gdT17 precursors in the fetal thymus (note the y-axis scales in Fig. 1F, J). As a complementary approach, we have analyzed additional ChIP-seq data for HEB/E2A binding in Rag2<sup>-/-</sup> DN3 cells retrovirally transduced with the KN6 gdTCR cultured with stroma expressing the weak KN6 ligand T10 for 4 days. This analysis revealed that the binding of HEB/E2A on those sites persisted after weak gdTCR signaling, strengthening the likelihood that concurrent binding of HEB/E2A and Egr2 occurs during this developmental transition. We noted that HEB/E2A binding was slightly dampened in Rag2<sup>-/-</sup> DN3 + gdTCR cells relative to Rag2<sup>-/-</sup> DN3 cells, consistent with the induction of Id3 and subsequent Id3-mediated disruption of E protein binding. We also located HEB/E2A and Egr binding sites in close proximity in the two regions that shared peaks between HEB/E2A and Egr2 analyses (HE1 and HE2), in line with the potential participation of these two transcription factors in an enhanceosome binding complex.

      Furthermore, we examined the chromatin landscape of the Id3 locus by sorting WT DN3 and DN4 cells, as well as Rag2<sup>-/-</sup> DN3 cells to provide a genuine pre-selection context, and performing ATAC-seq (Figure 7–suppl 7A). Given the known ability of E2A and HEB to induce chromatin remodeling, we also examined accessibility in DN3 and DN4 cells from HEB cKO mice. Alignment of ATAC-seq and ChIP-seq peaks in the Id3 locus revealed accessibility of HE1 and HE2 in Rag2<sup>-/-</sup>, WT DN3, and WT DN4 cells. However, accessibility of HE1 and HE2 was dampened in HEB cKO cells, especially at the DN3 stage, suggesting that HEB may be involved in remodeling the Id3 locus, resulting in a poised state that enables TCR-dependent transcription factors to induce Id3 proportionally to TCR signal strength. These data are now presented as a new “Figure 7 – figure supplement 1” with corresponding Results, Discussion, and Methods updates.

      Our next story will be focused on a finer dissection of the Id3 cis-regulatory elements and their combinatorial regulation by HEB/E2A and other transcription factors, and how they relate to specific signaling pathways. For this study, we will modify the language regarding Egr2 to reflect the open questions that still remain to be addressed.

      (2) E2A expression is not affected in HEB-deficient cells, raising the question of partial compensation, a point that should be specifically discussed.

      This confounding factor is always an issue with E proteins. We have now added a section to the discussion that highlights previous literature and relates it to our findings.

      (3) All experiments are done at E18, when fetal gdT17 development predominates. The discussion could address whether these mechanisms extend to neonatal or adult gdT17 subsets.

      In our 2017 paper (PMID 29222418) we showed that HEB cKO mice have defects in the production of functional gdT17 cells in fetal and neonatal thymus and in the adult periphery (in lungs and spleen). While the adult thymus does not support the development of fully functional innate gd T cells, it does contain gdTCR+ cells that have activated the Sox-Maf-Rorc network (Yang 2023, PMID 37815917). It will be very interesting to assess the impact of HEB loss on these cells, and we are actively pursuing this goal. For now, we will add a paragraph to the discussion addressing what we know from previous work and what is yet to be learned.

      Reviewer #3 (Public review):

      Summary:

      The authors of this manuscript have addressed a key concept in T cell development: how early thymus gd T cell subsets are specified and the elements that govern gd T17 versus other gd T cell subsets or ab T cell subsets are specified. They show that the transcriptional regulator HEB/Tcf12 plays a critical role in specifying the gd T17 lineage and, intriguingly, that it upregulates the inhibitor Id3, which is later required for further gd T17 maturation.

      Strengths:

      The conclusions drawn by the authors are amply supported by a detailed analysis of various stages of T cell maturation in WT and KO mouse strains at the single cell level, both phenotypically, by flow cytometry for various diagnostic surface markers, and transcriptionally, by single cell sequencing. Their conclusions are balanced and well supported by the data and citations of previous literature.

      Weaknesses:

      I actually found this work to be quite comprehensive. I have a few suggestions for additional analyses the authors could explore that are unrelated to the predominant conclusions of the manuscript, but I failed to find major flaws in the current work.

      I note that HEB is expressed in many hematopoietic lineages from the earliest progenitors and throughout T cell development. It is also noteworthy that abortive gamma and delta TCR rearrangements have been observed in early NK cells and ILCs, suggesting that, particularly in early thymic development, specification of these lineages may have lower fidelity. It might prove interesting to see whether their single-cell sequencing or flow data reveal changes in the frequency of these other T-cell-related lineages. Is it possible that HEB is playing a role not only in the fidelity of gdT17 cell specification, but also perhaps in the separation of T cells from NK cells and ILCs or the frequency of DN1, DN2, and DN3 cells? Perhaps their single-cell sequencing data or flow analyses could examine the frequency of these cells? That minor caveat aside, I find this to be an extremely exciting body of work.

      Excellent question, and the underlying answer is yes, loss of HEB renders the cells more open to divergence to non-T lineages, even at the DN3 stage. Although our datasets did not reveal those cells, we have examined this question previously. In our 2011 paper (Braunstein, 2011, PMID 21189289) where we identified “DN1-like” cells arising from HEB-/- DN3 cells in OP9-DL1 co-cultures. These cells responded to IL-15 and IL-7 by differentiating into cytotoxic NK-like cells. We did not detect TCRb rearrangements but did not look for gdTCR rearrangements. Subsequently, multiple papers from other labs showed that ILC2 were greatly expanded in the thymus using Id-overexpression transgenic mice and HEB/E2A-double deficient mice (Miyazaki, 2023, PMID 28514688; Miyazaki, 2025, PMID 39904558; Berrett, 2019, PMID 31852728; Qian, 2019, PMID 30898894; Peng, 2020, PMID:32817168). The ILCs in these mice had TCRg rearrangements, consistent with a shared origin with WT thymic-derived ILCs. In unpublished data from our lab, we found an increase in the numbers of ILC2 but not ILC3 in HEB cKO fetal thymic organ cultures. We did not follow up on this work any further since the topic was being heavily pursued in other labs, but remain very interested in this branchpoint, and will mention the literature in the discussion.

      Joint recommendations for the authors:

      (1) Experimental validation (for mechanistic clarity)

      The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors to substantiate the proposed cooperative mechanism.

      See above; new experiments with ATAC-seq and additional ChIP-seq analysis.

      (2) Figures

      Potential inconsistencies in Figure 1H: In the legend to Figure 1H, Vg1-Vg5- cells are considered Vg6+ cells. Flow plots show reduced A Vg1-Vg5- population in HEBc ko mice, but the accompanying bar plot shows increased frequency of Vg6+ cells.

      Vg6 cells are actually considered to be Vg4-Vg5-Vg1- cells (not Vg4- Vg1- cells, which is important in the fetal context). The flow plot shows the percentage of Vg6 cells out of the Vg1-Vg4- population, whereas the bar plot shows the percentage of Vg6 cells out of all gdTCR+ cells. The ratio of Vg6 to Vg5 cells decreases within the Vg1-Vg4- population, whereas the overall percentages and numbers of Vg6 cells in all gd T cells is increased in HEB cKO mice. We have now more clearly explained this in the text and the figure legend.

      Clarify which cells produce IL-17A in Figure 1L.

      This plot is gated on all gd T cells stimulated with PMA/ionomycin; this has been added to the results and figure legend.

      In Supplementary Figure 2, legend, do the authors mean that TRGV4 was depleted? The authors write TRDV4. Please check.

      Thank you for catching this mistake, we have corrected it.

      In Figure 7, the Author showed Id3 mRNA expression. Can the expression of Id2 be included?

      That is a really interesting question, and we will follow up on it in future studies.

      If Id1 or Id4 are relevant for any of these studies, can their expression be shown in Supplementary Figure 3A? If these are minimally expressed or not expressed, this could be mentioned.

      Id1 and Id4 were not detectable in our studies, this is now stated in the results section describing expression of E proteins and Id proteins.

      (3) Discussion

      Discuss possible redundancy between HEB and E2A, as E2A expression appears unaffected in HEB-deficient cells.

      See above

      Address whether the mechanisms identified at E18 (embryonic stage) also apply to neonatal or adult γδT17 subsets.

      See above

      Expand on how HEB function may relate to other hematopoietic or early lymphoid lineages (NK/ILC, DN1-DN3 stages), based on reviewer curiosity.

      See above

      (4) Methods and terminology

      Define the terms γδTe1 and γδTe2 (e.g., early effector subsets).

      This has been defined more clearly in several sections of the text.

      Add details to the scRNA-seq methods section (average number of cells analyzed and sequencing depth per cell).

      These details have been added.

    1. eLife Assessment

      This important study examines the role of TNF in modulating energy metabolism during parasite infection. The authors perform an elegant set of studies, however the evidence supporting the major claims of the manuscript is incomplete, particularly in highlighting a direct role for GLUT1 in monocytes. This work integrates an interesting set of observations that will be of interest to the Plasmodium and pathogenesis communities with an expanded set of experiments.

    2. Reviewer #2 (Public review):

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, that HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model regarding

      Weaknesses:

      The main conclusion of this work - that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection" is unsubstantiated. The authors show that TNFa induces GLUT1 in monocytes, but never show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection (nor the hypoglycemia phenotype they describe).

      Comments on revisions:

      The demonstration that the established TNF-iNOS-HIF-1α-glycolysis axis operates in vivo during P. chabaudi infection is valuable and relevant. However, it constitutes contextual validation and must be carefully described as such. This distinction, i.e., "what has already been shown vs. what is new" is not consistently reflected in the framing of the manuscript raising overstatement concerns. This is particularly evident in the abstract and other conclusive statements, where mechanistic novelty is implied, even when the underlying pathways/mechanisms are already known. To improve the manuscript, all sentences that refer to already established findings should be accurately described as such.

      For example, the abstract states: "Here, we show that TNF signaling hampers physical activity, food intake, and energy expenditure while enhancing glucose uptake by the liver and spleen as well as controlling parasitemia in P. chabaudi-infected mice." In this sentence, the effects of TNF signaling on physical activity, food intake, energy expenditure, glucose metabolism and control of parasitemia are unequivocally established and therefore do not, in themselves, constitute new findings. Feeding behavior, not cell-intrinsic metabolism, may drive glycemic differences

      The authors propose that TNF signaling leads to GLUT1 upregulation (in inflammatory monocytes, MO-DCs, and within the liver and spleen) during Plasmodium infection, and that this results in increased glucose uptake contributing to systemic hypoglycemia. While this is an intriguing hypothesis, we urge the authors to consider an alternative explanation that, at present, is not adequately ruled out. Given that glycemia serves as a central functional readout in the manuscript, this distinction is essential to clarify.

      The observed regulation of glycemia is likely not a direct consequence of increased glucose uptake by immune cells or by tissues but may instead reflect broader differences in disease severity across genotypes. The iNOS KO, TNFR KO, and HIF-19775ΔαLyz2 mice likely experience a dampened inflammatory response, which would blunt infection-induced anorexia and help preserve overall metabolic homeostasis. This alternate interpretation is supported by the authors' metabolic cage data showing increased physical activity in TNFR KO mice and the elevated food intake shown in Figure 2B.

      Since anorexia and energy expenditure are tightly coupled to the inflammatory milieu, it is plausible that these behavioral and systemic differences-not monocyte nor tissue GLUT1 expression per se-are the primary contributors to the observed glycemic patterns. To support their current interpretation, the authors should perform a pair-feeding experiment in which (at least) TNFR KO mice are restricted to the same food intake as infected WT controls. This would help disentangle whether differences in glycemia truly reflect immune-driven metabolic rewiring or are secondary to differences in caloric intake.

      The contribution of monocyte-specific glucose metabolism to host resistance remains unresolved.

      We appreciate the authors' effort to address the mechanistic role of glycolysis in host resistance using in vivo 2-deoxyglucose (2DG) treatment. However, I would like to point out that while this experiment is informative, it does not fully resolve the specific concern raised regarding the cell-intrinsic role of TNF-induced glycolysis in monocytes. 2DG acts systemically, inhibiting glycolysis across a wide range of cell types-including hepatocytes, endothelial cells, lymphocytes, and myeloid populations. Therefore, the observed increase in parasitemia following 2DG treatment may reflect the broad importance of glycolysis for host defense, or alternatively, may result from elevated circulating glucose levels induced by 2DG (PMID: 35841892), which could enhance parasite growth by increasing nutrient availability. Therefore, this experiment does not allow for a specific conclusion about the requirement for TNF-driven metabolic reprogramming in monocytes.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We now performed new experiments that were included in the manuscript. Our new results show that that monocyte-derived dendritic cells primed in vivo during P. chabaudi infection, or in vitro with TNF express high levels or GLUT-1 (Figures 4M, 5D, 6L). Furthermore, our new data show that mice treated with 2-DG (na inhibitor of glycolysis) are more susceptible to infection (Figures 6N, O). In addition, new results of glucose uptake by muscle and adipose tissues were added to the manuscript. Finally, figure legends were revised, densitometric analysis performed, and other issues addressed in the text.

      Please see below a point-by-point reply to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kely C. Matteucci et al. titled "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF-1α axis plays a key role in host resistance to Plasmodium infection" describes that TNF induces HIF-1α stabilization that increases GLUT1 expression as well as glycolytic metabolism in monocytic and splenic CD11b+ cells in P. chabaudi infected mice. Also, TNF signaling plays a crucial role in host energy metabolism, controlling parasitemia, and regulating the clinical symptoms in experimental malaria.

      This paper involves an incredible amount of work, and the authors have done an exciting study addressing the TNF-iNOS-HIF-1α axis as a critical role in host immune defense during Plasmodium infection.

      Reviewer #2 (Public Review):

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model.

      Weaknesses:

      The main conclusion of this work - that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection" is unsubstantiated. The authors show that TNFa induces GLUT1 in monocytes, but never show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection (nor the hypoglycemia phenotype they describe).

      We kindly disagree with the Reviewer. There is a series of experiments showing that TNFR KO (Figures 1, 2, 4), HIF1a KO (Figure 5) and iNOS KO (Figure 6) mice have partially impaired inflammatory response and control of parasitemia (Figures Figures 1E, 5G and 6B).

      To further address the issue raised by the reviewer, we performed two sets of experiments. First, we show, in vitro, the impact of TNF stimulation on GLUT1 expression and glucose uptake (Figure 4M, 5D, 6L). Our results show that GLUT1 is increased after 18 hours with TNF (100 ng/mL) stimulation in MODCs from WT mice but not from iNOS KO, HIF1a KO e TNFR KO mice. Similar results were obtained with monocytic cells derived from infected mice (Figure 4L, 5C, 6K). The results support the discussion by demonstrating that TNF stimulation influences GLUT1 expression in monocytic cells. This aligns with the proposed mechanism that TNF signaling regulates HIF-1α stabilization and glycolytic metabolism via RNI. The absence of GLUT1 upregulation and glucose uptake in TNFR KO, iNOS KO and HIF-1α KO mice further reinforces the role of RNI in promoting HIF-1α stabilization, as suggested in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      All Figure legends are not precise about the data express means {plus minus} standard errors of the means (SEM) or SD. Figure 1D shows no SD in the data from the uninfected group. It strongly suggests precise and improving all figure legends, giving more details in terms of including an explanation of all symbols, non-standard abbreviations, error bars (standard deviation or standard error), experimental and biological replicates, and the number of animals, and representative of the independent experiments.

      We apologize for the lack of details in the Figure legends. As requested, we are now indicating whether we used SEM or STDV, number of mice per group, number of replicate experiments. We also clarified the groups that are being compared, and the statistical significance indicated by the symbols. We also standardized symbols as asterisk only, and number of asterisk indicating the significance.

      Figure 1. The figure legend has no information about the organ for which TNF mRNA was measured (Figure 1D). Also, regarding the TNF data, Figure 1 C e 1D shows that the circulating levels of TNF and the expression of TNF mRNA in the liver peaked at the same time point, and after 6h, there is no difference between infected and uninfected mice. It would be expected that the TNF mRNA expression would be detected earlier than the protein, assuming that the primary source of TNF is from the liver. Is there another organ that could mainly source blood TNF levels? Did the authors have a chance to measure the blood TNF levels during infection (0-8dpi), besides the measurement at different times only on day 8?

      We included in the legend of Figure 1D that mRNA was extracted from liver.

      Liver and spleen are the main reservoir of infected erythrocytes and the main source of cytokines during the infection with the erythrocytic stage of malaria. The results presented in Figures 1C and 1D are from in vivo experiments, not a controlled cellular experiment in vitro. So, we can not conclude about exact time and synchronous production of TNF mRNA and protein. We have published earlier that during P. chabaudi infection, the peaks of TNF mRNA expression and the levels of circulating TNF protein occur between midnight and 6 am (Hirako at al., 2018). Hence the results are consistent in the results described here. In addition, this earlier study also shows that the same pattern of TNF at days 6 and 8 post-infection are similar. Furthermore, in another studies, we reported that the peak of TNF production occurs between days 6 and 10 post P. chabaudi infection (Franklin et al, PNAS, 2009; Franklin et al, Microbes and Infection, 2007). This is now clarified in the text (page 05, line 132):

      “As previously demonstrated, the circulating levels of TNF and expression of TNF mRNA in the liver peaked at 6 am (end of dark cycle) at 8 dpi (Figure 1C and 1D), and has been reported to peak between days 6 and 10 post-infection, with a consistent pattern observed on days 6 and 8.”

      Figure 2. "We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels, and Figures 2E-H)." Interestingly, the respiratory exchange rate of TNFR<sup>-/-</sup> uninfected mice seems higher in TNFR<sup>-/-</sup> uninfected mice than in naïve uninfected mice, and this pattern seems to be more pronounced in TNFR<sup>-/-</sup> uninfected mice. Is there any suggestion that could explain the change in respiratory exchange rate behavior without infection in those animals?

      At the moment, we have not investigated the basis of this difference between uninfected WT and TNFR KO mice, which goes beyond the scope of this research. This is indeed an interesting observation that should be pursued in the future by our group and elsewhere. We mentioned this difference, when describing the results (page 06, lines 155):

      “We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels and Figures 2E-H), with a slightly higher respiratory exchange rate in uninfected TNFR<sup>-/-</sup> mice. In contrast, all the evaluated parameters were decreased in infected C57BL/6 mice compared to their naïve counterparts during the light and dark cycles. When we analyzed only infected mice, the alterations in all parameters were milder in TNFR<sup>-/-</sup> compared to C57BL/6 mice (Figures 2A-D bottom panels and 2E-H).”

      Figure 3. To give an idea of the main population of non-parenchymal cells, it will be helpful to clarify briefly how non-parenchymal cells from the liver of infected or uninfected mice were isolated.

      We described in detail at Material and Methods (Page 19, Lines 566.)

      Figure 3, B, C, D, G and Figure 4K and Figure 5 A and B - Semi-quantitative data through the densitometric analysis of western blots should be included in all figures.

      Thank you for the suggestion. We now included the densitometric analysis for all Western blot results in Supplementary figure.

      Figure 4. The author describes, "We observed that except for Hexokinase-3, the expression of mRNAs of glycolytic enzymes (Hexokinase-1, PFKP, and PKM) was increased in C57BL/6 but not TNFR-/- 8dpi." Sometimes, it is hard to understand which groups have been compared to some data. Be precise in describing the statistical analysis between the groups. It seems that those genes were increased in "infected C57BL/6 in comparison to uninfected mice, but not TNFR-/- 8-dpi. Moreover, even though the authors include statistic symbols "ι, ιι, ιιι" in other legends, there is no explanation about statistic symbols in the legend of Figure 4.

      As mentioned above, we improved the descriptions of all figures in the legend, and when necessary in the main text describing the results.

      Figure 5. The authors describe, "We found that GLUT1 protein and glycolysis (ECAR) was impaired, respectively, in monocytic cells and splenic CD11b+ cells from infected, as compared to uninfected HIF-1aΔLyz2 mice (Figures 5C-5E)." The GLUT-1 expression was inhibited in both cells compared to HIF-1afl/fl mice but not even close to impaired GLUT-1 expression. There is still a robust amount of GLUT-1 expression, and significantly higher when compared to cells from uninfected mice.

      We tuned our statement to partially impaired, indicating that other host or parasite components maybe be also influencing GLUT-1 expression. In fact, we have recently published that IFNγ has also an important role in regulating GLUT1 expression in MO-DCs and this reference is mentioned in the text (page 10, line 291):

      “We found that glycolysis (ECAR) and GLUT1 expression were impaired, though partially, in monocytic and splenic CD11b+ cells from infected HIF-1aΔLyz2 mice (Figures 5C-5E) compared to infected WT mice. The level of GLUT1 expression that is still maintained is likely due to other host or parasite factors, such as IFN-γ (Ramalho 2024).”

      Figure 6. It is essential to have more information about the number of replicates in Figure 6A. However, there are just two dots replicates in the condition CD11b+ splenic cells from C57BL/6 stimulated with or without LPS (purple bars). It is essential to be precise regarding the number of experimental and biological replicates in each experiment and the statistical analysis that has been applied, including this group. Furthermore, the author concludes, "...these data demonstrated that RNI induces HIF-1α expression...." This conclusion needs a more careful description since no data supports that monocytic cells or splenic CD11b+ cells from iNOS-/- infected mice decrease stabilization of HIF-1αm using blotting, as shown in Figure 5 A.

      As mentioned above the number of replicates for each experiment was included in the figure legends.

      Minor Points.

      Figure 3. "Hepatocytes have an important role in glucose uptake from the circulation, and they do this primarily through GLUT2 (38), whose mRNA expression was downregulated (Figure 3A) and protein expression unchanged in response to Pc infection (Figure 4K)." I suggest moving the Figure 4K to Figure 3 to make it easy to follow the data description.

      We thank the reviewer for the suggestion. However, we chose to keep Figure 4K in Figure 4, as this panel includes data from TNF receptor deficient mice, and the analysis of TNF knockout models is first introduced and discussed in Figure 4. For clarity and consistency, we therefore maintained this panel within Figure 4.

      Line 433. Replace iNOS for iNOS-/- mice.

      iNOS is now replaced for iNOS-/- mice.

      Reviewer #2 (Recommendations For The Authors):

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      The main goal of this work is to study the interplay of TNF/HIF1a/iNOs in the pathogenesis in an experimental model of malaria. To dissect the molecular mechanism by which TNF induces reactive nitrogen species and regulates HIFa expression is beyond the scope of our research. Nevertheless, there is a vast literature addressing these issues. We now include in the discussion a paragraph describing the main conclusion of these studies published previously (page 12, line 363):

      "Previous studies have shown that TNF induces the production of RNI through the upregulation of iNOS via the NF-κB pathway (63, 64). TNF-mediated iNOS expression is critical for NO production, which in turn stabilizes HIF-1α by inhibiting prolyl hydroxylases (PHDs) even under normoxic conditions (58, 59). HIF-1α then upregulates the expression of glycolytic genes, including GLUT1 (22, 62).”

      Major comments

      Issues concerning novelty

      Some of the reported observations are not novel. TNFa and TNFa signaling has been demonstrated to contribute to the release of certain cytokines, and to contribute to the control parasitemia (PMID: 10225939). TNFa has been shown to increase glucose uptake in tissues (PMID: 2589544). There is a textbook about the role of INOS during the pathogenesis of malaria, including its association with parasite control (https://link.springer.com/chapter/10.1007/0-306-46816-6_15). Furthermore, other mechanisms controlling glycemia during Plasmodium infection have been shown (PMID: 35841892). The authors should adequately discuss other papers which have reported some of their findings.

      Thanks for the comments on previously existing literature. We are well aware of some of this earlier literature. Some of these earlier findings are mentioned in our manuscript. We emphasized these fundamental findings in the discussion, as requested (page 12, line 368):

      “TNF has been described as a critical mediator in malaria, driving cytokine release and parasitemia control (PMID: 10225939). It also enhances glucose uptake in tissues, aligning with our findings of increased glycolysis in monocytes (PMID: 2589544). The role of iNOS in malaria is well documented. IFN-γ and TNF induced the production of NO, which inhibits parasite growth but can cause tissue damage and organ dysfunction, especially in severe malaria (Mordmüller et al., 2002). Recent studies also highlight the complexity of glycemia regulation during Plasmodium infection describing its role in modulating parasite virulence and transmission (PMID:35841892). These studies demonstrate the critical function of TNF and iNOS in immune responses against Plasmodium, aligning with our findings of this axis and metabolic rewiring that are essential for monocyte activation and outcome of Pc infection.”

      The authors claim that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection," and contributes significantly to their effector functions (particularly parasite clearing), and the systemic drop in glycemia observed during Pc infection. Although the authors show that TNFa does result in altered metabolism and increased GLUT1 levels in a subpopulation of monocytes, the evidence that TNFa-induced glylcolysis plays a key role in host resistance is correlative at best.

      This is an important question. We did show that TNFR KO have higher parasitemia. But TNF is pleiotropic cytokine and has multiple roles on innate and acquired immunity. The experiment we have performed and helps to address this issue is the in vivo treatment with 2DG. We found that treatment with this inhibitor of glycolysis results in a increase of parasitemia. These results are now included in Figure 6.

      When considering that the majority of monocytic populations are reduced in frequency and only a small subset (i.e., Monocyte-derived DCs) increase in frequency (Fig 3K) during Pc infection, this makes it very difficult to demonstrate that a cell population whose overall frequency reduces contributes significantly to the drop in glycemia during Pc infection. The authors should therefore include experiments that demonstrate that the inhibition of glycolysis induced by TNFa in monocytes is protective and/or contributes to a decrease in extracellular glucose. The authors could assess the impact of the loss of function of GLUT1 on activated monocytes and monocyte-derived DCs on glycemia upon TNFa stimulation.

      We agree. We focused on monocytes and the derived inflammatory monocytes and MO-DCs. In fact, the frequency of monocytes, considering the inflammatory monocytes and MO-DCs, is increased both in spleen and liver. One interesting result is that the HIF1a Lysm KO mice has impaired metabolism, attenuated hypoglycemia and increased parasitemia (Figure 5). Nevertheless, we agree that our current data thus not proof that the glycemia is due to the consumption of glucose by the activated monocytes, and that these are the only cells with increased glucose consumption. This is now added to the discussion (page 13, line 395):

      "Although the frequency of MO-DCs increases during infection, other cell populations may also contribute to glucose consumption. Further experiments, including the assessment of GLUT1 function in these populations, are needed to clarify their contribution to glucose consumption during infection."

      Furthermore, in the current state of the manuscript, it is unclear how activated monocyte populations uptake glucose. The authors claim that glucose uptake by activated monocytes is GLUT1-dependent, however, glucose transport via GLUT1 is insulin-dependent. Since Plasmodium infection is associated with insulin resistance, and almost unquantifiable levels of insulin (PMID: 35841892), and TNFa itself induces insulin resistance (PMCID: PMC43887), it is unclear how the activated monocyte population uptakes glucose. If the authors consider TNFa to be sufficient for GLUT1 induction, in vitro experiments (TNFa+monocytes) could bolster this claim (and support that GLUT1 is induced in an insulin-independent mechanism.

      There is significant evidences indicating that in contrast to GLUT4, induction of GLUT1 in mice is independent of insulin (PMID: 9801136). In our case, seems to be induced by the cytokines TNF and IFN𝛾(this study and Ramalho et al., 2024). We now performed experiments exposing monocytes to TNF and evaluating GLUT1 expression. The results indicate that monocytes exposed to TNF (100 ng/mL) for 18 hours from WT mice exhibited a significant increase in GLUT1 expression. This increase was comparable to the increased-GLUT1 phenotype observed in infected animals. The results of this experiment were included in the manuscript.

      A text was included to the discussion to clarify the issue of insulin dependence of GLUT1 expression (page 13, line 388):

      “GLUT1 expression is recognized as independent of insulin, in contrast to GLUT4 (PMID: 9801136). In our model, this regulation appears to be driven by pro-inflammatory cytokines, particularly TNF. Supporting this, our results show that in vitro stimulation with TNF, significantly increases GLUT1 expression in monocytes, accordingly to the ex vivo phenotype observed in infected animals.”

      Alternative hypothesis which might explain their phenotypes

      Figure 2 A-H: The metabolic effects of the genetic manipulations including INOS KO, TNFR KO, and HIF-1α∆Lyz2 could be explained by lesser disease morbidity owed to a reduction of inflammatory response during infection. Under this condition, the development of anorexia will not be as profound in the knock-outs compared with wild-type littermate controls, since anorexia of infection is tightly linked to the magnitude of inflammatory response. Accordingly, infected knock-out animals can keep eating, which presumably impacts glycemia, maintenance of core body temperature, and overall energetics of infected mice. The authors should exclude this possibility.

      We consider this possibility and the discussion now elaborates about this alternative hypothesis. We believe, that these two mechanisms are not mutually exclusive (page 16, line 474):

      “Although restored physical activity, food consumption and energy expenditure in knockout mice may contribute to the observed systemic metabolic parameters by altering energy balance, these effects are not mutually exclusive with the TNF-driven, cell-intrinsic metabolic mechanisms described here.”

      Minor comments

      The authors showed increased parasitemia upon TNFR and HIF1a depletion in the LyZ2 compartment. The same was observed upon organismal INOS depletion. This raises the question of whether the TNFHIF-INOS signaling axis is adaptive or maladaptive during Pcc infection. The authors should show host survival in mice lacking TNFR and HIF1a in the LyZ2 compartment, and in mice lacking INOS (presumably, they have these data).

      Despite the fact the various knockout mice have increased parasitemia and signs of disease, they all survive the infection. This is now included in the Figure legends.

      Are the higher tissue glucose levels specific to the liver and the spleen or this is a more general event? Have the authors looked at other organs?

      We now added the results of glucose uptake in the muscle and adipose tissues in figure 2. The fact that the glucose uptake is not increased in muscle and adipose tissue, further suggest that the increased glucose uptake in this model is insulin independent.

      Figure 1F: All core body temperatures are within the physiological range, i.e., >36 degrees C. This makes it unclear why the authors regarded this as hypothermia. The authors should present experiments demonstrating the development of hypothermia in Figure 1F, as they claim this.

      Temperature changes in mouse kept in animal house have been an issue discussed in the field. It is clear, however, that early in the morning (end of active period) mice have torpor. Lower temperature and physical activity.

      In Figure 4, since the authors already suggested that extra-hepatic cells, and not the liver parenchyma, contribute to glucose uptake, the authors should clarify why they analyzed the whole liver in Figure 4, and not extra-hepatic cells. Furthermore, the authors should quantify the hepatic monocytic population in non-infected versus infected wild-type animals.

      The reason we used whole liver, is that the number of non-parenchymal cells obtained from liver is limited for Western blot analysis. We thought that was important to show that expression of GLUT1 was decreased in the liver of TNFR KO mice. Nevertheless, the level of TNFR expression in different cell types in the liver was shown by flow cytometry. In addition, we performed the WB with cells extracted from the spleen, where lymphoid and myeloid cells are more abundant.

      Line 87: Phagocytizing parasitized what?

      This has been corrected in the manuscript.

      Line 111 Define RNI before being used.

      Is there a gender disparity in the TNFR KO phenotype? If yes, the authors should comment about this in their discussion.

      This has been defined and addressed in the manuscript

      Line 192: Did the authors mean 3B??

      In 3M, please plot monocytes from uninfected animals.

      The plot of uninfected animals are now included in Figure 3M

      Line 390 Remove the extra dash in HIF1a.

      Extra dash has been removed.

      Line 397 Define RA

      RA is now defined.

    1. eLife Assessment

      This study presents a valuable finding on how the GAP DLC1, a deactivator of the small GTPase RhoA, regulates RhoA activity globally as well as at Focal Adhesions. Using a new acute optogenetic system coupled to a RhoA activity biosensor, the authors present convincing evidence that DLC1 amplifies local Rho activity at Focal Adhesions. Thanks to modeling, they show that DLC1 is needed for a negative feedback loop that engage more RhoA deactivators upon RhoA activation, highlighting the complex regulation of RhoGTPases in space and time.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript of Heydasch et al. addresses the spatiotemporal regulation of Rho GTPase signaling in living cells and its coupling to the mechanical state of the cell. They focus on a GAP of RhoA, the Rho specific GAP Deleted in Liver Cancer 1 (DLC1). They first show that removing DLC1 either by a CRISPR KO or by downregulation using siRNA leads to an increased contractility and globally elevated RhoA activity, as revealed by a FRET biosensor. This result was expected, since DLC1 is deactivating RhoA its absence should lead to increasing amounts of active RhoA. To go beyond global and steady levels of RhoA activity, the authors developed an acute optogenetic system to study transient RhoA activity dynamics in different genetic and subcellular contexts. In WT cells, they found that pulses of activation lead to an increased RhoA activity at focal adhesions (FA) compared to plasma membrane (PM), which suggests that FAs contain less RhoA GAPs, more RhoA, or that FAs involve positive feedbacks implying others GEFs for example. In DLC1 KO cells, they found that the RhoA response upon pulses of optogenetic activation was increased (higher peak) both at FA and PM, which could be expected since less GAP should increase the amount of active RhoA. But surprisingly, they observed also a higher rate of RhoA deactivation in DLC1 KO cells, which is counterintuitive: less GAP should result in a slower rate of deactivation. Less GAP should also lead to a lower rate of observed RhoA activation (smaller koff) and delayed peak. Using a modeling approach and control experiments (to monitor the optogenetic intrinsic dynamics), the authors propose that there is a negative feedback in WT cell between activated RhoA and the activity of its GAPs (other than DLC1). More active RhoA decreases GAP activity such that active RhoA relaxation to its basal state is relatively slow. This negative feedback would be absent in DCL1-deficient cells, explaining the relatively faster relaxation. This hypothesis is convincing given the data and the model, and it shows that there are compensatory mechanisms at play when DLC1 is knocked down. Further on, the authors study the dynamics of DLC1 on FAs depending on the mechanical state and nicely show a causal decrease of DLC1 enrichment at FA upon FA reinforcement, hereby probing a positive feedback where RhoA activation is further amplified as the force exerted at FA is increasing. Altogether, this work highlight the extremely fine regulation in space and time of RhoGTPases that is only revealed through acute perturbations, while at the cell scale and long time scale, complex compensatory mechanisms are at play rendering knock-down or overexpression experiments not always straightforward to interpret (in the present case, knock-down of a deactivator lead to an increase of deactivation rate through the induced absence of other activity dependent-deactivators).

      Strengths:

      - Experiments are precise and well done.

      - Technically, the work brings original and interesting data. The use of transient optogenetic activation within focal adhesions together with a biosensor of activity is new and elegant.

      - The link between DLC1 and global contractility/RhoA activity is clear and convincing.

      - The surprising higher rate of RhoA deactivation in DLC1 KO cells is convincing, as well as the differences in the dynamics of RhoA between focal adhesions and plasma membrane.

      - The model is very helpful to support the hypothesis of the negative feedback loop.

      - The correlation between DLC1 enrichment and focal adhesion dynamics is very clear.

      Weaknesses:

      - The negative and positive feedback loops could have been dug more deeply molecularly (in particular discover what are the compensatory mechanisms at play), but this could be the purpose of future work.

      Comments on revised version:

      I thank the authors for the great improvement of their work and their detailed answers to my comments. The modeling work is great and really brings novelty to the story. It also helps a lot to have the data for the optoLARG recruitment. I suggest authors move to the Version of Record.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful reading of our manuscript and for the constructive and insightful feedback. In response, we performed several new experiments and analyses that significantly strengthen the study. First, we addressed the important question of optoLARG recruitment dynamics by generating a new cell line expressing optoLARG-mScarlet3 together with paxillin-miRFP, enabling us to directly quantify the dynamics of the optogenetic activator at focal adhesions and the plasma membrane. Second, we introduced a quantitative modeling framework to analyze RhoA activity dynamics during transient optogenetic stimulation. Using the measured optoLARG kinetics as input, we fitted activation and deactivation parameters for both WT and DLC1 KO cells, revealing a loss of negative feedback regulation in the KO condition. Together, these additions clarify the temporal relationships between optogenetic activation, RhoA signaling, and biosensor responses, and provide a more rigorous, mechanistic interpretation of our data. We rewrote large parts of the discussion section to reflect this new information.

      Below, we provide detailed, point-by-point responses to all reviewer comments.

      Recruitment dynamics optoLARG

      Reviewer #1:

      Public Review:

      For the optogenetic experiments, it is not clear if we are looking at the actual RhoA dynamics of the activity or at the dynamics of the optogenetic tool itself.

      Recommendations for the authors:

      For the transient optogenetic activations at FA and PM, it would be great to have one data set where the optoLARG is fused to a fluorescent protein, for example, mCherry, while FAs would be marked with paxillin-miRFP (by transient transfection to avoid making a new stable cell line). The dynamics of the optogenetic activator should be the same (on and off rates), but it can be possible that the activator is retained at FA for example. Such an experiment would help the understanding of the differential observed dynamics, where several timescales are involved: the dynamics of the opto tool, the dynamics of RhoA itself, and the dynamics of the biosensor.

      We agree with the reviewers, this is an essential control for this manuscript and the cell line will be useful in future studies. We developed a new construct containing with the recruitable SSpB domain tagged in red (optoLARG-mScarlet3) compatible with the iLid system, and paxilin-miRFP to locate the focal adhesions. From previous experiments we know that the anchor part of optoLARG system is distributed evenly across the cell membrane and is not affected by cytoskeletal structures like focal adhesions. As for the recruitable part of the optoLARG system, that translocates from the cytosol to the membrane upon blue light stimulation, we illuminated focal adhesion and non-focal adhesion regions, and quantified optoLARG dynamics. The same scripts were used for automated stimulation and analysis as were used for the rGBD recruitment experiments. We illustrate these results in the new Suppl. Fig S3. We found no significant difference in recruitment dynamics between focal adhesion/non-focal adhesion regions (Fig. S3B). We found the optoLARG dynamics fits well with inverse-exponential during recruitment under blue light stimulation, and exponential decay after blue light stimulation (disassociation phase), consistent with the expected iLID dynamics (Fig S3C). This experiment is described in detail at the end of the section "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" (Lines 303-320). We then went on to use the optoLARG dynamics as input for the models describing RhoA activity dynamics (see next comment). This should help to untangle the measured RhoA dynamics from the dynamics of the optogenetic tool.

      Quantitative analysis RhoA activity dynamics

      Public Review:

      There is no model to analyze transient RhoA responses, however, the quantitative nature of the data calls for it. Even a simple model with linear activation-deactivation kinetics fitted on the data would be of benefit for the conclusions on the observed rates and absolute amounts.

      Recommendations for the authors:

      [...] for the transient optogenetic experiments, it would be great to make a simple model, or at least to fit the curves with an on rate, an off rate, and a peak value. This will clarify the conclusions drawn for the experiments. For example, the authors claim that they observe an increased Rho activation rate in DLC1 KO cells (see sections "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" and "Discussion") but the rate is not well-defined. One can have two curves with the same activation rate but one that peaks higher (larger multiplicative prefactor) and it would resemble the presented data. This being said, the higher deactivation rate in DLC1 KO cells is evident from the data.

      We agree that a quantitative analysis and model would improve our understanding of the data. We fit the activation/deactivation kinetics and provide the values in the chapter "Optogenetic interrogation of the Rho GTPase flux in WT and DLC1 KO cells" (Lines 287-299). We then modeled the RhoA activity dynamics at focal adhesions and at the plasma membrane after transient optogenetic stimulation using a system of ODEs, using the new measurements of optoLARG kinetics as activation input. We find a close fit for the experimental data, with WT following classic Michaelis-Menten dynamics. Interestingly, when fitting the DLC1-KO data with the same model as for WT, the parameter modeling the negative feedback loop (active RhoA recruiting a GAP) is set to zero; in other words, the factor that deactivates RhoA is present at a constant concentration. We added an additional main Figure 5 describing the models and fits, and added a new Results section "Modeling indicates loss of negative RhoA autoregulation in DLC1-KO cells" (Lines 326-378), and also updated the Methods and Discussion section of the paper accordingly. We use the findings to more clearly ground the mathematical terms used to describe our results.

      Error figure 6E

      Recommendations for the authors:

      The scheme presented in Figure 6E is not supported by the data and should be modified. In this scheme, the authors show a strongly delayed peak in control cells versus DCL1 KO cells, whereas in the data the peaks appear to be at similar time points. Similarly, the authors show a strongly decreased rate of activation, whereas the initial rates appear identical in the data.

      The delayed peak we illustrated is an error, we thank the reviewers for catching it. The decreased rate of deactivation and activation, although exaggerated in the scheme, is however present in the data (and is now quantified, see answer above). We updated the figure accordingly (now Fig. 7E in the manuscript).

      Clarification term "signaling flux"

      Recommendations for the authors:

      It would be nice to define more precisely several terms that are used throughout the manuscript. For example, could the authors define what they mean by "signaling flux"? Is it the temporal derivative of the Rho levels? Or the spatial derivative?

      We agree that this was not clear in the previous version of the manuscript. We refer to "signaling flux" as the continuous cycle of RhoA activation by GEFs and inactivation by GAPs, processes that persist even when bulk RhoA activity appears steady, as introduced by Miller & Bement (2009). We now explicitly define "signaling flux" in the abstract (Lines 20-24).

      See: Miller, Ann L., and William M. Bement. "Regulation of cytokinesis by Rho GTPase flux." Nature cell biology 11.1 (2009): 71-77. https://doi.org/10.1038/ncb1814

      Recommendations for the authors:

      Also (see above) it would be nice to define precisely what are the rates: the activation rate is in general the k_on of a reaction scheme, but it will differ from the observed rate given by a biosensor. For example, with a k_on and a k_off the observed rate toward the steady-state will be given by the sum of the activation and deactivation rates. In the manuscript, the authors do not make the distinction between the activation rate with the rate of increase of the biosensor which is confounding for the reader and for the interpretation of the data.

      We update the results section to make this distinction more clear (Lines 288-300), and add a note explicitly highlighting the difference between biosensor signal dynamics and the underlying RhoA activation/deactivation rates (Lines 298-300). In addition, our newly introduced model helps disentangle the combined activation/deactivation rates into distinct GEF and GAP activity parameters.

      Improvements to figure 3

      Minor recommendation:

      In Figures 3 B and D, the stars (statistical differences) are not visible. It would be good to make them bigger or move them above the graphs.

      Thank you! We updated the graphics.

      Other changes

      Additional panel (Figure 5D) showing paxillin intensity does not change after weak optogenetic stimulation, to better illustrate the weak stimulation regime that does not trigger FA reinforcement (contrasting Figure 7). Additional small layout changes to Figure 5.

      Addition of authors that contributed to the revisions

    1. eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, though computational evidence is limited for some conformations discussed. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets.

    2. Reviewer #1 (Public review):

      Summary:

      The authors were seeking to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, the authors sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). The authors used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. They observed that while TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. They confirmed this mechanism using an additional set of simulations and used it to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The authors develop their own forcefield parameters for the RY785 molecule based on extensive QM based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single channel conductance. The authors have performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The authors conclude that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits K+ current. This conclusion is plausible given that RY785 makes stable contacts with multiple hydrophobic residues in the S6 helix, which they can also validate using a recently published closed-state Kv2.1 channel cryo-EM structure. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The authors, however, did not directly observe this semi-closed channel conformation and in fact acknowledge that more direct simulation evidence would require extensive enhanced-sampling simulations beyond the scope of this study. They have not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the authors quantified K+ permeation, they have not made any estimates of the ligand binding affinities or rates, which could have been potentially compared to experiment and used to validate their models.

      However, despite those relatively minor weaknesses, the conclusions of the study are convincing, and overall this is a solid study helping us to understand two distinct molecular mechanisms of the voltage-gated potassium channel Kv2.1 inhibition by TEA and RY785, respectively.

    3. Reviewer #2 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      The study needs to consider the possibility of multiple binding sites for PY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-023-39307-6). These advances illustrate that ligand effects cannot always be interpreted based solely on a single binding site identified previously.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets 

      We are encouraged by this favorable assessment and thank editors and reviewers for their constructive feedback and recommendations. We trust that the revisions made to the manuscript will clarify the aspects that had been perceived to be incomplete.

      Reviewer #1 (Public review):

      Summary: 

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple nonpolar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data.

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the singlechannel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models. 

      As stated in the original manuscript, we concur that the mechanism we propose remains hypothetical until further studies of the complete conformational cycle of the channel are conducted. The recently determined structure of a Kv2.1 channel in the closed state (Mandala and MacKinnon, PNAS 2025) presents an excellent opportunity to do so. Indeed, a cursory analysis of that structure shows that a Pro-Ile-Pro motif in helix S6 marks the position of the intracellular gate, where the pore domain constricts maximally (aside from the selectivity filter). As illustrated in Fig. 5, this motif is precisely where the benzimidazole and thiazole moieties of RY785 bind in our simulations. The mechanism we outline in Fig. 7 thus seems very plausible, in our view; that is RY785 occludes the K<sup>+</sup> permeation pathway before the pore domain reaches the closed conformation, explaining the observed electrophysiological effects (see Discussion). The Discussion has been revised to note the recent discovery of the aforementioned structure, its implications for the mechanism we propose, and the opportunities for further research that are now open.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore. 

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      As noted in the manuscript, we designed our simulations to explore the possibility that RY785 binds within the pore domain, because TEA and RY785 are competitive and TEA is known to bind within the pore. That RY785 did in fact spontaneously and reproducibly bind within the pore was however not a predetermined outcome; if the site of interaction for the inhibitor was elsewhere in the channel, the simulation would not have shown a stable associated state, which would have prompted us to examine other possible sites, including the voltage sensors. It was also not predetermined or foreseeable a priori that the mode of interaction we observed in simulation provides a straightforward rationale for the electrophysiological effects of RY785. Based on our results, therefore, we believe that RY785 binds within the pore of Kv2. As stated by the reviewer, other allosteric modulators are known to bind instead to the sensors; to our knowledge, however, there is no precedent of a small-molecule inhibitor that simultaneously acts on the sensors and the pore domain. We therefore believe that future studies should focus on corroborating or refuting the mechanism we propose, through additional experimental and computational work; if, contrary to our claim, RY785 is found not to bind to the pore domain, it would be logical to explore other possible sites of interaction, as the reviewer suggests. The Discussion has been modified to address this point.

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      To clarify this question, we have included a multiple sequence alignment as Supplementary Figure 1; the revised manuscript refers to this figure in the Discussion section. The alignment reveals that the cluster of residues forming contacts with RY785 (Val409, Pro406, Ile405, Ile401, and Val398) is indeed specific to Kv2.1. Among Kv channels, Kv3.1 and Kv4.1 exhibit the greatest similarity to Kv2.1 at these positions, but they differ in a crucial substitution: Ile405 in Kv2.1 is replaced by Val. This replacement shortens the sidechain, undoubtedly reducing the magnitude of the hydrophobic interaction between inhibitor and channel (Val is approximately 6 kcal/mol, i.e. 1,000 times, more hydrophilic than Ile). Kv5.1 differs from Kv2.1 at two positions: Pro406 is replaced by His, and Val409 by Ile. The introduction of His abolishes the hydrophobic interaction at that position, and the need for hydration likely perturbs all adjacent contacts with RY785. Lastly, Kv6-Kv10 and Cav channels feature entirely different residues at these positions. Consistent with these findings, a recent study by the Sack lab (https://elifesciences.org/articles/99410) has demonstrated that Kv5, Kv6, Kv8, and Kv9 pore subunits confer resistance to RY785, while a high-throughput electrophysiological study carried out by Merck (Herrington et al., 2011) reported that RY785 shows no significant activity against Cav channels. The sequence alignment offers a simple interpretation for these experimental observations, namely that RY785 is recognized by Kv2 channels through the abovementioned hydrophobic cluster within the pore domain.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      We refer the reviewer to published work, for example Stix et al, 2023 and Tan et al, 2022. We provide additional comments below.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

      We detected no persistent specific interactions between RY785 and the permeant K+ ions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript describes atomistic molecular dynamics (MD) simulations of a voltage-gated potassium channel Kv2.1 using its cryo-EM structure in the open activated state and its inhibition by a classical non-specific cationic blocker tetraethylammonium (TEA) as well as a novel selective inhibitor RY785. Using multi-microsecond-long all-atom MD runs under the applied membrane voltage of 100 mV the authors were able to confirm that the channel structure represents an open conducting state with the computed single-channel conductance lower than experimental values, but still in the same order of magnitude range. They also determined that both TEA and RY785 bind in the channel pore between the cytoplasmic hydrophobic gate and narrow selectivity filter (SF) region near the extracellular side. However, while TEA directly blocks a knock-on K+ conduction by physically obstructing ion access to the SF, the mechanism of action of RY785 is different. It does not directly prevent K+ access to the SF but rather binds to multiple residues in the hydrophobic gate region, which effectively narrows a pore and drives the channel toward a semi-closed nonconductive conformation, which might be distinct from one with the deactivated voltage sensors and closed pore observed at hyperpolarized membrane potentials. However, additional studies beyond the scope of this work might be needed to fully establish this mechanism as suggested by the authors.

      The manuscript is written very well and represents a significant advance in the field of ion channel research. I do not have any major issues, which need to be addressed. However, I have several suggestions.

      For the apo-channel K+ conduction MD simulation under the applied voltage, the authors seem to observe mostly a direct or Coulomb knock-on mechanism across the SF with almost no water copermeation. This is in line with computational electrophysiology studies with dual membrane setup by B. de Groot and others but in disagreement with multiple previous studies by B. Roux and others also using applied electric field and CHARMM force fields as in the present study. I wonder why the outcomes are so different. Is it related to the Kv2.1 channel itself, a relatively small applied electric field used (corresponding to a membrane potential of 100 mV vs. 500-750 mV used in many previous simulations), ion force field (e.g., LJ parameters), or some other factors? Could weak dihedral restraints on the protein backbone and side chains contribute to this mechanism? I also wonder if the authors might have considered different initial SF ion configurations. Related to that, I wonder if the authors observed any SF distortions in their simulations including frequently observed backbone carbonyl flipping and/or dilation/contraction.

      We are aware of these discrepancies between published simulation studies, but cannot offer a satisfactory explanation, beyond speculation. The reviewer is correct that the mechanism of ion permeation we observe is comparable to that reported by de Groot, as we noted in Tan et al, 2022 and Stix et al, 2023. Neither in this nor in those previous studies did we observe any persistent distortions of the selectivity filter – but that outcome was expected by construction. The weak biasing potentials acting on the mainchain dihedral angles allow for local fluctuations but not a persistent deformation, relative to the conductive form determined experimentally.

      For MD simulations with the ligand present, I wonder if the authors can comment on the effect of the ligand especially RY785 on the pore size or more importantly size of the hydrophobic gate. The presence of the ligand itself would definitely result in a narrower pore, but I also wonder if this would also lead to a rearrangement of pore sidechain and/or backbone residues, which would lead to a narrower pore from a protein itself thus confirming the proposed mechanism of driving the channel towards a semi-closed state. It is easy to compute but I wonder if the presence of weak dihedral restraints may preclude this analysis.

      Yes, while the simulation design used in this study allows for local fluctuations in the mainchain structure and nearly unrestricted sidechain dynamics, changes in either the secondary or tertiary structure of the channel are strongly disfavored. This approach is thus sufficient to examine ligand binding or ion flow in the microsecond timescale but not channel gating. In the revised version of the Discussion, we outline a roadmap for future computational studies of that gating process, on the basis of the open-channel structure we used and the recently determined structure of the closed state.

      The authors state that RY785 does not block K+ ion, but it does significantly slow the rate of K+ ion access to the pore Scav site. Is this not a part of the mechanism for inhibition of the channel? The authors seem to focus on the primary mechanism of inhibition as the RY785 promoting channel closing, but would it not also reduce K+ current in the open state by slowing the rate of K+ entry into the cavity and selectivity filter? The authors should address this point in the text. I am also somewhat confused that in the MD simulations performed by the authors, there is still some K+ conduction with RY785 in the pore, which is not in 100% agreement with electrophysiology experiments. Does it mean that the channel in the simulations has not yet reached that semiclosed state or a reduced K+ conduction is not observed experimentally?

      The salient experimental observation is RY785 abrogates K+ currents through Kv2 channels (Herrington et al, 2011; Marquis et al, 2022). In our view, that observation can be explained in one of two ways: either RY785 completely blocks the flow of K+ ions across the channel while the pore domain remains in the conductive, open state – like TEA does – or RY785 induces or facilitates the closing of the channel, thereby abrogating K+ flow. The fact that we observe K+ flow while RY785 is bound to the channel is therefore not in disagreement with the electrophysiological measurements, but it does rule out the first of those two possible interpretations of the existing experiments. As it happens, the second possible explanation, i.e. that RY785 facilitates the closing of the pore domain, also provides a rationale for another puzzling experimental observation, namely that RY785 shifts the voltage dependence of the currents produced by the voltage sensors as they reconfigure to open or close the intracellular gate.

      Also, I wonder if the authors considered that since there are 4 potential equivalent sites in the pore (although, overlapping) more than one RY785 might be needed to prevent K+ conduction, even though the experimental Hill coefficient of ~1 does not indicate cooperativity.

      Admittedly, our simulation design was based on the premise that only one RY785 molecule might be recognized within the pore. Based on the outcome of the simulations, we are confident that this assumption was valid, as the binding pose that we identified rules out multiple occupancy – which would be indeed consistent with a Hill coefficient of ~1.

      I also wonder if the authors considered estimating ligand binding affinities and/or "on" rates from their simulations to have a more direct comparison with experiments and test the accuracy of their models. There are multiple enhanced sampling techniques allowing to do that, although it can be a study on its own.

      We thank the reviewer for this suggestion, which we will consider for future studies.

      The authors also discussed that they could not study Kv2.1 deactivation in a reasonable simulation time. Indeed it is very challenging but they should cite previous studies e.g. 2012 Jensen et al paper (PMID: 22499946) on this subject. There are structures of Kv channels with the deactivated voltagesensing domains (VSDs) available, e..g of EAG1 channel (PDB 8EP1), although they do not have a domain-swapped architecture. There are structural modeling approaches including AlphaFold, which can be potentially used to get a Kv2.1 structure with deactivated VSDs, and targeted MD, string method etc. can be used to study transition between different states with and without bound ligands.

      As noted, a structure of a Kv2 channel with a closed pore has now been determined experimentally. In the revised Discussion, we comment on what this structure tells us about the mechanism of inhibition we propose, and how it could be leveraged in future studies.

      The authors should be commended for doing a thorough QM-based force field parameterization of RY785. However, a validation of the developed force field parameters is lacking. In terms of QM validation, a gas-phase dipole moment can be compared in terms of direction and magnitude (it's normal to be overestimated to implicitly reflect solvent-induced polarization). If there are any experimental data available for this compound, they can be tested as well.

      We agree with the reviewer that forcefield validation is important, but to our knowledge no experimental data exists for RY785 to compare with, such as hydration free energies. We did however compare the gas-phase dipole moment computed with QM and with the MM forcefield we developed based on atomic charges optimized to reproduce QM interactions with water. The MM model yields a gas-phase dipole moment of 3.94 D, which is 20% greater than the QM dipole moment, or 3.23 D. That deviation is within the typical range for electroneutral molecules (Vanommeslaeghe et al, 2010), and as the reviewer notes, reflects the solvent-induced polarization implicit in the derivation of atomic charges. As shown in Author response image 1, the orientation of the dipole moment calculated with MM (right, blue arrow) is also in good agreement with that predicted with QM (left)

      Author response image 1.

      (1) p. 3 "the last two helices in each subunit" -> "the last two transmembrane helices in each subunit".

      Thanks. Corrected.

      (2) p. 5 "and therefore do not cause large density variations e.g. 100-fold or greater.". I would be more specific here and indicate what are the actual variations in density or free energy encountered and how they are compared e.g. with thermal fluctuations (~kT).

      Thanks. The exact variations in K+ density had been included in the original manuscript, in Fig. 2C, but we failed to refer to this figure at this point in the description of the results. The ion density is plotted in a log scale to facilitate conversion to free-energy units. Corrected.

      (3) p. 6 Figure 1 caption "and along the perpendicular to the membrane" -> "perpendicular to the membrane normal"?. "The channel is an assembly of four distinct subunits (in colors);" -> "The channel is an assembly of four identical subunits (distinct by colors);". I would use the same protein coloring method in panels B and C as was used in panel A.

      Thanks. Corrected as needed.

      (4) p. 6 Figure 2 In panel B I would appreciate a representative complete ion permeation event trace. In panel C caption I would indicate corresponding sites "S0-S4, Scav" for each residue mentioned. I also would not use gray color for site names in the figure.

      We appreciate the suggestion, but believe the figure is clear as is. Panel B is meant to focused on the mechanism of knock-on. Panel A includes numerous complete permeation events. 

      (5) p. 7 Figure 3 caption. Please indicate which atoms of residues T373 and P406 were used to define SF and gate positions. Chemical structures of both TEA and RY785 would be useful. In panels C and F channel interacting residues (if any) would be helpful to show.

      The revised caption clarifies that the positions of T373 and P406 are represented by their carbonalpha atoms. A close-up view of the structures of TEA and RY785 is included in the Supplementary Information section.

      (6) p. 8. Figure 4 caption. Please indicate if N atoms ere used for density maps in panels B and C, and which value of the density was used to show meshes. In panel A please indicate what are the units of the density shown by color maps. 

      The caption has been revised to clarify these questions.

      (7) p. 9 "inside the protein" -> "inside the channel pore".

      Thanks. Corrected.

      (8) p. 10 "which lines the cavity" -> "which lines the water-filled cavity"

      We appreciate the suggestion but believe the wording is clear as is.

      (9) p.10 Fig. 5. It would be helpful to distinguish residues from different chains e.g. by different colors rather than using different colors for different residues. The S atom in RY785 is hard to recognize due to the yellow color used for C atoms. Figure 5B is very confusing. It is not clear what this plot represents. For instance, what does it mean that Pro405 has ~10 contacts in 20% of simulation snapshots? Does it mean 10 C..C/S interactions within 4.5 A? I am not sure what the value of this is. I think a bar or radar chart plot showing % of contacts with one, two, or more residues of each type would be more helpful. 

      Thanks. The revised caption ought to clarify how to interpret the plot.

      (10) p. 12 "Due to its 2-fold molecular symmetry". TEA has a tetrahedral point group or Td symmetry. It has several two-fold rotational axes though. 

      Thanks. Corrected.

      (11) p. 12 "it prevents K+ ions in the cytoplasmic space from destabilizing the K+ ions that reside in the selectivity filter" I am not sure if this statement is entirely accurate as there might be destabilization of a multi-ion SF configuration not ions per see.

      We believe this statement is clear as is.

      (12) p. 13 Fig. 7 caption "includes non-conductive or transiently inactivated states" - I am not sure what "transiently inactivated state" is as inactivation is a specific term used in ion channel research and it does not seem to be explicitly considered in this study.

      A reference has been included in the caption for readers interested in the process of inactivation.

      (13) p. 14 "the net charge of these constructs is thus zero". This would depend on the number of basic and acidic residues in the protein. 

      Yes, it does – and as a result the construct we model has a net zero charge.

      (14) p. 14 I wonder if the protein was constrained or heavily restrained during MARTINI membrane building and equilibration procedure. Otherwise, C-alpha mapping would be problematic and clashes with lipid membrane atoms might take place as well.

      It was indeed. When a protein is simulated using the MARTINI coarse-grained forcefield, its fold must be preserved through a network of strong ‘virtual’ bonds between adjacent carbon-alpha atoms. This is standard practice so we do not believe it requires further explanation.

      (15) p. 15 PME - please spell out and provide reference.

      Corrected.

      (16) p. 15 "with a smooth switching function" - is it a special or standard switching function? Also, was it used for energy or forces? 

      The switching function brings both forces and energies to a value of zero at the cut-off value, smoothly. We refer the reviewer to the NAMD manual for further details.

      (17) p. 15 '𝑘 = 1 𝑘B𝑇.' Please confirm that there is a factor of "1" there, which can be actually skipped if this is the case. 

      The value of k = 1 KBT is correct.

      (18) p. 15. Please cite PMID: 22001851 for the transmembrane electric field application technique.

      Corrected.

      (19) p. 15 "and CHARMM36m" -> "and CHARMM36m force field". 

      Corrected.

      (20) p. 16 "the four proteins subunits" -> "the four protein subunits". 

      Corrected.

      (21) p. 16. Please provide the reference for CGenFF. It's reference 49. 

      Corrected.

      Supporting Information (SI): CGenFF is misspelled in multiple figure captions in the SI. All potential energy scans indicate "angle", but some are bond angles while others are dihedral angles. Using subscripts for atom numbers is confusing and does not match the numbering scheme used in Fig. S1. So, please use the same style of numbering throughout, e.g. C46-C42-N43 (without subscripts). Please label the X and Y axes in Figsures S2-S19 and S21. In Figure S22 please perform a linear regression analysis and/or compute Pearson correlation coefficients and indicate trend lines. Table S1. It would be good to compute RMS or mean unsigned errors to get an idea about accuracy. Also, please indicate if reference QM values were scaled by 1.16 for energies or offset for distances. 

      The Supplementary Information has been corrected. We thank the reviewer for their detailed feedback. 

      Reviewer #3 (Recommendations for the authors):

      (1) The study needs to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Molecular docking and/or MD simulations could quickly test this hypothesis. If this hypothesis is not true, a comprehensive search can exclude such a possibility, which can also confirm the long-range allosteric coupling between RY785 binding in the central pore and voltage-sensing domain dynamics. 

      Please see our response above.

      (2) The authors describe RY785 as a selective inhibitor of Kv2 channels and characterize its binding residues through MD simulations. To support this claim, Figure 5 needs to include a multiple sequence alignment with other Kv channels. This would help demonstrate whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      Please see our response above.

      (3) The study applies a biasing potential to 𝜙, 𝜓, and 𝜒1 dihedral angles. Please clarify:

      (a) Is this potential solely to prevent selectivity filter collapse/degradation, as mentioned in a previous D. E. Shaw Research publication (Jensen et al., 2012)?

      Yes, that is correct.

      (b) If it applies to all amino acids, can this potential prevent other changes, such as in the voltagesensing domain?

      Yes, that is correct.

      (c) What specific "large-scale structural changes" does this potential preclude? 

      For example, it would preclude the spontaneous degradation of the secondary or tertiary structure of the protein. We have revised the Methods section to make these points clearer. 

      (d) Given that such biasing potentials on backbone dihedral angles can decrease conformational flexibility, and considering that Kv channel permeability/conductivity could be highly sensitive to filter flexibility, what insights can you provide about the impact of the force constant k on channel conductivity?

      In previous studies based on an identical methodology (Stix et al, 2023; Tan et al, 2022), we have observed good agreement between calculated and experimental conductance values – at least as good as can be hoped for, when all approximations are considered. Based on the data presented in those studies, we have no reason to believe our methodology inhibits the permeability of the channel, which is logical as the local structural fluctuations required for K+ flow across the selectivity filter are not impaired, by definition. To the contrary, the fact that these weak biasing potentials make the conductive form of the filter the most favorable state in simulation enable a clear-cut analysis of conductance under plausible simulation conditions, both in terms applied voltage and K+ concentration. We refer the reviewer to the abovementioned studies for further details and a discussion of this subject.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing. Given the compact nature of the central cavity when RY785 is bound, it would be valuable to investigate whether polar groups of RY785 (e.g., nitrogens from the amide, benzimidazole, and thiazole moieties) always interact with K+ ions. Characterizing these interactions could inform the design of similar compounds with differential modulation effects.

      We examined this possibility and detected no convincing interaction patterns between RY785 and K+ ions – logically, inhibitor and ions are in close proximity while residing concurrently within the pore, but we detected no evidence of specific interactions.

      Minor points:

      It is strongly recommended that the refined force field parameters for RY785 be shared as a separate supplementary file in CHARMM force field format. This addition would be valuable for the scientific community, allowing other researchers to use or compare these parameters in future studies.

      We agree entirely. Upon publication of the VOR for this article the forcefield parameters for RY785 will be made freely available for download at https://github.com/Faraldo-Gomez-Lab-atNIH/Download.

      The study uses a KCl concentration of 300 mM, which exceeds typical intracellular K+ levels. While this may be intentional to enhance K+ permeation probability, a brief justification for this choice should be included in the Methods section.

      Yes, what motivated this choice in this and in our previous studies of K+ channels was the expectation of a greater number of permeation events, for a given simulation length, and therefore greater confidence (i.e. statistical significance) in the observed ion conductance, or in the degree to which it might be inhibited by a blocker. It worth noting that 300 mM KCl, while atypical in the intracellular environment, is often used in electrophysiological studies. The Methods section has been amended to clarify this point.

    1. eLife Assessment

      Using a transposon sequencing (TN-seq) approach, the authors identified key genetic determinants of drug tolerance in Mycobacterium abscessus. Given that M. abscessus is inherently resistant to multiple antibiotics, this valuable study makes a significant contribution by uncovering how antibiotic tolerance is linked to reactive oxygen species (ROS) in this non-tuberculous mycobacterial (NTM) species. The solid findings further strengthen the growing evidence that ROS play a central role in the mechanism of antibiotic action and tolerance in mycobacteria. However, the use of words persistence or tolerance should follow the consensus definition given in the Balaban 2019 Nat Rev Micro paper.

    2. Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Overall impact:

      Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These observations help solidify the field. The work raises an important unanswered question: why are rifampicin and many protein synthesis inhibitors bacteriostatic with E. coli but bactericidal with pathogenic mycobacteria?

      Comments on revisions:

      I call attention to word choice, because it can indicate how familiar the authors are with the field. An issue that caught my attention was the use of the words persistence and tolerance, because they are not uniformly used in the generally accepted way (see Balaban 2019 Nat Rev Micro). In this consensus statement persistence refers specifically to a subpopulation and as such has survival kinetics that are distinct from those seen with tolerance, a phenomenon that refers to the entire population. I notice that the Balaban paper is not in the reference list. My suggestion is to take a look at the Balaban paper and then examine every use of the words tolerance and persistence in the manuscript to be sure that they fit the Balaban definition.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tnseq was carried out to identify genes involved in spontaneous (nutrient rich) or starvationinduced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      We appreciate the reviewer’s comments, but we note that 4 out of 5 genes displayed phenotypes concordant with findings of the Tn-Seq data, with katG and pafA, as well as MAB_1456c (during starvation only) and blaR (in rich media only) having decreased survival as shown in Figure 3A-D. We do agree that some of the phenotypes were more modest in a single-mutant context than in the pooled Tn-Seq screen. In addition, several mutants that had modest changes in survival also showed profound defects in resuming growth after removal of antibiotics, with the pafA mutants particularly impaired. (Figure 3 - figure supplement 1).

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

      While the role of katG varied among strains, the antibiotic-induced accumulation of ROS was seen in all three strains (Figure 6A). This suggests that in some strains other ROS-detoxification pathways are able to compensate for the loss of katG.

      (Figure 2—figure supplements 1–3)

      Figure 1—figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.

      Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      See detailed responses below.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are fieldsolidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      Text modified to add more figure references.

      (1) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      We appreciate the suggestion and have added a line to the abstract.

      (2) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      We have added mention of this in the discussion (lines 354-356).

      (3) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

      We respond to the specific details below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses.

      They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are strongly encouraged to check the references. There is some systematic error in the citations of references. Started to list but then they were too many.

      For example Ln 51, Ref #11 cited, should be #10. Ln 59, #18 is wrongly cited. Should be - Ln 104. Ref #27 wrongly cited.

      Ref #26 and #28 identical.

      Even in discussion section a lot of references are mis-cited.

      We very much appreciate the reviewer catching this issue with the import of our references and we have corrected this.

      Reviewer #2 (Recommendations for the authors):

      Below I have listed comments on specific issues that I hope are useful during revision.

      Line 21 population is singular

      Text modified

      Line 21 comma after antibiotic (subordinate clause) Line

      Text modified

      25 is how singular?

      Text modified

      Impression of abstract: the work seems to confirm and therefore generalize concepts derived from studies with E. coli. If the authors agree, such a statement would be appropriate as a final sentence. I would also look for novel features to stress in the abstract.

      Line 41 this challenge is vague

      Text modified

      Line 43 comma such as (also comma at the end of the parenthetical statement). This type of comma error is common throughout the manuscript and slows reading.

      Text modified

      Line 60 paradoxically. Is this the best concept? Or is it the natural effect of evolution (assuming that mycobacteria or their ancestors were exposed to environmental antibiotics)?

      It is certainly problematic for clearing infection.

      Text not modified.

      Line 63 highlighted uncertainties ... meaning is unclear especially since you may have changed what "model" is referring to.

      Text modified

      Line 66 models.... Do you really mean systems? Models of what?

      This refers to mechanistic models. Text not modified.

      Line 67 arrest cell division. This is written as if it were true. Does the evidence point specifically to cell division or perhaps more accurately suppression of metabolism (see Ye et al 2025 mBio).

      Both have been postulated as important. Text modified to add concept of metabolism

      ... targeted by antibiotics non-essential... Do you think that antibiotics work by inactivating essential targets? That seems overly simplistic, as lethal action is more likely the metabolic response to the damage caused. By the end of the paragraph you come around to this view, but you have already misdirected the reader. The reader is not sure what to believe. Line 70 note that there are many inhibitors of transcription and translation that only block growth, they do not rapidly kill cells

      There can be both direct, and indirect secondary killing mechanisms. We devote a significant portion of the Discussion section to this topic.

      Line 71 debate. There was indeed a debate, but reference 22 is not a valid citation for this. I think you mislead the reader by not accurately describing the debate. It was basically about the inability of Kim Lewis and James Imlay to reproduce the work of ref. 22. A great deal of prior work and then subsequent work showed that the challenge to ref. 22 lacked substance.

      (1) Text modified to fix an error in the citation number related to direct β-lactam-mediated lysis.

      (2) We agree that there is a great deal of data supporting antibiotic-induced ROS as important for bactericidal activity in many circumstances and do not argue otherwise. This sentence points out that over the years the paradigm for how antibiotics kill bacteria has evolved.

      Line 80. It seems you are starting a new topic here. What about beginning a new paragraph?

      The paragraph introduces mycobacteria of which Mabs is one. Text not modified.

      Line 85 delete the comma: it implies a compound sentence that is not delivered.

      Text modified.

      Line 109 screen singular

      Text modified.

      Line 156 these conditions is imprecise and vague

      Conditions were described in paragraph above in the manuscript. Text not modified.

      Fig 2 it would be helpful to more clearly define the meaning of the coordinates

      Text modified.

      Line 230 and throughout please indicate the location of the data being cited for rapid reader reference

      Text modified.

      Lines 315-323 You could use this paragraph as the first of the Discussion. Some readers prefer to read the Discussion before the results. For them, a summary at the beginning of the Discussion is useful.

      Text modified.

      Line 328 without underlying mechanism... for E. coli refer to Zeng PNAS 2022. Depending on when the final version of this paper happens, there should be a figure in a Zhao Zhu mLife paper on purA that will have been published. Since it is not yet available, it cannot be cited.

      We agree that the Zeng et al study is interesting and have added this reference to our discussion. However, these findings related to broad Crp-regulated tolerance actually underscore the point that we are making: that there are multiple factors (Crp, RelA, Lon, TisB, MazE, others) that mediate antibiotic tolerance.

      Line 339 where are the data?

      These data are in Figure 5, panels C, D. We have clarified the text to indicate that only a single agent from each of these classes was tested.

      Line 346 here you are summarizing evidence for ROS in killing mycobacteria. You should include the moxifloxacin study by Shee et al 2022 AAC.

      Reference added.

      Line 348 refer to James Collins' work with E. coli in which his lab examined agents with a variety of mechanisms. There seems to be a fundamental difference between E. coli and mycobacteria with respect to rifampicin, a strictly static agent in E. coli but clearly lethal in mycobacteria. Note that chloramphenicol is static in E. coli and blocks ROS production. What does it do in mycobacteria? A brief discussion of this difference might be relevant at line 362

      Text modified.

      Lines 364-368 Here the idea might be simply that there are two modes of killing, one that is a direct extension of class-specific damage (chromosome fragmentation with fluoroquinolones, for example, or cell lysis by beta-lactams) and a second that is a metabolic response to the antibiotic damage (ROS accumulation). The second type is not class specific. Within this context, the mycobacterial killing by rifampicin might be a class-specific extension of inhibition of transcription that does not occur in E. coli.

      Agreed, text modified to include this.

      Line 400 The Key Resource table is not of publication quality. Precision and repeatability can be improved by spelling out the name of the vendor and its location (City, Country). In the present case, use of BD is lab jargon.

      We appreciate the reviewer’s precision. However, this is actually not lab jargon. Becton, Dickinson and Company now refers to itself as BD (see https://www.bd.com/en-us), and the American Type Culture Collection now refers to itself as ATCC (see https://www.atcc.org/about-us/who-we-are).

      Line 639 It would be good to have experienced colleagues critically review the manuscript, especially for English usage. Listing those persons here adds to the credibility of the work

      Text not changed.

      References: please refer to the journal style. Here you use italic for titles and scientific names, thereby obscuring the scientific names. Normally article titles are not italic and scientific names are ALWAYS italic unless prohibited by journal style.

      Our reference format is concordant with eLife submission guidelines, and all references are reformatted by the journal at the time of final publication (see https://elifesciences.org/insideelife/a43f95ca/elife-references-yes-we-take-any-format-no-we-re-not-rekeying).

      Supplemental Material: Please refer to journal style. Normally this is a stand-alone document that includes a title page and carefully crafted figure legends. Supplemental figures would be numbered as 1, 2, ... A professional appearing Supplemental Material section shows author publication experience not obvious in other parts of the paper. The text indicated MIC determinations. I would like to see a table of MIC values.

      (1) MIC table added as Supplemental Table 5.

      (2) The Supplemental figures are submitted and named in accordance with eLife instructions. Please note that for eLife, there is not a stand-alone supplementary figure section with a title page as you are requesting, but instead the figure supplements for each figure are provided as online files linked to each figure.

    1. eLife Assessment

      This important study analyzed the impact of amino acid homorepeats on protein expression and solubility in yeast and E. coli. The authors provided convincing evidence that hydrophobic and positively charged amino acids are toxic and that counterselection during evolution reduced the occurrence of such proteotoxic protein sequences. This study will be of interest to cell biologists and biochemists, particularly those working on proteostasis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors created a metric to score the toxicity of specific amino acid homorepeats that accounts for differences in physicochemical properties. This "neutrality" score reflects how often a particular homorepeat appears in nature across the proteomes of different species. This can be used to understand known proteins and their characteristics, as well as inform on the upcoming field of protein design.

      Strengths:

      This study represents a very careful and thorough study of the amino acid homorepeats and does a remarkable job of accounting for the effects of the fluorescent protein tags.

      Weaknesses:

      The initial characterization of the neutrality score is missing a control of a known toxic homorepeat to help validate this method of characterizing amino acid homorepeats.

      The authors did achieve their aim of developing a metric by which to score the toxicity and properties of amino acid homorepeats. This can be used in the future with other common amino acid motifs that are not homorepeats and can help scientists refine computer models for rational protein design.

    3. Reviewer #2 (Public review):

      Summary

      The aim of this study was to assess which amino acid stretches are tolerated/favoured in the course of evolution, considering their physico-chemical properties, metabolic costs and proteotoxicity. To address this question, the authors expressed PolyX variants in yeast, E. coli and also referred to COS cells. The PolyX constructs were tagged with GFP or a different fluorescence reporter to assess expression levels and localization at the C-terminus with or without a cleavable linker or to study topological effects. The PolyX stretch was also embedded between two different fluorescent proteins. The authors used growth rate and expression levels as judged by fluorescence intensities to calculate the relative neutrality in comparison to GFP alone.

      They could show that harmful/beneficial effects depend on the specific amino acid (aa) and polar aa are tolerated well, whereas hydrophobic and positively charged aa are harmful to the cell. This is not surprising as hydrophobic and positively charged aa are known to be aggregation-prone. They could further show that the topology matters for some, but not all, PolyX variants. The PolyX stretch can affect the subcellular localization and aggregation propensity of the GFP it is fused to. Interestingly, overexpression of PolyG, PolyQ or PolyS was not harmful, and overexpression of PolyE was potentially even beneficial for the cell. The authors concluded their study with a theoretical analysis of the presence of aa stretches in various species and identified a high correlation between their expression in yeast and other species, suggesting that the selection of aa stretches is conserved and follows biochemical rules (trade-off between tolerance of expression levels, solubility, sub-cellular localization, and maybe metabolic costs).

      Strengths:

      The authors performed a high number of experiments and systematically assessed the expression and tolerance of 10mer stretches of 20 aa fused to GFP or other fluorophores in yeast and E. coli. This is an impressive effort.

      Weaknesses:

      (1) The analysis of expression levels of the various PolyX variants should not rely only on fluorescence intensities. The fusion of the PolyX stretch may affect the fluorescence properties (brightness, photostability) of the fluorescent partner and may or may not affect abundance. A quantitative analysis of PolyX-GFP (same applies to the other fusion constructs shown in Figure 3) is needed. Preferably by an MS-based proteomic analysis via peptide count. Western blot is less ideal as it would rely on epitope recognition of the respective antibody, and the epitope accessibility might be altered upon fusion with different PolyX stretches. In addition, the authors should analyse the PolyX stretch without an attached fluorophore as a control.

      (2) The images shown in Figure 4 are not very informative. The constructs should be subjected to FRAP to assess the solubility of the PolyX variants and Ssa1 (Hsp70). FCS could be an alternative as well.

      (3) The observation of the lack of mCherry fluorescence for PolyK and PolyP (Figure 4) can also be interpreted as an instability of the fusion protein (partial truncation and degradation) or quenching. The authors should test different fluorophores and different linker lengths between the PolyX stretch and the fluorophores. Fluorophore swapping (N/C-terminally) would also be a good control.

      (4) The study would benefit from a consideration of a large body of literature on protein aggregation and the contribution of amino acid composition. The here identified amino acids that as 10mer stretch are harmful to the cell and are known to be aggregation-prone and are also recognised by molecular chaperones to prevent their aggregation.

      (5) The study could further benefit from ex vivo and in vitro analyses of the PolyX constructs. They could isolate the PolyX variants and study their solubility by, e.g. light scattering outside of the cellular context.

    4. Reviewer #3 (Public review):

      Summary:

      The constraints limiting the usage of especially repetitive amino acid sequences in proteins remain enigmatic. In their manuscript, Murase et al. analyse the impact of polyamino acid homorepeats (PolyX) on the expression of EGFP-variants with PolyX modifications. Introducing a new measure, relative neutrality, allows us to rate beneficial versus harmful sequences. The authors find that especially hydrophobic repeats (I, V, W, F, Y) show harmful effects on the respective proteins, enhancing their aggregation. Hydrophilic repeats (E, S, N, Q), on the other hand, show beneficial properties but suppress proteotoxic stress. Interestingly, these observations correlate with the occurrence of such PolyX in natural proteins across the proteomes of different organisms.

      Strengths:

      The manuscript seems especially valuable in the context of rational or de novo protein design. The observations on the one hand should allow for enhancing the solubility of proteins by using beneficial PolyX. On the other hand, they explain very well why some PolyX do not occur in natural proteins. The authors present a sound, broad and well-analysed dataset. The study is well designed, the manuscript is very well written, and the conclusions drawn are overall valid.

      Weaknesses:

      The whole data set relies on the definition of the newly introduced "relative neutrality" score. Besides being a well-chosen tool, this score is limited and biased as it does not directly include a measure for "solubility" but relies on "fluorescence emission" derived from the respective EGFP-fusion-proteins.

      A second major weakness is that the influence of PolyX-modifications on secondary structure is neither analysed nor discussed.

    1. eLife Assessment

      This valuable study identifies a novel neurodevelopmental syndrome caused by variants in ARID5B, supported by solid human genetic evidence from a well-characterized cohort. While the clinical data establish a clear genotype-phenotype correlation, the functional evidence regarding the proposed molecular mechanisms remains incomplete. Addressing the gaps in the functional characterization and refining the clinical assertions would significantly strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes a putative clinical association between ARID5B genetic variants and a novel neurodevelopmental syndrome characterized by global developmental delay, intellectual disability, and occasional neuroinflammatory episodes. While the identification of 29 individuals with overlapping phenotypes and the use of a CRISPR-Cas9 mouse model suggest a potential gene-disease link, the study suffers from significant methodological gaps in variant prioritization and a lack of robust mechanistic evidence to support its primary claims. Specifically, the "neuroinflammation" component is over-emphasized despite appearing in only a minor subset of the cohort, and the molecular pathogenesis remains insufficiently explored beyond initial protein localization assays.

      Strengths:

      (1) The study proposes a new clinical syndrome associated with the ARID5B gene, distinguishing it from established Coffin-Siris syndromes related to other ARID family members.

      (2) The recruitment of a relatively large cohort of 29 individuals from diverse geographical and ethnic backgrounds strengthens the initial phenotypic description.

      (3) The combination of human clinical data, in vitro localization assays, and an in vivo mouse model provides a multi-level framework for investigating the gene's function.

      (4) The identification of variants in the exceptionally long final exon of ARID5B that escape nonsense-mediated mRNA decay (NMD) offers an interesting perspective on the molecular pathology of this gene.

      Weaknesses:

      (1) The description of the genomic methodology appears limited. A more detailed explanation of the filtration and selection process for variant prioritization is essential. The authors should provide a comprehensive summary of evidence (e.g., CADD scores, allele frequencies in gnomAD, and segregation analysis) to justify the selection of the reported variants, even if they do not strictly meet all ACMG/AMP criteria.

      (2) The cohort includes several inherited variants and missense mutations that require more robust evidence of pathogenicity. For example, the presence of the variant in population databases (gnomAD) suggests the need for careful re-evaluation of its causality. A more rigorous assessment using diverse computational metrics, such as PhyloP scores and conservation analysis, is necessary to confirm the pathogenicity of the missense variants.

      It is recommended that the authors re-evaluate the cohort to ensure that only variants with strong evidence of causality are included to maintain a clear genotype-phenotype correlation.

      (3) The proposed molecular mechanism would benefit from further empirical support. The claim of NMD escape is currently supported by only a small number of cases, and a much more detailed explanation is also required for the experimental data provided.

      Although the mouse model exhibits developmental abnormalities, it does not recapitulate the other systemic features reported in humans. In addition, given that "brain development" is a central theme, the manuscript lacks detailed neuroanatomical data, histopathology, or other molecular biological (e.g., RNA-seq) evidence from brain specimens to substantiate these claims at a molecular level.

      (4) The emphasis on "neuroinflammation" in the title may be disproportionate to its observed frequency. Central nervous system inflammation was identified in only a small subset of the cohort (2 of 29 individuals).

      Without additional experimental validation, such as immunological challenges in the Arid5b mouse model, it is premature to characterize this as a hallmark feature. Additionally, the inconsistent response to immunotherapy suggests that the autoimmune component requires further investigation.

      (5) Supplementary tables require reorganization to improve clarity. The current structures make it difficult for readers to effectively analyze the data, and a more standardized format is recommended.

      (6) As the manuscript proposes a novel disease entity, a more comprehensive clinical discussion is warranted. The authors should provide a more systematic description of the core clinical features and, crucially, address the genotype-phenotype correlation. Specifically, a more detailed analysis is required to determine whether the clinical severity or the presence of specific features varies according to the location of the variant or the type of mutation. Such insights are essential for clinicians to differentiate this syndrome from other ARID-related disorders.

    3. Reviewer #2 (Public review):

      Summary:

      The authors compiled 29 patients with various neurodevelopmental symptoms due to the ARID5B mutations. Although not directly, the mouse model demonstrated that the heterozygous mutant mouse showed mild behavioral problems. It would be interesting to see if the mice carry craniofacial features.

      Strengths:

      The HEK293T model showed that the mutant protein mis-localized, but did not show whether the mutation caused any changes in epigenetic status. Nevertheless, this paper delivers clear support for genotype-phenotype correlation.

      Weaknesses:

      (1) The paper would be improved by providing pedigrees of some of the patients with inherited variants.

      (2) Figure 3d could provide more species for an accurate conservation assessment.

    4. Reviewer #3 (Public review):

      Summary:

      In the present study, through international gene-matching efforts, the authors present 29 individuals with rare, heterozygous ARID5B variants and find that these individuals have a newly recognizable neurodevelopmental syndrome. A recurring clinical syndrome of developmental delay/intellectual disability, behavioral difficulties, renal malformation, and recurrent infections is described. 19 of these variants were confirmed to be de novo, and only one was inherited from an unaffected parent. 24/29 of these variants introduce premature termination codons in the final exon and are predicted to escape nonsense-mediated decay. The ARID5B p.Q522Ter variant was studied in a mouse heterozygous knock-in model, found to be associated with behavioral abnormalities. The well-described genetic and phenotypic data for this cohort provide convincing clinical evidence for a novel neurodevelopmental syndrome. The functional evidence provided is preliminary, and further studies are needed to understand disease mechanisms.

      Strengths:

      (1) The authors give a good description of a novel clinical syndrome manifesting as developmental delay/intellectual disability, facial dysmorphism, and behavioral challenges.

      (2) The authors create a mouse model harboring an Arid5b(Q522*/+) variant and identify subtle behavioral changes.

      (3) Attempts are made to functionally characterize a subset of ARID5B variants in human cell lines.

      Weaknesses:

      (1) The title - "ARID5B mutations cause a neurodevelopmental syndrome with neuroinflammation episodes" - should be revised. 2/29 individuals (7%) had CNS inflammation; this does not appear to be a core feature of the disease and should not be highlighted as such. If this is going to be a feature that is highlighted, then more details are needed. MRI images of cerebellitis and/or ADEM would be helpful, as well as lumbar puncture results and supplemental information detailing the treatment course.

      (2) The abstract states that "Remarkably, 19 of 29 variants (66%) cluster within the first quarter of exon 10, are de novo, and escape nonsense-mediated mRNA decay (NMD), which we confirmed for two variants affecting seven individuals." The authors state in the Results that they "indeed found no signs of NMD". In Figure 3f, when assessing for transcript amount, there appears to be a great deal of variability. Three ARID5B variant lines are tested. Transcript amounts in two lines appear to be near control levels, but one LCL ARID5B Ile497AsnfsTer31 line appears to demonstrate significantly lower levels of transcript. The control lines also show a great deal of variability. No explanation is given for this large difference between LCL ARID5B Ile497AsnfsTer31 lines and for the variability in control lines, making these data uninterpretable. A major theme of the paper is that early truncating variants in exon 10 escape NMD and lead to the described phenotypes, so this is an important point that needs to be resolved, either by testing more patient-derived lines or knocking in these variants into cell lines.

      (3) The Arid5b(Q522*/+) mice are not sufficiently molecularly characterized. Does the variant transcript escape NMD? What happens at the protein level? Is there mislocalization of the protein?

      (4) For the HEK293T cell experiments, variants are overexpressed and compared to a control. These experiments appear to leave endogenous ARID5B intact. What might the authors expect to see if these variants were knocked in?

      (5) The functional consequences of the missense variants are not tested. The authors suggest that missense variants may be more associated with macrocephaly and possibly ASD. Are these missense variants causing loss-of-function or gain-of-function? Is there preserved protein function?

      (6) There are a number of functional assays performed, but it remains unclear if the tested variants are operating through a loss- or gain-of-function. Are truncating variants early in exon 10 leading to a partial loss-of-function? Or do they prevent the functioning of the other allele through a dominant negative mechanism? These possibilities are not directly tested.

    1. eLife Assessment

      This study presents important findings by identifying small molecules that can stabilize and refold missense-mutated VHL tumor suppressor protein, offering a potential therapeutic approach for clear cell renal cell carcinoma. The computational design approach is well-executed, but the evidence is incomplete due to insufficient demonstration that HIF2 downregulation occurs through on-target VHL rescue rather than off-target effects. Additional experiments with appropriate controls are needed to establish the specificity of the mechanism.

    2. Reviewer #1 (Public review):

      Summary:

      This is an excellent and strong paper. The authors not only show the mechanisms of action of destabilizing mutations in VHL, but notably, they also go on to computationally design and experimentally test an inhibitor that restores wild-type pVHL function, offering starting points for a new class of kidney cancer drugs. The approach that the authors take here can be used to target destabilizing mutations in repressor proteins, common in diseases, including cancer.

      Strengths:

      This paper is the culmination of an extraordinary amount of work, over years, including method development and testing by a broad range of tools and experiments. It is thorough and comprehensive. It is also well-written and easy to follow.

    3. Reviewer #2 (Public review):

      Summary:

      Inactivating VHL mutations are common in clear cell renal cell carcinoma, and about half of those mutations unfold/destabilize the protein rather than directly interfering with critical protein-protein interactions. The authors identify a compound that can stabilize/refold mutant VHL and seemingly restore its ability to downregulate its major downstream targets.

      Strengths:

      The authors use a clever combination of virtual and cell-based screens, followed by suitable biophysical and cell-based validation assays, to arrive at a VHL refolder. This compound is suboptimal from an ADME point of view, but could be a starting point for further medicinal chemistry optimization. Success would have implications for other diseases linked to similar loss-of-function mutations.

      Weaknesses:

      The authors need to tighten up the evidence that the VHL refolder is downregulating HIF2 in a strictly "on-target" manner.

      (1) In Figure 3C, the increase in VHL stability looks very modest. Taking into account the increased abundance of the VHL protein at time 0 in the presence of CP4 compared to control, it is not so clear that VHL is decaying much more slowly in the presence of CP4. I understand that the fact that the signal is low in the absence of CP4 at time 1 hour makes it hard to quantify the half-life of p30 in the absence of the drug (is a longer exposure needed?). However, perhaps the authors could try to quantify the p19 half-life.

      (2) In going from CP4 to CP4.29 the authors screened based on downregulation of HIF. This is logical but also introduces the danger of identifying chemicals that can downregulate HIF in an "off-target" manner, i.e. non-specifically. It is therefore essential to clearly show that CP4.29 downregulates steady-state levels of HIF and HIF target genes in cells with suitable (hydrophobic core) VHL mutants but not in isogenic cells lacking VHL. Another prediction is that these chemicals should be inert in cells with VHL mutations that directly abrogate HIF binding. So Figure 4E (HIF2 target genes) needs the use of the isogenic VHL-/- cells described later in the paper. And the steady-state levels of HIF2 should be measured in the isogenic cells (mutant VHL vs -/-) with or without CP4.29. Figure 4G, as it is done now, is too indirect and not very compelling. I don't understand why the "time 0" HIF2 levels aren't lower in the presence of CP4.29, and I think the half-life differences with treatment are very subtle to my eyeball densitometer (although I applaud the authors' attempt to quantify), with the exception of I180N. I agree that Figure 4F is encouraging, but hypoxia has many effects, and this experiment is not as "clean" as the VHL-/- experiments. The same applies to some of the pharmacologic agents in Figure 5.

    1. eLife Assessment

      This important study advances our understanding of developmental timing mechanisms by studying the cleavage, nuclear translocation, and oscillation of the transcription factor MYRF-1 (vertebrate MYRF) during C. elegans larval development. The evidence supporting the conclusions is solid, with elegant genome engineering experiments and state-of-the-art microscopy. The work will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      The current study is a follow-up to a previously published article in eLife in 2021, demonstrating that the transcription factor MYRF-1 interacts with the transmembrane protein PAN-1, which is required for the stability and targeting of MYRF-1 to the plasma membrane. There, MYRF-1 undergoes self-catalytic cleavage of its intracellular domain and translocates to the nucleus. Here, the authors analyze the activation of MYRF-1 during the larval development of C. elegans. They nicely show that MYRF-1 cleavage and nuclear translocation oscillate with larval stage transitions. They further identify two regions in MYRF-1 and PAN-1 that negatively regulate MYRF-1 cleavage and activation, and show that relief of this negative regulation causes premature lin-4 activation and overrides nutrient-responsive developmental checkpoints. The experiments are elegant and accurately support the conclusions raised. There are only minor comments and suggestions to improve the manuscript.

    3. Reviewer #2 (Public review):

      In this study, Xu et al. investigated the regulatory mechanisms controlling intramolecular cleavage of the transmembrane transcription factor MYRF-1, an important event that controls developmental progression in C. elegans.

      The authors made important advances in several aspects:

      (1) Through endogenous gene editing/tagging, further supported by western blots, the authors convincingly demonstrate the novel finding that the intramolecular cleavage and nuclear translocation of MYRF-1 is not static, but temporally controlled within each developmental stage: with nuclear translocation peaking at the late stage and then declining into lethargus/molts between developmental stages (Figure 1).

      (2) They demonstrate that this cleavage and nuclear translocation is controlled by external stimuli, namely starvation.

      (3) They reveal modes of regulation of the intramolecular cleavage that is mildly regulated by MYRF-1's own JM domain as well as the CCT tail of interacting partner PAN-1.

      The conclusions of this paper are mostly well supported by data, but some aspects of the manuscript and conclusions should be clarified and extended to strengthen its findings.

      (1) The authors concluded that the intramolecular cleavage and nuclear localization of MYRF-1 were similarly temporally-regulated in all tissue types. However, the data/image presented was limited to specific regions/cell types that were inconsistently chosen across developmental windows. For example, for the cleavage/nuclear translocation across L1 into lethargus (Figures 1B, E, F, G), the heads of the worm were shown to comprise mostly neurons and muscles. While across the rest of the larval stages, only mid-body pictures were shown, comprising mostly hypodermal and some intestinal cells. A complete coverage of all tissues across all time points would better support the author's conclusion that this temporal regulation occurs similarly in all tissue types. Additionally, the authors should clearly indicate which tissue/cell-types were used in the quantifications, as these were not done for several figure panels (including but not limited to Figure 1I and J).

      (2) Related to point 1 above, this inconsistency in tissue assessment was also true for downstream experiments (Figures 2-6; e.g., starvation, JM, and CCT regulation, etc.). Broad tissue specific assessment for all downstream experiments would greatly enhance the strength and relevance of the findings. Judging by the current data presented (Figures 3, 5, 6), it seems to suggest that there are tissue/cell-type differences in the regulation of MYRF-1 nuclear translocation.

      (3) Developmental progression was superficially and inconsistently assessed across the study. Developmental progression was mainly assessed by hypodermal (V-lineage) division patterns and worm length in this study. Several glaring omissions that should have been examined were the lengths of larval stages/lethargus and molting defects, as well as gonad development, to help identify which developmental landmarks were affected vs. not.

      (4) The phosphorylation within MYRF-1's JM domain was insufficiently investigated. There were two serine phosphorylation sites that were discovered through mass spectrometry experiments, however the authors only investigated one of the serine (S623) residues without any justifications for the choice. Additional investigation of the other residues, as well as both together, would strengthen the relevance of these phosphorylation events to cleavage and nuclear translocation, especially considering the minimal effect observed with only mutating the one residue.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors identified dual inhibitory mechanisms, an intrinsic juxtamembrane (JM) region and an extrinsic cytoplasmic tail (CCT) domain in the binding protein PAN-1, that suppress MYRF-1 cleavage in C. elegans. The authors showed that MYRF-1 cleavage oscillates across larval stages, peaking in mid-to-late phases and being suppressed during molts. This oscillatory pattern is consistent with MYRF-1's role in promoting transitions of larval stages, particularly in late-L1 involving lin-4 activation and DD neuron remodeling.

      Strengths:

      This work generated several knock-in strains of fluorescent tags and mutations in the endogenous myrf-1 and pan-1gene loci, which will provide resources for future identification and characterization of the underlying molecular mechanisms regulating MYRF-1 cleavage inhibition.

      The results presented in the paper are solid enough to support the paper's main conclusions.

      This study is valuable for establishing MYRF-1 cleavage as a key gatekeeper of the C. elegans developmental timing. Findings from C. elegans MYRF-1 may provide insight into the regulation and function of mammalian MYRF.

      Weaknesses:

      The following points should be discussed to further support the authors' model that MYRF-1 cleavage is a key gatekeeper of developmental timing.

      (1) Recent findings by Helge Großhans and Jordan Ward groups showed that KIN-20 (CK1δ) and LIN-42 (PERIOD) are required for proper molt timing in C. elegans, and that loss of LIN-42 binding or of the phosphorylated LIN-42 tail impairs nuclear accumulation of KIN-20, resulting in arrhythmic molts (EMBO J. 44, 6368-6396, 2025). In this paper, the authors concluded that PAN-1 promotes MYRF trafficking to the cell membrane, where MYRF-1 cleavage and nuclear translocation occur, and that oscillates with developmental molting cycles in C. elegans. It is unclear whether MYRF-1 and KIN-20 interact in the nucleus and, if so, how this interaction controls developmental timing.

      (2) Separately, it was previously shown that the let-7 primary transcript (pri-let-7) exhibits oscillating, pulse-like expression that peaks during each larval stage, rather than a steady increase, and directly correlates with developmental molting cycles. It is unclear whether the nuclear-localized MYRF-1 fragment regulates the oscillatory primary let-7 expression during larval transition (McCulloch and Rougvie, 2014; Van Wynsberghe et al., 2011).

    1. eLife Assessment

      This manuscript by Feng et al. provides valuable evidence regarding the hematopoietic differentiation of bone marrow endothelial cells in the adult mouse. Overall, the authors have addressed our main concerns. Solid data now more strongly support long-term multi-lineage reconstitution of the adult hemogenic endothelial cells. However, critical data, especially regarding the endothelial cells' hematopoietic identity and functional capacity, remain insufficient, which limits the strength of the hemogenic claim, especially the assertion that these adult hemogenic ECs generate bona fide HSCs. Additional experiments would be necessary to fully rule out alternative explanations.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3-4% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line, or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      Weaknesses:

      While the updated manuscript now provides strong evidence for Cdh5-CreER+ cells as a source of myeloid-biased hematopoiesis, the true identity of these "adult EHT stem cells", their differentiation hierarchy, the kinetics, the EHT mechanism, and the physiological relevance of this process remain unaddressed.

    3. Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers which was transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood forming capacity ex vivo. This endothelial cells were transplantable and contribute to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood forming potential corresponds to its high Runx1 expressing property.

      Comments on revised version:

      Overall, the authors have addressed our main concerns, and the revised manuscript is improved. The new data now more strongly supports long-term multilineage reconstitution of the adult hemogenic ECs. However, critical data, especially regarding the ECs' hematopoietic identity and functional capacity remains insufficient, which limits the strength of hemogenic claim, especially to assert that these adult hemogenic ECs generate bona fide HSCs.

      Points that are sufficiently addressed:

      (1) Exclusion of the potential contamination during cell sorting for the ex vivo CD45- ZsGreen+ fraction culture has been explicitly shown to be of a high purity in Fig. 2B.

      (2). The pre-cultured ZsG+ fraction is shown to having a long-term multilineage reconstituting capacity (10 months chimerism, Fig. 2J), which increases confidence that the fraction is not limited to short-lived progenitors.

      Points that are insufficiently addressed:

      (1) As noted in the "Limitation of Study", the absence of LT-HSC phenotyping and/or secondary transplantation data of ZsG+ donor limits confidence in concluding that the adult hemogenic BM-ECs generate HSPCs.

      (2) The lack of early to the end of reconstitution kinetics in Fig 2E-2J restricts interpretation of whether the donor fraction contains rapid reconstituting transient progenitor versus sustained repopulating HSCs.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      We appreciate the Reviewer’s consideration of the strengths of our study supporting the identification of adult endothelial to hematopoietic transition (EHT) in the mouse bone marrow.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

      Recognizing the importance of the weaknesses pointed by the Reviewer, we provide below our response to the thoughtful recommendations rendered.

      Reviewer #1 (Recommendations for the authors):

      The main model is to label cells using Cdh5 (VE-cadherin) CreERT2 genetic tracing. Cdh5 is a typical marker of endothelial cells. The data shows that, when treating adults with tamoxifen, the model labels PBMCs after ~10 days, and the labeling kinetics plateau by day 14... The authors reach the main conclusion: that adult ECs are making hematopoietic cells.

      We agree that the main tool used in this study is to label endothelial cells (ECs) using Cdh5 (VE-Cadherin) CreERT2 genetic tracing in mice. Indeed, Cdh5 is recognized as a good marker of ECs. As a minor point, we wish to clarify that the results from treating adult Cdh5-CreERT2 mice with tamoxifen (Figure 1F) show that the ZsGreen labeling kinetics plateau by day 28 (not by day 14).

      Important controls should be shown to rule out alternative possibilities: namely, that the CreERT2 reporter is being sparsely expressed in HSPCs. Many markers, specific as they may seem to be, can show expression in non-specific lineages - particularly in the cases of BAC and PAC transgenic models, in which the transgene can be present in multiple tandem copies and subject to genome location-specific effects. As the authors remind readers, the Cdh5 gene is partly transcribed (though at low levels) in HSPCs, and even more clearly expressed in specific subpopulations such as CLPs, DCs, pDCs, B cells, etc. Some options would be to: i) check if the Cdh5-CreERT2 transgene (not endogenous Cdh5, but the BAC/PAC transgene) is expressed in LSKs (at least by qPCR), ii) verify if any CreERT2 protein levels are present in LSKs (e.g., by western blot), and iii) check if tamoxifen is labeling any HSPCs freshly after induction (e.g., flow cytometry data of ZsGreen LSKs at 24-48h post tamoxifen injection).

      We fully agree with the Reviewer that many markers, allegedly specific to a certain cell type, can show expression in other cell lineages. We also agree that excluding sparse or ectopic CreERT2 expression in hematopoietic stem and progenitor cells (HSPCs) is essential for interpreting lineage-tracing results. As suggested by the Reviewer, we have now examined if the Cdh5-CreERT2 transgene is expressed in bone marrow LSKs. To this end, we analyzed the Polylox single-cell RNAseq dataset presented in this study, containing ZsGreen<sup>+</sup> ECs and enriched ZsGreen<sup>+</sup> LSKs. As shown in the revised Figure S4D, CreERT2 transcripts were detected exclusively in Cdh5-expressing endothelial populations and were absent from Ptprc/CD45-expressing hematopoietic cells, except for plasmacytoid dendritic cells (pDCs; Figure S4E). These results are consistent with the RNAseq data from adult mouse bone marrow[1] showing that the Cdh5 gene is not expressed in HSPCs, CLPs, DCs, or B cells. Rather, among hematopoietic CD45<sup>+</sup> cells, Cdh5 is only expressed in a small subset of plasmacytoid dendritic cells (pDCs), which are terminally differentiated cells. These published results are described in the text.

      To further support this conclusion, we provide additional single-cell RNAseq analyses from our unpublished dataset of LSKs isolated from Cdh5-CreERT2/ZsGreen mice and not enriched for ZsGreen expression. These new analyses were performed after integrating the single-cell data from ECs and ZsGreen<sup>+</sup> hematopoietic cells from the Polylox dataset (current study). As shown in Author response images 1 and 2, CreERT2 expression closely matches the expression patterns of Cdh5, Pecam1, and Emcn and is not detected in Ptprc/CD45-expressing hematopoietic cells.

      Author response image 1.

      Expression of CreERT2, Cdh5, Ptprc and ZsGreen in BM cell populations enriched with ECs and hematopoietic cells. The single-cell RNAseq results are derived from ZsGreen-enriched BM ECs and ZsGreen-enriched BM hematopoietic cells were derived from Polylox lineage-tracing experiments (data shown in Fig. 5; 37,667 ECs and 48,065 BM hematopoietic cells) and from LSKs (23,017 cells) independently isolated from tamoxifen-treated Cdh5-CreERT2/ZsGreen mice without ZsGreen enrichment (unpublished data).

      Author response image 2.

      Expression of CreERT2, Cdh5, Ptprc, Pecam1, Emcn, ZsGreen1, Col1a2, Cd19, Cd3e, Itgam (CD11b), Ly6a (Sca-1), Kit(cKit), Cd34, Cd48, Slamf1 (CD150), and Siglech in enriched BM ECs and LSKs from Cdh5-CreERT2/ZsGreen mice treated with tamoxifen 4 weeks prior to harvest (same cell source as indicated in Author response image 1).

      Additionally, we functionally tested whether hematopoietic progenitors could acquire ZsGreen labeling following tamoxifen administration using transplantation assays (Figure 4A-D). ZsGreen<sup>-</sup> LSKs (purity 99%), sorted from Cdh5-CreERT2/ZsGreen donors that had never been exposed to tamoxifen to exclude background Cre leakiness, were transplanted into lethally irradiated wild-type recipients. After stable hematopoietic reconstitution, recipients were treated with tamoxifen. If transplanted HSPCs or their progeny expressed CreERT2, tamoxifen administration would be expected to induce ZsGreen labeling. However, no ZsGreen<sup>+</sup> hematopoietic cells were detected in these recipients, demonstrating that hematopoietic progenitors from Cdh5-CreERT2/ZsGreen and their descendants do not undergo tamoxifen-induced recombination.

      Together, the single-cell transcriptional and transplantation data demonstrate that CreERT2 expression and tamoxifen-induced recombination are restricted to Cdh5-expressing ECs (except for pDCs). These findings support the conclusion that ZsGreen<sup>+</sup> hematopoietic cells arise from adult bone marrow ECs rather than from contaminating hematopoietic progenitors.

      One important missing experiment is to trace how ECs actually do this hematopoietic conversion: meaning, which populations of HSPCs are being produced by adult ECs in the first instance? LT-HSCs? ST-HSCs? MPPs? GMPs? All of the above? What are the kinetics? Differentiation is likely to follow a hierarchical path, but this is unclear at the moment.

      We agree that defining the earliest EC-derived hematopoietic cell progenitors and the kinetics by which these progenitors appear (LT-HSC vs ST-HSC/MPP vs lineage-restricted progenitors) would provide important insights into adult EHT.

      In the current genetic labeling system, a rigorous kinetic analysis of hematopoietic cells first generated by EC-derived in vivo is not straightforward. Specifically, the low-level baseline reporter ZsGreen<sup>+</sup> fluorescence in hematopoietic cells (dependent on EHT occurring prenatally, perinatally or in young mice or other causes (Figure 1 A-D and Figure S1 D-I) impairs identification of newly generated ZsGreen<sup>+</sup> progenitors at early time points and distinguish them from baseline fluorescence. A potential solution might be to introduce serial harvests across multiple time-points in large mouse cohorts to capture rare transitional events with statistical significance.

      We wish to emphasize that the primary objective of this study was to establish whether adult bone marrow ECs have a hemogenic potential. Our data demonstrate adult EC-derived hematopoietic cell output that includes progenitor-containing fractions and multilineage mature progeny, under both steady-state conditions. We acknowledge that the current work does not resolve the order and kinetics of hematopoietic cell emergence following EHT. Therefore, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      One warning sign is how rare the reported phenomenon is. Even when labeling almost 90% of the BM ECs, these make at most ~3% of blood (less than 1% in the transplants in Figure 4F, less than 0.5% in the col1a2 tracing in Figure 7). This means this is a very rare and/or transient phenomenon... The most major warning sign is the fast kinetics of labeling and the fast plateau. We know that: a) differentiation typically follows some hierarchy, b) in situ dynamics of blood production are slow (work by Rodewald and Höfer). Considering how fast these populations need to be replaced to reach a steady state so rapidly (as reported here, 2-4 weeks), the presumably specialized ECs would need to be steadily dividing and producing hematopoietic cells at a fast pace (as a side prediction, the adult "EHT" cluster would likely be highly Mki67+). More importantly, the ZsGreen LSKs produced by the ECs would have to undergo VERY rapid differentiation (much faster than normal LSKs) or otherwise, if 3% of them are produced by a top compartment (the BM ECs) every 4 weeks, then the labeled population would continue to grow with time. The authors could try to challenge this by testing if the ZsGreen LSKs undergo much faster differentiation kinetics or lower self-renewal (which does not seem to be the case, at least in their own transplantation data). We believe a more likely explanation is that the label is being acquired more or less non-specifically, directly across a bunch of HSPC populations.

      The Reviewer correctly notes that that the population of hemogenic ECs in the adult mouse bone marrow is small and the output of hematopoietic cells from these hemogenic ECs accounts for at most 3% of blood cells. We agree that delineating the kinetics by which hematopoietic cells are generated from adult EC is important, as this information would provide important insights into adult EHT.

      Nonetheless, we believe that the rapid appearance and early plateau of labeled blood cells in our experiments may not derive from a sustained, high-rate generation of labeled blood cells from self-renewing top-tier hematopoietic cell compartments, such as LT-HSCs. Rather, our data are more consistent with a predominantly lineage-restricted and biased hematopoietic progenitor cell population being the source of labeled blood cells. Supporting this interpretation, longitudinal analysis of peripheral blood shows that EGFP<sup>+</sup> PBMCs are consistently enriched with myeloid cells, whereas EGFP<sup>-</sup> PBMCs are predominantly B cells (Figure 4G and H). This myeloid lineage skewing is stable over time and contrasts with what would be expected if labeling were acquired broadly and nonspecifically across the hematopoietic hierarchy. Therefore, our results are more consistent with myeloid biased progenitors being among the first populations that EHT generates.

      We acknowledge that our studies do not identify the earliest endothelial-derived hematopoietic cells produced in vivo, and do not define their differentiation kinetics. Addressing rigorously these questions would require temporally resolved lineage tracing with sufficiently powered cohorts at early time point to statistically distinguish from baseline reporter background. These important experiments were beyond the scope of the present study. As noted above, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      Transplant experiments in Figure 4 do offer a crucial experiment in support of the main conclusion of the manuscript. These experiments show that transplanted LSKs bearing the Cdh5-CreERT2 and ZsGreen reporter cannot acquire the tamoxifen-induced label post-transplantation - suggesting that the label is coming from ECs. However, it is also possible that the LSK Cdh5-CreERT expression is partly during the transplantation process... Indeed, we know through the aging data that the labeling is less active in aged mice. In any case, this would be verified by qPCR/western-blot (comparing native vs post-transplant LSKs).

      We agree with the Reviewer that the experiment in Figure 4A-D “offer a crucial experiment in support of the main conclusion of the manuscript.” The results of this experiment show that ZsGreen negative LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not acquire tamoxifen-induced ZsGreen fluorescence post transplantation, supporting the endothelial cell origin of blood ZsGreen<sup>+ </sup>cells.

      The Reviewer raises the possibility a “that the LSK Cdh5-CreERT expression is partly during the transplantation process... , and that this Cdh5-CreERT expression may occur slowly as learned “through the aging data that the labeling is less active in aged mice.” As we show in Figure 3F, tamoxifen administration induced a similar percentage of ZsGreen<sup>+ </sup>ECs in the bone marrow of Cdh5-Cre<sup>ERT2</sup>(BAC)/ZsGreen mice, whether tamoxifen was administered to 6-week-old, 16-week-old, 26-week-old or 36-week-old mice. Similar results with Cdh5-CreERT2 (BAC) mice are reported in the literature[2]. Since the mice transplanted with ZsGreen<sup>-</sup> LSKs were followed for 25 weeks after tamoxifen administration, we believe that the results in Figure 4A-D address the concern raised by the Reviewer.

      Supporting the conclusion that LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not express the Cdh5-CreERT2 under a native -non-transplant- setting, we now provide transcriptomic data from Cdh5-CreERT2/ZsGreen mice (not transplanted) showing that CreERT2 expression closely tracks with expression of canonical endothelial markers (Cdh5, Pecam1, Emcn) and is not detectable in Ptprc/CD45-expressing hematopoietic cells (Author response images 1 and 2). These data were obtained from non-transplanted mice treated with tamoxifen at ~12 weeks of age and analyzed four weeks later. Together, these results indicate that CreERT2 expression is endothelial-restricted in Cdh5-CreERT2-ZsGreen reporter mice.

      Figure 5 presents PolyLox experiments to challenge whether adult ECs produce hematopoietic cells through in situ barcoding. Several important details of the experiment are missing in the main text (how many cells were labeled, at which time point, how long after induction were the cells sampled, how many bones/BM-cells were used for the sample preparation, what was the sampling rate per population after sorting, how many total barcodes were detected per population, how many were discarded/kept, what was the clone-size/abundance per compartment). As presented, the authors imply that 31 out of ~200 EC barcodes are shared with hematopoietic cells... This would suggest that ~15% of endothelial cells are producing hematopoietic cells at steady state. This does not align well with the rarity of the behavior and the steady state kinetics (unless any BM EC could stochastically produce hematopoietic cells every couple of weeks, or if the clonality of the BM EC compartment would be drastically reduced during the pulse-chase overlap with mesenchymal cells. Important controls are missing, such as what would be the overlap with a population that is known to be phylogenetically unrelated (e.g., how many of these barcodes would be found by random chance at this same Pgen cut-off in a second induced mouse). Also, the Pgen value could be plotted directly to see whether the clones with more overlapping populations/cells (3HG, 127, 125, CBA) also have a higher Pgen. We posit that there are large numbers of hematopoietic clones that contribute to adult hematopoiesis (anywhere from 2,000-20,000 clones would be producing granulocytes after 16 weeks post chase), and it would be easy to find clones that overlap with granulocytes (the most abundant and easily sampled population) - HSPCs would be the more stringent metric.

      We thank the Reviewer for highlighting the need for a more detailed description of the Polylox experiments. To address this deficiency, we have compiled a document (Additional Supplementary Information file) containing all the specifics of the Polylox experimental and analytical parameters in one location. This includes: (i) the number of cells analyzed per population, (ii) the time points of induction and sample collection, (iii) the number of bones and total bone marrow cells used for preparation, (iv) the sampling rate following cell sorting, (v) the total number of detected barcodes per population, (vi) barcode filtering criteria and numbers retained or discarded, and (vii) clone-size and barcode number across cell compartments. We have updated the manuscript to refer readers to this Supplementary file.

      The Reviewer concluded from our results (Figure 5, Figure S5) that 31 out of ~200 endothelial cell (EC) barcodes shared with hematopoietic cells (HCs), implying that ~15% of ECs produce hematopoietic cell progeny at steady state. This interpretation in inconsistent with our data showing the rare nature of adult EHT and would require either that a large fraction of bone-marrow ECs can generate hematopoietic cells within short time windows, or that EC would clonally expand rapidly during the pulse-chase period, as noted by the Reviewer. The explanation for this apparent problem is technical. Briefly, the ~200 EC barcodes recovered do not represent all barcoded ECs. During Polylox barcode library construction, a mandatory size-selection step is applied prior to PacBio sequencing, retaining fragments that are approximately 800–1500 bp in length, whereas the full Polylox cassette spans ~2800 bp. This is mainly because the PacBio sequencer requires that the library be either 800-1500bp or over 2500bp, for optimal sequencing results. As described in the original Polylox publication[3,4], this size selection eliminates most (approximately 75%) longer barcodes, together with ~85% of the shorter barcodes. Thus, ECs harboring very long or short recombined barcodes are under-represented or excluded from sequencing. As a result, the 22 true barcodes linking ECs and HCs recovered from sequencing do not indicate that ~10–15% of ECs generate hematopoietic progeny. Rather, these barcodes represent a highly selected subset of ECs with barcode configurations compatible with library recovery and sequencing. The observed EC–HC barcode sharing thus reflects qualitative lineage connectivity, not the quantitative frequency of endothelial-derived hematopoiesis at steady state.

      The Reviewer correctly notes that true Polylox barcodes are shared by ECs and mesenchymal-type cells and asks that we examine whether this overlap could occur by chance alone. The Polylox filtering threshold (pGen < 1 × 10<sup>-6</sup>), that we have revised for stringency (from pGen < 1 × 10<sup>-4</sup>, without altering the essential results; new Figure S4 and revised Figure 5C-F) renders such overlap exceedingly unlikely. At this threshold, the expected number of random recombination events among 4,069 barcoded cells is approximately 0.004. Consequently, among the 87 mesenchymal cells identified here, fewer than 0.4 cells would be expected, to share a barcode with another cell by chance alone. Thus, the probability of recovering identical barcodes across unrelated lineages due to random recombination is vanishingly small, and the observed EC–mesenchymal barcode sharing substantially exceeds random expectation.

      Related to this observation, the Reviewer correctly notes that the endothelial and mesenchymal cell lineages are phylogenetically unrelated. However, endothelial-to-mesenchymal cell transition (EndMT), the process by which normal ECs completely or partially lose their endothelial identity and acquire expression of mesenchymal markers, is a well-established process that occurs physiologically and in disease states (Simons M Curr Opin Physiol 2023). In the bone marrow, the occurrence of EndMT has been documented in patients with myelofibrosis, and the process affects the bone marrow microvasculature (Erba BG et al The Amer J Patholl 2017). Single-cell RNAseq of non-hematopoietic bone marrow cells has shown the existence of a rare population of ECs that co-expresses endothelial cell markers (Cdh5, Kdr, Emcm and others) and the mesenchymal cell markers, as shown in Figure 6E and F.

      We fully agree with the Reviewer that given the large number of hematopoietic clones contributing to adult hematopoiesis -particularly granulocyte-producing clones- it may be relatively easy to detect barcode overlap with abundant mature populations, whereas overlap with HSPCs would represent a more stringent and informative metric of lineage relationships. The Polylox results presented here show the sharing of true barcodes between individual ECs and HSPC.

      Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      We thank the Reviewer for the supportive comments about our study.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      We agree that FACS sorting can never achieve 100% cell purity and that sorting purity is critical for interpreting the ex vivo culture experiments presented in our study. As requested by the Reviewer, we have now documented the purity of the sorted endothelial cell (EC) population used in the ex vivo culture experiments. The post-sort purity of CD45<sup->/sup>VE-cadherin<sup>+</sup>ZsGreen<sup>+</sup> ECs was 96.5 %; this data is now shown in the revised Figure 2B (Post Sort Purity panel). This purity level is comparable to purity levels of sorted ECs shown in Figure S2I (94.5 %).

      While we agree that a detailed time-course analysis of hematopoietic cell output from EC cultures could further strengthen the conclusion that bone marrow ECs can produce hematopoietic cells ex vivo, we wish to call attention to the additional critical control in the experiment shown in Figure 2B-D. In this experiment, we co-cultured CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells from Cdh5-CreERT2/ZsGreen mice, rather than ECs, and examined if these hematopoietic cells could produce ZsGreen<sup>+</sup> cell progeny after 8-week culture under the same conditions used in EC co-cultures (conditions not designed to support hematopoietic cells long-term). Unlike ECs, the CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells did not generate ZsGreen<sup>+</sup> hematopoietic cells at the end of the 8-week culture, indicating that the culture conditions are not permissive for the maintenance, proliferation and differentiation of hematopoietic cells. This provides strong evidence that even if few hematopoietic cells contaminated the sorted ECs, these hematopoietic cells would not contribute to EC-derived production of hematopoietic cells at the 8-week time-point. We have revised the text of the results describing the results of Figure 2B-D.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      The original manuscript reported survival and engraftment up to 12 weeks post transplantation. The recipient mice have now been monitored for up to 10 months post transplantation. These extended survival and engraftment data are now included in the revised Figure 2I and J replacing the previous 10-week analyses.

      We agree with the Reviewer that the time-course kinetics of donor cell repopulation would help define adult endothelial to hematopoietic transition (EHT) and the hematopoietic cell types produced by adult (EHT). We did not perform serial time-course sampling of peripheral blood beyond the 10-week and the 10-month time-points. Given that the recipient mice were lethally irradiated with increased susceptibility to infection, we sought to minimize repeated interventions that could compromise animal health and survival. We therefore prioritized long-term survival and endpoint analysis over repeated longitudinal sampling. Nonetheless, the long-term survival,10 months, and multilineage hematopoietic cell reconstitution after lethal irradiation provides functional evidence that adult EHT produced at least some LT-HSC.

      We acknowledge that phenotypic assessment of bone marrow LT-HSC chimerism /or secondary transplantation would further strengthen the manuscript. We have clarified these limitations in the revised manuscript under “Limitations of the study”.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      We agree with the Reviewer that, in most cases, a larger number of experimental data points is helpful to strengthen the conclusions, and that having additional mice transplanted with ZsGreen-enriched LSK would be desirable. However, we do not believe that additional mice transplanted with ZsGreen LSKs would strengthen the conclusions drawn from the experimental results shown in Figure 4D, in which we used 6 mice transplanted with ZsGreen-depleted (ZsGreen<sup>-</sup>) LSKs and 2 mice transplanted with ZsGreen<sup>+</sup>-enriched (ZsGreen<sup>+</sup>) LSKs. The independence of adult EHT from “pre-existing hematopoietic cell progenitors” is based on the following experimental results and conclusion from these results.

      First, ZsGreen<sup>-</sup> LSKs (purity 99%) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 6). These ZsGreen<sup>-</sup> LSKs robustly reconstituted hematopoiesis, demonstrating successful engraftment. Importantly, tamoxifen administration to the recipients of ZsGreen<sup>-</sup> LSKs produced no detectable ZsGreen<sup>+</sup> cells in the blood for up to 6 months post transplantation (Figure 4D, blue line encompassing the results of the 6 mice). This result demonstrates that the transplanted ZsGreen<sup>-</sup> hematopoietic progenitors and their progeny do not acquire ZsGreen labeling in vivo following tamoxifen treatment, indicating that they lack the Cre-recombinase. This result is consistent with the endothelial specificity of Cdh5 expression.

      Second, ZsGreen<sup>+</sup> LSKs (accounting for ~50% of the LSKs) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 2). This arm of the experiment was performed in part as a technical control to confirm successful engraftment and detection of ZsGreen<sup>+</sup> hematopoietic cells in the transplant setting. Importantly, tamoxifen administration to the two recipients of ZsGreen<sup>+</sup> LSKs (Figure 4D, two green lines reflecting these two mice) show that the level of ZsGreen<sup>+</sup> blood cells stabilized in each of the mice between week 10 and 24, showing equilibrium between the proportion of ZsGreen<sup>+</sup> and ZsGreen<sup>-</sup>cells in the blood. This indicates that pre-existing ZsGreen<sup>+</sup> LSK are not responsible for tamoxifen-induced increases in ZsGreen<sup>+</sup> hematopoietic cell in blood.

      Together, the results from this experiment demonstrate that in the setting of transplantation, tamoxifen does not induce ZsGreen labeling of ZsGreen- hematopoietic progenitors/their progeny. This result strongly supports the conclusion that ZsGreen⁺ hematopoietic cells arise independently of pre-existing or inducible hematopoietic progenitors. We have revised the text to clarify these experiments and to present the results in a simplified manner.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

      We agree that the biological significance and functional roles of hematopoietic cells generated de novo from adult bone marrow ECs remain important open questions. We also agree that the output of hematopoietic cells from adult EHT is low, but rare events can be important, particularly as they pertain to stem/progenitor cell biology. Both points are described under “Limitations of the study”. The primary goal of the present study was to address the question whether adult bone marrow ECs can undergo EHT. We believe that the combination of various mouse transgenic lines, different Cre-ER, different reporters (ZsGreen and mTmG), including the s.c. barcoding reporter (PolyloxExpress), different approaches to evaluate hematopoiesis in vivo and ex vivo, makes it rather unlikely that our conclusions are driven by an artifact related to a specific leaky reporter, contamination, or problems with one of the Cre-lines. The experiment where we find no tamoxifen-induced labeling of transplanted ZsGreen<sup>-</sup> LSKs derived from the Cdh5-CreERT2/ZsGreen mice is strongly supportive of the existence of adult EHT, virtually excluding a contribution of contaminant hematopoietic cells.

      Reviewer 2 Recommendations for the authors:

      (1) There is a discrepancy in the proportion of peripheral blood composition between different reporters (mTmG and ZsGreen) (Figure 1G and Figure S1K), especially the contrasting B cell proportion between both models. The additional comments on this data should be mentioned.

      In the revised Results section, we now note that the mTmG and ZsGreen reporters show slightly different efficiencies or kinetics of labeling. These differences have previously been reported[5] and have been attributed to relative reporter leakiness, sensitivity to tamoxifen, or different kinetics of Cre recombination. As suggested, these comments have been added to the text following the description of (Figure S2A).

      (2) Experimental methods concerning cell transplantation/transfer need more information, such as: a) using or not using rescue cells and how many cells are they if using, b) single or split dose of irradiation, c) when were cells transplanted following irradiation, etc. Otherwise, the data are uninterpretable.

      We have ensured that the Material and Methods section under “Bone marrow ablation and transplantation” contains all the information requested by the Reviewer.

      (3) Some of the grouped data haven't been statistically analyzed.

      We have reviewed all data and performed appropriate statistical analyses where comparisons were made. In the revised figures and legends, all grouped datasets now include statistical tests and p-values are indicated (added to Fig. 3H and I; Figure 4G).

      (4) Some flowcytometry plot has the quantitative number, others do not. The quantitative information is absolutely needed in all flow cytometry plots.

      We have updated the flow cytometry figures to include quantitative values (percentages or absolute counts) in all relevant plots (2B (new figure, bottom left); 2C; S1G, S1H).

      (5) It is more relevant to present the Emcn/VE-Cadherin plot from gated CD45+/ZsGreen+, not the CD45-/ZsGreen+ fraction (Figure 2C), as the latter were not the EHT-derived offspring, but rather the common phenotypic endothelial cells

      As requested, we have added the suggested flow cytometry plot. The revised Figure 2C now includes an Emcn vs. VE-Cadherin plot from the gated CD45<sup>+</sup>ZsGreen<sup>+</sup> population. This complements the existing panel and confirms that the cells of interest retain endothelial cell markers after culture, while the CD45<sup>+</sup>ZsGreen<sup>+</sup> cells did not express endothelial markers. The figure legend has been updated to explain the new panel. We agree that this plot more directly highlights the phenotype of the presumed EHT-derived cells.

      (6) To show the effect of the ex vivo culture, the authors should present the absolute number of CD45+ZsGreen+ cells in the pre-/post-culture; otherwise, the data are uninterpretable (Figure 2D).

      Our interpretation of the Reviewer’s comment above (relative to the experiment shown in Figure 2B-D) is that the Reviewer would like that we provide the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells introduced into the co-culture (supplemented with unsorted BM cells, ZsGreen<sup>+</sup> hematopoietic cell or ZsGreen<sup>+</sup> ECs) and the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. Currently, the results in Figure 2D show the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. The input of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells for unsorted BM cells was 2.93e6 on average; for ZsGreen<sup>+</sup> hematopoietic cells was 1.68e6 on average and from sorted ZsGreen<sup>+</sup> ECs was estimate up to 100.

      (7) It is confusing to see Figures 2F and 2G, which apparently show the data from the middle of the experimental procedure (Figure 2E). Those data should be labelled clearly regarding which procedures of the whole experiment protocol.

      As correctly noted by the Reviewer, Figures 2F and 2G provide data that relate to the middle of the graphical representation of the experiment shown in Figure 2E. We see how this may be confusing.

      Therefore, we have updated both the figure labeling and legend to explicitly indicate that Figure 2F and 2G provide the FACS sorting results for the cells used for transplantation. The revised legend now reads: “Representative flow cytometry plots of the non-adherent cell fraction after 8 weeks of co-culture (cells used for transplantation).”

      References

      (1) Kucinski, I., Campos, J., Barile, M., Severi, F., Bohin, N., Moreira, P.N., Allen, L., Lawson, H., Haltalli, M.L.R., Kinston, S.J., et al. (2024). A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244-259.e10. https://doi.org/10.1016/j.stem.2023.12.001.

      (2) Identification of a clonally expanding haematopoietic compartment in bone marrow | The EMBO Journal | Springer Nature Link https://link.springer.com/article/10.1038/emboj.2012.308.

      (3) Pei, W., Shang, F., Wang, X., Fanti, A.-K., Greco, A., Busch, K., Klapproth, K., Zhang, Q., Quedenau, C., Sauer, S., et al. (2020). Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell 27, 383-395.e8. https://doi.org/10.1016/j.stem.2020.07.018.

      (4) Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460. https://doi.org/10.1038/nature23653.

      (5) Álvarez-Aznar, A., Martínez-Corral, I., Daubel, N., Betsholtz, C., Mäkinen, T., and Gaengel, K. (2020). Tamoxifen-independent recombination of reporter genes limits lineage tracing and mosaic analysis using CreERT2 lines. Transgenic Res 29, 53–68. https://doi.org/10.1007/s11248-019-00177-8.

    1. eLife Assessment

      This study provides valuable insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compared male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the analyses provided do not provide sufficient discrimination and incompletely support the central claims regarding sex differences in the TCR repertoire and potential autoimmune bias.

    2. Reviewer #1 (Public review):

      Summary:

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male or female human. To address this, this group sequenced TCRs from double-positive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. The experiments are heroic, yet do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past databases using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found little overlap of their sequences with these annotated sequences (depending on the individual, ranged from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a generalizable finding in the human population.

      Strengths:

      It is a novel dataset that attempts to understand sex differences in the T cell repertoire in humans. Overall, the methodologies are sound and are the current state-of-the-art. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females. This is an important negative result.

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. This reviewer recognizes the difficulty in obtaining samples for this experiment (which were from deceased donors), and this limitation was appropriately discussed. Their analysis was limited by the current availability of other TCR sequences. These weaknesses were appropriately discussed and considered.

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.1365-2796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi:10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia (doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.<br /> In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compare male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the results do not provide sufficient analytical rigor and incompletely support the central claims.

      The statement in the editorial assessment that our study “does not provide sufficient analytical rigor” surprised us. TCR repertoire analysis is indeed a highly complex domain, both experimentally and computationally. We consider ourselves to be leading experts in this field and have invested a great deal of effort to ensure the rigor and reproducibility of every analytical step.

      Specifically, our group has previously benchmarked and published validated methodologies for the following areas: (i) TCR repertoire generation (Barennes et al., Nat Biotechnol 2021), (ii) repertoire analysis (Six et al., Frontiers in Immunol, 2013; Chaara et al., Frontiers in Immunol, 2018; Ritvo et al., PNAS, 2018; Mhanna et al., Diabetes, 2021; Trück et al., eLife, 2021; Quiniou et al., eLife, 2023; Mhanna et al., Cell Rep Methods, 2024; Mhanna et al., Nat Rev Primers Methods, 2024), and (iii) the curation and quality control of public TCR databases (Jouannet et al., NAR Genomics and Bioinformatics 2025). The current study applies these optimized and peer-reviewed pipelines, along with additional internal quality controls that we have been implemented over the years, ensuring the highest possible analytical standards for TCR repertoire studies.

      We therefore respectfully feel that the phrase “insufficient analytical rigor” does not accurately reflect the methodological robustness of our work. This perception is also in contrast to the comment made by one of the reviewers, who explicitly noted that “overall, the methodologies appear to be sound.”

      We would therefore be grateful if, upon reviewing our detailed point-by-point responses, the editors could reconsider this statement and tone it down in the final editorial summary.

      With regard to comment that our results “incompletely support the central claims”, we will leave it to the reader’s judgement. We believe that our work provides a robust and transparent basis for future research into TCR repertoire, autoimmunity, and women’s health.

      Reviewer 1 (Public reviews):

      Summary

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male and a female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. Though the experiments themselves are heroic, they do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found very little overlap of their sequences with these annotated sequences (depending on the individual, ranging from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a finding that is generalizable to the human population.

      Strengths:

      This is a novel dataset. Overall, the methodologies appear to be sound. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females.

      We appreciate the positive feedback from the reviewer regarding these points.

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. The cleaner experiment would have been to study the impact of sex in a number of inbred MHC I/II identical mouse strains or in humans with HLA-identical backgrounds.

      We respectfully disagree with the reviewer’s statement. We firmly believe that the issue we are dealing with, namely sex-based differences in thymic TCR selection relevant to autoimmunity, should be investigated more thoroughly in the general human population than in inbred mouse models.

      While inbred mouse strains, being MHC I/II identical, eliminate the complexity of MHC variation, this comes at the cost of biological relevance. Firstly, a discrepancy in TCR generation or selection may only become apparent under specific MHC contexts, which could easily be overlooked when studying a single inbred strain. Secondly, inbred strains frequently contain fixed genetic variants that may influence thymic selection or immune regulation. This has the potential to introduce confounding effects rather than reducing them and not solving the generalization issue.

      We are in full agreement that an HLA-matched human cohort would reduce inter-individual variability. However, such sampling is impossible in practice, as our thymic tissues were obtained from deceased organ donors, a collection effort that was, as the reviewer rightly noted, “heroic”. Despite these inherent limitations, the patterns we observed were consistent across multiple analytical approaches, lending robustness to our findings.

      We now explicitly acknowledge this limitation in the Discussion of the revised manuscript and explain why, despite this constraint, our study provides meaningful and biologically relevant insights into human TCR selection and sex-related immune differences.

      It is unclear whether there was consensus between the three databases they used regarding the antigens recognized by the TCR sequences. Given the very low overlap between the TCR sequences identified in these databases and their dataset, and the lack of replication, they should tone down their excitement about the CD8 T cell sequences recognizing autoimmune and bacterial antigens being over-represented in females.

      The three databases used in this study - McPAS-TCR, IEDB, and VDJdb - provide complementary and partially non-overlapping specificity landscapes. McPAS-TCR is enriched for pathology-associated TCRs, while IEDB and VDJdb contain a higher proportion of viral specificities. Combining them therefore broadens the antigenic spectrum accessible for analysis and represents the most comprehensive approach currently possible to capture the diversity of TCR–antigen annotations.

      With regard to the limited overlap between our dataset and these databases, this observation should be interpreted with caution. While the overlap may appear minimal at first glance, it is a biologically significant phenomenon. The public databases collectively contain only a minute fraction of the total universe of TCR specificities, estimated to exceed 10<sup>15-21</sup> possible receptors in humans. In this context, the observation of any overlap at all, particularly with coherent biological patterns such as the overrepresentation of autoimmune- and bacterialassociated TCRs in females, is noteworthy.

      We have included a short clarification in the Discussion of the revised manuscript to make this point explicit and to further temper the language describing this finding.

      The dataset could be valuable to the community.

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Reviewer #1 (Recommendations for the authors):

      I would just recommend toning down the excitement about autoimmune TCRs being overrepresented in females. Then the conclusions will be in alignment with their results.

      We thank the reviewer for this constructive recommendation. We would like to express our full support for the editorial transparency policies of eLife, which allow readers to access to both the reviewers’ comments and our detailed responses, enabling them to form their own informed opinions regarding our conclusions.

      Nevertheless, we have moderated some of our wording.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important, and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues that require substantial improvement. In several instances, the authors conclude that there are no sex-associated differences for specific parameters, yet inspection of the data suggests visible trends that are not properly quantified. The authors should either apply more appropriate statistical approaches to test these trends or provide stronger evidence that the observed differences are not significant. In other analyses, the authors report the differences between sexes based on a pulled analysis of TCR sequences from all the donors, which could result in differences driven by one or two single donors (e.g., having particular HLA variants) rather than reflect sex-related differences.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Weaknesses:

      Major:

      The authors state that there is "no clear separation in PCA for both TRA and TRB across all subsets." However, Figure 2 shows a visible separation for DP thymocytes (especially TRA, and to a lesser degree TRB) and also for TRA of Tregs. This apparent structure should be acknowledged and discussed rather than dismissed.

      We thank the reviewer for this careful observation. Discussing apparent “trends” rather than statistically significant results is indeed a nuanced issue, as over-interpretation of visual patterns is usually discouraged. We agree that, within the specific context of TCR repertoire analyses, visual structures in multivariate projections such as PCA can provide useful contextual information.

      However, we have not identified a striking trend in our representation. We therefore chose to avoid overemphasizing these visual impressions in the text.

      Supplementary Figures 2-5 involve many comparisons, yet no correction for multiple testing appears to be applied. After appropriate correction, all the reported differences would likely lose significance. These analyses must be re-evaluated with proper multiple-testing correction, and apparent differences should be tested for reproducibility in an external dataset (for example, the pediatric thymus and peripheral blood repertoires later used for motif validation).

      As is standard in exploratory immunogenomic studies, including TCR repertoire analyses, our objective was to uncover broad biological patterns rather than to establish definitive statistical associations. In analyses that are discovery-oriented, correction for multiple testing, while essential in confirmatory contexts, is not mandatory and may even obscure meaningful trends by inflating type II error rates. Our objective was therefore to highlight consistent directional patterns across analytical layers, to guide future confirmatory work rather than to make categorical claims.

      We also note that this comment somewhat contrasts with the earlier suggestion to discuss trends that are not statistically significant.

      With regard to the proposal to verify our observations using an external dataset, we are in full agreement that independent confirmation would be beneficial. However, as reviewer 1 rightly emphasized, the generation of such datasets from sorted human thymocyte subsets is “heroic” and has rarely, if ever, been achieved. We are aware of no existing dataset that provides comparable material or analytical depth.

      The available single-cell thymic dataset (Park et al., Science 2020) includes only a few hundred sequences per donor, which is significantly less than the number of sequences in our study. This limited dataset is not adequate for cross-validation or for representing the full complexity of thymic TCR repertoires.

      As with the pediatric thymus dataset, the lack of statistical power in the dataset due to the small number of female subjects (only three) means that sex-related differences in V/J usage cannot be evaluated.

      Finally, the peripheral blood dataset is not appropriate for validating thymic generation or selection processes, as it reflects post-thymic selection and antigen-driven remodeling, making it impossible to distinguish peripheral effects from thymic influences.

      For these reasons, none of the currently available datasets provides a sufficiently clean or powerful framework to test the reproducibility of subtle sex-associated effects on thymic TCR repertoires. Nevertheless, we fully agree that confirmation in an independent and larger cohort will be an important next step to refine these exploratory findings and assess their generalizability to a broader human population.

      Supplementary Figure 6 suggests that women consistently show higher Rényi entropies across all subsets. Although individual p-values are borderline, the consistent direction of change is notable. The authors should apply an integrated statistical test across subsets (for example, a mixed-effects model) to determine whether there is an overall significant trend toward higher diversity in females.

      We agree that Rényi entropies tend to show a consistent direction of change across subsets, with slightly higher values observed in females. In this section, our objective was to provide a descriptive overview of diversity patterns for each thymic subset. This is because these subsets are biologically distinct and therefore require individual analysis, as we previously demonstrated using the same dataset (Isacchini et al, PRX Life. 2024). Therefore, while a mixed-effects approach could in principle be applied to test for an overall trend, such an analysis would rely on the assumption of a common sex effect across heterogeneous cell types.

      It is important to note that the complete dataset has now been made publicly available, enabling interested researchers to perform additional integrative or model-based analyses to further explore these diversity trends.

      Figures 4B and S8 clearly indicate enrichment of hydrophobic residues in female CDR3s for both TRA and TRB (excluding alanine, which is not strongly hydrophobic). Because CDR3 hydrophobicity has been linked to increased cross-reactivity and self-reactivity (see, e.g., Stadinski et al., Nat Immunol 2016), this observation is biologically meaningful and consistent with higher autoimmune susceptibility in females.

      We thank the reviewer for this insightful comment.

      As correctly noted, increased hydrophobicity at specific CDR3β positions has been linked to enhanced cross-reactivity and self-reactivity, as described by Stadinski et al. (Nat Immunol 2016), and we reference this work in the manuscript.

      In our analysis corresponding to Figure 4B (TRB), hydrophobicity was quantified at the sequence level by computing, for each unique CDR3β sequence, the overall proportion of hydrophobic amino acids across the CDR3 loop. This approach aligns with that of Lagattuta et al. (Nat Immunol 2022), whose code we adapted to accommodate longer CDR3s. This global hydrophobicity metric captures overall composition, but, by its construction, does not account for positional context, the key mechanism implicated by Stadinski et al.

      As outlined in our original Figure 4C, the results were obtained through a position-based amino acid analysis. For each CDR3β sequence, we extracted the amino acid at every IMGTdefined CDR3 position (p104–p118) and quantified, at each position, the percentage of unique sequences containing each amino acid. Positions p109 and p110 correspond to the p6–p7 sites highlighted by Stadinski et al. as functionally relevant for self-reactivity. This analysis evaluates positional composition independently of clonotype frequency, focusing specifically on hydrophobic amino acid classes.

      Following the recommendation of the reviewer, the revised manuscript has removed alanine (which is only weakly hydrophobic) has been excluded from the hydrophobic residue set. With this refined definition, we observe a significant enrichment of hydrophobic amino acids at p109 in CD8 T cell repertoires from females, with similar but non-significant trends at p109 in DP and CD4 Teff cells and at p110 in CD8 cells (see new Figure 4C).

      As outlined in the revised Methods, Results, and Discussion sections, Figure 4C focuses exclusively on positional hydrophobic amino acid usage. This was previously implicit, although it was noted in the legend and visually represented in the plots.

      The majority of "hundreds of sex-specific motifs" are probably donor-specific motifs confounded by HLA restriction. This interpretation is supported by the failure to validate motifs in external datasets (pediatric thymus, peripheral blood). The authors should restrict analysis to public motifs (shared across multiple donors) and report the number of donors contributing to each motif.

      We fully agree that donor-specific and HLA-restricted motifs represent a major potential confounder in repertoire-level comparisons. To minimize this potential bias, our analysis was explicitly restricted to public motifs, as clearly stated in the Materials and Methods section:

      “Additional filters were applied so that: (i) a motif includes public CDR3aa sequences (shared by at least two individuals); (ii) a significant enrichment is detected (Fisher’s exact test, p < 0.01); and (iii) a usage difference between groups of at least twofold (Wilcoxon test, p < 0.05).”

      Accordingly, every motif reported in the manuscript is supported by at least two independent donors, ensuring that no motif reflects an individual- or HLA-specific effect (see Supplementary Figures 10-13[previously Supplementary Figure 9]). We have now added a more explicit mention of the number of donors contributing to each motif in the figure legend and have clarified this point in the revised Methods and Results sections to make this criterion more visible to readers.

      When comparing TCRs to VDJdb or other databases, it is critical to consider HLA restriction. Only database matches corresponding to epitopes that can be presented by the donor's HLA should be counted. The authors must either perform HLA typing or explicitly discuss this limitation and how it affects their conclusions.

      We respectfully disagree with the assertion that HLA typing is necessary for the type of comparative analysis we have conducted. While it is true that HLA molecules present peptides to TCRs and thereby contribute to the tripartite interaction determining T cell activation, extensive evidence indicates that the CDR3 region, particularly CDR3β, is the dominant determinant of antigen specificity. This finding is supported by structural and computational studies (Madi et al., eLife, 2017; Huang et al., Nat. Biotech., 2020; MayerBlackwell et al., Methods Mol. Biol., 2022) showing that CDR3β residues are responsible for the majority of peptide contacts, whereas CDR1 and CDR2 primarily interact with the MHC framework.

      As emphasized in several recent benchmarking studies (e.g., Springer et al., Front Immunol, 2021), CDR3β sequence composition alone captures most of the information required for specificity inference. Consequently, widely used and validated computational tools such as GIANA (Zhang et al. Nat. Commun. 2021), iSMART (Zhang et al. Clin. Cancer Res. 2020), and ATMTCR (Cai et al. Front. Immunol. 2022) rely exclusively on CDR3β aminoacid sequences and still achieve high predictive performance.

      Our analysis aligns with this well-established paradigm. While we agree that integrating donor HLA typing would refine epitope-level annotation and reduce potential noise, the absence of HLA data does not invalidate the comparative framework we used, which focuses on relative representation of annotated specificities across groups rather than on individual TCR–HLA–peptide triads.

      Although the age distributions of male and female donors are similar, the key question is whether HLA alleles are similarly distributed. If women in the cohort happen to carry autoimmuneassociated alleles more often, this alone could explain observed repertoire differences. HLA typing and HLA comparison between sexes are therefore essential.

      To address the issue of any potential differences in HLA background, we examined the subset of adult donors for whom HLA typing information was available (HLA-A, HLA-B, HLADR, and HLA-DQB; n = 16). Within this subset, the distribution of HLA alleles was relatively balanced between males and females (as illustrated by the heatmap showing HLA class II expression patterns and HLA class I family grouping in Author response image 1). This analysis suggests that the sex-associated differences in the repertoire observed in our study are unlikely to be driven solely by unequal representation of autoimmune-associated HLA alleles.

      We acknowledge, however, that complete HLA information was not available for all donors, which remains a limitation of the dataset.

      Author response image 1.

      In some analyses (e.g., Figures 8C-D) data are shown per donor, while others (e.g., Fig. 8A-B) pool all sequences. This inconsistency is concerning. The apparent enrichment of autoimmune or bacterial specificities in females could be driven by one or two donors with particular HLAs. All analyses should display donor-level values, not pooled data.

      While Figures 8A–B present pooled data to summarize global trends, the corresponding donor-level analyses were provided in Supplementary Figures 15B and 16 (previously Supplementary Figures 11B and 12). In these, each individual is shown separately, with each point representing an individual. It is important to note that these donor-resolved plots do not reveal any sample-specific driver: the patterns observed in the pooled data remain consistent across donors, without any single individual accounting for the apparent enrichments. As outlined in the revised manuscript, readers now directed to the relevant supplementary figures for further clarification.

      The reported enrichment of matches to certain specificities relative to the database composition is conceptually problematic. Because the reference database has an arbitrary distribution of epitopes, enrichment relative to it lacks biological meaning. HLA distribution in the studied patients and HLA restrictions of antigens in the database could be completely different, which could alone explain enrichment and depletions for particular specificities. Moreover, differences in Pgen distributions across epitopes can produce apparent enrichment artifacts. Exact matches typically correspond to high-Pgen "public" sequences; thus, the enrichment analysis may simply reflect variation in Pgen of specific TCRs (i.e., fraction of high-Pgen TCRs) across epitopes rather than true selection. Consequently, statements such as "We observed a significant enrichment of unique TRB CDR3aa sequences specific to self-antigens" should be removed.

      We respectfully disagree with the conclusion that our enrichment analysis lacks biological meaning. Our approach directly involves a direct comparison of the same set of observed TCR sequences between males and females. Consequently, any potential biases related to generation probability (Pgen), which affect all sequences equally, cannot account for the observed sex-specific differences. To summarize, because the comparison is performed on the same set of sequences, changes in the probability of generation across epitopes cannot explain the differences seen between the sexes.

      We do agree, however, that the composition of the reference databases may influence apparent enrichment patterns, as these resources contain uneven distributions of epitope categories and often incomplete information regarding HLA restriction. It should be noted that this limitation is inherent to all database-based annotation approaches, a fact which is explicitly acknowledged in the revised Discussion.

      The overrepresentation of self-specific TCRs in females is the manuscript's most interesting finding, yet it is not described in detail. The authors should list the corresponding self-antigens, indicate which autoimmune diseases they relate to, and show per-donor distributions of these matches.

      We thank the reviewer for this constructive suggestion.

      As recommended, we have expanded the description of the self-specific TCRs identified in our dataset and now provide this information in Supplementary Table 2 of the revised manuscript. Specifically, the table lists the corresponding self-antigens and the autoimmune diseases with which they are associated. In our curated database, these annotations primarily correspond to celiac disease and type 1 diabetes, which were the two autoimmune contexts explicitly defined in the manually curated reference datasets.

      For the “cancer” specificity group, we have clarified that antigen assignments were established based on (i) annotations available in the original databases (IEDB, VDJdb, McPAS-TCR) and (ii) cross-referencing with additional resources, including the Human Protein Atlas, the Cancer Antigenic Peptide Database (de Duve Institute), and the Cancer Antigen Atlas (Yi et al., iScience 2021), to ensure consistency in the classification of cancer and neoantigen specificities. Please refer to the Materials and Methods section for a full description of the procedure for this specific assignment.

      Donor-level distributions of these self-specific matches are now shown in Supplementary Figures 15B and 16 (previously Supplemental Figures 11B and 12), allowing direct visualization of inter-donor variability. Importantly, these plots confirm that the observed enrichment in females is not driven by a single individual, further supporting the robustness of the finding.

      The concept of poly-specificity is controversial. The authors should clearly explain how polyspecific TCRs were defined in this study and highlight that the experimental evidence supporting true polyspecificity is very limited (e.g., just a single TCR from Figure 5 from Quiniou et al.).

      We certainly agree (and regret) that the concept of TCR polyspecificity remains a subject of debate and often underappreciated in the field of immunology. As Don Mason famously discussed in his seminal essay “A very high cross-reactivity is an essential feature of the TCR” (doi: 10.1016/S0167-5699(98)01299-7) published over 25 years ago, both theoretical and experimental evidence indicates that each TCR can, in principle, recognize millions of distinct peptides, albeit with variable avidity.

      Although this principle is widely accepted, it is frequently overlooked in the field of experimental immunology. In this area, anything that deviates from strict monospecificity is often disregarded as noise.

      In our own analyses of large-scale TCR repertoires, we have repeatedly observed that many CDR3 sequences are annotated with multiple specificities across different databases, often corresponding to peptides from unrelated organisms. As demonstrated in Quiniou et al. (eLife 2023), such polyreactive TCRs exhibit distinctive features, including biased physicochemical composition, and tend to be enriched in various biological contexts. In our preliminary study of such TCRs, which have the capacity to be specific for multiple viral- and self- epitopes, we hypothesized that they may serve as a first line of defense against pathogens and also be involved in triggering autoimmunity. We therefore consider it important to report this phenomenon rather than omit it, especially given its potential relevance to both protective immunity and autoimmunity.

      In the present study, polyspecific TCRs were defined operationally as TRB CDR3aa sequences associated with a minimum of two distinct specificity groups, corresponding either to different microbial species or to multiple antigen categories within the curated database. Therefore, our definition captures broader antigenic groupings rather than epitope-level binding events.

      We fully acknowledge that direct experimental evidence for true molecular-level polyspecificity remains limited. Indeed, as the reviewer notes, only a single TCR with multiepitope reactivity has been rigorously demonstrated to date (Quiniou et al.2023). Consequently, our analysis does not make claims about structural promiscuity; instead, it uses database-annotated cross-reactivity as a proxy to explore broader repertoire-level patterns.

      As outlined in the Methods section, this definition has been clarified and its discussion expanded in the Discussion to explicitly address these conceptual and methodological nuances.

      Minor:

      Clarify why the Pgen model was used only for DP and CD8 subsets and not for others.

      As noted, computing Pgen values involves two steps: (i) training a generative model of V(D)J recombination using IGoR, and (ii) estimating generation probabilities with OLGA based on that model. Both steps require a significant amount of computing power, especially when applied to large repertoires across multiple subsets. For this reason, we focused the analysis on DP thymocytes, which represent the repertoire prior to thymic selection, and CD8 T cells after CD8 selection.

      The Methods section should define what a "high sequence reliability score" is and describe precisely how the "harmonized" database was constructed.

      Briefly, the annotated database used in this study was constructed in accordance with the procedure established in our previously published work (Jouannet et al., NAR Genomics and Bioinformatics, 2025). The study integrates three publicly available resources, IEDB, VDJdb, and McPAS-TCR, which were collected as of October 2023. These three datasets were then merged into a single harmonized compendium, undergoing extensive standardization. When entries shared identical information across databases (same V–CDR3–J for both TRA and TRB, same epitope, organism, PubMed ID, and cell subset), only one representative was kept; discrepant or incomplete entries were retained to preserve information. We then assigned a sequence reliability score, the Verified Score (VS), following the verification strategy used by IEDB. The scale ranges from 0 to 2 and reflects the concordance between calculated and curated TRA/TRB CDR3 sequences (2 = both TRA and TRB present are verified, 1.1 = only TRA verified, 1.2 = only TRB verified, 0 = no verified chain). A second score, the Antigen Identification Score (AIS), is used to rank antigen-identification methods on a scale of 0 to 5, according to the strength of the experimental evidence supporting them.

      In the present study, “high reliability” refers to sequences with a verified TRB CDR3aa chain (VS ≥ 1.2) and an AIS score corresponding to T cells in vitro stimulation with a pathogen, protein or peptide, or pMHC X-mer sorting (> 3.2, excluding categories 4.1 and 4.2), ensuring that downstream analyses were performed on a rigorously curated and biologically trustworthy dataset. The Methods section now explicitly details these criteria.

      The statement "we generated 20,000 permuted mixed-sex groups" is unclear. It is not evident how this permutation corrects for individual variation or sex bias. A more appropriate approach would be to train the Pgen model separately for each individual's nonproductive sequences (if the number of sequences is large enough).

      The objective of this analysis was to determine whether the enrichment of TRBV06-5 in females was due to random grouping of individuals or whether it was attributable to sex itself. To do so, we generated all possible perfectly mixed groups of donors (i.e., groups containing an equal number of male and female donors) for the concerned thymocyte subset, and then performed 20,000 random pairwise comparisons between such mixed groups. For each comparison, we tested the TRBV06-5 usage between the two mixed groups. This procedure directly evaluates whether group composition (independent of sex) could spuriously generate differences in TRBV usage. Notably, none of these 20,000 comparisons between the two mixed groups yielded a statistically significant difference in TRBV06-5 usage. In contrast, when comparing the true male and female groups, a significant difference was identified. This demonstrates that the signal we observe is not driven by random donor grouping or individual-level variation, but is specifically associated with sex. It is important to note that this analysis, which is designed to exclude spurious group effects, is rarely performed in published repertoire studies, yet it provides an important internal control for robustness.

      Reviewer #2 (Recommendations for the authors):

      (1) Data availability "upon request" is unacceptable. All raw and processed data, as well as scripts used for analysis and figure generation, must be publicly deposited before publication.

      We would like to clarify that our intention has always been to make this dataset publicly available. It was a mistake to suggest otherwise in the original submission.

      (2) At the beginning of the Results section, include a brief description of the dataset: number of donors, sex ratio, age range, number of samples per subset, and sorting strategy. Although Figure 1 shows this, the information should also be mentioned in the main text.

      In line with the recommendation, we have now added a summary of the cohort characteristics at the beginning of the Results section. This includes the number of donors, sex ratio, age range, number of samples per subset, and the sorting strategy used. While this information was already included in Figure 1, we concur that including it directly in the main text enhances readability.

      (3) Report the number of cells and unique clonotypes analyzed per individual. Rank-frequency plots (in log-log coordinates) would be helpful.

      We have now added, for each donor and each subset, the number of cells, and additionally for each chain, the number of total and unique clonotypes analyzed. This information is provided in the revised manuscript in a new supplementary table (Supplemental Table 1).

      These plots have been integrated into the revised manuscript as Supplementary Figure 2.

      (4) For analysis in Figure 4B, the total fraction of hydrophobic amino acids should be calculated for each patient separately, and values for men and women should be compared (analogously to Figure 4C, but for the whole CDR3 and excluding alanine).

      Please note that the TRB CDR3aa composition in Figure 4B has already been quantified at the individual level. For each unique TRB CDR3aa sequence, we computed the proportion of each of the 20 amino acids across the CDR3β loop, then summarized these values per donor (mean per individual). The log2 fold change displayed in Figure 4B (and supplemental Figure 9 for TRA) is calculated from the median donor-level values for females versus males, rather than from pooled CDR3s. It is intended as descriptive, “global” view of amino acid usage within the central CDR3 region. Hydrophobicity was not used directly in the computation, but is indicated only by bar color, based on the Kyte-Doolittle- derived IMGT classification. This provides an observational overview of amino acid composition in the central CDR3 region.

      As the mechanistic link between hydrophobicity and self-reactivity described by Stadinski et al. is explicitly position-dependent, we consider positional analyses to be the most appropriate method for formally interrogating this hypothesis, as we did in Figure 4C. Here, our primary focus was on the position-specific usage of hydrophobic amino acids at IMGT positions p109-p110. These positions correspond to the central p6-p7 positions described by Stadinski et al. For each individual, we computed the proportion of unique TRB CDR3aa sequences carrying a hydrophobic amino acid at a given position.

      Accordingly, in the revised manuscript we refined the Figure 4C by excluding alanine due to its weak hydrophobic property (as recommended by the reviewer) This positional composition analysis now reveals a statistically significant increase in hydrophobic usage at p109 in female CD8 repertoires, with similar, though non-significant, trends at p109 in DP and CD4Teff ad at p110 in CD8 cells. Figure 4B is therefore retained as an exploratory overview of amino acid composition usage along the CDR3 loop, while Figure 4C is used for the more specific question of hydrophobicity and potential cross-reactivity.

      The Methods section has been expanded to provide clearer descriptions of these computations, and the Results and Discussion sections corresponding to Figures 4B-C (and supplemental Figure 9) have been revised to make the rationale, implementation, and interpretation of these hydrophobicity analyses more explicit.

      (5) Figure 6 shows a trend toward higher clustering of Treg TCRs in males, which could relate to the lower incidence of autoimmunity in men. The authors could test whether specific Treg clusters are male-specific and shared among male donors.

      As shown in Figure 6, a clear trend towards higher similarity among Treg CDR3aa sequences in males is evident, as indicated by the proportion of sequences included in clusters and in the overall similarity density. However, identifying “male-specific clusters” shared across donors is not straightforward in our analytical framework.

      In our approach, for each cell subset, CDR3aa sequences were downsampled 100 times to the smallest sample size, and clustering was repeated at each iteration. Therefore, the clusters’ identities are not consistent across iterations. The clusters depend on the specific subset of sequences selected at each downsampling step, as well as on their underlying Pgen distribution. Therefore, it is not possible to reliably assess whether specific clusters are systematically “male-shared”. This is because cluster composition is a function of stochastic resampling rather than of biological structure. For this reason, a comparison of cluster identities across donors would not produce interpretable results.

    1. eLife Assessment

      This fundamental study investigates whether neural prediction of words can be measured through pre-activation of neural network word representations in the brain; convincing evidence is provided that neural network representations of neighboring words are correlated in natural language. This study urges future studies to carefully differentiate between neural activity that predicts the upcoming word and neural activity that encodes the current words, which contain information that can be used to predict the upcoming word. The study is of potential interest to researchers investigating language encoding in the brain or in large language models.

    2. Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very interesting alternative hypothesis for claims in prior work (e.g., Goldstein et al., 2022). In contrast to claims in prior work, the current paper convincingly demonstrates that prior results can be explained by inherent stimulus dependencies in natural language, as opposed to the brain actively predicting future linguistic content.

      Two independent datasets are analyzed. The analyses with the most and least predictive words is clever, and is nicely complementing the more naturalistic analyses. The work emphasizes how claims about linguistic prediction cannot be trivially drawn using encoding models in naturalistic designs.

    3. Reviewer #2 (Public review):

      Summary:

      At a high-level, the reviewers demonstrate that there is a explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exist could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      After author revision, I see no major weaknesses in the underlying arguments or data processing steps.

    4. Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies-specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      • The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      • They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings-an approach used in prior work.

      • The study is based on two large MEG datasets and one ECoG dataset, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the three datasets.

      Weaknesses:

      • While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      Comments on revisions:

      I appreciate the added analyses. This study raises an important methodological concern regarding an influential paper and will certainly have a high impact on our field.

    5. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive feedback, which has helped preparing a substantially improved manuscript. In response to concerns about the conceptual distinction between prediction and stimulus dependency, we have fundamentally restructured the paper around the notion of passive control systems. This involved rewriting the Abstract, Introduction, and large portions of the Results (~60% of text revised).

      Key changes:

      - New analyses on Goldstein et al. (2022) data. We demonstrate that our findings—including the insufficiency of proposed corrections—generalise to the original dataset (Figures S2B, S3B, S5C, S6B).

      - Clarified novel contribution. We now make explicit that prior control analyses (residualisation, bigram removal) do not address the concern, because hallmarks persist in passive systems that cannot predict.

      - Proposed criterion for future work. Pre-onset neural encoding can only count as evidence for prediction if it exceeds a passive baseline (e.g., acoustics).

      We believe the revision offers a clearer, more rigorous contribution and provides a constructive framework for evaluating claims of neural prediction.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      We thank the reviewer for this comment, which highlights that the previous version wasn’t sufficiently clear. Conceptually, the difference is critical: it is the difference between passively encoding or representing the stimulus (like e.g., a spectrogram of the stimulus would), and actively generating predictions.

      We have substantially changed the framing of the paper to put the notion of control systems centre-stage. One such control system is the speech acoustics: they encode the stimulus (and thus its dependencies) but cannot predict. When we observe the "hallmarks of prediction" in acoustics, this demonstrates the hallmarks can arise without any prediction.

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      Different weights are estimated per time point in the time-resolved regression. This allows the model to learn how the response to words unfolds, but also to learn different stimulus dependencies at each timepoint. Fitting on every second word would reduce but not eliminate the problem. Our control system approach provides a more principled test. We have clarified the mechanism in the Introduction (lines 82-90), explaining how correlations between neighbouring words allow the regression model to predict prior neural activity without assuming pre-activation.

      Reviewer #2 (Public Review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe our paper does punch the hole that can be punched, which is a hole in the method. Our control demonstrates that adjusting the features (X matrix) cannot address dependencies that persist in the signal itself (Y matrix). Because the hallmarks emerge in a system that cannot predict (even after linearly removing the previous stimulus) attributing pre-onset encoding performance to neural prediction (rather than stimulus structure) is fundamentally ambiguous, and different (e.g. variance partitioning) approaches would suffer from the same ambiguity. We have reframed the manuscript to make this argument more clearly.

      Reviewer #3 (Public Review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli—rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      We’d like to thank the reviewer for their comments on our preprint.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      We thank the reviewer for this suggestion. The Goldstein dataset was not publicly available when we conducted this research. However, we have now applied our control analyses to their stimulus material, and found that the exact same problem applies to their dataset, too.

      We have added analyses of the Goldstein et al. (2022) podcast stimulus throughout the paper. Results are shown in Figures S2B, S3B, S5C, and S6B. Critically, we observe the same pattern: both hallmarks emerge in the acoustic control system, and residualisation fails to eliminate them. This demonstrates that our findings generalise to the very dataset used to establish pre-onset encoding as evidence for neural prediction.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      We thank the reviewer for raising this point, as it reveals we failed to convey a central argument in the previous version. Goldstein et al.'s control analysis did not address the concern. We show that even after the control analyses that Goldstein et al. perform (removing bigrams, regressing out embedding dependencies) the "hallmarks of prediction" still emerge when applying the analysis to a passive control system that by definition does not predict: the speech acoustics. We now also show this in their data.

      To better convey this critical point, around the concept of "passive control systems". We now first establish that the hallmarks appear in acoustics (Figure 3), then show that residualisation fails to remove them (Figure 4). This makes explicit that any claim about "controlling for dependencies" must be validated against a system that cannot predict.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for this question, and we agree that the question whether pre-activation occurs is an important and interesting one. However, we ask a different question in our study: Our goal is not to definitively establish whether the brain predicts during language processing; it is to scrutinise what counts as evidence for prediction, and to correct for some highly influential claims made in the literature. The reviewer asks whether pre-activation remains "after accounting for these confounds." But the point we are trying to make is that in this analytical framework, one cannot analytically account for these confounds: corrections to the X matrix leave dependencies in the data itself intact, as the acoustic control demonstrates.

      We do offer recommendations for future work. The passive control systems approach can serve as a benchmark: pre-onset neural encoding (or decoding) can only count as evidence for prediction if it exceeds what is observed in a passive control system like acoustics (which is not what we observe). Additionally, the field could move toward less naturalistic stimuli with tighter experimental controls, reducing the correlations that make this attribution so difficult. Developing a new definitive test is beyond the scope of our paper, but we believe applying this benchmark is a necessary first step.

      To make this clearer, we have rewritten the Discussion to explicitly state this criterion (lines 331-340) and to outline these recommendations for future work (lines 337-340). We have also added a paragraph extending our argument to decoding approaches (lines 343-354), noting that the same ambiguity applies regardless of analytical direction.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      As per my "Weakness" point, I would appreciate engagement with the conceptual point related to the difference between prediction and stimulus correlations. Most importantly, I hope the authors will spell out more explicitly which predictions their proposal makes, and how exactly those would be present in an encoding model.

      Our proposal makes a clear prediction: if pre-onset encoding can be explained by stimulus dependencies (essentially a confound in the analysis) the same hallmarks should emerge in passive control systems that encode the stimulus but do not predict. We test this with word embeddings and speech acoustics, and both show hallmarks despite not doing any prediction.

      Reviewer #2 (Recommendations for Authors):

      I greatly enjoyed reading the paper and only have minor quibbles. The work is overdue and will no doubt be a valuable addition to the literature to push back on over-hyped claims about the implications of pre-word predictivity in neural response. I have few issues with the methods that the paper uses, they seem sensible and in line with previous work that has investigated these questions, and I did not find typos.

      One point I would like to raise is whether or not there is a more effective solution to resolving the issues behind residualization that the paper demonstrates. The authors show that removing next-word information does not effectively resolve the problem that local relationships in the stimulus dataset pose. The challenge to me here seems to be that it is difficult to get a model to "not learn" a relationship that is learnable. I wonder if a better solution to this is to not try to get a model to exclude a set of information but instead to do some sort of variance partitioning where you train a model to predict the next-word representation from the current-word representation (as in the self-predictivity analysis) and then build an encoding model out of the predicted representation. Then, compare the pre-word-onset encoding performance of the prediction with the pre-word-onset encoding performance of the original representation. If the performance of the two models roughly matches, that would be strong evidence that most of what these models are capturing before word onset is just explainable by the stimulus dependencies, no?

      We would like to thank the reviewer for their kind words and positive appraisal!

      The proposed analysis is that if a linear proxy representation, w_hat_t – predicted linearly from w_{t-1} – yields pre-onset predictivity comparable to the actual w_t vector, this would support that the effect can be explained by stimulus dependencies. While this is an interesting alternative analysis, we would be cautious about the inverse conclusion: that if w_t outperforms the linear proxy w_hat_t, the residual variance must reflect true neural prediction.

      This is because of our control system results. We show that even when we remove the "predictable" shared variance – which is similar to computing the difference between w_t and w_hat_t – the unique information still yields pre-onset predictivity, albeit reduced, in the passive acoustics that by definition cannot predict. Therefore, instead of developing an ever-more-clever way to "correct" for the problem by adjusting the X matrix, we focus on showing that the problem lies in the stimulus itself. For the revision, we focused on reframing the problem and hope we have punched a fuller hole in the logic by breaking down the fundamental issue more clearly and showing it applies to the stimulus material of Goldstein et al. (2022) as well.

      Additionally, I would say that I was a bit confused about what was going on in the methods figures, to the point where I do not see the value in having them, but thankfully, the text was clear enough to resolve that confusion.

      We are sad the methods illustration wasn’t helpful. In presentations we have found that the illustrations were generally helpful to bring the analysis across, e.g. the aspect of keeping the analysis identical but simply replacing the brain data with either word vectors (current Figure 2) and acoustics (current Figure 3). In the revision we have reorganised the schematics slightly, we introduce the acoustics as a control system earlier, to separately introduce residualisation and its insufficiency (Figure 4). We hope this helps

      Reviewer #3 (Recommendations for Authors):

      (1) My major concern is the extent to which this study offers new insights beyond what was already demonstrated in Goldstein's work. First, the embedding dependency highlighted by the authors seems somewhat expected, given how these embeddings are constructed: GloVe embeddings are based on word co-occurrence statistics, and GPT embeddings are combinations of embeddings of preceding words. More importantly, Goldstein et al. addressed this issue by regressing out neighboring word embeddings. This control was effective, as also confirmed by the current manuscript, and their main results remain. Therefore, the embedding dependency appears to have been properly accounted for in the earlier study.

      Building on the previous point, I appreciate the analysis of dependencies across representational domains, which I see as the main novel contribution of this manuscript. I would encourage the authors to explore this aspect more deeply. If I understand correctly, stimulus dependencies may persist even after regressing out neighboring word embeddings due to two potential factors:

      (a) Temporal dependencies in embeddings: since the regression of neighbor words is performed at the word level rather than over time, temporal dependency may remain.

      (b) Cross-feature dependencies - specifically, correlations between embeddings and acoustic features.

      Regarding the first factor, it is not entirely clear to me whether this is a real problem—i.e., whether word-level regression fails to remove temporal dependencies. A simulation could help clarify this and support the argument. While it's not essential, it would be valuable if the authors could propose a method to address this issue, or at least outline it as a direction for future work.

      For the second point, it would be helpful for the authors to explicitly explain the potential relationship between word embeddings and acoustic features. Additionally, while correlations between features are a common problem in speech research, they are typically addressed by regressing out acoustic features early in the analysis (Gwilliams et al., 2022). It would strengthen the current findings if the authors could test whether the self-predictability persists even after controlling for neighboring embeddings and acoustic features.

      We appreciate the extensive and detailed engagement with our work, which has been very useful in highlighting key unclarities and gaps we had to address.

      We do believe our study goes well beyond what was shown by Goldstein, by identifying a fundamental limitation in their analysis, and showing that their purported control analyses do not in fact control for the problem. We’ll address the reviewers' sub-questions in turn.

      (i) Why this offers crucial insights beyond Goldstein et al.

      While Goldstein et al. indeed addressed embedding dependencies via residualization (or in their case projection), their conclusion relied on the assumption that any neural encoding surviving this "fix" must reflect genuine predictive pre-activation. Our study invalidates this assumption. By applying the residualization fix, we show that the "hallmarks of prediction" persist just as robustly in a passive control system that cannot predict (the speech acoustics) as in the neural data. (We also show this for bigram removal.)

      This provides a key new insight: persistent pre-onset predictivity after “correction” is not evidence that the dependency issue was solved. Instead, because the same effect persists in a system that cannot predict (acoustics), the persistence of the hallmarks cannot be attributed to prediction. It demonstrates that the standard "fix" is mathematically insufficient to remove the confound, rendering the original evidence for neural prediction fundamentally ambiguous.

      (ii) Why do dependencies/hallmarks persist after residualization?

      Residualization successfully removes the linear dependency between the current embedding (w_t) and the previous embedding (w_{t-1}) within the feature space. However, it does not (and cannot) remove the dependency from language itself, and therefore from the brain which (in some format) encodes the linguistic stimulus. Language is massively redundant. Knowing the current word tells you something about what came before – acoustically, syntactically, semantically. As long as the embedding identifies the word, the regression model will re-learn this relationship. For instance, in the case of acoustics, even when using the corrected embedding, the regression will re-learn that certain words (e.g., "Holmes") tend to follow certain acoustic patterns (e.g., the acoustics of "Sherlock"). “This shows that correcting the embeddings is insufficient: the dependencies exist in language itself, and the model will re-learn them from any signal that encodes that language.”

      (iii) Why not regress out the acoustics?

      This is also why "regressing out acoustics" (as the reviewer suggests) would miss the point. We do not claim that acoustic features leak into the neural signal or that acoustics are a specific confound to be removed. Rather, we use acoustics as a “passive baseline”: a system that encodes the stimulus but cannot predict. That the method yields "hallmarks of prediction" in this baseline demonstrates these hallmarks are not valid evidence for prediction—regardless of what additional features one regresses out. This motivates our proposed criterion: future studies seeing evidence for neural pre-activation should not rest on finding pre-onset encoding per se, since passive systems show this too. Rather, it should require demonstrating that the brain signal contains more information about the upcoming word than the passive stimulus baseline.

      As these aspects are fundamental to the interpretation of our study, we have fundamentally re-organised and re-wrote large parts of the paper. We hope it is much clearer now.

      (2) To better compare to Goldstein's work, the author may consider performing the same analyses using their publicly available dataset.

      This is a good suggestion. When we initially conducted this research, the Goldstein dataset was not yet publicly available. It now is, and we have applied our analyses to their stimulus material. The same problem emerges: the hallmarks of prediction appear in the acoustics of their podcast stimuli. Even after applying the control analyses, pre-onset predictivity is robust in their acoustics (indeed, in correlation terms, higher than reported for the neural data, so there is not more predictivity in the brain than in the stimulus material), confirming that the issue we identify applies to the original dataset. Results are shown in Figures S2B, S3B, S5C, and S6B.

      (3) It is also interesting to show the predictability effect after word onsets for self-predictability analyses, for example, in Figure 2C. The predictability effect is not only reflected in pre-onset responses but also in post-onset responses, i.e., larger responses for unpredicted words. Whether the stimulus dependency mirror this effect?

      Our paper focuses specifically on temporal dependencies – the capacity of the current word to predict the previous stimulus signal (e.g., previous acoustics, previous embeddings) – and how this mimics neural pre-activation. Post-onset analyses, by contrast, concerns the mapping between the current word and its concurrent signal, which involves fundamentally different mechanisms (e.g., mapping fidelity, frequency effects, acoustic clarity, word length) and would require the consideration of covariates of the attributes of the word post-onset to meaningfully interpret. Post-onset, there can be differences between predictable and non predictable words – e.g. sometimes unpredictable words are pronounced with more emphasis – which is why surprisal studies include a large range of covariates. However, this is not about stimulus dependencies or pre-activation, so we consider it is beyond scope of our study.

      (4) The authors might consider reporting the encoding performance for the residual word embeddings, similar to Figure S6B in Goldstein's paper. This would allow us to determine whether pre-activation persists in the MEG responses and compare its pattern with the predictability of pre-onset acoustics.

      We do report this analysis, in the revised supplement it is shown in Figure S7. We placed it in the supplement precisely because residualized embeddings are not the "fix" they appear to be: as we show, they still yield strong pre-onset predictivity in the passive acoustic baseline (Figure 4, S6), undermining their use as a control.

      (5) The series of previous pre-activation analyses proposed fruitful findings, e.g., the difference between brain regions (Fig. S4, (Goldstein et al., 2022)) and the difference between listeners and speakers (Figure 2, (Zada et al., 2024)). Whether these observed differences can be explained by the stimulus dependency?

      We appreciate this question. Our goal is to address the general logic of using pre-onset encoding as evidence for prediction, rather than to critique every finding in specific papers, especially as it pertains to a specific author. But briefly:

      Speaker vs. Listener differences (Zada et al., 2024): Zada et al. report distinct temporal profiles: speaker encoding peaks pre-onset (planning?), whereas listener encoding peaks post-onset but shows a pre-onset "ramp." Our critique applies to interpreting this ramp as "prediction." However, this interpretation is not central to their paper, which focuses on speaker-listener coupling via shared embedding spaces. We leave the implications (which are clear enough) to the reader.

      Regional differences (Goldstein et al., 2022): Encoding timecourses do vary across electrodes, as we also observe across MEG sources (and participants). But our point is logical: because pre-onset encoding does not necessarily reflect prediction, finding a channel with stronger pre-onset encoding does not mean that channel performs “more prediction”. For instance, one subject in the Armeni dataset showed higher pre-onset than post-onset encoding (and indeed activity) overall – but it would be implausible to conclude this subject "only predicts" and does not “process” or “listen”. More likely, this reflects differences in signal-to-noise, integration windows, or source contributions. The exact sources of these morphological differences are interesting but unclear, and speculating on them is beyond our scope.

      (6) I appreciate that the authors have shared their code; however, some parts appear to be missing. For example, the script encoding_analysis.py only includes package-loading code.

      Thank you for noticing, we have updated our code database.

      (7) What do the error bars in the figures represent - for example, in Figure 1C? How many samples were included in the significance tests? The difference between the two curves appears small, yet it is reported as significant. Additionally, Figure S1 shows large differences between subjects and between the two MEG datasets. Do the authors have any explanation for these differences?

      The shaded areas in our previous Figure 1c) show 95% confidence intervals computed over the 100 MEG sources identified to be part of the bilateral language system and the 10 cross-validation splits.

      We do not have an elaborate explanation for the differences in encoding performance across the three subjects in the few-subject dataset. Instead, we interpret these differences as a likely consequence of substantial inter-individual variability in evoked responses, even at the source level, arising from differences in cortical folding and the orientation of underlying current dipoles. We deem this a likely explanation since different electrodes in Goldstein’s ECoG data also showed very different encoding profiles.

      With respect to the multi-subject dataset, we suspect that the large differences stem most likely from two substantial differences: First, the acoustics were purposefully manipulated by the experimenters to reduce temporal dependence. This made it harder for listeners to concentrate on the stories and thereby might have potentially led to lower quality neural data. Furthermore, it reduced one form of stimulus dependency, namely the acoustic temporal dependencies, which could be exploited by the encoding model to reach higher encoding accuracies. Secondly, MEG has a notoriously poor signal-to-noise ratio, and the amount of data per participant (7.745 words as opposed to 85.719 in the few-subject dataset) might not have been enough to produce reliably high encoding results.

      Finally, the current study is clear and convincing, and my suggestions are not intended to question its novelty or robustness. Rather, I believe the authors are in a strong position to address a critical question in language processing: whether pre-activation occurs. The authors have thoughtfully considered important confounds related to pre-onset responses. Adding some approaches to regressing out these confounds could be particularly helpful for determining whether a true pre-onset response remains.

      We thank the reviewer again for their constructive feedback, suggestions and questions. To clarify, however, our goal is *not* to definitively attest to whether pre-activation occurs. Our goal is simply to scrutinise a specific method to test for linguistic prediction. This method purports to be an improvement on conventional post-onset (e.g. surprisal-based) methods, as it can directly investigate effects occurring prior to word onset. We have demonstrated fundamental limitations in the underlying logic of this method. We propose passive control systems as baselines against which claims of prediction should be evaluated. Against this baseline, the current evidence does not show unequivocal support for prediction: pre-onset encoding in the brain does not exceed that in the passive control. However, we do not conclude from this that pre-activation does not exist — that would require a different study entirely. Our aim is more methodological: to establish what should count as evidence for prediction, not to settle whether prediction occurs.

      We would like to thank the reviewers and editors for their thoughtful feedback, which has been tremendously helpful in improving the paper.

    1. eLife Assessment

      The submission by Praveen et al. reports important findings describing the structure of genetic and colour variation in its native range for the globally invasive weed Lantana camara. Whilst the importance of the research question and the scale of the sampling is appreciated, the analysis, which is currently incomplete, requires further tests to support the claims made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:

      The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      Update after manuscript was revised by authors:

      The authors added a selfing experiment, showing that the wild plants are selfing and not outcrossing, which limits the genetic exchange. This supports their claims, but a link with the horticultural industry is still lacking in the study, and conclusions should still be viewed in the regional context of India rather than globally.

    3. Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found clear population structure that did not show a geographical pattern, and flower color was rather associated with population structure. Excess of homozygosity indicate high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. Authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strength of the present study. Coalescent-based analysis nicely supports the authors' claim that multiple introductions may have contributed the population structure of this species.

      Weaknesses:

      The main findings of the SLiM-based simulation were that inbreeding promotes fixation of alleles and reduction in genetic diversity. These are theoretically well known, and such findings themselves are not novel, although it may have become interesting if these findings are quantitatively integrated with their empirical findings in the studied species.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have carefully addressed all the concerns raised in the responses below and incorporated the suggested revisions into the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:

      The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. The observed patterns in the inbreeding coefficient and heterozygosity can indeed arise from multiple factors, including past bottlenecks, selection, inbreeding, and selfing. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India. These results are included in the revised manuscript.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #1 (Recommendations for the authors):

      Lantana camara is a globally invasive plant as the authors mention in their manuscript, but this study only focuses on India. This should be reflected in the title.

      The reviewer has suggested that the title should reflect the study area. Since our sampling covers nearly all regions in India, we believe the patterns observed here are likely representative of those in other parts of the invaded range. For this reason, we would prefer to retain the current heading.

      It would be helpful if the pictures of the flowers in Figure 3 were larger to more clearly see the different colors.

      As per the reviewers suggestion we have increased the size of the images to improve clarity.

      Figure 4 could probably be moved to supplemental material, it does not add much to the results.

      We feel it is important to reiterate that the patterns we observe in Lantana are consistent with what one would expect in any predominantly self-fertilizing species. It act as an additional proof and therefore, we believe it is important to retain this figure, as it effectively conveys this link.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location (similar to what we observed in Lantana camara) can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation. Therefore, we conducted these simulations ourselves for invasive plants to test whether the patterns we observed are consistent with expectations for a predominantly self-fertilising species.

      Additionally, as suggested by the reviewer, we have performed demographic history simulations using fastsimcoal2 to investigate the divergence among different flower colour morphs. The results have been incorporated into the revised manuscript.

      First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Applying a HWE filter is a common practice in genomic data analysis because it helps remove potential sequencing or genotyping artefacts, which can otherwise bias downstream analyses. However, we understand that HWE filtering can also remove biologically informative loci and potentially bias the analysis, especially when a stringent cutoff is used. A strict filter might retain only loci that perfectly fit Hardy–Weinberg expectations and exclude sites influenced by real evolutionary processes like selection and/or inbreeding.

      To balance this, we used a mild HWE filter, aiming to remove clear artefacts while retaining loci that may reflect genuine biological signals. Another reason for applying it is that many downstream tools, for example, admixture, assume the markers are neutral and not strongly deviating from HWE (although this assumption may not always hold). This helps in avoiding the complexity of the model.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate.

      We have cited the references for these values in the manuscript. However, for Lantana, many such baseline data are not available, so we used general values reported for plants, which is an accepted approach when working with understudied species. Moreover, the aim of these simulations was to develop a general understanding of how mating systems influence genetic diversity in invasive plants, rather than to parameterize the simulations specifically for Lantana.

      While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilisation alone.

      Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      In genetic simulations, it is often best to begin with simpler scenarios involving fewer parameters, and we followed this approach. As the reviewer rightly pointed out, selfing can influence multiple factors such as mutation and recombination rates. However, to first understand the broad effects, we chose to work with simpler scenarios where both mutation and recombination rates were kept constant.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We thank the reviewer for this valuable suggestion. We have performed a MANOVA to test the association between flower colour and genetic structure. These results are incorporated in the revised manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively?

      We carefully considered this and defined our criteria based on flower colour. Specifically, we named morphs according to the colour of both young and old flowers. If both stages shared the same colour, we used that colour as the name. As shown in Figure 1b, it is possible to reliably distinguish between the different flower colour morphs. While one could also measure flower colour using a photometer, we believe both approaches yield similar results.

      I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The flower colour changes within an inflorescence, with young flowers shifting colour after pollination. However, this trend is consistent within a plant; for example, the yellow–pink morph always changes from yellow to pink. Based on this consistency, we incorporated a naming system that considers both the colour of younger and older flowers.

      Reviewer #2 (Recommendations for the authors):

      Figure 4: Figures a and b are not the "signatures of high inbreeding", because such patterns could also simply happen due to geographical isolation. The title of the figure could be changed. Figure 4c should be presented as a histogram.

      We have incorporated this suggestion into the manuscript and revised the figure title accordingly. However, we believe that presenting Figure 4c in its current form is more informative.

      L459 "in the introduced range, Lantana is self-compatible": is it self-incompatible in the native range? If it is known, it could be mentioned in the manuscript.

      A previous study from India demonstrated that self-fertilisation is possible in Lantana, providing an additional line of evidence for our findings. However, Lantana remains poorly studied in its native range, and to the best of our knowledge, only a single study has examined its pollination biology there, which we have cited in this paper.

    1. eLife Assessment

      Winter months with short days are commonly associated with seasonal depression and hypersomnolence; the mechanisms behind this hypersomnolence however, remain unclear. Chen and colleagues identify a genetic basis for this phenomenon in the fly Drosophila - mutations in the circadian photoreceptor cryptochrome resulted in increased sleep under short photoperiods. These findings are valuable insights into the genetic mechanisms regulating sleep under short days. The data supporting the precise neurobiological basis of these effects however, remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (CRY) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of CRY action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that CRY acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this CRY action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      While the authors have made improvements in this resubmitted manuscript, there are still multiple concerns about the paper. I think the authors provide enough evidence suggesting that CRY plays a role in sleep under short photoperiod. The data also supports that CRY acts in GABAergic neurons. However, there are still major issues with the quality of the confocal images presented throughout the paper. In many cases it appears that the images are oversaturated with poor resolution, making it hard to understand what is going on. In addition, none of the drivers used in this study are specific to the neurons the authors aim to manipulate. Therefore, the identity of the GABAergic neurons involved in this CRY dependent sleep mechanism remains unclear. Similarly, whether l-LNvs are the target of this GABA mediated sleep regulation under short photoperiod is not fully demonstrated. The data presented suggests that but does not prove it.

      Major concerns:

      (1) While the authors provided sleep parameters like consolidation or waking activity for some experiments. These measurements are still not shown for several experiments (for example Figures 2E, 3, 4, 5, and 6). These data are essential, these metrics must be reported for all sleep experiments.

      (2) Line 144 "We fed flies with agonists of GABA-A (THIP) and GABA-B receptor (SKF-97541) (Ki and Lim, 2019; Matsuda et al., 1996; Mezler et al., 2001). Both drugs enhance sleep in WT," The proper citation is needed here, Dissel et al., 2015 PMID:25913403. Both THIP and SKF-97541 were used in that paper.

      (3) Figure 2C and 2F: it appears that the control data is the same in both panels. That is not acceptable.

      (4) Figure 4A: With the quality of the images, it is impossible to assess whether GABA levels are increased at the l-LNvs soma.

      (5) Fig 4 S1A shows colabeling of l-LNvs and Gad1-Gal4 expressing neurons. They are almost 100% overlapping signals. This would indicate that the l-LNvs are GABAergic themselves, or that there is a problem with this experiment.

      (6) Fig 4 S1B: Again, I can see colabelling of the GFP and PDF staining, suggesting that Gad1-Gal4 expresses in l-LNvs.

      (7) Line 184: "Consistently, knocking down Rdl in the l-LNvs rescues the long sleep phenotype of cry mutants (Figure 4-figure supplement 1D)." This statement is incorrect as the driver used for this experiment, 78G01-GAL4 is not specific to the l-LNvs, so it is possible that the phenotypes observed are not coming from these neurons.

      (8) Figure 4G-K: None of these manipulations are specific to the l-LNvs. The authors describe 10H10-GAL4 and 78G01-GAL4 as l-LNvs specific tools, but this is not the case. Why not use the SS00681 Split-GAL4 line described in Liang et al., 2017 PMID: 28552314? It is possible that some of the effects reported in this manuscript are not caused by manipulating the l-LNvs.

      (9) Similarly for the manipulation of s-LNvs, the authors cannot rule out effect that are coming from other cells as R6-GAL4 is not specific to s-LNvs.

      (10) The staining presented in Fig 5 S1 is not very convincing. Difficult to see whether Gad1-GAL4 only expresses in the s-LNvs.

    3. Reviewer #3 (Public review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects is however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light : 20 h dark). They go on to show that cry is required in GABAergic neurons and that the effects of the cry mutation on sleep are mediated by alterations in GABA signalling. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal promoting large ventral neurons via GABA signalling

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' have substantially revised their manuscript, and the manuscript is better for the revisions. However, the conclusion that the sLNvs are GABAergic is unfortunately still not well supported by the data. A key sticking point remains the anti GABA immunostaining, and specific driver lines for sLNvs and lLNvs.

      The authors should tone down their conclusions to reflect the fact that their data, as presented, does not support the model that cry acts in sLNvs to modulate GABA signalling onto lLNvs and thus modulate sleep.

    4. Reviewer #4 (Public review):

      Summary:

      Short photoperiod is an important experimental manipulation in neurobiology, endocrinology, and metabolism studies. However, the molecular mechanisms by which short photoperiod gives rise to behavioral phenotypes that are seen in seasonal affective disorders remain unknown. Using the classic circadian model organism Drosophila, this study examines short photoperiod-induced hypersomnolence and identifies the circadian photoreceptor cryptochrome as a regulator of GABAergic tone within the clock neural circuit to promote wakefulness under short photoperiod conditions. The discovery has broad implications for understanding how short photoperiod modulates neural inhibition in circadian circuits in regulating sleep.

      Strengths:

      The Drosophila model provided a powerful platform to dissect the molecular mechanisms underlying short photoperiod-induced hypersomnolence. A battery of behavioral, imaging, circuit-manipulation approaches was employed to test the novel hypothesis that the circadian photoreceptor cryptochrome modulates GABAergic tone within the clock neural circuit to promote wakefulness under short photoperiod conditions.

      Weaknesses:

      The current model proposed by the authors suggests that the small ventral lateral neurons of the Drosophila clock circuit are GABAergic; however, this remains unclear. At present, the field lacks sufficient data and validated reagents to definitively establish the GABAergic identity of these neuropeptidergic neurons.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (cry) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of cry action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that cry acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this cry action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal-promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      Major concerns:

      (1) Figure 2 A-B: The authors show that knocking down cry expression in GABAergic neurons mimics the sleep increase seen in cryb mutants under short photoperiod. However, they do not provide any other sleep parameters such as sleep bout numbers, sleep bout duration, and more importantly waking activity measurements. This is an essential parameter that is needed to rule out paralysis and/or motor defects as the cause of increased "sleep". Any experiments looking at sleep need to include these parameters.

      Thank you for bringing up these points. We have now included these sleep parameters in Figure 2—figure supplement 3.

      (2) For all Figures displaying immunostaining and imaging data the resolution of the images is quite poor. This makes it difficult to assess whether the authors' conclusions are supported by the data or not.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue.

      (3) In Figure 4-S1A it appears that the syt-GFP signal driven by Gad1-GAL4 is colabeling the l-LNvs. This would imply that the l-LNvs are GABAergic. The authors suggest that this experiment suggests that l-LNvs receive input from GABAergic neurons. I am not sure the data presented support this.

      We agree that this piece of data alone is not sufficient to demonstrate that the l-LNvs receive GABAergic inputs rather than the l-LNvs are GABAergic. However, when nlsGFP signal is driven by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), we do not observe any prominent signal in the l-LNvs (Figure 5A and B; Figure 5-figure supplement 1A). We have also co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. This further supports the idea that the l-LNvs are not GABAergic, and that the syt-GFP signal likely arises from GABAergic neurons projecting to the l-LNvs.

      (4) In Figure 4-S1B. The GRASP experiment is not very convincing. The resolution of the image is quite poor. In addition, the authors used Pdf-LexA to express the post t-GRASP construct in l-LNvs, but Pdf-LexA also labels the s-LNvs, so it is possible that the GRASP signal the authors observe is coming from the s-LNvs and not the l-LNvs. The authors could use a l-LNvs specific tool to do this experiment and remove any doubts. Altogether this reviewer is not convinced that the data presented supports the conclusion "All in all, these results demonstrate that GABAergic neurons project to the l-LNvs and form synaptic connections." (Line 176). In addition, the authors could have downregulated the expression of Rdl specifically in l-LNvs to support their conclusions. The data they are providing supports a role for RDL but does not prove that RDL is involved in l-LNvs.

      Thank you for these wonderful suggestions. Again we apologize for the poor resolution and hopefully by uploading the images separately we can resolve this issue. We agree that the GRASP signal could be coming from the s-LNvs and not the l-LNvs but unfortunately we are not able to find a LexA that is specifically expressed in the l-LNvs. We believe the trans-Tango data further support the idea that GABAergic neurons project to and form synaptic connections with the l-LNvs. Nonetheless, we have changed our conclusion to “All in all, these results strongly suggest that GABAergic neurons project to the l-LNvs and form synaptic connections” to be more rigorous. In addition, we have obtained R78G01GAL4 which is specifically expressed in the l-LNvs, and using this GAL4 to knock down Rdl rescues the long-sleep phenotype of cry mutants (Figure 4—figure supplement 1D).

      (5) In Figures 4 A and C: it appears that GABA is expressed in the l-LNvs. Is this correct? Can the authors clarify this? Maybe the authors could do an experiment where they co-label using Gad1-GAL4 and Pdf-LexA to clearly demonstrate that l-LNvs are not GABAergic. Also, the choice of colors could be better. It is very difficult to see what GABA is and what is PDF.

      Thank you for this wonderful suggestion. We have now co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. We suspect the GABA signal at the l-LNvs may arise from the GABAergic projections received by these cells. We have now changed the color of the GABA/PDF signals in these images and have reduced the intensity of the PDF signal. Hopefully, it would be easier to visualize in this revised version.

      (6) Figure 4G: Pdf-GAL4 expresses in both s-LNvs and l-LNvs. So, in this experiment, the authors are silencing both groups, not only the l-LNvs. Why not use a l-LNvs specific tool?

      Thank you for bring up this important point. We have previously used c929GAL4 to express Kir2.1 and this led to lethality. We have now used two l-LNv-specific GAL4 drivers (R78G01GAL4 and R10H10GAL4) that we newly obtained to express Kir2.1 but did not observe significant effect on sleep. Please see Author response image 1 for the results.

      Author response image 1.

      Daily sleep duration of male flies expressing Kir2.1 in l-LNvs using R78G01GAL4 (A)(n = 40, 41, 30 flies) and R10H10GAL4 (B) (n = 40, 41, 32 flies) and controls, monitored under 4L20D. One-way ANOVA with Bonferroni multiple comparison test was used to calculate the difference between experimental group and control group.

      (7) Figure 4H-I: The C929-GAL4 driver expresses in many peptidergic neurons. This makes the interpretation of these data difficult. The effects could be due to peptidergic cells being different than the l-LNvs. Why not use a more specific l-LNvs specific tool? I am also confused as to why some experiments used Pdf-GAL4 and some others used C929-GAL4 in a view to specifically manipulate l-LNvs? This is confusing since both drivers are not specific to the l-LNvs.

      Thank you for bring up these important points. We have now used the l-LNv-specific R10H10GAL4 and the results are more or less comparable with that of c929GAL4 (Figure 4I and K), i.e. activating the l-LNvs blocks the long-sleep phenotype of cry mutants. The reason PdfGAL4 is used in 4G is because c929GAL4 leads to lethality while the l-LNv-specific GAL4 lines do not alter sleep.

      (8) Figure 5-S1B: Why does the pdf-GAL80 construct not block the sleep increase seen when reducing expression of cry in Gad1-GAL4 neurons? This suggests that there are GABAergic neurons that are not PDF expressing involved in the cry-mediated effect on sleep under short photoperiods.

      Yes, this is indeed the conclusion we draw from this result, and we commented on this in the Discussion: “Moreover, inhibiting cry RNAi expression in PDF neurons does not eliminate the long-sleep phenotype of Gad1GAL4/UAScryRNAi flies. Therefore, we suspect that cry deficiency in other GABAergic neurons is also required for the long-sleep phenotype. Given that the s-LNvs are known to express CRY and appear to be GABAergic based on our findings here, we believe that CRY acts at least in part in the s-LNvs to promote wakefulness under short photoperiod.”

      In conclusion, it is not clear that the authors demonstrated that they are looking at a cry-mediated effect on GABA in s-LNvs resulting in a modulation of the activity of the l-LNvs. Better images and more-suited genetic experiments could be used to address this.

      Thank you very much for all the comments. They are indeed quite helpful for improving our manuscript. Hopefully, with images of higher quality and the additional experiments described above, we have now provided more evidence supporting our major conclusion.

      Reviewer #2 (Public Review):

      Summary:

      The sleep patterns of animals are adaptable, with shorter sleep durations in the winter and longer sleep durations in the summer. Chen and colleagues conducted a study using Drosophila (fruit flies) and discovered that a circadian photoreceptor called cryptochrome (cry) plays a role in reducing sleep duration during day/night cycles resembling winter conditions. They also found that cry functions in specific GABAergic circadian pacemaker cells known as s-LNvs inhibit these neurons, thereby promoting wakefulness in the animals in the winter. They also identified l-LNvs, known as arousal-promoting cells, as the downstream neurons.

      Strengths:

      Detailed mapping of the neural circuits cry acts to mediate the shortened sleep in winter-like day/night cycles.

      Weaknesses:

      The supporting evidence for s-LNvs being GABAergic neurons is not particularly strong. Additionally, there is a lack of direct evidence regarding changes in neural activity for s-LNvs and l-LNvs under varying day/night cycles, as well as in cry mutant flies.

      Thank you very much for all the comments. We have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      We have now examined GCaMP signals in the l- and s-LNvs of WT and cry mutants under 4L20D/12L12D. Please see Author response image 2 for the results. As can be seen, both WT and cry mutants show photoperiod-dependent changes. Interestingly, cry mutants show more prominent reduction of GCaMP signal in the l-LNvs compared to WT under 12L12D vs. 4L20D, but the sleep duration phenotype is observed only under 4L20D. Moreover, GCaMP signal is elevated in the s-LNvs of cry mutants relative to WT under 4L20D but decreased under 12L12D. These results indicate that there are distinct mechanisms regulating sleep under short vs. normal photoperiod (with CRY being dispensable under 12L12D), and the role of CRY in modulating the activity of these neurons are also photoperiod-dependent. Further in-depth characterizations are need to delineate these complex issues.

      Author response image 2.<br /> Quantification of GCaMP6m signal intensity normalized to that of tdTomato under 12L12D and 4L20D (n = 25-45 cells). Student’s t-test: compared to WT, #P < 0.05, ##P < 0.01; 12L12D vs. 4L20D, *P < 0.05, ***P < 0.001.

      Reviewer #3 (Public Review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects are, however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light: 20 h dark). They go on to show that cry is required in GABAergic neurons. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal-promoting large ventral neurons via GABA signalling.

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' conclusion that the sLNvs are GABAergic is not well supported by the data. Better immunostaining experiments and perhaps more specific genetic driver lines would help with this point (details below).

      (1) The sLNvs are well known as a key component of the circadian network. The finding that they are GABAergic would if true, be of great interest to the community. However, the data presented in support of this conclusion are not convincing. Much of the confocal images are of insufficient resolution to evaluate the paper's claims. The Anti-GABA immunostaining in Fig 4 and 5 seem to have a high background, and the GRASP experiments in Fig 4 supplement 1 low signal.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue. Unfortunately, the GABA immunostaining does not work very well in our hands and thus the background is high. We have now adjusted the images by changing the minimum lookup table (LUT) value in the green channel to 213, which removes all pixels below 213. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the contrast. Furthermore, we have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      Transcriptomic datasets are available for the components of the circadian network (e.g. PMID 33438579, and PMID 19966839). It would be of interest to determine if transcripts for GAD or other GABA synthesis/transport components were detected in sLNvs. Further, there are also more specific driver lines for GAD, and the lLNvs, sLNVs that could be used.

      Thank you for these wonderful suggestions. Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” PMID 33438579 does not report expression of these genes in either s-LNvs or l-LNvs, likely due to insufficient sequencing depth. Furthermore, we have now used two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4) to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and K).

      (2) The authors' model posits that in short photoperiods, cry functions to suppress GABA secretion from sLNvs thereby disinhibiting the lNVs. In Fig 4I they find that activating the lLNvs (and other peptidergic cells) by c929>NaChBac in a cryb background reduces sleep compared to activating lLNVs in a wild-type background. It's not clear how this follows from the model. A similar trend is observable in Fig 4H with TRP-mediated activation of lNVs, although it is not clear from the figure if the difference b/w cryb vs wild-type background is significant.

      Thank you for bring up this important point. This does appear to be counterintuitive. We suspect that in cry mutants, there is more inhibition occurring at the l-LNvs and thus the system may be particularly sensitive to their activation. Therefore, activating these neurons on the mutant background can result in a more prominent wake-promoting effect compared to that of WT.

      Recommendations for the authors:

      Our major concern centers around the claim that the sLNvs are GABAergic and secrete GABA onto the lLNVs. As it stands, this is not well supported by the data.

      The authors could substantiate these findings by using more specific driver lines for GAD / vGAT (MiMic based lines are available that should better recapitulate endogenous expression). Transcriptomic data for circadian neurons are available, the FlyWire consortium also predicts neurotransmitter identities for specific neural circuits. These datasets could be mined for evidence to support the claim of sLNvs being GABAergic

      Thank you for these wonderful suggestions. We have now used MiMic-based lines for Gad1 (BS52090, Mi{MIC}Gad1MI09277) and VGAT (BS23022, Mi{ET1}VGATMB01219) to knock down cry but unfortunately were not able to observe changes in sleep. Please see Author response image 3 for the results.

      Author response image 3.

      Daily sleep duration of male flies with cry knocked down in GABAergic neurons by Gad1GAL4 (A) (n = 30, 38, 50, 18, 31 flies) or VGATGAL4 (B) (n = 28, 38, 50, 18, 30 flies) monitored under 4L20D.One-way ANOVA with Bonferroni multiple comparison test: compared to UAS control, ###P < 0.001.

      Furthermore, we have now included another Gad1GAL4 line which is generated by knocking GAL4 transgene into the Gad1 locus. We are also able to observe increased sleep when using this GAL4 to knock down cry, and positive signals in the s-LNvs can be observed when using this GAL4 to drive nlsGFP (Figure 2B; Figure 5-figure supplement 1A).

      Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” The FlyWire does not have prediction for this particular circuit that we are interested in.

      Further, many of the immunostaining images have high background / low signal - so better confocal images would help, as would the use of more specific driver lines for the lNVs as it is sometimes hard to distinguish the lLNvs from sLNvs.

      We have now adjusted all images by changing the minimum lookup table (LUT) value in the green channel to 213 and that of the red channel to 279, which removes all pixels below 213 and 279, respectively. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the signal to noise ratio. We were not able to find a LexA line that is specifically expressed in the l-LNvs but we have found two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4). We used these lines to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and 4K).

      Additional specific comments are in the reviews above.

      Minor points:

      (1) Line 55: CRYPTOCHROME is misspelled.

      This has been fixed.

      (2) Line 140: The authors need to provide the appropriate references for the use of THIP and SKF-97541.

      This has been added.

      (3) Line 149: there are multiple GABA-A receptors in flies, the authors should acknowledge that. What about LccH3 or Grd?

      Thank you for bring up this important point. Here we focused only on Rdl because it is the only GABA-A receptor known to be involved in sleep regulation. We have modified our description regarding this issue: “We tested for genetic interaction between cry and Resistant to dieldrin (Rdl), a gene that encodes GABA-A receptor in flies and has previously been shown to be involved in sleep regulation.”

    1. eLife Assessment

      This important study introduces LUNA, a new autofocusing method that achieves nanoscale precision and robustly corrects focus drift during time-lapse microscopy, improving imaging under temperature shifts. The authors exploit this technical advance to investigate the bacterial cold shock response, providing solid evidence that individual cells continue to grow and divide in a highly coordinated process that cannot be observed in population-level measurements. This work offers a technical and conceptual framework for reconciling discrepancies between bulk and single-cell growth measurements, with broad relevance for cell biology and microbiology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

    1. eLife Assessment

      This valuable study introduces MPS, an open-source pipeline that addresses a significant technical bottleneck by making miniscope data analysis more accessible. Characterized by speed and a low barrier to entry, the software's performance is supported by solid evidence. This work will be of interest to miniscope users seeking a streamlined, memory-efficient, end-to-end analysis solution.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript by Peden-Asarch et al. introduces MPS, a new open-source software package for processing miniscope data. The authors aim to provide a fast, end-to-end analysis pipeline tailored to miniscope users with minimal experience in coding or version control. The work addresses an important practical barrier in the field by focusing on usability and accessibility.

      Strengths

      The authors identify a clear and well-motivated need within the miniscope community. Existing pipelines for miniscope data analysis are often complex, difficult to install, and challenging to maintain. In addition, users frequently encounter technical limitations such as out-of-memory errors, reflecting the substantial computational demands of these workflows-resources that are not always available in many laboratories. MPS is presented as an attempt to alleviate these issues by offering a more streamlined, accessible, and robust processing framework.

      Weaknesses

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error:<br /> "rsync(95755): error: ... Permission denied ... unexpected end of file."

      For the purposes of this review, resolving this issue would significantly improve the evaluation of the software and its accessibility to users.

      More broadly, the authors propose self-contained installers as a solution to the "package-management burden" commonly associated with scientific software. While this approach is appealing and potentially useful for novice users, current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript introduces Miniscope Processing Suite (MPS), a novel no-code GUI-based pipeline built to easily process long-duration one-photon calcium imaging data from head-mounted Miniscopes. MPS aims to address two large problems that persist despite the rapid proliferation of Miniscope use across the field. The first issue is concerned with the high technical barrier to using existing pipelines (e.g., CaImAn, MIN1PIPE, Minian, CaliAli) that require users to have coding skills to analyze data. The second problem addressed is the intense memory limitations of these pipelines, which can prevent analysis of long-duration (multi-hour) recordings without state-of-the-art hardware. The MPS toolbox takes inspiration from what existing pipelines do well, innovates new modules like Window Cropping, NNDSVD initialization, Watershed-based segmentation, and improves the user experience to improve access to calcium imaging analysis without the need for new training in new coding languages. In many ways, MPS achieves this aim, and thus will be of interest to a growing, broad audience of new calcium imagers.

      There are, however, some concerns with the current manuscript and pipeline that, if addressed, would greatly improve the impact of this work. Currently, the manuscript provides insufficient evidence that MPS can generate good results efficiently on various data sets, and it is not properly benchmarked against other established packages. Additionally, considering the goal of MPS is to attract novices to attempt Miniscope analysis, better tutorials, documentation, and walkthroughs of expected vs inaccurate results should be provided so that it is clear when the user can trust the output. Otherwise, this simplified approach may end up leading new users to erroneous results.

      Strengths:

      The manuscript itself is well-organized, clear, and easy to follow. MPS is clearly designed to remove the computational barrier for entry for a broad neuroscience community to record and analyze calcium data. The development of several well-detailed algorithmic innovations merits recognition. Firstly, MPS is extremely easy to install, keep updated, and step through. Having each step save every output automatically is a well-thought-out feature that will allow users to enter back into the pipeline at any step and compare results.

      The implementation of an erroneous frame identifier and remover during preprocessing is an important new feature that is typically done offline with custom-built code. Interactive ROI cropping early in the pipeline is an efficient way to lower pixel load, and NNDSVD initialization is a new way to provide nonnegative, biologically interpretable starting spatial and temporal factors for later CNMF iterations. Parallel temporal-first update ordering cuts down dramatically on later computational load. Together, all these features, neatly packaged into a no-code GUI like the Data Explorer for manual curation, are practical additions that will benefit end users.

      Weaknesses:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines. The paper uses their own analysis of processing speeds, component counts, signal-to-noise ratio improvements, and morphological characteristics of detected cells, but it needs to be reworked to include some combination of validation against manually annotated ground truth data sets, simulated data with known cell locations and activity patterns, or cross-validation with established pipelines on identical datasets. Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results that help distinguish it from what is currently already available. For example, line 57 refers to the CaImAn pipeline having near-human efficiency (Figures 3-5 and Tables 1 and 2 of the CaImAn paper), but no specific examples for MPS performance benchmarks are made. Figure 15 of the Minian paper provides other examples of how to show this.

      Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes (FOV, number of frames and/or neurons, sparsity of cells) and differing pipelines. Figure 8 of the CaImAn paper and Figure 18 of the Minian paper show this quite nicely. Table 1 currently references how "traditional approaches" differ methodologically from MPS innovations, but runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster". Additionally, while the paper does mention the type of hardware used by the experimenters, a table with a full breakdown of components may be useful for reproducibility. As well as the minimum requirements for smooth processing.

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators (anything other than GCaMP8f used in the paper), etc. This leads to another limitation of the paper in its current form. While MPS is aimed at eliminating the need to code, users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Perhaps including a tutorial with sample test data for parameter investigation and exploration, like many other existing pipelines do, is warranted. This would also increase the transparency and reproducibility of this work.

      Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization. The main GitHub repository does contain better stepwise explanations, but there needs to be a centralized location for all this information. Additionally, a lack of documentation on the graphs created by each analysis step makes it hard for a true novice to interpret whether their own data is appropriately optimized for the pipeline. Greater detail on this would greatly improve the quality and impact of MPS.

    4. Author response:

      (1) Claim regarding NNDSVD initialization

      Reviewer #1:

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      We agree that our original phrasing was too broad. NNDSVD-family initialization is widely used in NMF implementations (e.g., scikit-learn) and is available within some pipeline components. We revised the abstract and main text to clarify our intended contribution: MPS seeds CNMF directly with NNDSVD-derived nonnegative factors as the primary initialization strategy, rather than relying on heuristic or greedy ROI-based seeding, integrated within a memory-efficient, end-to-end workflow for long-duration miniscope recordings.

      (2) Installation issue on macOS

      Reviewer #1:

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error: "rsync(95755): error: ... Permission denied ...unexpected end of file."

      We thank the reviewer for identifying the broken installer link and the macOS installation error. We fixed the macOS installer link on the documentation website and updated installation instructions to explicitly address common macOS permission-related failures (including rsync "Permission denied" errors that arise when attempting to write into protected directories without appropriate privileges). We re-tested installation on clean macOS systems and confirmed successful installation under the revised instructions.

      (3) Validation, benchmarking, and cross-pipeline comparison

      Reviewer #2:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines... Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results... Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes and differing pipelines... runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster".

      We thank the reviewers for their careful reading and for raising the question of biological validity, which we agree is central to any calcium imaging analysis tool. We would like to clarify, however, that MPS does not introduce a novel source extraction algorithm, and therefore the question of biological validity is not one that MPS alone can answer - nor should it be expected to. MPS is built on CNMF, the same mathematical framework underlying CaImAn and Minian. The contribution of MPS lies in its initialization strategy and parallelization architecture, which allow this proven framework to operate in the multi-hour recording regime.

      To address the reviewers' request for a direct qualitative comparison, we will run MPS, CaImAn, Minian, and MIN1PIPE on a representative 10-minute real recording with clearly visible neurons. The figure will show the spatial components (ROI footprints) and representative temporal traces (ΔF/F) for all four pipelines on identical data. We anticipate that the spatial layouts and temporal dynamics will be highly concordant across pipelines, demonstrating that MPS produces biologically consistent output. We believe this side-by-side comparison will provide a clear demonstration that MPS output is comparable in quality to established tools on tractable recordings.

      Regarding runtime comparison across pipelines, we will provide a table showing approximate processing times at three recording durations (5, 20, and 180 minutes). On short recordings, all pipelines are expected to complete successfully at different rates, whereas on long-duration recordings, this pipeline behavior is expected to diverge. We acknowledge that any single runtime benchmark reflects specific hardware and dataset characteristics and may not generalize to all configurations. We will therefore present these data as illustrative rather than definitive and will direct readers to the MPS documentation for guidance on hardware-specific tuning.

      (4) Dataset description and scope of generalizability

      Reviewer #2:

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators...

      We agree that the dataset description should be centralized and unambiguous. We added a dedicated Methods subsection stating that all results are based on a single, controlled experimental dataset consisting of 28 long-duration miniscope sessions acquired under consistent conditions (same brain region, calcium indicator, optical configuration, and acquisition parameters). This section explicitly specifies the number of animals, brain region, frame rate, field of view, session duration, and total data volume. We also clarified that conclusions are intended to evaluate MPS performance in this controlled long-duration setting rather than to claim universal parameter generalizability across brain regions, indicators, or optical systems.

      (5) Parameter guidance and documentation

      Reviewer #2:

      ...users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization...

      We agree that users should not blindly trust default or suggested parameters. We substantially expanded and centralized documentation by adding a parameter-selection walkthrough that explains what each modifiable parameter does, how it affects intermediate and final outputs, and how diagnostic plots generated at each stage should be interpreted. Rather than prescribing dataset-specific parameter values, we explicitly framed parameter selection as an iterative, hypothesis-driven process informed by experimental factors such as calcium indicator kinetics, lens size and numerical aperture, field of view, recording duration, and expected neuronal density. We consolidated previously dispersed explanations from the GitHub repository into a single documentation site and expanded figure descriptions to guide interpretation by less experienced users. A representative sample dataset and accompanying analysis code were made publicly available at https://github.com/ariasarch/MPS_Sample_Code to support parameter exploration on tractable data.

      (6) Packaging and distribution

      Reviewer #1:

      ...current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

      Regarding distribution more broadly: while our one-click installers are intended to reduce setup burden for non-programmers, we recognize the value of conventional environment-based distribution for longterm sustainability. We are exploring the feasibility of adding a standard PyPI and/or Conda installation pathway alongside the standalone installers. To ensure reproducibility across environments, all package dependencies are now explicitly version-pinned at installation time, eliminating environment drift as a source of irreproducibility.

      We would note, however, that PyPI distribution alone does not fully resolve the reproducibility challenges inherent to scientific Python software. Even with version-pinned dependencies, downstream changes in the Python interpreter itself, compiled extension modules, and platform-specific build toolchains can silently alter numerical behavior in ways that are difficult to anticipate or control. Our standalone installers address this by shipping a complete, fixed execution environment, and we believe this remains a meaningful architectural advantage for ensuring long-term reproducibility - particularly for non-developer users who may not be in a position to diagnose subtle environment-related failures. We see PyPI/Conda support and standalone installers as complementary rather than equivalent approaches, and will pursue both where feasible.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equalfragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.

      Our analysis in Figure 2 considers two size classes: small colonies (l < 5) and large colonies (l ≥ 5). The equal-fragment model predicts that the fracture of a large colony gives rise to two daughter fragments with half the biovolume. For an average colony of l = 25 in diameter, this corresponds to two daughter fragments with a diameter of l = 18, which is still in the large colony class. Sequential fragmentation events would be required to set a biomass transfer to the small size range (l < 5). However, the nearly exponential behavior of the fragmentation frequency function (Eq. 5) implies that subsequent fragmentation events are greatly slowed down. Therefore, the equal-fragments model predicts that the biomass transfer from large to small colonies during the first five hours of the experiment is negligible. This is in a sharp contrast with the erosion model, which transfers biomass to the small size class at every fragmentation event. The difference between the two fragmentation models is quantified in Figure 2D, with a negligible change in biomass size distribution for the equal-fragment model (horizontal dash-dotted line) and a strong increase of small colonies for the erosion model (curved dashed line). Hence, it is clear from Figure 2D that the erosion model outperforms the equal-fragment model by capturing the observed shift from large to small colonies. We have now described this more clearly in lines 231-233.

      Nevertheless, the performance of the idealized erosion model is limited at late stages (Fig. S9D). We agree with the reviewer that this limitation could potentially be overcome with the introduction of variance in cell adhesion among colonies (as we hypothesized in lines 140142). However, this is not a trivial thing to do, as it would require additional free parameters and reduce the simplicity of the model. Therefore, we chose to restrain our model to the common assumptions of idealized fragmentation models widely used in literature (e.g. references 53-55).

      Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.

      In this work we aimed to investigate the effects of shear forces over a wide range of values, extending beyond the regime of natural lakes into the strong mixing created by technological applications such as the bubble plumes that are applied in several lakes to suppress cyanobacterial blooms. The adhesion force between cells via, e.g., extracellular polysaccharides (EPS) play an essential role by controlling the resistance to shear-driven erosion, which has been quantified in our model by the fitting parameters S<sub>i</sub> and q<sub>i</sub>.

      We agree with the reviewer that we have missed some literature on Microcystis colony formation via cell aggregation (i.e., cell adhesion), for which we apologize. In our new revision, we have now included several new references [30-34,36] and we now describe the findings of these earlier studies. Specifically, in the Introduction we now pay more attention to the role of cell adhesion by writing (lines 53-60):

      “In contrast, cell aggregation (sometimes also called cell adhesion) can promote a rapid increase in colony size beyond the limit set by division rates, and may explain sudden rises in colony size in late bloom periods [26, 30, 31]. Aggregation rates depend on the stickiness of the colonies, which in turn is controlled by the EPS composition, pH, and ionic composition of water [27–29]. In particular, divalent cations such as Ca2+ can bridge negatively charged functional groups in EPS and therefore increase stickiness [32–34]. It has been shown that high levels of Ca2+ enhance cell aggregation in Microcystis cultures [35]. Moreover, cell aggregation can provide a fast defense against grazing [36]. Fluid flow plays an important role in cell aggregation by regulating the collision frequency between cells or colonies [6]. In addition, fluid flow ….”

      Furthermore, in the Conclusions we added (lines 374-376):

      “A previous study on colony aggregation at high Ca2+ levels observed similar morphological differences in colony formation [35]. There, an initial fast cell aggregation produced a sparse colony structure, followed by a more compact structure of the colonies associated with cell division”

      Finally, we would like to clarify a difference in terminology between the reviewer’s comment and our work. The term cell adhesion is commonly used in microbiology to refer to adhesion of cells with a solid substrate. In our work, the adhesion mediated by EPS occurs between free-floating cells and colonies. To avoid any confusion, we chose to refer to this process as cell aggregation, in line with other literature on suspended particles.

      Reviewer #2 (Recommendations for the authors):

      The authors have expanded on the image analysis process but now report substantially different correction factors (λ2 =2.79 compared to 73.13 in the previous submission; λ3 =0.52 compared to 13.71 in the previous submission). Could the authors comment on how the analysis changed? These correction factors for N<5 appear particularly relevant for the aggregation experiments presented in Figure 3. For measurements involving only small colonies, as in Figure 3, are these correction factors still valid? In addition, does the timing of image acquisition, i.e. when the colonies are imaged, influence the correction factors applied in this study?

      The description of the calibration process was improved in our earlier revision of the manuscript to improve clarity and remove unclear definitions. In the first version, the supplementary equation (S1) for the input variable N<sub>p</sub>[i] was defined as the number of features per frame. This variable is dependent on the frame dimension (2048x2048 px for large colonies, l>5, and 400x400 px for small colonies). We believe that a more suitable input is the concentration distribution, which is normalized by frame area, and therefore invariant to frame dimensions and less prone to misinterpretations. For this reason, we adjusted this definition of N<sub>p</sub>[i] in the revised version of the manuscript, so that it expresses the number of features per frame area (instead of per frame). These changes required the calibration constants, λ<sub>2</sub> and λ<sub>3</sub>, to be updated in the manuscript by a factor of (400 px/2048 px)<sup>2</sup>. This explains why these two calibration constants changed by a factor 0.038. This rescaling of the input variable N<sub>p</sub>[i] and the calibration constants did not affect the final results of our calculations (Figures 2 and 3).

      The authors use a moderate dissipation rate to stir the colonies, after which they allow them to sediment. How long were the particles allowed to sediment before measurements were taken? Intuitively, one might expect a greater number of colonies to be detected following sedimentation, yet the authors report only about one third of the colonies in the sedimented state. What accounts for this reduction? Furthermore, if higher shear rates are applied, do the results differ, for instance if particles are lifted further by the shear flow? Some more clarity would help other researchers to perform similar work.

      The sedimentation of particles following an initial stir was applied only for creating a reference size distribution, displayed in the supplementary Figures S8-C and D. As one intuitively would expect, a higher concentration of colonies was detected after sedimentation (Fig. S8-C and D) than during the shear flow (Fig. S8-A and B). During all other experiments in our work, the applied dissipation rate was sufficient to ensure a uniform distribution of colonies in suspension throughout the parameter range, as described in lines 461-473.

      In the caption of Figure S8 we have reported the number of colonies counted in small subsamples. These numbers are just small subsets of the total number of colonies contained in the entire volume of the cone-and-plate setup. A sub-sample with larger volume was measured during the shear flow in comparison to the sub-sample measured for the sedimented sample, leading to a larger number of counted colonies in panels A and B (N = 10776, combined) compared to panels C and D (N = 3066 and 1455, respectively).

      However, when normalized for the volume of the sub-samples, the calculated concentration of colonies is higher for panels C and D (as shown in the graphs). We understand that the earlier caption description of Figure S8 was misleading, for which we apologize. In the revised version, we have adjusted the caption to better describe the quantity:

      “Number of colonies counted during sampling …”

      Line 797 contains an unfinished edit ("Figure ADD") that should be corrected.

      The unfinished edit has been corrected in the newly revised manuscript. Thanks!

    2. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigate the role of fluid flow in shaping the colony size of a freshwater cyanobacterium Microcystis. To do so, they have created a novel assay by combining a rheometer with a bright field microscope. This allows them to exert precise shear forces on cyanobacterial cultures and field samples, and then quantify the effect of these shear forces on the colony size distribution. Shear force can affect the colony size in two ways: reducing size by fragmentation and increasing size by aggregation. They find limited aggregation at low shear rates, but high shear forces can create erosion-type fragmentation: colonies do not break in large pieces, but many small colonies are sheared off the large colonies. Overall, bacterial colonies from field samples seem to be more inert to shear than laboratory cultures, which the authors explain in terms of enhanced intercellular adhesion mediated by secreted polysaccharides.

      Strengths:

      -This study is timely, as cyanobacterial blooms are an increasing problem in freshwater lakes. They are expected to increase in frequency and severeness because of rising temperatures, and it is worthwhile learning how these blooms are formed. More generally, how physical aspects such as flow and shear influence colony formation is often overlooked, at least in part because of experimental challenges. Therefore, the method developed by the authors is useful and innovative, and I expect applications beyond the presented system here.

      -A strong feature of this paper is the highly quantitative approach, combining theory with experiments, and the combination of laboratory experiments and field samples.

      Weaknesses:

      This study has no major weaknesses. Although the initial part of the introduction seems to imply that fluid flow is the predominant factor in shaping cyanobacterial colony (de)formation, the ensuing discussion is sufficiently nuanced for the reader to understand that the multicellular lifestyle of cyanobacterium Microcystis is shaped by multiple effects, that include bacterial behavior (e.g. which and how much EPS is produced), environmental variables that control cellular aggregation or adhesion and, indeed, fluid flow.

    3. eLife Assessment

      With the goal of investigating the assembly and fragmentation of cellular aggregates, this manuscript examines cyanobacterial aggregates in a laboratory setting. This quantitative investigation of the conditions and mechanisms behind aggregation is an important contribution as it yields a basic understanding of natural processes and offers potential strategies for control. The combination of computational and experimental investigations in this manuscript provides convincing support for the role of shear on aggregation and fragmentation.

    1. eLife Assessment

      There is a growing interest in understanding the individuality of animal behaviours. In this important article, the authors build and use an impressive array of high throughput phenotyping paradigms to examine the 'stability' (consistency) of behavioural characteristics in a range of contexts and over time. The results show that certain behaviours are individualistic and persist robustly across external stimuli while others are less robust to these changing parameters. The data supporting their findings is extensive and convincing. At the same time, the main analyses focus on a selected subset of the many behavioural metrics recorded, so a large fraction of the acquired data remains only lightly explored; by making these additional data available, the authors provide an invaluable resource for future work to apply alternative analytical frameworks and further mine this rich dataset.

    2. Joint Public Review:

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Comments from the editors on the latest version:

      In the latest communication, the authors were asked to (i) justify their selection of metrics (i.e. why these specific five behavioural metrics were chosen from the many recorded), (ii) discuss the variation in ICCs, and (iii) in light of this variation and the reliance on a few selected behavioural parameters, tone down the general claim so as not to overstate that individuality persists across all behaviours.

      We note that the justification for choosing the five metrics and the discussion of ICC variation are purely qualitative, and, despite the edits, the manuscript continues to frame individual behaviours as broadly stable.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. eLife Assessment

      This valuable study advances our understanding of best practices for analyzing population-level data using advanced functional alignment methods. It provides convincing evidence that demographic-specific functional templates improve functional neuroimaging studies that use hyperalignment. This study will be of interest to cognitive neuroscientists, neuroimaging methodologists, and computational researchers with an interest in the human brain.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a compelling case for the necessity of age-specific templates in functional hyperalignment. Given that the brain undergoes substantial developmental, structural, and functional changes across the lifespan, a 'one-size-fits-all' canonical template is often insufficient. This study effectively demonstrates that incorporating age-congruent features significantly enhances the performance and sensitivity of hyperalignment models. By validating these findings across two independent datasets (Cam-CAN and DLBS), the paper provides robust evidence that accounting for age-related functional organization is a critical prerequisite for accurate functional alignment in lifespan research.

      Strengths:

      (1) The authors used three metrics to evaluate performance. Across all metrics, they found that age-congruent templates outperformed age-incongruent templates, suggesting that age-specific templates can improve alignment.

      (2) These findings highlight the superiority of age-congruent templates for hyperalignment. This work underscores the importance of age-matching in cross-subject functional mapping and represents a vital step forward for the methodology.

      Weaknesses:

      (1) Participant Demographics and Group Separation:

      The study defines the 'older' cohort as 65-90 years and the 'younger' cohort as 18-45 years. While this 20-year gap (ages 46-64) effectively maximizes the contrast between groups, the results in Figure 4a suggest that the predicted individualized connectomes follow a continuous distribution. Given this continuity, could the authors provide the average median trends for Figures 2a and 2b to illustrate how the model behaves across the missing age range?

      (2) Request for Implementation:

      I have been unable to locate the source code associated with this publication. Could the authors please provide a link to the repository or clarify if the implementation is available for reproduction?

      (3) Analysis of Prediction Performance and Distribution:

      While Figures 3b and 5b clearly demonstrate that the congruent template improves correlation, Figure 4a shows a distinct shift in the scatter distribution. Could the authors provide a detailed explanation of the prediction performance metrics used? Specifically, I would like to understand how the underlying method accounts for the distribution differences observed when applying the congruent template.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Zhang and colleagues examine the role of participant selection in creating and using functional templates to improve analyses using hyperalignment. Hyperalignment aligns participants' functional MRI data to a shared functional template, analogous to the anatomical templates used to bring anatomical MRI data into a shared space (e.g., MNI152). The question of appropriate template creation is especially pressing for population-level analyses, where a large number of demographic groups (e.g., different age ranges, clinical statuses) may be included in the same analysis. These different demographic groups may have differences in their functional organization that complicate the creation of a single study-specific functional template.

      To provide an initial investigation of the potential effect of demographic-specific templates, the authors use the publicly available Cam-CAN dataset, which contains participants from 18 to 87 years of age. They define a young adult (< 45 years of age) and an older adult group (> 65 years of age) from this dataset with approximately the same number of participants. They investigate whether "age-congruent" templates (i.e. defined in the same age group they are used) improve three analyses where hyperalignment has been previously shown to boost performance: inter-subject correlation, predicting individual connectomes, and predicting individual functional responses. Using the Cam-CAN-derived older adult template, they then replicate the ISC analyses using the publicly available Dallas Lifespan Brain Study (DLBS).

      Overall, the presented results are highly suggestive that age-congruent templates consistently improve performance, though the absolute effects are small.

      Strengths:

      The use of a separate validation sample, reusing the same template calculated with Cam-CAN, highlights the potential of developing independent templates for individual demographic groups and then distributing these for wider use, analogous to the MNI templates that are widely used throughout the field of neuroimaging. This suggests that the potential impact of this framework is significant.

      Weaknesses:

      While the authors appropriately highlight the potential applications of this result (e.g., to different clinical statuses), it is not apparent how to appropriately extend this methodology to many common experimental paradigms. For example, in case-control studies (where researchers are interested in comparing clinical and non-clinical participants) the use of two different functional templates may complicate rather than ease analyses. Providing this as a potential limitation of the current template construction method, or providing recommendations to researchers interested in comparing across groups, would help to increase the impact of this work.

    1. eLife Assessment

      This important study demonstrates that Mycobacterium tuberculosis suppresses protective Th17/IL-17 responses in C57BL/6 mice via a Tbet-dependent mechanism involving the virulence factors ESX-1 and PDIM, as mutants lacking these factors induce significantly higher IL-17-producing CD4 T cells and IL-17A in the lungs compared to wild-type bacteria. The experiments are rigorous and well-designed, combining host knockouts and bacterial mutants to yield solid evidence pointing to cross-regulation between Th1 and Th17 pathways, including reduced IL-23 in draining lymph node dendritic cells. However, some of the data on IFN-γ effects or lymph node-specific mechanisms are incomplete and require deeper mechanistic insight, such as direct T cell transcription factor analysis in lymph nodes and broader host validation, to strengthen the work. Overall, the findings provide insight into how bacterial virulence factors limit Th17 induction, thereby promoting persistence, and will interest immunologists and TB researchers focused on host-pathogen balance and vaccine strategies.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript examines the factors that restrict the induction of IL-17-producing T cells during Mycobacterium tuberculosis (Mtb) infection. The authors show that neither the infectious route nor the duration of infection is responsible. But they do show that mice that lack the Th1-defining transcription factor, a finding consistent with prior reports in the field of immunology. They also show that 2 highly attenuated Mtb mutants in ESX-1 and PDIM, two well-known Mtb virulence factors, do induce IL-17-producing T cells. In contrast, Mtb mutants in mmpl4 are also similarly attenuated, but do not induce IL-17-producing T cells, suggesting that this property is not simply a result of attenuation but due to specific properties of ESX-1 and PDIM-deficient mutants.

      Strengths:

      (1) It is interesting that mice infected with ESX-1 and PDIM mutants have increased induction of Th17 cells.

      (2) The data are solid and convincing throughout.

      Weaknesses:

      There are two main criticisms:

      (1) It is not clear how much the factors uncovered here are true beyond B6 mice. B6 mice, compared to humans, are known to be very Th1-skewed, and Tbet is a strong inhibitor of Th17-specific T cells. Many people make IL-17-producing T cells in response to Mtb infection.

      (2) Very few novel insights are mechanistically revealed about how Th17 induction is restricted by Mtb. Tbet induction is known to restrict Th17 development, and this is a T-cell intrinsic mechanism. In contrast, the IL-23 association revealed seems to be extrinsic to T cells and to act on T cells. How, if at all, are these factors related to each other in restricting Th17 induction? Also, the conclusion that it is not a result of attenuation is not completely convincing.

      Other points:

      (1) The authors show that mice infected with a deficiency in ESX-1 have more IL-17-producing CD4 T cells in response to stimulation with an ESAT-6 peptide pool (Figure 3B). Because ESAT-6 is encoded by ESX-1, why do mice infected with this Mtb mutant have any ESAT-6-specific T cells? Is it an incomplete knockdown?

      (2) The manuscript states, "Under the conditions where Th17s are highly induced, mice infected with either ΔESX-1 or PDIM lacking Mtb, the Il17a-/- mice had ~3-5 fold higher CFU than WT mice (Figures 3F-G). These results indicate that the induction of Th17s is not dependent on the attenuation of Mtb in general, but instead Mtb utilizes ESX-1 and PDIM to suppress the induction of a Th17 response that enhances protection against Mtb infection." I don't think the last sentence is necessarily true. I can imagine a scenario in which the induction of the Th17s is, in fact, due to the attenuation, and the Th17 induction still contributes to protection.

      (3) ESX-1, PDIM, and mmpl4 mutants all have similarly reduced CFUs in the lung, but what about the LN? The bacterial burden in the LN may be more important for regulating T-bet, IL-23, and Th17 differentiation, since the LN is where T cell priming occurs, than the CFU in the lung. Perhaps ESX-1 and PDIM mutants have reduced CFU in the LN, but mmpl4 does not. This difference in LN burdens may be the primary driver of Th17 priming, as high avidity interactions are thought to be an important driver of T-bet induction.

      (4) Do LN cDC1 and high levels of IL-12 p35 in mice infected with the mmpl4 mutant? Likewise, LN cDC2's express low levels of IL-12 p19 (akin to those infected with WT Mtb)? If these observations for ESX-1 and PDIM mutants are mechanistically linked to the increased numbers of Th17 cells, then you would expect mice infected with mmpl4 mutants to be more like those infected with WT Mtb than those infected with ESX-1 and PDIM mutants.

      (5) ESX-1 and PDIM are very different virulence factors - a protein secretory pathway and cell wall lipid, respectively? Mechanistically, how would mutants in these pathways give very similar outcomes regarding Th17 cells unless it was simply as an aspect of their attenuation? Perhaps, mmpl4 mutants simply differ in some aspects of their attenuation, such as bacterial burdens in LNs, or their interaction with cDCs?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors tackle an important question of why IL-17 production and TH17 responses are lower than expected during Mtb infection. The authors identify an axis of cross-regulation between TH1 and TH17 cells and provide data to support roles for Mtb virulence factors ESX1 and PDIM in promoting TH1 responses and/or suppressing TH17 responses.

      Strengths:

      The strengths include the significance of the work, the combination of host and Mtb genetic models to dissect the mechanistic basis for regulation of IL-17 production from T cells during infection, and the rigor of the experiments. There are a number of exciting findings from the work, including the cross-talk between T cell responses and the impact of ESX1 and PDIM on these responses.

      Weaknesses:

      The following conclusions and interpretations should be revisited, rephrased, and re-evaluated:

      (1) The manuscript neglects to analyze T cell responses in the dLN, which is the critical site where these responses are initiated (only DC cytokine production is measured in the dLN). The differences in the lungs could reflect trafficking of T cells to the lungs, local lung T cell responses, or durability of the T cell responses in the lungs. The authors state in the last results section that "These results indicate that the ESX-1 and PDIM virulence factors impact naïve T cell differentiation at the draining mediastinal lymph node..." but T cell responses are never measured in the dLN.

      (2) Figure 2: The authors state that "Importantly, IFN-γ deficient mice did not exhibit elevated levels of IL-17A producing CD4 T cells demonstrating that IFN-γ production is not the mechanism by which Th1 T cells limit a Th17 response during Mtb infection", but the difference is significantly different and even more obvious in Panel B. In fact, if the Panel D y-axis was on a log scale, the Ifng-/- would likely look more like Tbet-/- than WT. Based on this data, it seems like IFNg is having an effect and should not be completely discounted. Does the deletion of Ifng affect the number of Tbet+ T cells?

      In addition, the deletion of Tbet results in an increased number of IFNg+IL-17+ double positive T cells (Figure 2B), in addition to a sizable IFNg single positive T cell population maintained in the Tbet-/- mice (10x the negative control of Ifng-/-). Is this why Tbet deletion is not as severe as Ifng deletion, because T cells are still making IFNg?

      Along these lines, the statement in the text that, "Tbet-/-Il17a-/- mice completely lacked both IFN-γ producing...." T cells is not supported by the data in Figure 2C. Tbet-/-Il17a-/- mice look to have more gamma-producing T cells than Tbet-/- mice (which is already 10x the negative control of Ifng-/- in panel 2B if one includes the gamma single positive and IFNg/IL-17 double positive).

      (3) In the Results sections describing Figures 3, 4, and 5, the authors equate IL-17 production by T cells with TH17 responses and IFNg expression with TH1, but Tbet and RORgt expression in the T cells should be measured to make conclusions about TH1 and TH17. Or the authors can rephrase their findings to specifically state the observations as IFNg or IL-17 expressing CD4+ T cells.

      (4) Conceptually, do the authors think that ESX1/PDIM promotes TH1 responses and this blocks TH17 or are ESX1/PDIM blocking TH17 responses directly, allowing for increased TH1 responses? It would be helpful to clarify the model in this regard, describe how the data supports one model or the other, and then make sure the language is consistent throughout. Can these effects on T cell responses be tested and recapitulated in vitro using infected APC and T cell co-cultures?

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Zilinskas et al seeks to understand the mechanisms underlying the ability of Mtb to suppress Th17 differentiation. As Th17 responses are needed for protective immunity against TB, this is an important topic of investigation. They use Mtb mutants that lack eccC1 (from the ESX-1 locus) and fadD28 (encoding PDIM) and implicate a Tbet-dependent pathway by which Mtb modulates Th17 differentiation. The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is, however, unclear, which limits the novelty of the results.

      Strengths:

      Understanding how Mtb limits Th17 differentiation has implications for vaccine development. Comparative study of KO mice and Mtb mutants is a strength.

      Weaknesses:

      (1) The authors should acknowledge and reference key findings from the literature that have identified suppression of Th17 differentiation as an Mtb virulence mechanism, e.g., the role of the Hip1 protease and CD40 signaling (Madan-Lala JI 2014, Sia Plos Path 2017, Enriquez iScience 2022) and Khader JI 2005, showing the requirement of IL-23 for Th17 responses in vivo in a TB mouse model.

      (2) Addressing several questions related to the Tbet KO mouse experiments would strengthen the study. Do the Tbet KO mice have elevated IL-4/5/13 (which has been previously reported in non-TB studies) in addition to IL-17? The lack of Th17 cells in the IFNg KO compared to the Tbet KO may be due to a difference in timing, since only 3-week data are shown; earlier and later time points would provide better interpretation. The authors do not present any data on neutrophil infiltration in WT vs Tbet KO vs IFNg KO mice. Since IL-17 is known to be important for recruiting neutrophils to the lung, data on neutrophils are important for clarifying the mechanism for the CFU outcomes.

      (3) While IL-23 is important for sustaining IL-17 production, IL-6, TGF-b and/or IL-1β are necessary for Th17 polarization. What were the levels of these cytokines in DCs in the lung? (Figure 5). Additionally, Tbet-deficient DCs exhibit impaired activation of antigen-specific Th1 cells and have reduced IL-12 production. Given the data showing higher IL-17 levels in Tbet KO mice, the authors should provide information on the DC phenotype (IL-23, IL-6, etc.) in the Tbet KO experiments.

      (4) The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is not clear. While data showing a role for ESX-1 and PDIMs in inhibiting Th17 responses is interesting, there is no insight into the potential mechanism of action. Figure 3 showing reduction in IFNg+ CD4 T cells after infection with eccC1 and fadD28 mutants suggests that this outcome is due to a lower bacterial load relative to WT Mtb at the 3-week time point. Since IFNg is known to suppress IL-17, the higher levels of Th17 cells could be due to the reduction in IFNg due to the attenuated growth of the mutants. Additionally, what was the level of Type I IFNs elicited by these mutants?

      (5) Since macrophages have been implicated in the reduced cytokines seen in the ESX-1 mutant, IL-23 and other cytokine data on lung macrophages would complement the DC data.

      (6) Figure 5. There are many fewer DCs overall in the eccC1 and fadD28 mutant groups, which could account for the increased % IL-23p19 in DCs (5D). What were the levels of IL-23 in DC1s?

    1. eLife Assessment

      This valuable manuscript demonstrates that embryonic exposure to the pesticide chlorpyrifos (CPF) impairs juvenile zebrafish social behavior and sets out to define the underlying mechanism. The authors provide solid evidence that butyrate and class I histone deacetylases are involved, as their modulation rescues the phenotype. However, claims that CPF acts through the microbiome and nitric oxide signaling remain correlative and incomplete. Additional validation would strengthen the intriguing hypotheses raised by this work.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine the effect of Chlorpyrifos (CPF) exposure on zebrafish social development. They expose larval zebrafish to CPF (0 - 3 dpf), and report social deficits at juvenile stages. They show that the gut microbial metabolite butyrate can rescue these social deficits, proposing that butyrate acts as a histone deacetylase (HDAC) inhibitor, given that inhibition of some HDACs can also rescue social deficits. They also show that CPF changes neuronal gene expression, and butyrate partially rescues these changes. Finally, they demonstrate changes in gut microbiome and metabolome composition, pointing to potential modulation of nitrogen metabolism pathways. They then hypothesise that NO can modulate HDAC activity and attempt to link the NO pathway to social behavior.

      Strengths:

      The authors demonstrate an interesting link between early Chlorpyrifos (CPF) exposure and later-life social deficits, such as changes in neuronal gene expression, including some autism-related genes, and provide solid evidence that butyrate and epigenetic modulation (histone deacetylase inhibition) may be involved.

      They also comprehensively characterise the microbiome and metabolome of CPF-exposed zebrafish, providing a useful resource for further investigation into its gut-brain mechanisms.

      They are cautious in framing some of their conclusions as a hypothesis and provide some suggestions for future analyses.

      Weaknesses:

      The claim that butyrate's effects on CPF-induced social deficits and neuron activity changes are mediated by histone deacetylase inhibition is lacking some additional controls and, hence, is not completely supported.

      Details on the social behavior assay performed and other potential morphological or behavioral changes were not provided.

      Claims on the mechanism of action of CPF are inconclusive. The causal role of the gut microbiome is not established, especially since gut microbial dysbiosis may also be a downstream consequence of direct effects of CPF on the host, such as changes in host gut gene expression. Evidence for the role of nitrogen metabolism is also incomplete, and the authors have not discussed or ruled out the potential alternative mechanism of reduced butyrate production due to gut microbiome changes.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Diaz et al. uses the zebrafish model to examine how early embryonic exposure to Chlorpyrifos (CPF), a widely used organophosphate pesticide, induces social behavior deficits later in life. This paper combined behavioral testing, pharmaceutical treatment, genetic manipulation, and multi-omics to test the hypothesis that early CPF increases the abundance of denitrifying bacteria, Pseudomonas, which, in turn, enhances nitric oxide production and induces selective inhibition of HDAC8 and abnormal gene expression in the brain.

      Strengths:

      (1) The observation that early embryonic CPF exposure causes behavior deficits in juvenile zebrafish is very intriguing. It is especially exciting to see that CPF-induced behavior deficits can be reversed by overnight treatment with butyrate or HDAC1 inhibitors in juvenile zebrafish. In humans, CPF exposure during pregnancy causes brain abnormalities and neurological disorders such as Autism. Though it is far away from the zebrafish experimental study to human application, the experimental effects reported in the paper are still quite thought-provoking.

      (2) The authors performed RNA sequencing experiments on control zebrafish, CPF-exposed zebrafish, and CPF-exposed zebrafish that were treated with Butyrate. The data not only showed large-scale transcriptomic changes in the juvenile zebrafish brain in response to embryonic CPF exposure but also showed that many CPF-induced genetic alterations can be alleviated by butyrate exposure later in life.

      (3) The authors also performed untargeted metabolomics on zebrafish gut and metagenomic analysis in zebrafish feces samples. The results are interesting and support the conclusion that increased Intestinal Nitric oxide metabolism and the abundance of denitrifying bacteria, such as Pseudomonas, are associated with CPF exposure.

      (4) The large datasets presented in the paper will be useful to other researchers interested in understanding how CPF or butyrate alters brain and gut function. It might be useful to generate new hypotheses to power other research lines.

      (5) The social preferences, behavior testing, and experimental paradigm used by the paper may also be used by other researchers to investigate the interaction among gene, environmental factors, and brain function.

      Weaknesses:

      (1) The presented link between gut microbiome and CPF-induced behavior and genetic alteration is an association, but not causation. Although the research data align with the hypothesis, the hypothesis is not fully supported or tested by the data presented in the paper in the current state.

      (2) The authors performed several large omic studies. However, some of the presented analyses are relatively simple and incomplete. For example, the authors performed shotgun metagenomic analysis on zebrafish feces. However, the paper only displayed the bacterial taxa differences. Are there any differences in bacterial genetic pathways, especially the pathways associated with microbial nitrogen metabolism? What is the alpha and beta diversity looking like when comparing different experimental groups?

    1. eLife Assessment

      This work provides an important modeling-based framework for understanding the processes of temporal integration in the claustrum. These mechanisms could support a broader range of integrative brain function. However, at present, the evidence remains at least in part incomplete, primarily because of over-interpretation of the results and their connection to neurophysiology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate how the anterior claustrum may integrate temporally separated task-relevant signals to guide behavior in a delayed escape paradigm. Because in vivo neural recordings from claustrum during this task are extremely limited - comprising single-trial data with small neuronal samples - the authors adopt a modeling-driven approach. They train recurrent neural networks (RNNs) using only behavioral data (escape latency) to reproduce task performance and then analyze the internal dynamics of the trained networks. Within these networks, they identify a subset of units whose activity exhibits persistent responses and strong correlations with behavior, which the authors label as "claustrum-like." Using dimensionality reduction, decoding, and information-theoretic analyses, they argue that these units dynamically integrate conditioned stimulus (CS) and door-opening signals via nonlinear, trajectory-based population dynamics rather than fixed-point attractor states.

      To bridge model predictions and biology, the authors complement the modeling with in vitro slice experiments demonstrating recurrent excitatory connectivity and prolonged activity in the anterior claustrum that depends on glutamatergic transmission. They further compare latent neural trajectories derived from previously published in vivo claustrum recordings to those observed in the RNN, reporting qualitative similarities. Based on these results, the authors propose that the claustrum implements temporal signal integration through recurrent excitatory circuitry and dynamic population trajectories, potentially supporting broader theories of integrative brain function.

      Strengths:

      This study addresses an important and challenging problem: how to infer population-level computation in a brain structure for which in vivo data are sparse and experimentally constrained. The authors are commendably transparent about these limitations and seek to overcome them through a principled modeling framework. The integration of behavioral modeling, RNN analysis, and slice electrophysiology is ambitious and technically sophisticated.

      Several aspects stand out as strengths. First, the behavioral RNN is carefully trained and interrogated using a rich set of modern analytical tools, including cross-temporal decoding, trajectory analysis, and partial information decomposition, providing multiple complementary views of network dynamics. Second, the slice experiments convincingly demonstrate recurrent excitatory connectivity in the anterior claustrum, lending biological plausibility to the model's reliance on recurrent dynamics. Third, the manuscript is clearly written, logically organized, and conceptually engaging, and it offers a coherent mechanistic hypothesis that could guide future large-scale recording experiments.

      Importantly, the work has significant heuristic value: rather than merely fitting data, it attempts to generate testable computational ideas about claustral function in a regime where direct empirical access is currently limited.

      Weaknesses:

      Despite these strengths, the manuscript suffers from a recurring and substantial conceptual issue: systematic over-interpretation of model-data correspondence. While the modeling results are potentially insightful, the extent to which they are presented as recapitulating real claustral neural mechanisms goes beyond what the available data can support.

      A fundamental limitation is that the RNN is trained solely on behavioral output, without being constrained by neural data at either single-unit or population levels. As a result, the internal network dynamics are underdetermined and non-unique. Many distinct internal solutions could plausibly generate identical behavior. However, the manuscript frequently treats the specific internal solution discovered in the RNN as if it were a close approximation of the actual claustrum circuit.

      This issue is compounded by the sparse nature of the in vivo data used for comparison. The GPFA-based trajectory analyses rely on pseudo-populations and single-trial recordings, yet are interpreted as evidence for robust population-level dynamics. Because neurons were not recorded simultaneously, the inferred trajectories necessarily lack true population covariance and shared trial-to-trial variability, limiting their interpretability as genuine population dynamics. Similarly, conclusions about trajectory-based versus attractor-based computation are drawn almost exclusively from model analyses and then generalized to the biological system.

      Overall, while the modeling framework is appropriate as a hypothesis-generating tool, the manuscript repeatedly crosses the line from proposing plausible mechanisms to asserting explanatory or even causal equivalence between the model and the brain. This undermines the otherwise strong contributions of the work.

      Below are several specific points that warrant further clarification or revision:

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      Overall Assessment:

      This manuscript presents an interesting and potentially valuable modeling-based framework for thinking about temporal integration in the claustrum, supported by solid slice physiology. However, in its current form, it overstates the degree to which the proposed RNN dynamics reflect actual claustral neural mechanisms. With substantial revision - especially a more cautious interpretation of model-data similarity and a clearer articulation of modeling limitations - the study could make a meaningful contribution as a hypothesis-generating work rather than a definitive mechanistic account.

    3. Reviewer #2 (Public review):

      This manuscript reports the behavior of a computational model of rat claustral neurons during the performance of a behavioral task known as the delayed escape task (in this reviewer's understanding, this behavioral task was created and implemented by this group only). These authors have argued in a prior manuscript (Han et al.) that a group of neurons located "rostral to striatum" is part of the claustrum. The group names the region the "rostral to striatum claustrum." Additionally, in the Han et al. paper, the authors argue that these cells are responsible for maintaining a signal that lasts through the delay period.

      The main findings of the current paper are:

      (1) The authors have built a model network that was trained to show firing similar to what was reported for rats in their prior paper.

      (2) The authors' analysis of model behavior is used to suggest that the model network recapitulates biological activity, including the existence of a cluster of cells mainly responsible for the delay period firing.

      (3) The authors offer evidence from patch clamp recordings for excitatory interconnections among claustral neurons that are an essential feature of the model network.

      A major value of the computational network is that "trials" of the network can be performed. In experiments on animals, only single trials can be used.

      Concerns:

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Our goal was to propose a possible computational mechanism underlying information integration in the claustrum, not to claim structural or causal equivalence between the model and the biological circuit. We acknowledge that some expressions in the original manuscript may have been interpreted as exceeding this intention, and we will revise the text to explicitly soften such statements.

      It is well established that behavior-trained RNNs can admit multiple internal solutions capable of producing the same behavioral output, and we fully agree with this point. Among the many possible solutions, we focused on networks that exhibited dynamical properties consistent with independently obtained behavioral and physiological findings. Thus, in our view, biological plausibility in this study is not grounded in structural isomorphism, but rather in whether the core population-level dynamical properties observed in the model are reproducible in actual claustral population activity.

      We also agree with the reviewer that our original qualitative comparison of GPFA-based latent trajectories did not provide sufficient quantitative support. In the revised manuscript, we have therefore added an eigenvalue-based quantitative analysis of the dimensional structure of population trajectories. This analysis does not depend on the identity of the dimensionality-reduction method itself, but instead focuses on quantifying the geometric structure of population-state trajectories as they evolve over time. Applying the same metric to both the RNN and biological claustrum data revealed consistent condition-specific differences in population dynamics.

      This quantitative addition strengthens the previous qualitative trajectory comparison and clarifies that the model implements a specific computational dynamical regime that directionally corresponds to claustral population activity. While this does not imply uniqueness of the model, we believe it suggests that the proposed computational principle represents a biologically realizable candidate mechanism.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      We agree with the reviewer’s concern. Expressions such as “closely mimicked,” “nearly identical,” and “recapitulate” will be replaced with more moderate language.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer noted, behavior-trained RNNs can yield multiple internal solutions that generate the same behavioral output, and we acknowledge this non-uniqueness. However, we do not interpret the relatively low success rate (5/100 networks) as evidence of fragility. Rather, we interpret it as suggesting that the emergence of this particular dynamical regime requires stringent structural constraints.

      The computational demands of the task—specifically, the integration of temporally separated signals—drive convergence toward networks capable of sustaining persistent activity through recurrent excitatory connectivity. Indeed, all networks exhibiting a claustrum-like cluster shared a strong recurrent excitatory structure within Cluster 1, a structural feature consistent with our slice electrophysiology findings.

      Our criterion for selecting RNNs was their ability to reproduce behavioral and physiological observations from the delayed escape experiment. Excluded RNNs may reflect alternative information-processing strategies characteristic of other brain regions or artificial logical solutions. Importantly, claustrum-like dynamics were not explicitly enforced during training; they emerged spontaneously under behavioral constraints, suggesting that this solution is not arbitrary.

      Furthermore, the computational principles derived from the RNN were quantitatively consistent with in vivo single-neuron activity. Using an eigenvalue-based metric (λ<sub>3</sub>/Σλ), both the RNN and biological claustrum data showed effects in the same direction. Leave-one-neuron-out analyses further demonstrated that this pattern was broadly distributed across neurons in the claustrum. These convergent results suggest that the identified network captures a computational regime that is consistent with claustral population dynamics, rather than representing an arbitrary solution unrelated to the biological observations.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      We agree that the original GPFA trajectory comparison in the biological claustrum data remained qualitative and did not sufficiently establish robustness or population-level structure. We have therefore added quantitative analyses in the revised manuscript.

      Before presenting these analyses, we clarify methodological limitations inherent in pseudopopulation and single-trial data. GPFA estimates latent trajectories based on covariance structure and temporal smoothness assumptions. In pseudopopulations, true simultaneously recorded covariance cannot be fully reconstructed. Although our dataset is based on single trials rather than trial-to-trial variability, we acknowledge that latent-space estimation depends on covariance structure.

      Therefore, the additional quantitative metric is not independent of the GPFA estimation stage; rather, it evaluates the geometric structure of single-trial latent trajectories estimated by GPFA.

      Specifically, for biological data, we reanalyzed GPFA-estimated latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). Across 20 time bins, a sliding window of 10 bins was applied. For each window, we computed the covariance matrix and extracted eigenvalues for PC1, PC2, and PC3. The third eigenvalue (λ<sub>3</sub>) was normalized by total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the extent to which trajectories deviate from a planar (two-dimensional) structure into a third dimension. An increase in λ<sub>3</sub>/Σλ indicates the formation of a higher-dimensional geometric structure.

      For RNN data, since all unit activities were simultaneously observed and sufficient trials were available, we directly applied PCA to population activity without GPFA. Mean trajectories across trials were computed, and the same λ<sub>3</sub>/Σλ metric was applied. Although the initial dimensionality-reduction steps differ, the final metric definition and computation are identical. Thus, the comparison focuses on geometric dimensional structure rather than the dimensionality-reduction method itself.

      Importantly, within the biological dataset, GPFA estimation, preprocessing, pseudopopulation construction, subsampling strategy, temporal alignment, and smoothing were applied identically across the CS and Neutral conditions. Under this common analysis framework, λ<sub>3</sub>/Σλ values were consistently higher in the CS condition than in the Neutral condition.

      For the RNN data, an identical analysis pipeline was applied across the CS+Open and Open-only conditions. In this case as well, λ<sub>3</sub>/Σλ values were significantly higher in the CS+Open condition than in the Open-only condition.

      If structural bias arose from covariance estimation or dimensionality reduction, it would be expected to affect conditions similarly within each dataset. The observation that λ<sub>3</sub>/Σλ increases selectively in the CS condition in biological data and in the CS+Open condition in the RNN therefore supports the interpretation that the effect reflects a condition-specific dynamical difference rather than an artifact of dimensionality reduction.

      To further examine whether the effect was driven by a small subset of neurons, we performed leave-one-neuron-out analyses in the biological dataset. In the CS group, most neurons contributed relatively evenly to the metric, whereas such distributed contribution was not observed in the Neutral group. This suggests that the three-dimensional structure reflects an organized population-level phenomenon rather than covariance dominated by a small number of outlier neurons.

      These results indicate that the consistent elevation of λ<sub>3</sub>/Σλ in the CS condition (biological data) and in the CS+Open condition (RNN) reflects a genuine dynamical feature rather than an artifact arising from pseudopopulation construction or dimensionality reduction.

      Taken together, the three-dimensional geometric structure observed in GPFA-based latent trajectories is unlikely to reflect random noise. The replication of the same quantitative metric in the RNN, using an independent dimensionality-reduction procedure, strengthens the correspondence between the two systems. We appreciate the reviewer’s suggestion for quantitative reinforcement, which has substantially strengthened the manuscript.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and will clearly indicate that references to broader theoretical interpretations are speculative. We will substantially reduce their strength and emphasis.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      We agree with the reviewer’s concern. We will describe the delayed escape task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals” and remove inference-related terminology throughout the manuscript.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s constructive and well-balanced comments. We regret that some of our wording and the scope of our introduction and discussion may not have appropriately reflected the contributions of prior studies. We will revise the manuscript accordingly to ensure that previous literature is more accurately and fairly acknowledged. In addition, we will reorganize the figures to more clearly present the hypotheses being tested and will provide additional details regarding both the modeling framework and the experimental procedures.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We will clarify more explicitly which data and methods originate from Han et al. (2024). In the original manuscript, Figure 1 panels A, D, E, F, and L (left) were indicated in the legend as originating from Han et al. (2024). We will further clarify this distinction in the main text. Additionally, we will briefly describe the behavioral experiments and in vivo electrophysiology performed in Han et al. in the Methods section, with appropriate citation.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      As requested, we will provide additional details regarding model training procedures, weight matrices and their evolution during training, equations (2) and (3), the origin of constants used in the equations, and detailed methods for ChrimsonR injection (anesthesia, stereotaxic coordinates, injection parameters, and clarification of “sparse expression”).

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We will reorganize the figures to emphasize core results and clarify that the primary goal is to test and validate the computational model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We will cite Orman (2015) as suggested and note that persistent activity has been observed in slices cut at specific angles, consistent with our findings.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      We will remove wording implying “limited” prior work and appropriately acknowledge contributions from the Mathur and Citri groups.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      Across all whole-cell recordings, optogenetic responses were observed in 38 out of 43 patched cells (~90%), suggesting that a high proportion of claustral neurons receive intra-claustral excitatory input. However, precise connectivity frequency and strength cannot be determined from the current dataset.

      As the reviewer noted, our RNN is specialized for the delayed escape task, and we do not claim direct generalization to other proposed claustral functions such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism observed in this specific task.

      While our model is specific to the delayed escape task, the computational principle identified here—nonlinear trajectory-based temporal integration supported by recurrent excitatory connectivity—may represent a more general mechanism for integrating temporally separated signals. However, testing such generality lies beyond the scope of the present study and will be framed as a future direction in the revised Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

      We thank Reviewer 1 for their kind and constructive comments. While we have thoroughly addressed all specific recommendations below, in brief, we have added new analysis of the variance inflation factor in Supplementary Tables 2 and 3 to reassure readers that the chosen parameter sets exhibit low levels of collinearity, and provided more explanation for why the relative positional parameters were chosen to avoid this issue. We have added explanatory figures for all positional and orientational parameters to improve understanding of the technical details, and improved clarity of existing figures as detailed below. We welcome the suggestion to add QT interval to the manuscript – whilst this was only available in the UK Biobank for a single lead, we have included an analysis of both QT and QTc intervals in this lead to Page 10, and added some discussion of this to the second full paragraph of Page 14.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: “Collinearity and Regression Analysis: It would be valuable to assess the collinearity among the regressed parameters (e.g., cardiac size, torso volume, heart center positions [x, y, z], and cardiac orientation angles) and evaluate whether alternative regression methods (e.g., ridge regression) might improve robustness. Additionally, cardiac digital twinning with electrophysiological models could help isolate the exact contribution of electrophysiology while enabling sensitivity analysis. Nonlinear regression or machine learning approaches might also enhance the predictive power of the analysis.”

      We thank the reviewer for drawing attention to the important issue of collinearity in the parameter sets used in the regression analysis. To address this, we have added Supplementary Tables 2 and 3, which detail the variance inflation factors for each of the parameter sets used. This was considered in the selection of anatomical parameters – e.g. using relative position not absolute distances between landmarks, which would be more collinear. As these are all below a value of 3.4, we believe that the effect of collinearity is limited, and thus to reduce subjectivity of parameter selection in more complex methods, and encourage interpretability, we have retained our linear regression analysis. In addition, we have added an explanation to the second full paragraph on Page 6 of how we calculated the relative, rather than absolute position of the cardiac centre partially to avoid the problem of collinearity when using multiple absolute distances. We concur that modelling and simulation techniques are well suited to explore the electrophysiological component further – as this is out of the scope of this work, we have addressed the role of these methods in future work in the final paragraph of Page 16.

      Comment 2: “Figure Clarity (Bar Plots): The superimposed bar plots in Figures 2-4 are difficult to interpret; separating the bars for each coefficient would improve readability.”

      We accept that the stacked bar plots could be improved in their clarity. Whilst plotting each anatomical parameter separately multiplies the number of plots by a factor of nine, and makes comparison between parameters more difficult, we have added clear horizontal grid lines in order to make values easier to read and interpret.

      Comment 3: “Feature Extraction Visualization: A schematic figure illustrating the steps for measuring heart positional parameters (e.g., with example annotations) would help readers better understand the feature extraction methodology.”

      We agree with the reviewer that the calculation of positional and orientational parameters is crucial to illustrate clearly. We have included additional Supplementary Figures 2 and 3 to better convey these parameters.

      Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 postMI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is stateof-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

      We thank Reviewer 2 for their considered and detailed feedback. We greatly appreciate the invitation to elaborate on the electrophysiological factors, and we have added discussion of this matter to the second and third full paragraphs on Page 14, extending to Page 15 and first full paragraph on Page 15, and highlighted the role of modelling and simulation in future work on the third full paragraph of Page 16. We agree that registration errors are one reason behind remaining reconstruction errors and feel a strength of our study is that the large number of subjects used aided in reducing the effect of this noise, and have updated the second full paragraph of Page 16 to reflect this. We are wary of moving too many supplemental figures and tables describing demographic trends to the main manuscript for fear of diluting the specific answers to our research questions. We have however actioned the suggestions as detailed below to reformat the paper, including redressing the balance of supplemental versus main methodological sections, and thank the reviewer for their guidance in increasing our clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Please detail what "chosen to be representative of the underlying dataset" means in terms of a validation dataset.

      We thank the reviewer for addressing the lack of clarity in this matter. We have added a reference in the third full paragraph on Page 6 to Supplementary Appendix 1.1, where we have included full details of the selection criteria.

      (2) “Current guidelines ... further research [16]." The paragraph should begin with a broader statement that is relevant to the fact that the entire body of work focuses on ECG-based diagnosis differences in women, rather than LVEF through echocardiography.

      We have revised the introduction to Paragraph 3 on Page 3 to clarify our motivation for focusing on the ECG in order to shape proposals for novel ECG-based risk stratification tools.

      (3) The last paragraph of the introduction should more clearly state what was performed and how you aim to prove your hypothesis. There is no mention of the data, the regression model, or other key aspects important to the reader.

      We have added methodological details to Paragraph 5 on Page 3 in order to clarify our approach in testing our hypothesis.

      (4) An overview paragraph should be included in the Methods at the beginning.

      We thank the reviewer for this valuable suggestion – we have added an overview paragraph to the start of the methodology section on Page 5.

      (5) The computational pipeline portion of the methods should be written in full paragraphs instead of almost a bulleted list. In general, more details from the supplement should be provided in the methods.

      We thank the reviewer for raising important points concerning the balance of methodological description in the main manuscript and the supplementary materials. We have added detailed description of the reconstruction pipeline to Pages 5 and 6. We feel that the ordered format of the methods section adds to the reproducibility and transparency of our methodology.

      (6) The torso reconstruction method was already validated in Smith et al. [29]. What value does your additional validation bring to this methodology? Furthermore, how does the construction of the ventricular-torso reconstructions using the cardiac axes (not just the torso contours) influence ECG metrics?

      We apologise that this was not clear – we have clarified in Paragraph 4 on Page 5 that while Smith et al. 2022 provided a detailed validation to the contour extraction networks, it did not validate the torso reconstruction pipeline, as it only presents the reconstruction of two cases as a proof of concept. We have also expanded the second full paragraph on Page 6 to explain that the sparse (but not dense) cardiac anatomies were constructed in order to calculate the cardiac size, which we found was a key factor moderating many ECG biomarkers. We also specified that the cardiac position and orientation were necessary in order to relate these to the torso axes and positions of the ECG electrodes.

      (7) Include the details of the regression analysis in the main body of the methods for the readers. This is crucial to the claims and outcomes of the paper. Only a sentence is included in the results and one in the figure: "Each factor's contribution is calculated from the product of the regression coefficients and anatomical sex differences (Supplementary Appendix 1.5)." What specific contributions can I expect to see in the results figures? The results are filled with methodological aspects that should be in the results.

      We thank the reviewer again for this important comment regarding the balance of the main text methodology and supplementary methodology sections. We have added detail to the statistical analysis section of the main text on Pages 7 and 8 in order for the reader to understand the following results section without consulting the supplemental methods. We have also removed these details from the results section.

      (8) What is "the remaining estimated effect of electrophysiology". Did you do simulations on the electrophysiology, or how is this computed from the clinical data of patients? More explanation is needed, as without this, the paper is just focusing on anatomy.

      We have clarified this important point by moving the explanation of the methodology underpinning our estimation of the electrophysiological contributions using the clinical ECGs from the supplementary methods to the main manuscript on the second full paragraph on Page 7, and continuing to Page 8. We have also specified the role of simulations studies in future work on the final paragraph on Page 16.

      (9) Include an overview paragraph of the methods to create more structure.

      We thank the reviewer again for the further attention to this issue – as previously, we have added an overview paragraph to the methodology section on Page 5.

      (10) Only 19.8% of the patients were female, which is probably due to females having a more severe presentation of the disease. How does this impact, bias, or skew your results?

      This comment raises a very interesting point, and while the origin of this imbalance is of course multifactorial – women likely do have lower rates of MI events due to the cardioprotective role of estrogen and different health promoting behaviours, and our sex imbalance was reflective of wider trends in MI diagnosis. However, as mentioned in Paragraph 2 Page 3 of the text, there are more missed MI diagnoses in women, and we agree that this may lead to a more severe presentation of female MI pathophysiology. We have expanded the first full paragraph on Page 16 to specify the ECG and demographic impacts that this has on our results, and that it is a strength of this work that we may contribute to future adjustment of the diagnostic criteria, such that future investigations do not have this bias, and that clinical outcomes are improved.

      (11) A lot of extra information is provided in Tables 1 and 2. Include additional information in the supplements that is not directly relevant to your findings.

      We agree that Table 2 is supplementary, rather than critical information, and have moved it accordingly to the Supplementary Materials on Page 38. We do believe that Table 1 is central for understanding the extracted dataset.

      (12) Combine paragraphs 3 and 4 into a single paragraph. "Current guidelines..." and "T wave amplitude...". They are part of a single coherent concept.

      We have removed the paragraph break on Page 3 Paragraph 3.

      (13) Check all acronyms throughout the paper. The abbreviation for sudden cardiac death (SCD) is only used once in the same paragraph. Remove the acronym and type it out. T-wave amplitude (TWA) is introduced twice in a Figure caption and not introduced until the methods.

      Many thanks for this suggestion – we have reviewed all acronyms in the manuscript.

      (14) "Figure 1B showcases the capability of the computational pipeline to extract torso contours and reconstruct them into 3D meshes". Isn't this Figure 1A?

      We apologise that this was unclear, and have updated the sentence on the first full paragraph of Page 8 to clarify the purpose of Figure 1B.

      (15) No need to state: "Female y-axis limits have been adjusted by the difference in healthy QRS duration between sexes for ease of comparison" in the Figure 2 caption.

      We have removed this statement on all relevant captions.

      (16) The paragraph "For lead V6, 15.9% of healthy subjects..." can be combined with the previous section.

      We have removed this paragraph break on Page 9 to improve readability.

      (17) The only demographics I could find were age and BMI. State which demographics you used explicitly. This is especially true when the discussion makes claims like "Our findings suggest that corrected QRS duration taking into consideration demographics...". How did you take them into account?

      We accept that our previous description of the demographic adjustment to QRS duration in the discussion did not adequately reflect the comprehensiveness of our approach, and have adjusted the second paragraph on Page 14 to rectify this.

      (18) The results section is also almost a bulleted list that should be written and reformatted into paragraphs.

      The ordered style of our results section was designed to compare how our obtained data answers our research question differently for ECG intervals, amplitudes, and axis angles. Whilst we have adjusted paragraph breaks and moved methodological details to more appropriate sections, we have retained this stylistic choice.

      (19) The following sentence should be in the introduction: "Alterations to the polarity and amplitude of the T wave are used in the diagnosis of acute MI [42] and TWA affects proposed risk stratification tools, particularly markers of repolarization abnormalities [9, 43]."

      We thank the reviewer for this suggestion. We have included the discussion of how TWA is separately used in proposed risk stratification and current diagnostic tools in Paragraph 3 of Page 3.

    2. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. The study then deploys a linear regression model to relate the level of influence of various factors to ECG-based changes. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent linear regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      A major weakness is that a linear additive model may not adequately capture how anatomy and electrophysiology interact. Myocardial infarction dramatically alters both anatomy and electrophysiology in ways that are not easily separable and could be considered non-linear. As such, the electrophysiological factors in the model may still include factors that have an anatomical basis (i.e. the formation of scar) that were not accounted for during model generation. However, the technique remains useful for dissecting large factors beyond anatomy, as demonstrated in this study.

    3. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      • The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals). • The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances). • Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics. • Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      • The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

    4. eLife Assessment

      This important study combines electrocardiographic (ECG) and heart/torso anatomy data from subjects included in the UK Biobank to analyze sex-specific differences in relationships between those two characteristics. The study has several compelling strengths, including the development of an open-source pipeline for reconstruction and analysis of heart/torso geometry from a large cohort. Nevertheless, technical analysis of the data as presented is incomplete, specifically as it pertains to assessment of co-linearity between regressed parameters, interpretation of regression coefficients for sex and/or presence of myocardial infarction, and discussion of potential roles played by underlying electrophysiological derangements. With improvements to these aspects of the analysis, the paper would be of interest to the cardiovascular research community, especially those studying highly relevant health and treatment disparities arising from sex differences.

    5. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

    6. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

    1. eLife Assessment

      This fundamental work shows that a history of cocaine self-administration disrupts the orbitofrontal cortex's ability to encode similarities between distinct sensory stimuli that possess identical task information - hidden states. The evidence supporting these conclusions is compelling, with methods and analyses spanning self-administration, a novel 'figure 8' sequential odor task, recordings from 3,881 single units, and sophisticated firing analyses revealing complex orbitofrontal representations of task structure. These results will be of broad interest to psychologists, neuroscientists, and clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      - Elegant behavioral design that affords the detection of hidden-state representations.<br /> - Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      -The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments on revisions:

      The authors have thoroughly addressed all of my previous comments. Congratulations on an excellent paper!

    3. Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. The findings are important, and the results are compelling. The introduction and discussion contextualize the experiments in the context of the literature, and explain the novelty and significance of the current findings. Specifically, the observation that cocaine self-administration impairs generalization across task sequences at the single unit level builds on previous observations of aberrant neuronal activity within the OFC in animals with a history of cocaine self-administration. These new data point to a neurophysiological mechanism that could explain why drug-seeking is so context dependent, and hard to ameliorate with therapeutic strategies that take place within a clinical setting.

      The authors clearly acknowledge the major limitations of this work, namely that the sample size is restricted due to the technical challenges of performing in vivo electrophysiology recordings combined with self-administration, and that animals of only one sex were used. Importantly, the data from all rats within each group was remarkably homogeneous, increasing confidence in the conclusions drawn.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. eLife Assessment

      This valuable study re-evaluates a published simulation model on the role of heterozygote advantage in shaping MHC diversity. By modifying key modeling assumptions, the author argues that the original conclusions depend on a narrow and potentially unrealistic parameter range. While the work is in principle solid, the robustness of this claim is viewed differently by the reviewers. The manuscript further proposes an alternative modeling framework in which expansion of the MHC gene family allows homozygotes to outperform heterozygotes, thereby challenging the idea that heterozygote advantage alone can account for high allelic diversity at MHC loci. The topic is highly relevant for eco-immunology and evolutionary genetics, although a clearer delineation of the model's scope would help readers assess its broader implications.

    2. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published by Mattias Siljestam and Claus Rueffler conclusion (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand an even very slight change in ecological parameters. Second, a modified model that allows an expansion of MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to the readership of eLife if the conclusions are valid and non-trivial.

      The resubmitted manuscript addresses several questions from my previous review. In particular, there is a more detailed description of how the code of Siljestam and Rueffler ([SR]) was used for the simulations and the calculation of the factor 2.7 x 10^43 that is the key to the alleged breakdown of the numerical reasoning presented by in [SR].

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered. I guess the discussion becomes rather general about the universality and robustness of various types of models to parameter changes. My point is that none of the models is totally universal. The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response. The choice of constants and functions used in Eqs. (1-5) is dictated by the mathematical convenience and works in a limited range of parameter values. It is shown in [SR] that for 3 pathogens and reasonable "virulence " \nu, the alleles branch. These conclusions are supported by the analytically derived Adaptive Dynamics branching criteria (7), which, contrary to the statement is the cover letter (" It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.") is perfectly confirmed by the simulation data shown in Fig. 4.

      The mathematical simplicity of the [SR] model generates various artifacts, such as the mentioned by the Author reduction of the "condition" by an enormous factor 2.7 x 10^43 and the resulting decrease in the "survival" induced by the addition of a new pathogen. This occurs at the very large value of \nu=20, whose effect is enormous due to the Gaussian form of (1), which, once again, was chosen for the mathematical convenience. In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor c_max was introduced to buffer such an excess. There is no reason to fix c_max once for an arbitrary number of pathogens, because varying c_max basically reflects the observation that a well-adapted individual must have a reasonable survival probability. At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one, so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      I have doubts that the reported breakdown of the [SR] model with fixed c_max remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the population genetic underpinnings of the extraordinary diversity of genes in the MHC, which is widespread among jawed vertebrates. This topic has been widely discussed and studied, and several hypotheses have been suggested to explain this diversity. One of them is based on the idea that heterozygote genotypes have an advantage over homozygotes. While this hypothesis lost early on support, a reason study claimed that there is good support for this idea. The current study highlights an important aspect that allows us to see results presented in the earlier published paper in a different light, changing strongly the conclusions of the earlier study, i.e., there is no support for a heterozygote advantage. This is a very important contribution to the field. Furthermore, this new study presents an alternative hypothesis to explain the maintenance of MHC diversity, which is based on the idea that gene duplications can create diversity without heterozygosity being important. This is an interesting idea, but not entirely new.

      Strength:

      (1) A careful re-evaluation of a published model, questioning a major assumption made by a previous study.

      (2) A convincing reanalysis of a model that, in the light of the re-analysis-loses all support.

      (3) A convincing suggestion for an alternative hypothesis.

      Weakness:

      (1) The title of the study is catchy, but it is explained only in the very end of the paper.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered.

      I believe that I have fully addressed the points in the earlier review. The reviewer had doubted that my results were correct, attributing them to “a poor setup of the model” on my part. The reviewer stated that if I were correct about the factor of >10<sup>43</sup> change in cmax, this would “naturally break down all the estimates and conclusions made in Siljestam and Rueffler” (S&R).

      It appears that the reviewer is now convinced that my results represent a faithful analysis of the models on which S&R based their claims. The reviewer now contends that these results, including the factor of >10<sup>43</sup>, present no difficulties for the claims of S&R after all. In fact, this enormous factor of >10<sup>43</sup> is now claimed to support the conclusions of S&R by invalidating my conclusions. I respond to these new and very different arguments in what follows.

      As I stated in the first round of review, the issue is not the enormity of this factor per se, but the fact that the compensatory adjustment of cmax conceals the true effects of changes in other parameters. These effects are large; small changes to the parameter values mostly eliminate the diversity that the model is claimed to explain.

      The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response.

      The hidden sensitivity of the results of S&R to paramater values is sufficient to invalidate them as a proof of principle. The manuscript goes further and explains how the problem "is not specific to the details of the models of Siljestam and Rueffler, but is inherent in the phenomenon invoked to allow high diversity" because "any change that affects condition by as much as the difference between MHC heterozygotes and homozygotes will eliminate high equilibrium diversity". This general principle addresses all of the reviewer's points.

      In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor cmax was introduced to buffer such an excess. There is no reason to fix cmax once for an arbitrary number of pathogens, because varying cmax basically reflects the observation that a well-adapted individual must have a reasonable survival probability.

      This is not a legitimate reason for making compensatory, diversity-promoting adjustments to cmax when evaluating sensitivity to other parameters. If the number of pathogens or their virulence changes, cmax obviously does not automatically change along with it. If the population or species consequently goes extinct, then it goes extinct. If it persists, it does so with the same value of cmax.

      The possibility of extinction arguably puts a minimum value on cmax, but it does not restrict it to a range of values that conveniently leads to high MHC diversity. In the examples that I analyzed, slightly decreasing the number of pathogens or their virulence, which increases survivability, eliminates diversity. This phenomenon obviously cannot be dismissed on the grounds that survivability would be too low for the species to exist.

      S&R in effect assume that the condition of the most fit homozygote remains fixed, regardless of the number of pathogens, their virulence, and myriad other differences between species. It is this assumption that is without justification.

      At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one

      I am not sure what is meant by “the numerical simulation may break down”. Numerical error is not a tenable explanation of the lack of diversity observed in that simulation. The outcome is exactly what is expected from purely theoretical considerations: conditions of all genotypes fall on the steep part of the curve, making the mechanism proposed by S&R largely inoperative, so a pair of alleles forming a fit heterozygote comes to predominate. The numerical simulation is actually superfluous.

      Low survival rates are completely irrelevant to the effect of decreasing the number of pathogens or their virulence, which does not lower survival rates, but does eliminate diversity.

      so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      Whether or not it surprising, the lack of diversity is a problem for the claims of S&R, as there is no reason to expect the number of pathogens to have just the right value to produce high diversity. Furthermore, for many combinations of values of the other parameters (e.g., my v=19.5 and 20.5 examples), no number of pathogens leads to high diversity.

      Again, the general principle mentioned above makes the details that the reviewer refers to irrelevant. Nonetheless, some additional remarks are in order:

      (1) This comment ignores the fact that removal of a pathogen, or a slight decrease in “virulence”, eliminates diversity without lowering survival rates.

      (2) Small increases or decreases in v (virulence) eliminate diversity without having such large effects on condition.

      (3) In the example emphasized by the reviewer, mean survival rates are nowhere near as low as 10<sup>-43</sup>. Only homozygotes have such low fitness.

      (4) The adaptive dynamics predict the low diversity seen in the simulations, contrary to what the reviewer seems to suggest. Elimination of diversity is not an artifact of the simulation.

      (5) v\=20 was chosen because it is most favorable to the model of S&R in that it yields the highest diversity. Indeed, S&R only observed realistically high diversity with the narrow gaussians that the reviewer objects to. With lower values of v, diversity is much lower, but even this meager diversity is eliminated by small changes in parameter values (see below). If narrow gaussians and large effects of pathogens somehow invalidate results, then they invalidate the high-diversity results of S&R.

      I have doubts that the reported breakdown of the [SR] model with fixed cmax remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      These doubts are unwarrented. With the suggested parameter values, for example, increasing or decreasing m by 1 reduces the effective number of alleles to around 1 or 2. This can easily be checked using the simulation code of S&R, as detailed in my initial response and now in a Supplementary Text. Even without this result, the general principle mentioned above tells us that considering other regions of parameter space cannot rescue the conclusions of S&R.

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

      What is unsubstantiated is the claim of S&R that “For a large part of the parameter space, more than 100 and up to over 200 alleles can emerge and coexist”. As my manuscript illustrates, this is an illusion created by the adjustment of one parameter to compensate for changes in others.

      The reviewer even acknowledges that “the choice of constants and functions...works in a limited range of parameter values”. Furthermore, the manuscript explains why this problem is inherent to the general phenomenon, not specific to the details of the model or parameter values.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth. Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number. It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler. I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c</sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct. The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values. A simple way to determine this number is to have the simulation code print the value to which c</sub>max</sub> is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values. I have in this way confirmed this factor using the simulation code written and used by Siljestam and Rueffler. A procedure for doing so is described in the new Supplementary Text S1. In addition, I now give a theoretical derivation of this factor in Supplementary Text S2.

      This begs the conclusion that the branching remains robust to changes in cmax that span 4 decades as well.

      That shows at most that the results are not extremely sensitive to c</sub>max</sub> or K. They are, nonetheless, exquisitely sensitive to m and v. This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c</sub>max</sub>. It is evident from Fig. 4 of Siljestam and Rueffler that the level of diversity is not robust to these very large changes in c</sub>max</sub>, which include, as noted above, a change of over 43 orders of magnitude.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v\=20. As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v. This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions. Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      ...the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable. I have addressed the reasons for this suggestion above. Furthermore, I have confirmed the main conclusion—the extreme sensitivity of the results of Siljestam and Rueffler to parameter values--using the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”. I now describe, in Supplementary Text S1, how anybody can verify my conclusions in this way.

      Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem. However, as I understand it, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c</sub>max</sub>. Rather, they describe the adjustment of c</sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”. Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>). In this sense there is no loss of generality, but the automatic adjustment of c</sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I have expanded the end of the Discussion in the hope of clarifying the point expressed by the title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest to the author that they provide essential details about their simulations that would justify their claims, and to communicate with Mattias Siljestam and Claus Rueffler whether claims of the lack of robustness could be confirmed.

      The models simulated were modified versions of those of Siljestam and Rueffler. Thus, only the modifications were described in my manuscript. I have added a more detailed description of how c</sub>max</sub> was set in the simulations concerned with sensitivity to parameter values. In addition, the new Supplementary Text S1, which describes confirmation of the lack of robustness using the code of Siljestam and Rueffler, should remove any doubt about this conclusion.

      Reviewer #2 (Recommendations for the authors):

      I have no further recommendations. The manuscript is well written and clear.

      Thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Since this is a full report and not just a letter to the editor, it would benefit from a bit more introduction of what the MHC actually is and what the current understanding of its evolution is. Currently, it assumes a lot of knowledge about these genes that might not be available to every reader of eLife.

      I have added some more information to the opening paragraph. I would also note that this report was submitted as a “Research Advance”, which may only need “minimal introductory material”.

      (2) Some more recent literature on MHC evolution should be added, e.g., the review by Radwan et al. 2020 TiG, a concrete case of MHC heterozygote advantage by Arora et al. 2020 MolBiolEvol, and a simulation of MHC CNV evolution by Bentkowski et al. 2019 PLOSCompBiol.

      I have cited some additional literature.

      (3) Since much of the criticism hinges on the cmax parameter, its biological meaning or role (or the lack thereof) could be discussed more.

      I am not sure what I can add to what is in the first paragraph of the Discussion.

      (4) I find it difficult to grasp how the v parameter, which is intended to define pathogen virulence, if I understand it correctly, can be used to amend the breadth of peptide presentation. Maybe this could be illustrated better.

      I have attempted to make this clearer. The parameter v actually controls the breadth of peptide detection conferred by an allele, which, if not identical to the breath of presentation, is certainly affected by it. The basis of the “virulence” interpretation seems to be that narrower detection breadth can, according to the model, only decrease peptide detection probability, which increases the damage done by pathogens.

      (5) Please check sentences in lines 279ff on peptide detection and cost of . There seem to be words missing.

      There was an extraneous word, which I have removed. Thank you for pointing this out.

    1. eLife Assessment

      This valuable study provides evidence that locus coeruleus activity is coordinated with heart rate during sleep, confirming previous work in mice and humans, with a possible role for sleep-dependent memory consolidation. The claims are supported by solid evidence, although the underlying mechanisms and the predictive value of the correlative dataset would benefit from additional controls. This work will be of interest to neuroscientists focusing on sleep, memory, and autonomic functions.

    2. Reviewer #1 (Public review):

      Summary:

      This study examined whether infraslow fluctuations in noradrenaline and in heart rate are coupled and how they are affected by sleep transitions. The authors used the fluorescent NA biosensor GRAB-NE2m in the medial prefrontal cortex of mice to record extracellular NA while also recording EEG and EMG during sleep-wake episodes. They also analyzed previously published human data to reproduce relationships they found between sigma power and RR intervals in mice.

      Strengths:

      This is an impressive study with significant strengths, as it involves a rich set of data that includes not only observations of associations between heart rate and noradrenergic dynamics but also optogenetic manipulation of the locus coeruleus. Human data is presented to show parallels in the association between sigma power during sleep and phasic heart-rate bursts.

      Weaknesses:

      (1) Language could be clearer and more precise. As detailed below, in both the introduction and the discussion, the way the hypotheses and study objectives are described could use some revision to be more precise and accurate.

      1A) In the introduction on p. 4: The overarching question is framed as "could the peripheral autonomous systems be a read-out of the central LC-NE system and thus be a biomarker of memory consolidation and LC dysfunction?" This gives the impression that the LC function would be the main influence on peripheral autonomous systems. There are, of course, many influences on peripheral autonomous systems, so it would be advisable for the authors to be more specific here about what signal(s) in particular would be predicted to be sensitive markers of LC function.

      1B) In the discussion on p. 12: "In this study, we leveraged real-time measurements of mPFC NE levels and HR measurements from EMG recordings in mice to investigate the causal link between the two variables with high temporal resolution in freely moving sleeping mice, with similar inspection in humans." To test the causal link between mPFC NA levels and HR measures, the study would manipulate NA levels just in the mPFC and not elsewhere in the brain. However, in this study, the manipulation occurred in the LC, and so there would be broad cortical changes in NA levels. Thus, it could be that LC activity causes HR changes via a non-PFC pathway.

      (2) Comparisons with the control condition need further development.

      2A) While the authors did include a key YFP control condition, in the main text no direct statistical comparison between the closed-loop optogenetic stimulation (ChR2) condition and the YFP control condition was reported. (It was reported in Supplementary Figure 2c-d.) Instead, in the main text, the authors only reported that the effects of stimulation were significant in the closed-loop condition and not in the control. However, that is not the same as demonstrating that the two conditions significantly differed from each other, and it is the direct test that is important for the conclusions, so it seems important to include this result in the main presentation.

      2B) In addition, the authors should address the issue that the pre-stimulation NE was consistently significantly lower in the YFP condition than in the ChR2 condition (see Supplementary Figure 2c), which is a potential confound.

      2C) Direct comparison of the strengths of correlations shown in Figure 2h vs. Supplementary Figure 2f should be included. Currently, we see relatively weak correlations in both ChR2 and YFP conditions, and it is not clear if the relationships differ in the control. It seems they are still present in the control condition but weaker, which would contradict the apparently broad claim on p. 7 that "No such effects were present in the control condition" (it is not entirely clear whether this claim refers to all effects discussed in the figure or just a subset - this language should be clarified).

      2D) Did the YFP controls vs. ChR2 animals show any differences in the number of NA states that triggered stimulation in the closed-loop system? With ChR2 animals, stimulation changes NA, which could change future triggering. In YFP animals, nothing changes NA (other than natural fluctuations), so the dynamics of stimulation timing could diverge between groups in a way that complicates interpretation. Specifically, if ChR2 stimulation raises NA and prevents future threshold crossings, ChR2 animals may end up receiving fewer subsequent stimulations than YFP animals (or a different temporal clustering). If the number or pattern of stimulation differed in two groups, it would be important to have a yoked control where matched animals get the same stimulation pattern but not triggered by their own NA.

      (3) Some more discussion/explanation of the rationale for the closed-loop approach and how it influences how we should interpret the results could be useful. For instance, currently, it is not clear whether LC stimulation needs to be timed after an NA dip to yield the effects seen.

      (4) The section on heart rate decelerations is hard to follow. In particular, I was not sure how to interpret Figure 3f-j. For Figure 3f, what does the middle line represent? The laser onset or the max RR value after laser onset? What is the baseline that is used to correct the values to obtain amplitudes? If it is the whole period before the maximal RR value or the laser onset, wouldn't baseline values differ significantly across conditions and so potentially account for differences seen between conditions in the reported HR decelerations? Larger HR decelerations may be seen in conditions with higher HR simply as a regression to the mean phenomenon.

      (5) The findings regarding LC suppression could be further clarified.

      5A) Page 8: "observed a response in NE decline" - please be more precise. Did NE decline more or less?

      5B) It would be helpful to also show the correlation between NE and RR in the control (YFP) condition and whether there were any differences between YFP and Arch conditions (Figure 4e).

      5C) This sentence took me multiple readings to understand - it would be helpful to rewrite to make it clearer: "indicating that, while HR generally did not respond strongly to LC suppression, the variability in RR responses was dependent on NE changes to the suppression (Figure 4e)."

      5D) The two colors in Figure 4 are similar and hard to distinguish.

      5E) The correlations shown in Figure 4j seem to be driven by just two of the cases. Are the effects significant when outliers are removed?

      5D) Page 10: Were there any differences in memory performance between the Arch and YFP conditions?

      5E) Page 10: "We found a correlation between RR responses to LC suppression and sigma power, suggesting that a stronger HR reduction response is linked to higher spindle power." It should be noted in the text that the correlation was not specific to sigma (it was also seen for theta and beta, Figure 4i).

      (6) It is not clear which of the sigma power and RR interval findings do/do not exactly line up between the mice and humans. It could be helpful to have a table comparing them. For instance, was the finding in humans that pre-HRB sigma power was positively associated with slowing in heart rate after the HRB also seen in mice? Was there evidence in mice (as seen in the human sample) that sleep-dependent memory improvement was associated with pre-HRB sigma power?

      (7) Page 18: It is not clear if the sex of mice was balanced across controls and optogenetics groups.

    3. Reviewer #2 (Public review):

      Summary:

      The major part of this study reproduces previously published findings in both mice and humans and provides incremental analyses on these findings. In essence, the work reaffirms the presence of coordinated infraslow fluctuations in sigma power and heart rate during NREM sleep. It further confirms previous findings that coordination depends on noradrenaline-releasing neurons in the locus coeruleus. Also supporting previously published work in mice and humans, the authors describe a link between the strength of these infraslow fluctuations and memory consolidation in mice and humans.

      Strengths:

      The authors successfully replicate key previously reported phenomena across both mice and humans. Confirmatory studies and demonstrations of reproducibility are essential for progress in neuroscience. To maximize their value, such studies should clearly acknowledge their confirmatory nature and carefully situate what, in their view, are novel results, going beyond existing literature.

      Weaknesses:

      The authors' interpretation of their data needs to be revised. Many of their claims regarding the mechanistic basis of their findings and the predictive value of their correlative datasets are not supported by the available evidence.

      In the present manuscript, several citations of literature on the work they reproduce lack precision or completeness, which reduces transparency and obscures how the reported findings relate to previously established results.

    4. Author response:

      Response to reviewer 1:

      We thank reviewer 1 for their thoughtful, detailed, and constructive evaluation of our manuscript. We appreciate their recognition of the strengths of the study, particularly the integration of noradrenergic recordings, optogenetic manipulation, and cross-species analyses. We are especially grateful for the reviewer’s careful attention to clarity, experimental interpretation, and control comparisons. The comments have helped us sharpen the framing of our hypotheses, clarify causal claims, improve statistical reporting, and better explain our closed-loop approach and heart rate analyses. We have addressed each point in detail below and believe that the revisions substantially strengthen the manuscript.

      Response reviewer 2:

      We thank reviewer 2 for their thoughtful comment regarding citation, positioning relative to prior work, and caution in mechanistic interpretation. We have made efforts to cite relevant foundational and related work throughout the manuscript, but we will of course further clarify the relationship between our findings and prior studies in the revision.

      Although prior work has demonstrated infraslow coupling between sigma activity and heart rate and established a role for the locus coeruleus (LC) in coordinating these oscillations, cardiac measures have typically been presented as secondary observations rather than as primary experimental targets. While we of course recognize all the prior efforts conducted, a central goal of the present study was to perform a targeted and highly systematic characterization of norepinephrine-mediated heart-rate dynamics during sleep, integrating infraslow relationships, sleep-wake transitions, and a range of physiological manipulations of LC activity. A major priority of ours was to link infraslow heart-rate fluctuations to the well-known very-low-frequency (VLF) component of heart rate variability (HRV). Within the clinical HRV field, VLF has remained comparatively under-characterized and mechanistically unresolved. Our findings provide a biologically grounded explanation for this component, which we believe may be informative for the broader HRV community.

      Second, a core aim of this work is to provide a translational tool: to determine whether cardiac dynamics alone can reflect the infraslow, memory-consolidating potential of sleep and thus serve as a non-invasive biomarker. Because direct LC recordings are not feasible in humans, HRV, including its VLF component, may offer a clinically accessible proxy of sleep’s memory-restorative capacity. By directly manipulating LC activity and demonstrating corresponding changes in heart-rate dynamics, we strengthen the mechanistic and translational rationale of this biomarker approach. Our findings suggest that heart-rate measures alone may provide an estimate of the infraslow memory-consolidating potential of sleep.<br /> In revision, we will ensure that the foundational findings underlying this manuscript are highlighted, while communicating our new findings more clearly.

    1. eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

    2. Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      (2) For some methods, more detailed information is needed.

      (3) There are grammar issues in the text that need to be fixed.

      (2) Some text in the figures is not labeled well.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

    4. Author response:

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We will carefully consider each comment from these two reviewers and will revise the manuscript accordingly. Below, we provide a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles will be crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We will expand these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we will carefully check the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We will add labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we will supplement the additional immunological experiments or validation using in vivo models. In addition, we will elaborate on the limitations of our study in the Discussion section and provide reasonable explanations.

    1. eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterisation. Thus, the strength of the evidence can be enhanced by the use of a mouse model. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

    2. Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be be improved in a revision.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

    4. Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serineprotease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

    5. Author response:

      eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as snake venom metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterisation. Thus, the strength of the evidence can be enhanced by the use of a mouse model. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the Neotropics.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address. Our work in this manuscript included the standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript, we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated (Line 434 to 448) that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a ‘definitive demonstration of a broadly effective, deployable intervention’, we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revision.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

    1. eLife Assessment

      This useful study supplements previous publications of willed attention by addressing a frontoparietal network that supports internal goal generation. The evidence is solid in analyzing two datasets collected at different independent sites, using the same willed-attention paradigm and combining fMRI and EEG. This work will interest cognitive psychologists and neuroscience researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses a fundamental question in cognitive neuroscience regarding how the brain transitions from a reactive state of following external instructions to a proactive state of self-directed agency. The authors utilize an ambitious multimodal design by combining the spatial precision of fMRI with the temporal resolution of EEG across two independent datasets from the University of Florida and UC Davis. By applying multivariate pattern analysis, the work demonstrates that while both instructed and willed attention engage the Dorsal Attention Network, willed choices uniquely recruit a frontoparietal decision network including the dACC and anterior insula. Furthermore, the study shows that pre cue alpha oscillations can predict subsequent spontaneous choices. This provides a neural link between pre-existing brain states and intentional action, representing a significant technical effort to characterize the neural scaffolding of internal goal generation.

      Strengths:

      The primary strengths of this work include the integration of fMRI and EEG which allows the authors to bridge the gap between slow metabolic signals and fast oscillatory brain states. The use of two independent cohorts is a commendable effort to ensure the reproducibility of the willed attention effect, which is often a concern in small sample neuroimaging studies. Additionally, the move beyond univariate activation toward information based mapping demonstrates that the identified networks actually contain specific information about the direction of attention.

      Weaknesses:

      However, several critical weaknesses must be addressed to support the fundamental claims made in the manuscript. There are significant behavioral differences in performance between the two sites, specifically regarding the UC Davis cohort exhibiting slower reaction times and lower accuracy compared to the UF group. These discrepancies suggest potential differences in subject populations or experimental environments that are not currently accounted for in the neural models. The fMRI analysis lacks temporal precision because the use of beta series regression collapses the complex BOLD response into a single estimate per trial. This loss of temporal information obscures the evolution of the decision process and makes it difficult to distinguish whether the identified patterns represent a truly spontaneous choice or a slow building pre planned strategy.

      Furthermore, the EEG decoding approach utilized the entire topography of electrodes rather than a biologically motivated posterior region of interest. Given that alpha mediated spatial attention is traditionally localized to parieto occipital sensors, using the full electrode set risks the inclusion of non neural artifacts such as micro saccades or muscle activity which can contaminate multivariate classifiers. The introduction of the neural efficiency metric also requires further validation as the current ratio is mathematically sensitive to small denominators in the BOLD contrast.

      Crucially, the manuscript does not address the physiological implications of recruiting additional frontoparietal networks when behavioral performance remains identical across conditions. The activation of the anterior insula and dACC is frequently associated with increased autonomic arousal and effort. If the willed condition requires more extensive neural scaffolding to reach the same behavioral output as the instructed condition, it raises the question of whether this internal decision process is accompanied by changes in arousal levels. The authors should consider whether the lack of a behavioral tax is due to a compensatory increase in arousal, which could be reflected in the EEG data or pupil diameter if recorded, and potentially also in the amplitude of BOLD activity, which is being masked by the neural efficiency metric. Without an account of how the brain balances this increased computational demand without impacting behavioral performance, the functional significance of the willed attention network remains partially obscured.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript combines fMRI and EEG investigations performed at two research sites to examine 'willed' or volitional visuospatial attention, as contrasted with more standard cued (or 'instructed') visuospatial attention. The primary findings are: 1) willed attention (vs. instructed attention) drives additional cortical circuitry across a broad fronto-parietal network; 2) the direction of willed attention, but not instructed attention can be decoded from the pre-cue EEG data and from MVPA analysis of the trial-level fMRI data; and 3) the subjects with high EEG decoding also exhibited high neural efficiency (i.e., high decoding with low BOLD signal change) in the fMRI data. The methods and data analysis are generally sound, and these results appear solid. On the negative side, it is not made clear how the present findings extend our understanding beyond prior published work from one of the senior authors. There are also three significant concerns regarding interpretation of the findings. One has to do with the causal interpretation of the pre-cue alpha EEG signal determining the direction of willed attention. The second concern is the degree to which the present research paradigm adequately examines 'willed attention.' The third is that the MVPA analysis is not sufficiently described, and Permutation testing needs to be done to validate these findings. Otherwise, this manuscript appears methodologically sound, but questions about interpretation may mute the potential impact.

      Strengths:

      The focus on willed attention attempts to move beyond some of the many limitations of standard laboratory investigations of attention.

      The shared paradigm across two modalities and two research sites demonstrates solid reproducibility, even though a few minor differences are observed across sites.

      Weaknesses:

      (1) There are concerns about this experimental paradigm carrying the banner of Willed Attention, because the application of 'Will' appears quite modest. Yes, extra brain activity is exhibited for this condition vs. its control, but do the cognitive processes isolated adequately stand in for 'Willed Attention?" Willed attention, as operationally defined here, appears to involve a simple decision process prior to the shifting of spatial attention. The cue is internally generated, but after that the rest of the attentional processes appear identical to standard externally cued visuospatial attention experiments. This self-generated cue process likely involves some sort of memory/history of the recently selected cues and then some random-ish selection between A and B. This appears very similar to asking the subject to guess whether a fair coin flip will be heads or tails on each trial. A mental 'coin flip' feels like a very weak version of 'will.' As a potential remedy, it would be helpful to discuss what other phenomena might fall within 'willed attention' and what some future studies might choose to focus on, along with some potential pitfalls (e.g., the reasons why the current study avoided more robust exemplars of will).

      (2) The manuscript is lacking a description of the decision processes used during the willed attention paradigm and is lacking evidence as to WHEN subjects made their willed decision. Both of these points are of major concern:

      (a) The authors state: "For willed attention, participants were explicitly told to avoid relying on any stereotypical strategies of generating decisions, such as always attending the same/opposite side they attended during the previous trial, as well as to avoid randomizing or equalizing their decisions to choose left or right across trials; prior studies found that decisions to explicitly randomize decisions might invoke additional working memory related processes (Spence & Frith, 1999)." Subjects were instructed NOT to apply a simple heuristic and NOT to randomize or try to equalize their decisions, but exactly HOW the subjects made their decisions is not at all clear. What options does that leave? How does this strategy avoid the working memory-related processes mentioned in the Spence & Frith, 1999 citation? The brain regions that comprise the network of interest (aka Frontoparietal Decision Network) are activated by a very broad range of visual cognitive tasks, including many working memory paradigms. The Anterior Insula and dACC nodes Salience Network often simply reflect task difficulty. Obviously, making a choice is more cognitively demanding than not making a choice. The present experiments do not distinguish functional roles between different regions of the Frontoparietal Decision Network. On the whole, the study does very little to isolate the cognitive processes or neural bases of willed attention beyond calling out the set of 'Usual Suspects' for visual cognition.

      (b) The finding that pre-cue EEG signals predicted the postcue decision is intriguing. It could mean that the seemingly irrelevant and transient state of the brain causally and unconsciously biased the subject to one direction or the other. Alternatively, it could mean that the subjects utilized the pre-cue period to make their decision and hold it in case it was needed (i.e., that it was a choice trial). While 2-8 seconds ITI variability makes sense for fMRI decoding, it is a long time for a subject to idly wait, so they might fill that time preparing for the next trial. There appears to have been a substantial amount of individual difference in the pre-cue alpha decoding, which could reflect individual differences in cognitive strategy, specifically in the use of the pre-cue period to make their decision. More efficient decision makers might have pre-decided, which might account for the neural efficiency. The experiments lack any measurement of WHEN participants made their decision. For that reason, I would ask that the authors temper their claims about the significance of the alpha decoding and its possible causality.

      (3) Did individual subjects exhibit a choice bias of location for the willed trials? If not, doesn't that raise concerns that subjects were trying to equalize their trials? If they do exhibit location biases, how does that impact the decoding? A simple decoder could learn to always just guess the biased direction for a subject and would perform > 50%. Consider the example in which an individual subject chooses 'Left' 55% of the time. A classifier that simply learns to choose 'Left' on every trial will be correct on 55% of trials. The training data would likely be sufficient to learn the direction of choice bias in each individual subject. So the classifiers could perform significantly above 50% without learning anything beyond the tendency of each subject. That is to say, 50% is not truly chance in this data set. It doesn't appear that Permutation testing has been performed to empirically determine chance for an individual's data. Permutation methods, scrambling the labels 1000 or 10000 times to establish a true baseline would be preferred over simply comparing to 50% and would address concerns about individual subject biases.

      (4) The novel contributions of this work beyond the two prior Bengson et al papers from Dr. Mangun's lab appear quite modest. The discussion would be enhanced by specifically stating how the present work advances understanding beyond the prior Bengson studies.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript analyzes two independent datasets collected at different sites. Using the same willed-attention paradigm (instructional vs. choice cues) and combining fMRI and EEG analyses, the authors investigate how attentional direction is selected when no external instruction is provided. Their main claims are that the dorsal attention network is engaged by both cue types, whereas the choice cue additionally involves a frontoparietal decision network. Moreover, left-versus-right attentional decisions can be decoded in this decision network only on choice trials, and multichannel pre-stimulus alpha patterns predict the subsequent attentional choice. Finally, individuals with more predictive alpha patterns show greater neural efficiency in the decision network, i.e., higher decoding with lower BOLD activation.

      The question is worthwhile and the two-site design is a genuine strength. At the same time, several central inferences rely on decoding analyses for which the statistical testing and cross-validation structure are not described in enough detail to assess robustness. In addition, using a ratio-based neural-efficiency measure make the interpretation more fragile than it needs to be. With a focused revision that tightens inference around MVPA and clarifies a few methodological points, I think the paper could become substantially more convincing.

      Strengths:

      The work extends previous willed attention studies by attempting to link pre-stimulus alpha pattern predictability to post-cue frontoparietal representations, and by testing reproducibility across two datasets. The conceptual advance beyond previous studies, e.g., Bengson et al. (2015), however, depends on how solid the decoding-based evidence is and whether alternative explanations are convincingly excluded. At present, the strength of support is limited mainly by incomplete reporting and/or controls for MVPA significance testing, as well as potential inflation of decoding estimates if folds are not independent of run structure. Concerns about statistical assessment of decoding accuracy are well documented in the literature (Combrisson & Jerbi, 2015).

      Weaknesses:

      (1) The manuscript describes the decoding pipeline for both fMRI and EEG MVPA. However, it does not clearly specify how "significantly above chance" is determined for the fMRI ROI decoding, nor how multiple comparisons across ROIs are handled, even though p-values are reported. The same issue applies to the time-resolved EEG analysis across many time points. For each decoding analysis, please specify the inferential test (e.g., permutation test within participant, group-level test on subject accuracies, binomial test, etc.) and report effect sizes with confidence intervals (e.g., Combrisson & Jerbi, 2015). Further, for EEG decoding over time, it would be preferable to control family-wise error, e.g., cluster-based permutation, rather than thresholding pointwise p-values. A standard approach here is the nonparametric cluster framework (e.g., Maris & Oostenveld, 2007).

      (2) The cross-validation approach used here is appreciated and appropriate in principle. However, random 10-fold splits across trials can inflate accuracy if training and test folds share run-specific noise, scanner drift, or autocorrelated structure. The manuscript should indicate whether folds were blocked by run or randomized across the entire session. In addition, please report the number of trials per condition after artifact rejection and after removing short ITIs for the long prestimulus epochs (−2500 ms to 0 ms) for each dataset in the section of EEG preprocessing. Similarly, please report how often participants chose left vs. right on choice trials, and whether balanced folds (or an equivalent balancing procedure) were used if needed.

      (3) Moreover, ROI definition is not sufficiently specified and independence should be clarified. The ROIs are defined based on peaks from the choice-instructed univariate contrast (Table 2) and then used for MVPA. First, are these ROIs defined as spheres around peaks or using anatomical masks? What radius or voxel count was used? This needs to be explicit. Second, I am concerned about circularity risk. Although choice-vs-instructed selection is not identical to left-vs-right decoding, ROI selection from the same dataset can still bias descriptive estimates and encourages overinterpretation if not carefully justified (Kriegeskorte et al., 2009). At minimum, the authors should explain why their selection criterion is independent of the decoded contrast under the null, and ideally provide a robustness check using either anatomical ROIs or independently defined ROIs, e.g., from prior literature or an atlas.

      (4) Using an index of neural efficiency is conceptually interesting. However, if the denominator, computed as the activation difference between choice and instructional conditions, is near zero or noisy, the ratio can become unstable. I would rather see a multivariate model that treats activation and decoding as separate dependent measures, or a latent-variable approach, than a single ratio.

    1. eLife Assessment

      This study presents a valuable examination of two measurements of physical activity (self-report and objective) in relation to widely studied structural MRI measures of the brain (hippocampal volume and BrainAGE) and cognitive function (Trail Making Test). Cross-sectional and longitudinal data were analyzed using established and validated methodology. The results convincingly suggest that brain health is more likely a cause of physical activity than an outcome of it, although limitations to the data could mask evidence of benefits to brain health. This work will be of interest to neurologists and epidemiologists studying the etiology of cognitive decline, to clinicians interested in advising patients on strategies for preserving brain health in aging, and to members of the lay public.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the relationship between physical activity (PA) and both structural (MRI) and cognitive brain health in the LIFE-Adult Study, with total baseline recruitment of 2576. Hippocampal volume, an MRI-derived BrainAGE marker, and scores from the Trail Making Test were used as outcomes, with the majority of participants measured at baseline and subsets also measured in a follow-up session. The key findings were a lack of direct association between PA and outcomes, but longitudinal evidence for a higher BrainAge at baseline leading to lower physical capacity at follow-up. This supports a reverse-causation hypothesis in contrast to the prevailing understanding of the positive effects of physical activity on brain health.

      Strengths:

      The Life-Adult study is a rich and carefully acquired dataset, with multiple follow-up time points. The statistical analyses were conducted carefully with appropriate control for confounds and multiple testing. The study design enables an important assessment for reverse causality. The authors are scrupulous in their consideration of a number of factors that could potentially bias their results, performing an age-stratified analysis, and emphasising discrepancies in PA measurements (specifically, age-reporting bias) across the dataset and other limitations.

      Weaknesses:

      This is an observational study with inconsistent measures of physical activity. Previous studies have used physical activity interventions, and might be more strongly weighted when considering evidence for these effects (specific confounders involved in interventions notwithstanding).

      The model identifying potential reverse causality is relatively limited - it seems possible/likely that brainAge could reflect more general health status, which would expand the potential range of factors underlying this observation.

      The important quantitative actigraphy subset is small (n=227), as are the longitudinal subsets. Along with the discrepancy of physical activity/capacity at baseline and follow-up, and other complexities of the dataset, it is difficult to make firm conclusions. The authors point out that the actigraphy subset was quite inactive.

    3. Reviewer #2 (Public review):

      Summary:

      This population-based cohort study found no evidence that physical activity, whether self-reported or objectively measured, positively influenced brain structure (hippocampal volume or BrainAGE) or cognitive function (Trail Making Test scores). Notably, longitudinal analyses suggested the opposite temporal relationship: a higher BrainAGE at baseline predicted higher physical capacity at follow-up, more in line with reverse causation rather than a neuroprotective effect of physical activity.

      Strengths:

      The study's statistical approach is thorough and well-documented, and the inclusion of two measurements of physical activity (self-report questionnaire and objective accelerometer data) is a strength. The longitudinal aspect also represents a strength.

      Weaknesses:

      Several aspects of the measurement timing warrant consideration. Physical activity was assessed over 7-day periods, creating a potential mismatch with (commonly less dynamic) brain outcomes examined (hippocampal volume, BrainAGE), which may reflect cumulative exposures over longer timescales. Additionally, the asynchronous measurement protocol (cognitive testing preceding accelerometry, and the MRI occurring weeks after baseline visits) may introduce time lags that attenuate associations. The observed null associations may be influenced by timing misalignment rather than reflecting the absence of consistent effects of physical activity on brain health and cognition.

      Other measurement characteristics also warrant consideration when interpreting the null findings. Physical activity was assessed using short-form self-report questionnaires and averaged accelerometer MET/day values, both of which have limited reliability. Additionally, the modest accelerometer subsample size and low/insufficient variation in activity levels observed in this cohort increase the likelihood of missing effects. These factors collectively raise the possibility that true physical activity-brain health associations may have been obscured.

      The study's conclusions regarding brain health, structure, and cognitive functioning are broad despite the scope of the selection of outcomes examined. The analyses focus on hippocampal volume, BrainAGE (a global aging metric), and Trail Making Test performance (processing speed and executive function), while omitting other important neuroimaging markers such as cortical thickness, functional connectivity, or white matter microstructure. The null findings presented here cannot exclude positive effects of physical activity on broader constructs of brain health or cognitive functioning.

      While the authors appropriately note the use of different physical activity instruments across time points (IPAQ at baseline, VSAQ at follow-up) in the limitations section, the discussion should more explicitly address the interpretive challenges this creates. The observed association between higher baseline brain age gap and lower follow-up physical activity may reflect: (1) a true temporal relationship, (2) an artifact of switching from behavior-focused (IPAQ) to capacity-focused (VSAQ) measurement, or (3) some combination of both. This ambiguity substantially limits causal inference.

    1. eLife Assessment

      This study provides valuable contributions to establish canonical Dhh signaling as a primary mediator in the differentiation of Leydig cells and their steroidogenic capacity. Together, the experimental design using their established stem Leydig cell line alongside relevant genetically mutated models, both derived using the relevant Nile tilapia animal system, provided largely convincing evidence to support their conclusions. The work could benefit from a more rigorous dissection of current literature on this pathway that might better inform their conclusions. The work will be of broad interest to developmental biologists interested in differentiation of steroidogenic or hormone producing cells.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.'

      Strengths of the methods and results:

      - The use of Nile tilapia is important as it is an important aquaculture species, it shares the genetic pathway for sex determination of mammalian species, and molecular differentiation pathways are highly conserved<br /> - The approach is rigorous and incorporates a novel TSL, clonal stem Leydig cell model that they developed that is relatively faithful in following endogenous developmental steps and can produce the appropriate steroid.<br /> - Tilapia are relatively amenable to CRISPR/Cas9 targeting and, with their accelerated developmental time frame, provide an excellent model system to interrogate specific signaling pathways.<br /> - The stepwise analysis from dhh-gli-sf1 is thoughtful and well done.

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.<br /> - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.<br /> - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.<br /> - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      Achieved Aims:

      The authors set out to test the hypothesis that the canonical Dhh signaling pathway for Leydig cell differentiation and steroidogenic activity is mediated via ptch2 and gli1 regulation of sf1. The results are strong, there are additional steps needed to verify that redundancy/compensation is not contributing to the outcomes.

      This work is important in better understanding of nuanced commonalities and differences in developmental pathways across species. Specific to Leydig cell differentiation and steroidogenesis, their work with tilapia supports conservation of the canonical Dhh pathway; however, there appear to be some differences in downstream mediators compared to mouse. Specifically, they conclude that ptch2/gli1 stimulates sf1 and steroidogenesis in tilapia where gli1 is dispensable in mouse. Instead, Gli3 has recently been shown to play an important role to stimulate Sf1 and support the hedgehog pathway.

    3. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. eLife Assessment

      This paper presents a computational method to infer from data a key feature of affinity maturation: the relationship between the affinity of B-cell receptors and their fitness. The approach, which is based on a simple population dynamics model but inferred using AI-powered Simulation-Based Inference, is novel and valuable. It exploits recently published data on replay experiments of affinity maturation. While the method is well-argued and the validation solid, the potential impact of the study is hindered by its complex presentation, which makes it hard to assess its claims reliably.

    2. Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail. This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference. In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question.

      (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model.

      (3) Code and data are publicly available and well-documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

    4. eLife Assessment

      This paper presents a computational method to infer from data a key feature of affinity maturation: the relationship between the affinity of B-cell receptors and their fitness. The approach, which is based on a simple population dynamics model but inferred using AI-powered Simulation-Based Inference, is novel and valuable. It exploits recently published data on replay experiments of affinity maturation. The method is well argued and presented, and the validation is compelling.

    5. Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference.

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      The revised methods section is easier to follow and better explains the approach. Inference results on simulated data are compelling and the real-data findings are compared with alternative approaches, clarifying the relationship to previous work.

    6. Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      • The authors leverage the unique data they have generated for a separate project to provide novel insights to a fundamental question.
      • The paper is clearly written, with accessible methods and straightforward discussion of the limits of this model.
      • Code and data are publicly available and well-documented.

      Weaknesses:

      • No substantial weaknesses noted.
    7. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. eLife Assessment

      This valuable manuscript provides solid evidence regarding the role of alpha oscillations in sensory gain control. The authors use an attention-cuing task in an initial EEG study followed by a separate MEG replication study to demonstrate that whilst (occipital) alpha oscillations are increased when anticipating an auditory target, so is visual responsiveness as assessed with frequency tagging. The findings offer a re-interpretation of the inhibitory role of the alpha rhythm, supporting that alpha oscillations contribute to interareal communication.

    2. Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results imply that alpha modulation does not solely regulate 'gain control' in early visual areas (also referred to as alpha inhibition hypothesis), but rather orchestrates signal transmission to later stages of the processing stream.

      Comments on the first revision:

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated."

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)."

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      Comments on the latest version:

      I am satisfied with the author's response and do not have any additional comments.

    3. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. eLife assessment

      The ability to estimate the force of infection for Plasmodium falciparum from other more directly measurable epidemiological quantities is a useful contribution to malaria epidemiology. The authors propose a method to accomplish this using genetic data from the var genes of the Pf genome and novel applications of existing methods from queueing theory. While the simulations are sophisticated, the real-world application of the method is incomplete in its analysis and would benefit from clearer articulation of the assumptions being made. Given the lack of clarity in the methods and presentation of results, it is difficult to fully assess the performance of their proposed estimation procedure.

    2. Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context.

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics.

      (3) The mathematical approach is simple and elegant, and thus easy to understand.

      Weaknesses:

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates.

      (2 )Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals.

    4. Reviewer #3 (Public Review):

      Summary:

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI.

      Strengths:

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics.

      Weaknesses:

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone).

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. eLife Assessment

      The ability to estimate the force of infection for Plasmodium falciparum from other more directly measurable epidemiological quantities would contribute to malaria epidemiology. The authors propose a method to accomplish this using genetic data from the var genes of the Pf genome and novel applications of existing methods from queueing theory. After revising the manuscript, this is a useful contribution to the field and the authors provide solid evidence to support it.

    2. Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weaknesses:

      - The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      Comments on revisions:

      The authors have adequately responded to all comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. eLife Assessment

      This study provides valuable insights through the elucidation of the first full-length structure of the heterohexameric (MmpS4)₃-(MmpL4)₃ transporter complex from Mycobacterium tuberculosis, advancing understanding of its transport mechanism, linked to virulence and drug resistance. The structural analysis is convincing, offering a clear framework for future mechanistic studies. Major strengths include a comprehensive structural characterization of the complex, though some conclusions require further validation.

    2. Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.

      Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)₃-(MmpL4)₃ complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

    5. Author response:

      We thank the three reviewers for their critical and in-depth assessment of our study. Below you find our comments to the public reviews and our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

      We thank reviewer#1 for this positive assessment of our work. The deviation of the AF prediction from the experimental structure is , in our view, not puzzling. AF does not take the physical properties of proteins into account, but predicts structures based on strong sequence alignments. It therefore does not have “knowledge” about the general flexibility of domains such as the CCD, which is also observed in the corresponding MmpL5 structures, nor does it have knowledge about preferred conformational states. Rather than “failing” to confirm the AF predictions, our cryo-EM structure revealed an unexpected tilted conformation of the CCD. As we outline in comments below, the physiological relevance of the tilted CCD is unclear. Its flexibility might be required to interact with (still elusive) outer membrane protein components to form the fully assembled efflux machinery.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      We thank reviewer#2 for this positive assessment of our work and agree that it is interesting that the experimental structures do not fully agree with the AF predictions (see also comment to reviewer#1).

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      We thank the reviewer for this valuable comment. We will add a new figure with the model of the MmpL4 transport cycle based on our new data and discuss the proposed molecular transport mechanism in more detail in the main.

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      In the revised version, we will include additional data to assess the functional consequences of cross-linking.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      We state clearly in the discussion that the channel through the CCD seems too narrow to let large molecules like mycobactin and bedaquiline pass:[AG1]

      Line 318ff: “ The channel radius of the MmpL4 CCD is very narrow with a minimum of 1.1 Å according to the AlphaFold3 predition (Fig. 5). This is much smaller than the smallest axis of a molecular model of mycobactin molecule of ?? nm as determined from a model of iron-free mycobactin. In addition, the cryo-EM structure of MSMEG_1382 revealed a constriction in the CCD channel [21]. Even though the methionine side chains lining the channel wall are considered to be flexible{Aledo, 2019 #69594}, large conformational changes of the α-helical hairpins relative to each other would be required to allow passage of molecules as large as mycobactin and bedaquiline. The AcrAB-TolC efflux machinery provides an example for such large conformational changes to enable transport of large molecules by iris-like opening and closing movements the outer membrane channel-tunnel TolC [33]. Similar helical twisting may widen the channel of the CCD. Alternatively, it is conceivable that the substrates of MmpL4/MmpL5 are transported along the CCD surface, potentially requiring further protein partners. It is interesting to note that siderophore secretion and drug efflux by MmpL4/MmpL5 systems involves at least two additional proteins, namely the periplasmic protein Rv0455, which was shown to be essential for mycobactin efflux [34] and an outer membrane channel, whose identity remains elusive. A complete molecular understanding of the transport mechanism through the MmpL4/MmpL5 systems hence requires the identification of the missing components and structural information about their interactions.”

      The channel radius of the MmpL4 CCD is very narrow (minimum of 1.1 Å) according to the AlphaFold3 prediction (Fig. 5), and the cryo-EM [AG2] [MN3] structure of MSMEG_1382 revealed a further constriction in the CCD channel [21]. We therefore consider direct substrate transport through the CCD central channel to be physically implausible for molecules of the size of mycobactin and bedaquiline. Even accounting for the flexibility of the methionine side chains lining the channel wall, the large conformational changes of the α-helical hairpins relative to each other would be required to accommodate such large substrates. While iris-like opening movements have been described for TolC in the AcrAB-TolC system [33], those movements widen an already substantially larger channel, and even such dramatic conformational changes would be insufficient to open a channel as narrow as that of the MmpL4 CCD to a diameter permissive for substrate passage. We instead favor a model in which substrates are transported along the outer surface of the CCD, potentially with the assistance of additional protein partners. This is consistent with the observation that MmpL4/MmpL5-mediated siderophore secretion and drug efflux involves at least two further proteins: the periplasmic protein Rv0455, shown to be essential for mycobactin efflux [34], and an as-yet-unidentified outer membrane channel. In this context, the overall flexibility of the CCD - illustrated here by the tilted, incompletely formed conformation - may reflect the conformational dynamics required for interaction with these partner proteins, rather than being directly involved in forming a transport conduit. A complete mechanistic understanding will require identification of the missing components and structural characterization of the fully assembled efflux machinery.

      We do not think that the incompletely formed CCD represents a conformation that is relevant for transport. But it is a demonstration of the overall flexibility of the CCD, which may be required to further open the channel in case the substrates are transported within the CCD tube. Further in-depth experiments will be needed to clarify this interesting question, which is beyond the scope of this paper.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

      This is a great suggestion. We will include a 3D variability analysisin the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)<sub>3</sub>-(MmpL4) )<sub>3</sub> complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      We thank reviewer#3 for these useful comments, which we will address during the revision of the manuscript. In particular, we plan to include a scheme of an updated transport model.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

      We believe that there was a misunderstanding about our interpretation of the tilted CCD. As a matter of fact, it must be a stable intermediate, otherwise no density would have been observed for it in the cryo-EM maps. Despite being a stable intermediate, it is indeed unlikely that it represents a conformational state that is relevant/required for transport. Firstly, only the upright, complete CCD can bridge the periplasm. because . Secondly, the structure was determined in detergent and lacks additional protein binder partners, which might stabilize the upright conformation of the CCD . It is also conceivable, as the reviewer pointed out, that disulfide cross-linking may have caused the tilt. However, as we wrote in the manuscript, we do not think that cross-linking caused this striking asymmetry of the CCD, because the three MmpL4 and MmpS4 chains are basically symmetrical in the C1-processed data (see also Figure 2E):

      Line 182 ff: “To assess whether there are asymmetries in other parts of the structure, we superimposed the individual protomers of the (MmpS4)3-(MmpL4)3 complex analyzed using C1 symmetry (Fig. 2E). Apart from the two resolved α-helical hairpins, the MmpL4 core domains and the resolved parts of MmpS4 differ by a RMSD of less than 0.6 Å and are therefore structurally identical considering the map resolution of around 3 Å. The fact that the core domains of MmpS4 and MmpL4 do not deviate between the protomers argues against the possibility that the cross-links established between them cause the (asymmetric) tilt of the CCD.”

      Regarding the DDM binding site, we will indeed include an updated transport model. That said, we wish to be cautious, because we lack experimental proof that MmpL4 can in fact transport DDM.

    1. eLife Assessment

      Maloney et al. offer an important contribution to understanding the potential ecological mechanisms behind individual behavioral variation. By providing compelling theoretical and experimental data, the study bridges the gap between individual, apparently stochastic behavior with its evolutionary purpose and consequences. The work further provides a testable and generalizable model framework to explore behavioral drift in other behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2026) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are compelling, the connection between the experimental data (Fig. 1) and the modeling work (Fig. 2-4) is convincing.

      Weaknesses:

      In the experimental data (Fig. 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I had identified three significant issues with the experiments that were addressed in the revision:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not fully link to the theory-driven work in changing environments. A full experimental investigation of this would be beyond the scope of the current work.

      (2) The temporal aspect of behavioral instability has been addressed in Figure 1F.

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). This issue has been further discussed in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits are regulating behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting and in fact more fundamental than showing if it is serotonin that does it or not.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      Comments on the latest version:

      The changes to the manuscript sufficiently addressed my few comments. I do not have anything else substantial to add to my review and I am comfortable with my initial assessment.

    4. Reviewer #3 (Public review):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains in a Y maze concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift (although with some contrasting results). The use of a different assay for this different dataset (Y maze istead of wide arena) is justified by previous observation of similar behavioral biases in these assay. Yet the conceptual link between the spectral power analysis used for the first dataset and the autoregressive model used for the second remains unclear.

      Finally, the authors use theoretical approaches to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. eLife Assessment

      This paper describes useful findings on the effects of isoflurane anesthesia on the visual cortical circuitry of the mouse. It provides solid evidence that the visual spatial frequency sensitivity becomes coarser (lower resolution) during anesthesia, with distinct effects described in excitatory neurons, and parvalbumin (PV) and somatostatin (SOM) positive interneurons. This study should be of interest to neuroscientists studying the mouse visual cortex and the effects of anesthesia on cortical circuitry.

    2. Reviewer #1 (Public review):

      This manuscript characterizes the effects of isoflurane on visual processing in layer 2/3 of the mouse primary visual cortex (V1). General anesthesia, including isoflurane, has been reported to modulate various neural processes, such as size tuning, direction selectivity, and spatial selectivity in V1. Using two-photon calcium imaging, the authors monitored neural responses to visual stimuli under isoflurane anaesthesia and found that spatial frequency preferences are also affected across cell types, with the magnitude and direction of these effects varying between cell types.

      The authors performed careful and rigorous comparisons of neuronal responses between the two conditions using well-chosen nonparametric statistics. At the same time, because two-photon calcium imaging can be combined with cell-type-specific labeling, the authors labelled inhibitory neurons with tdTomato, allowing them to distinguish GCaMP activities in excitatory and specific inhibitory cell classes. We also appreciated that the manuscript provides not only summary statistics but also example GCaMP traces (Figure 1), which makes it easy for readers to understand the quality of the raw data.

      We believe that the manuscript could be improved by emphasizing the following three points.

      (1) The analyses are limited to the neurons that responded to visual stimuli in both the anesthetized and awake states. According to Table S1, the proportion of visually responsive neurons that met such criteria is only 27.4% for the excitatory neurons. This raises the potential concerns that the reported effects of isoflurane may not fully reflect population-level changes in visual coding. We suggest that the authors repeat the same analyses, including average tuning curves and decoding analyses, for all recorded neurons in each condition.

      (2) The manuscript would benefit from tuning curves of spatial frequency preference for individual neurons, as this would help readers assess whether the reported statistics are appropriate (Figures 2A-D). In addition, more in-depth single-neuron analyses would help distinguish between the two proposed hypotheses in Figure 5 that may not be evident from average responses alone. This is because, with the current analysis, it is not clear how the shape of the tuning curves will affect the estimation of spatial frequency preference. To address this potential concern and strengthen the interpretation of the results, we suggest:<br /> a) repeating the analysis at the level of individual neuronal responses, instead of average responses, and<br /> b) using simulated data to examine how changes in tuning-curve width could affect estimated spatial frequency preference.

      For example, using the neuronal responses in the awake condition, one could broaden the tuning curves and recompute the preferred spatial frequency, then compare the resulting distribution with that observed under anesthesia.

      (3) We believe the manuscript's overall framing is a little broader than what is directly supported by the data. In particular:

      (a) the statement "reduced sensory perception during anesthesia is linked to a degradation in spatial resolution at the cellular level" in the Abstract is an unclear and unsupported claim. We suggest removing this sentence and more directly summarizing the findings.

      (b) given the discrepancy between the effects of urethane and isoflurane as laid out in the discussion, the current title "Anesthesia Lowers Spatial Frequency Preference in the Primary Visual Cortex" appears overstated and should be revised to explicitly reflect the specific anesthetic tested: "Isoflurane Anesthesia Lowers Spatial Frequency Preference in the Primary Visual Cortex".

    3. Reviewer #2 (Public review):

      Summary:

      The main objective of the study was to link the changes in brain state due to anesthesia to consequences on visual neural processing, particularly effects on spatial frequency tuning. This is accomplished by 2-photon imaging of excitatory and inhibitory neurons (separating PV- and SST-positive subtypes) in mouse visual cortex during full-field visual stimulation with gratings, and tracking neuronal tuning for spatial frequency before, during, and after isoflurane anesthesia. The main finding is that anesthesia induces lower spatial frequency preferences in excitatory neurons, and this leads to poorer population representations (decoding) of higher spatial frequency responses during anesthesia. A second main finding is that anesthesia impacts inhibitory neuron subtypes in distinct ways, with the most pronounced effects of anesthesia on somatostatin inhibitory neurons.

      Strengths:

      (1) A main strength is that the study is that it is straightforward, and reassuringly, the results confirm multiple previous studies showing anesthesia's effects on the amplitude of cortical responses: larger and less selective responses in excitatory neurons (versus awake responses); strongly reduced responses in somatostatin inhibitory neurons (versus awake responses) (Fig. 5I-L), with less differences across anesthetized and awake states on response amplitude of PV neurons.

      (2) These confirmations of prior observations (on the amplitude of responses) establish good ground for the new results on spatial frequency tuning. For excitatory neurons, spatial frequency selectivity shifts to higher values in awake versus anesthetized conditions; this is because anesthesia induces larger responses to lower spatial frequencies. In somatostatin neurons, instead, wakefulness reduces the lower spatial frequency responses present in anesthesia, and dramatically increases the overall amplitude of responses and medium and higher spatial frequencies. This is consistent with prior work showing that in awake states, somatostatin neurons exert broad inhibition in V1; this study extends that finding to the tuning of spatial frequencies.

      Weaknesses:

      (1) A first weakness of the study is the lack of examination of changes to single neuron receptive field sizes and/or surround suppression across conditions, and how these may relate to the effects on spatial frequency tuning with full field gratings. There is a well-known relationship between the size of the receptive field and the resulting selectivity for spatial frequencies (i.e., large receptive fields prefer lower spatial frequency stimuli). Likewise, there are many studies showing how surround suppression / spatial integration is impacted by anesthesia (and arousal). A more detailed examination of all these related quantities on an individual neuron basis would provide a greater understanding of the factors underlying the effects on spatial frequency tuning. One could imagine that receptive field changes, and/or changes in surround suppression, influence the selectivity to full-field gratings.

      (2) A second weakness is the lack of examination/insight into the temporal dynamics of the effects. The experimental paradigm records activity across control, anesthesia, and recovery epochs in a single duration (~40 mins) session. The epochs are simply binned together ("Awake", "Anes.", "Recover"). It is not clear how the start of the anesthesia bin is defined, nor is it clear how the recovery period is defined. It is also not clear what the changes are to motor tone, brain state, etc., that are also strong influences on visual responses in mouse V1. Presumably, these onset/offset effects are similar enough across mice and sessions that they affect all the bins in the same way, but greater examination of the temporal effects in excitatory, PV, and SOM neurons could shed light on interactions driving the changes. Is there some temporal dependence of anesthesia on selectivity changes across the cell types? For example, at the onset of anesthesia, are SOM neurons losing broadband frequency responses before the excitatory neurons gain low frequency responses? Do PV neurons also show effects after the changes in SOM neurons (suggesting strong SOM -> PV inhibition)? Such analysis might shed light on the timing/causality of the effects among these 3 neuron types.

      (3) A third weakness concerns the interpretation of the low and high arousal conditions during awake states (Figure 6). It is not clear how movement (or lack of movement) impacts the high arousal epochs, nor is it clear how the low arousal condition compares to the brain state during anesthesia. For example, deep versus light anesthesia can lead to synchronized or asynchronous states, respectively, and low arousal in wakefulness can show strong low-frequency oscillations of activity, which could promote a lower excitability state than light anesthesia. Without some more detail about commonly measured brain state or body/face motion metrics, it is difficult to know what brain states are represented by the bins and how to interpret the comparisons.

      Overall, the study uses adequate methods and experimental design to demonstrate solid support for the (somewhat narrow) central finding that anesthesia lowers the spatial resolution of mouse V1 responses.

      Since this is a very well-examined topic, the findings here are not totally surprising, but confirmatory and slightly extend prior findings (a good thing). As such, the study will likely have most relevance to specialists in the mouse visual system, but if the study could address some of the remaining questions discussed above, this would potentially broaden the implications of the study to general insights about the operation of cortical circuitry.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript is focused on studying the spatial frequency selectivity of individual neurons in the mouse primary visual cortex (V1) in the anesthetized and awake brain states using 2-photon calcium imaging. Although previous studies have demonstrated that anesthesia decreases both size tuning and spatial selectivity in V1 neurons, the strength of this study is its focus on characterization of the same neurons in awake and anesthetized states in combination with transgenic mouse lines selectively labeling pan-inhibitory neurons and also more specific neuronal subtypes, including parvalbumin-positive (PV+) or somatostatin-positive (SOM+) interneurons. A combination of these methodologies allows for a more in-depth mechanistic study of the properties of different types of neurons. The main findings suggest that in excitatory neurons, anesthesia leads to a shift in preferred SF and broadening of SF tuning, with no changes in orientation and direction selectivity. Downward shift in preferred SF was more pronounced in both SOM+ and PV+ interneurons.

      Strengths:

      (1) 2-photon calcium imaging with single-cell resolution.

      (2) Characterization of excitatory and two types of inhibitory neurons.

      Weaknesses:

      (1) VIP interneurons are critical to the neural circuit, and their characterization would be critical to the mechanistic understanding of this process, but is missing.

      (2) Unfortunately, the manuscript does not lead to an additional insight into the nature of this anesthesia-induced shift in SF preference.

      (3) Furthermore, it also doesn't help understand how SF preference is encoded in V1.

      (4) Finally, some critical histological controls are missing.

    5. Author response:

      Thank you for the eLife assessment and the constructive reviews. We appreciate the reviewers’ valuable insights and the time they dedicated to providing such thoughtful feedback on our manuscript. The reviewers highlighted the technical rigor of our study, specifically the tracking of individual neurons across both anesthetized and awake states using two-photon imaging. They also emphasized the importance of our cell-type-specific analysis (excitatory, PV, and SOM neurons) and noted that the study provides solid evidence for isoflurane-induced shifts in preferred spatial frequency (SF).

      Based on our team's evaluation of the reviewers' comments, we would like to outline our planned revisions.

      (1) Expanded Population and Single-Neuron Analysis

      We will re-analyze our dataset to include all neurons that were responsive under anesthesia, in the awake state, or both. This will ensure our findings accurately represent the entire population of visually responsive neurons. We will also provide examples of individual tuning curves to clarify the relationship between tuning shape and SF shifts in individual neurons.

      (2) Addressing Methodological Scope and Behavioral Metrics

      Receptive Field Size and Dynamics: While we did not utilize a stimulus set specifically designed to map receptive field (RF) sizes, we intend to examine how other functional parameters co-varied with the shift in preferred SF within each cell type. Furthermore, although characterizing the precise temporal dynamics during anesthesia onset presents technical challenges, we will attempt to analyze the time-dependence of the observed changes to provide deeper insight into the transition between states.

      Behavioral Metrics: While pupil size is a well-established proxy for brain state, we will explore the inclusion of other available behavioral parameters.

      (3) Cell-type Specificity (SOM, PV, and VIP)

      SOM vs. PV Comparison: We will perform a detailed comparison of preferred SFs between SOM and PV interneurons, including those responsive only under anesthesia or only in the awake state.

      VIP Neurons: While VIP neurons are known to play critical roles in cortical circuits, such as disinhibition, we have decided not to conduct new recordings for VIP interneurons in the present study. Based on existing literature, the proportion of visually responsive VIP cells is too low to yield statistically reliable conclusions for this specific study (de Vries et al., Nature Neuroscience 23, 138-151, 2020). Additionally, we intend to focus our analysis on inhibitory interneuron subtypes that provide direct input to pyramidal cells.

      Histology: We will provide additional histological validation.

      (4) Refined Framing

      As suggested, we will focus the manuscript strictly on isoflurane anesthesia. This includes updating the title and abstract to reflect this specificity and discussing how our results compare with other anesthetics like urethane. Furthermore, we will substantially deepen our discussion on the potential mechanisms by which anesthesia induces a downward shift in preferred spatial frequency.

      We believe these additions will significantly strengthen the manuscript.

    1. eLife Assessment

      This important study experimentally probes potential antibiotic activity against hypothetical "mirror bacteria" with reversed chirality, showing that D-enantiomers of several approved antibiotics largely lack activity against natural bacteria (as a proxy for mirror organisms) and that conjugated D-peptides can elicit strong binding antibody responses in mice when adjuvanted. The evidence is solid for these core observations but incomplete on issues of chiral purity, functional antibody assays, replicates, and pharmacodynamic readouts; the work also overreaches in extrapolations without deeper mechanistic integration or native-format validation. Overall, the work offers a cautious, relevant contribution to mirror microbiology discussions and will interest infectious disease researchers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Evaluation of Antibiotic and Peptide Vaccine Strategies for Mirror Bacterial Infections" addresses a topic that is well established in the literature. The authors investigate the activity of enantiomeric (D-form) antibiotics against bacteria and the immunogenicity of D-form peptides, proposing that D-enantiomers are ineffective both as antibacterial agents and as vaccine candidates. While the subject matter is relevant, the concepts explored are already well known, and the manuscript offers limited novelty.

      The authors demonstrate that D-enantiomeric antibiotics lack antibacterial activity compared to their naturally occurring L-forms and that D-form peptides fail to elicit detectable immune responses. These observations are consistent with existing knowledge regarding molecular chirality in biological systems. However, the manuscript relies on a limited experimental dataset while extrapolating the findings broadly, which weakens the strength of the conclusions.

      Strengths:

      The manuscript introduces the topic of Mirror Bacterial Infections, likely to occur if no regulations or restrictions are placed immediately.

      The manuscript addresses a relevant topic and has potential value, particularly in framing discussions around chirality and pathogen interactions. With a more cautious interpretation of the results, the manuscript could better justify its conceptual framework and strengthen its contribution to the field.

      Weaknesses:

      (1) Several sections of the manuscript are overly descriptive and would benefit from deeper comparative analysis and critical synthesis. In multiple instances, the discussion relies on hypothetical scenarios supported primarily by selective citations rather than robust experimental evidence. The introduction of the term "mirror microbiology" or "mirror bacteria" appears largely conceptual and is used to unify what are essentially two separate lines of investigation, enantioselective antibiotic activity and peptide chirality in immune recognition, without sufficient mechanistic integration.

      (2) To the best of this reviewer's understanding, the manuscript does not present substantial novelty. The pronounced differences in biological activity between L- and D-forms of small molecules and peptides are well documented, including their implications for antimicrobial efficacy and immune recognition. While the manuscript is written in clear and accessible language suitable for both specialists and interdisciplinary readers, novelty remains limited.

      The manuscript reiterates well-established principles of stereochemistry and biological recognition. Given the extensive existing literature demonstrating that enantiomeric antibiotics are typically inactive due to stereospecific target interactions, the failure of D-form antibiotics is expected and does not constitute a novel finding.

      (3) Critical experimental details are lacking, particularly regarding the peptide design. It is unclear whether the peptides were synthesized entirely in the D-configuration or whether only select amino acids were substituted. This distinction is essential for interpreting immunogenicity results and for comparison with prior studies.

      (4) The authors conclude that D-form peptides are poorly recognized by the immune system. However, the data presented indicate that neither the L- nor the D-form peptides tested elicited a measurable immune response. Without demonstrating immunogenicity of the corresponding L-form peptides, the conclusion that immune non-recognition is specific to the D-form is not sufficiently supported.

    3. Reviewer #2 (Public review):

      This paper by Kleinman et al. tackles an increasingly discussed biosecurity scenario, namely the possibility that "mirror bacteria" could evade key elements of host immunity and therefore demand bespoke medical countermeasures. The authors experimentally probe two such countermeasure concepts: (1) whether existing chiral antibiotics might still work against mirror bacteria (this is tested indirectly by measuring the activity of antibiotic enantiomers against natural-chirality bacteria), and (2) whether D-peptide antigens can be made immunogenic. Briefly, the authors show that enantiomers of four approved antibiotics have little to no activity in MIC assays, argue this implies the parent drugs would likely fail against mirror bacteria, report limited single-dose tolerability data for the enantiomers in mice, and show that selected bacterially derived D-peptides can elicit strong binding antibody titers when conjugated to a carrier protein and given with adjuvant.

      Overall, the study is quite interesting but constrained by the fact that D-peptide immunogens and related ideas have been explored for decades, by prior literature showing that D-enantiomeric peptides can themselves be strongly antimicrobial vs conventional bacteria, and by a number of conceptual and experimental limitations outlined below.

      (1) A blanket statement indicating that flipping chirality makes antibiotics ineffective cannot be true across all classes. Indeed, there is extensive precedent for "mirror" (D-amino-acid) peptides that retain, or even improve, antimicrobial activity against natural bacteria.

      (2) The paper's key claim ("parent antibiotics won't work on mirror bacteria") is based on the observation that the enantiomers of chloramphenicol/linezolid/tedizolid/aztreonam largely lose activity against natural bacteria. This is a reasonable proxy experiment given the absence of mirror organisms, but it remains an inference and should be described as such.

      (3) The chiral purity needs to be documented more rigorously. The methods mention structural confirmation by NMR and >95% purity by LC-MS/HPLC for enantiomeric compounds, but this is not the same as demonstrating high enantiomeric excess or excluding low-level contamination by the active parent enantiomer.

      (4) The residual activity of ent-aztreonam is quite interesting. The authors report slight activity for ent-aztreonam (MIC of 32-128 µg/mL in a subset), still far weaker than aztreonam but nonzero.

      (5) For antibiotics, MIC is a starting point, but further experiments are needed. To justify countermeasure relevance, it would help to include at least one additional pharmacodynamic readout (time-kill kinetics, post-antibiotic effect, inoculum effect, or activity in the presence of human serum).

      (6) The acute toxicity study is limited (single-dose, short follow-up, small n, one sex/strain, and no histopathology).

      (7) The Discussion leans on human equivalent dosing logic to reassure feasibility. Given the lack of PK, bioavailability, metabolism, and repeat-dose data, these comparisons risk overreach.

      (8) The readout is ELISA endpoint binding (IgG; and IgA in BALF for one antigen), which is fine for an initial immunogenicity screen. But the manuscript then drifts toward "vaccine strategy" claims without showing any antibody functionality (opsonophagocytosis, complement deposition, neutralization, blocking adhesion, and so on) or even binding to a more native-like antigen format (e.g., D-peptide displayed on particles; D-protein fragments; or any surrogate that goes beyond plate-bound peptide).

      (9) The methods report peptide conjugates containing ~10-200 EU/mL endotoxin. That is not trivial and could materially amplify immunogenicity, and should be discussed.

      (10) The authors should report how many technical/biological replicates were performed for MIC determinations and for ELISAs.

    4. Reviewer #3 (Public review):

      Summary:

      There is a threat of mirror life bacteria, which could possibly evade immunity and cause problems for human/animal hosts. This paper evaluates enantiomeric antibiotics and vaccines as a means to understand how this could be combatted in the future.

      Strengths:

      It is valuable to collect such information, as it is not always clear how an antibiotic in its enantiomeric form would interact with a bacterium in terms of its MIC or towards toxicity. The paper is scientifically sound with regard to assays and statistical methods.

      Weaknesses:

      The beginning of the paper could be described as hyperbolic. For a paper that demonstrates that mirror-image molecules have (expected) lower MICs and toxicity, some of the claims in the beginning that they are going to cause a pandemic of evading the immune system seem to be a bit overstated. If they are mirror images, how are these bacteria going to generate virulence factors or mediate pathogenesis mechanisms? It seems like the lack of adaptation would go both ways - supported by the empirical data gathered in this manuscript. There is also the issue of only relatively simple and accessible mirror-image antibiotics being available. This is a limitation that - to their credit - the authors do discuss in the discussion section.

    1. eLife Assessment

      AIRE has been well known to contribute to immune self-tolerance in the thymus by expressing auto-antigens; in this manuscript, the authors describe unexpected findings about the interaction of AIRE with AID in B cells, and its function in the immune system, thereby contributing to a fundamental understanding of the broader functions of AIRE. The strength of this manuscript is that, by employing biochemical and genetic experiments, the authors convincingly show interaction between AIRE and AID and subsequent AIRE's function in the GC responses. However, two weak points exist: first, the connection between AIRE, auto-anti IL17 Abs, and IL17-positive effector T cells, and second, like the thymus, expression of auto-antigens by AIRE in the GC B cells has not been tested.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide in vivo and in vitro evidence for an interaction between AIRE and AID. This has implications for the dynamics of the germinal center response and autoimmunity related to the APSI disease.

      The manuscript describes an unexpected function of AIRE, which is more well known for its function to regulate negative selection of T cells in the thymus. Here, the gene has also been shown to be expressed by B cells (Immunity 2015: 26070482). They describe that AIRE interacts with AID, and in its absence, B cells acquire more hypermutations and also produce auto-antibodies against IL-17. These autoantibodies have been described previously.

      Strengths:

      The study is interesting and provides some additional information about how AIRE regulates immune cell function. Several biochemical and in vivo experiments show the interaction and the function of AIREs in the regulation of AID activity in the GC response.

      Weaknesses:

      Some of the hypothetical consequences of this regulation are not investigated. This includes responses to model antigens and dynamics of the germinal center related to kinetics.

      Major Comments:

      (1) AID regulates both switch and somatic hypermutation. Switch is easier to achieve, so which of these processes does AIRE influence the most? Also, the switch is thought to occur before the B cell enters the GC. Looking at the histology, is AIRE also expressed at the early proliferative stage that has been described by Ann Haberman?

      (2) In experiments determining anti-CD40-dependent upregulation of AIRE, naïve resting B cells were used from mice. A proportion of the B-cells got activated. Are these MZB or FOB cells as MZBs are more easily activated?

      (3) In the BM chimeric experiments in Figure 3. Do the AIRE+ and AIRE - populations distribute equally among B cell subpopulations?

      (4) Furthermore, in the NP-KLH experiments, one would expect that B cells with increased affinity would leave the GC earlier and become plasma cells. Thus, the kinetics of the AIRE+ vs AIRE- B cells within the GC would be different? Also, would they maybe take over at some point, as the increased affinity would favor help from Tfh cells that are known to be limited?

      (5) Given the previous studies on AIRE's function in regulating transcription (PMID: 34518235), how does this interaction fit into this picture?

      (6) In the uracil experiments, the readout for AID to induce double-stranded breaks could be tested.

      (7) The candida experiments are a nice connection to the situation in patients. However, why is it mostly auto-antibodies against IL-17? How about other immune responses, as well as T cell-independent type I and II responses?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Zhou et al investigated the expression and function of AIRE in B cells in peripheral lymphoid tissues. First, they found the expression of AIRE protein in mature B cells in the follicles in human tonsils and spleens from healthy donors. Flow cytometry analyses using human samples as well as Aire-reporter mice demonstrated AIRE expression in germinal center B cells. The expression of Aire in B cells was induced by CD40 signals. Then, to investigate the impact of AIRE deficiency on B-cell function, the authors used a method of transplanting bone marrow cells from Aire-KO and WT mice into B-cell-deficient mice, comparing B-cell development and function reconstituted in the recipient mice. Their results showed that Aire-deficient B cells strongly responded to immunization with antigens, exhibiting enhanced class switching and somatic hypermutation of antibodies compared with WT B cells. The same phenomena were observed in CRISPRed B cell lines lacking Aire. The authors successfully utilized the Aire-deficient B cell line to demonstrate that Aire suppresses antibody class switching and somatic hypermutation via its interaction with AID. Finally, using B cell transfer into B cell-deficient mice demonstrated that mice harboring Aire-deficient B cells produced high levels of autoantibodies against Th17 cytokines and exhibited reduced resistance to Candida infection. This mirrors characteristic symptoms in AIRE-deficient patients. The findings of this study not only reveal an unexpected function of AIRE in B cells but also have the potential to contribute to understanding the pathogenesis of APECED and to offering a new direction for developing therapies.

      Strengths:

      The strength of this study lies in demonstrating the expression of the function of AIRE in B cells in both mice and humans. It also revealed the direct interaction between AIRE and AID, along with its binding mode (requiring CARD and NLS domains of AIRE), and showed that this interaction is crucial for AIRE function in B cells. It is also significant that the study demonstrated how B-cell-intrinsic dysfunction of AIRE leads to autoantibody production against cytokines.

      Weaknesses:

      As for loss-of-function analysis of Aire in B cells, in addition to the B cell transfer from Aire-KO mice performed in this study, generating B cell-specific Aire-deficient mice using Aire-flox mice (Dobes et al, Eur J Immunol 2018) would further reinforce the conclusions of this study. Furthermore, the relationship with Aire function in thymic B cells reported by previous studies remains unclear, posing an unresolved challenge. This study also failed to address whether Aire deficiency affects gene expression in GC B cells, in particular, whether it induces the expression of various self-antigens as reported in thymic B cells or mTECs.

    1. eLife Assessment

      This study used pupillometry to provide an objective assessment of a form of synesthesia in which people see additional color when reading numbers. It provides convincing evidence that subjective color ratings are matched by changes in pupil size that recapitulate brightness-mediated changes when exposed to the real color. The work provides a valuable contribution to the literature on both synesthetic perception and the use of pupillometry to probe perception and related psychological processes.

    2. Reviewer #1 (Public review):

      Summary:

      Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a black-white character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.

      Strengths:

      The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.

      Weaknesses:

      Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.

      (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.

      (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).

      (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.

      (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.

      Impact:

      Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.

    3. Reviewer #2 (Public review):

      Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individuals (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia has continued to attract a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.

      Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict, and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and put forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").

      Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show, for example, that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.

      Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.

    4. Reviewer #3 (Public review):

      Summary:

      In the present study, the authors examined pupillary responses to uncolored stimuli (number graphemes) among number-color synesthetes and non-synesthetes. After seeing a digit, the synesthetes and active control participants were asked to indicate which color they perceived using three dimensions of hue, saturation, and lightness. The lightness values were the primary independent variable for follow-up analyses. To see how the pupil responded to psychologically "bright" and "dark" digits, the authors split the reported lightness values at the median and plotted them. The synesthetes showed a pupillary constriction to digits they perceived as bright and dilation to digits they perceived as dark. Active control participants did not show that effect. In a subsequent block, only the synesthetes were shown the colors they reported perceiving as colored discs. Their pupillary responses were similar. The authors also found that the differences in pupillary responses between light and dark perceptions (with digits) were only slightly delayed in their onset to the perception of a colored disc, and therefore, the color perception accompanying a digit is unlikely to be effortful or a retrieved association, but occurs rather automatically.

      Strengths:

      The authors employed a well-controlled and designed quasi-experiment comparing color-grapheme synesthetes to non-synesthetes and showed convincingly that the color perceptions accompanying graphemes alter the physical perception of brightness. They also made a reasoned attempt to rule out the possibility that color associations are occurring effortfully via retrieved associations.

      Weaknesses:

      There are some areas in which the implications of these findings could be elaborated upon. I had the following questions:

      (1) Are the pupillary responses among synesthetes, which objectively do not seem to match the degree of physical stimulation entering the retina, in any way maladaptive for eye functioning? I understand the constriction/dilation of the pupil to not only benefit visual acuity but also to protect the retina from damage. Are synesthetes at any risk of retinal damage due to over-dilation of the pupil to brighter stimuli? Or are these effects of a magnitude that is too small to matter? As reported in arbitrary units, it was hard to know how large these effects were in terms of measurable changes in dilation (e.g., millimeters).

      (2) Likewise, is the automatic synesthetic merging of two percepts something that could be learned such that natural synesthetes and "artificial" synesthetes would look similar? For example, if a group of non-synesthetic participants were to learn a color-grapheme association to automaticity, would you expect their pupillary responses to the graphemes look similar to the synesthetes'? If so (or if not), what would this tell us anything about the phenomenology of synesthesia?

      (3) Do the synesthetic perceptions of digit graphemes merge in a sensible way? For example, if a synesthete sees a particular color with the digit 1, and a different color with the digit 9, what do they perceive when they see 19? or 1-9, or 1 9? Is there color blending, or an altogether different color perception?

    1. eLife Assessment

      This study provides an important, comprehensive, large-scale dataset on transcription factor binding in Pseudomonas aeruginosa, along with analyses of its regulatory network, key virulence and metabolic regulators, and a pangenomic examination of transcription factors. Utilizing large-scale ChIP-seq and multi-omics integration, the research convincingly supports the hierarchical regulatory structures and offers insights into virulence mechanisms. This dataset, made available through an online database, should be an invaluable resource to the research community studying P. aeruginosa, a key pathogen at risk for hospital infections and development of antibiotic resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain under-evaluated. But this could be improved in the future.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Comments on revisions:

      The authors have done a good job of revising their manuscript. The manuscript is now more concise and logical for readers.

    3. Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. Experimental validation has been included in the revised version. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Weaknesses:

      Drawbacks of the study, which have been mitigated in a revised version, include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions (discussed by the authors), and 2) remaining challenges in the practical utilization of the TRN topological analysis.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Thank you for your positive feedback!

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      Thank you for this valuable suggestion. We have performed the EMSA experiment to validate the binding result and also constructed the mutants for further functional validation. The details can be found in Figure S5.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      Thank you for this insightful comment regarding ChIP-seq data quality and non-promoter binding events. While we acknowledge that completely eliminating all non-specific binding signals is technically challenging in bacterial ChIP-seq experiments, we implemented stringent quality control measures including replicates, negative controls, and FDR cutoffs to minimize false positives.

      Although the coding binding peaks represent a smaller fraction of total binding events, they are functionally significant rather than mere technical artifacts. Our previous work systematically demonstrated that bacterial TFs can bind to coding sequences and regulate gene expression through multiple mechanisms, including modulating cryptic promoter activity and antisense RNA transcription, hindering transcriptional elongation, and influencing translational efficiency[1]. We have now expanded the Discussion section to address these regulatory mechanisms.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Thank you for this constructive feedback on PATF_Net. We acknowledge that more advanced features would further enhance the platform’s utility. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Thank you for your positive feedback! We have added experimental validation in the Results section.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Thank you for your positive feedback!

      Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Thank you for your positive feedback!

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

      We thank the reviewer for summarizing these key concerns. We acknowledge the limitations raised regarding TF overexpression, TRN topological analysis interpretation, and experimental validation. We provide detailed point-by-point responses to each of these concerns in our replies to the specific comments below, where we explain our rationale, the measures taken to address these limitations, and our plans for improvement.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Future Directions for the authors to consider for next steps:

      (1) Key TFs (e.g., PA1380, PA5428) should be validated via gene knock out experiments, fluorescent reporter assays, or animal models to confirm roles in virulence pathways.

      Thank you for this important suggestion. We agree that experimental validation is essential to confirm their regulatory roles and biological functions.

      Firstly, we selected a subset of key TFs, including PA0167, PA1380, PA0815, and PA3094, and performed Electrophoretic Mobility Shift Assays (EMSA) experiments to validate their direct binding to target promoters. These results confirmed the ChIP-seq-identified interactions and are now included as Figure S5A-F.

      We also constructed a clean deletion mutant of PA1380 and PA 3094 (ΔPA1380 and ΔPA3094) and their complementary strains (ΔPA1380/p and ΔPA3094/p). We then performed RT-qPCR analysis to validate their regulatory effects on key target genes. We found that PA1380 positively regulate the expression of cupB1 and cupB3 genes (Figure S5F). While the CupB cluster was known not be as important as CupA cluster in the biofilm information, so we did not find significant difference in biofilm formation between WT and ΔPA1380. Additionally, we found TF PA3094 also positively regulate lecA expression, which were shown in Figure S5G.

      We agree that comprehensive functional validation, including animal model studies, would further strengthen the biological significance of these findings. Such experiments are currently underway in our laboratory and will be the subject of follow-up studies.

      We have revised the Results section and Method section to include these validation experiments and their implications. Please see Figure S5 and Lines 283-300.

      “To experimentally validate the regulatory interactions identified by ChIP-seq, we performed biochemical and genetic analyses on selected TFs. First, we conducted Electrophoretic Mobility Shift Assays (EMSA) for four TFs, including PA0167, PA0815, PA1380, and PA3094, using DNA fragments containing their predicted binding sites from target gene promoters. These TFs showed specific binding to their cognate DNA sequences (Figure S5A-D), confirming the direct binding of the ChIP-seq-identified interactions.

      To further validate the functional regulatory roles of these TFs, we constructed clean deletion mutants of PA1380 and PA3094 (ΔPA1380 and ΔPA3094) along with their complemented strains (ΔPA1380/p and ΔPA3094/p). RT-qPCR analysis revealed that PA1380 positively regulates the expression of cupB1 and cupB3 (Figure S5E), two genes within the CupB fimbrial cluster identified as ChIP-seq targets. Similarly, PA3094 was confirmed to positively regulate lecA expression (Figure S5F), which encodes a lectin involved in biofilm formation and host interactions[2]. Expression of these target genes was restored to wild-type (WT) levels in the complemented strains, validating the regulatory relationships predicted by ChIP-seq. These combined biochemical and genetic validations demonstrate the accuracy and biological relevance of our TF binding data.”

      (2) Non-promoter binding events (e.g., coding regions) may regulate RNA stability, warranting integration with translatomics or epigenomics data.

      Thank you for this suggestion. We have now expanded the Discussion section to address this comment. Please see Lines 478-482.

      “Our analysis revealed that TF binding events occur within coding regions, which is consistent with our previous study demonstrating that bacterial TFs possess binding capabilities for coding regions and can regulate transcription through multiple mechanisms [1]. Besides, it may also regulate RNA stability, warranting integration with translatomics or epigenomics data.”

      (3) Incorporate strain-specific TF data (e.g., clinical isolates) and dynamic visualization tools to broaden PATF_Net's applicability.

      Thank you for this constructive suggestion. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query. These features are now live on the database and described in the revised manuscript.

      Regarding strain-specific TF data, we agree this would be valuable for understanding regulatory diversity in clinical isolates. However, such an expansion would require ChIP-seq profiling across multiple strains. The current dataset is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa research and allows direct comparison with existing genomic and functional studies. We have added a statement in the revised manuscript acknowledging this limitation and highlighting strain-specific TF analysis as an important future direction for the field. Please see Lines 372-390.

      “The database offers multiple search modalities to facilitate data exploration: users can perform TF-centric searches to query binding sites, target genes, and regulatory networks for individual TFs, or utilize the target gene search function to identify all TFs that regulate any gene of interest by entering its locus tag. To connect regulatory data with biological function, we have implemented a virulence pathway browser that allows users to explore TF binding patterns across curated gene sets for major P. aeruginosa virulence pathways. Interactive visualization tools, including network graphs and binding profile plots, facilitate intuitive exploration of regulatory relationships. The primary purpose of PATF_Net is to store, search, and mine valuable information on P. aeruginosa TFs for researchers investigating P. aeruginosa infection. The current resource is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa molecular studies and allows direct integration with existing genomic annotations and functional data. However, P. aeruginosa exhibits substantial genomic diversity across clinical isolates, and strain-specific differences in TF binding patterns may contribute to phenotypic variation in virulence, antibiotic resistance, and host adaptation. Extension of this resource to include strain-specific regulatory maps from diverse clinical isolates would provide valuable insights into the regulatory basis and represents an important direction for future investigation.”

      (4) Phylogenetic analysis highlights TF conservation in bacteria; future work could explore functional homology in other Gram-negative pathogens (e.g., E. coli).

      Thank for this insightful suggestion. Our phylogenetic analysis revealed that P. aeruginosa TFs exhibit varying degrees of conservation across bacterial species, with some showing broad distribution across Gram-negative pathogens while others are lineage-specific.

      We agree that exploring functional homology of orthologous TFs across species would be highly valuable. Such comparative studies could address whether conserved TFs regulate similar target genes and biological processes across species, or whether regulatory networks have been rewired during evolution. For example, comparative ChIP-seq analysis of P. aeruginosa TFs and their orthologs in Klebsiella pneumoniae or even Gram-positive pathogen like Bacillus cereus could reveal conserved regulatory modules governing universal virulence or metabolic strategies versus species-specific adaptations. This represents an important direction for future investigation and would be facilitated by the comprehensive TF binding dataset we provide here. We have expanded the Discussion section to highlight this future direction. Please see Lines 539-550.

      “While our phylogenetic analysis reveals varying degrees of TF conservation across bacterial species, the functional implications of this conservation remain to be fully explored. Many P. aeruginosa TFs have clear orthologs in both Gram-negative (e.g., Klebsiella pneumoniae) and Gram-positive pathogens (e.g., Bacillus cereus), yet whether these orthologs regulate similar target genes and biological processes is largely unknown. Future comparative ChIP-seq profiling of orthologous TFs could reveal the extent to which regulatory network architecture is conserved versus rewired during bacterial evolution, potentially identifying core regulatory modules governing universal bacterial strategies versus species-specific innovations. Such cross-species comparisons would enhance our understanding of regulatory network evolution and enable functional prediction in less well-characterized pathogens based on homology to experimentally validated P. aeruginosa regulators.”

      Reviewer #3 (Recommendations for the authors):

      Major comments

      - Limitations of the ChIP-seq approach: With overexpression plasmids as an approach to TRN elucidation, there are always a set of concerns. First, TF expression is not enough to ensure regulatory activity - metabolite effects must be such that the TF is active which requires growing the cells in activating conditions. Second, the presence of a binding event does not mean that the binding has a regulatory effect - the authors are clearly aware of this as they specify binding sites in promoter regions, which should be helpful, but they also mention the possibility of regulatory binding events in coding regions. These issues should be listed as weaknesses of the approach in the Discussion.

      Thank you for these important suggestions. We agree that these limitations should be explicitly discussed. We have now added a dedicated paragraph in the Discussion section addressing these concerns. Please see Lines 492-501.

      “However, several limitations of the ChIP-seq approach should be acknowledged. Firstly, TF overexpression ensures sufficient protein levels for ChIP-seq signal detection but does not guarantee that all TFs are in their active conformational states, as many bacterial TFs require allosteric activation by metabolites, cofactors, or post-translational modifications. The cells under standard laboratory conditions which may not activate all TFs to their maximal regulatory states, potentially leading to underestimation of condition-specific binding peaks. Secondly, while we observed TF binding at thousands of genomic sites, binding per se does not equate to functional regulation, as chromatin context, cofactor availability, and competitive binding all influence regulatory outcomes.”

      - Lack of independent validation: The study seems to lack substantial independent validation of either the functional nature of the binding sites as well as the proposed physiological regulatory role of the TFs. For example, for the 103 identified TF motifs, do any of these agree with existing motifs in motif databases that may be homologous to P. aeruginosa TFs? The authors claim to have discovered master regulators of virulence and associated core regulatory clusters - but there does not seem to be any independent validation of the proposed associations. The authors selected the TF targets to cover TFs that had not yet been characterized; however, it would have been nice to have some overlap with previous studies so that consistency and data quality could be assessed.

      Thank you for raising these critical points about validation.

      As for motif validation, we compared the existing motifs in the RegPrecise database[3] and we found that the motif of PA3587 show significant similarity to homologous TFs in Pseudomonadaceae. We have added the related description in the Results section. Please see Figure S3B and Lines 228-231.

      As for the validation of master regulators, we have performed EMSA experiments for validating the binding events and constructed the mutants for function validation. We have added the related contents in Results section. Please see Figure S5 and Lines 283-300.

      We have discussed the overlap between our results and previous studies in the Discussion section. Please see Lines 530-538.

      “PA0797 is known to regulate the pqs system and pyocyanin production[4]. In the present study, it was also found to bind to the pqsH promoter region and its motif was visualised. PA5428 was found to bind to the promoter regions of aceA and glcB genes[5], which was also demonstrated in our ChIP-seq results. PA4381 (CloR) was found to be associated with polymyxin resistance in a previous study[6] and to be possibly related to ROS resistance in the present study. Furthermore, PA5032 plays a putative role in biofilm regulation and also forms an operon with PA5033, an HP associated with biofilm formation[7].”

      - Uncertain value of TRN topology analysis: The relationship between ternary motifs and pathogenicity of P. aeruginosa, and why the authors argue these results motivated TF-targeting drugs (the topic of the last paragraph of the Discussion), are unclear to me. The authors allude to possible connections between pathogenicity, growth, and drug resistance, but I don't see concrete examples here of related TF interactions that clearly represent these relationships. The sections "Hierarchical networks of TFs based on pairwise interactions" and "Ternary regulatory motifs show flexible relationships among TFs in P. aeruginosa" seem to not say much in terms of results that are actionable or possible to validate. A topological graph is constructed based on observed TF-TF connections in measured binding sites - however, it's unclear if any of these connections are physiologically meaningful. Line 178 - Why would there be any connection between the structural family of TF and its location in the proposed TRN hierarchy?

      Thank you for this valuable comment on TRN topology analysis. It is hard to quantify precisely how much this resource will accelerate P. aeruginosa research or drug development, but we believe providing this foundational network architecture has inherent value for the community, which is valued for enabling hypothesis generation even before comprehensive functional validation. We would like to clarify our perspective on these findings and have added the discussion in the revised manuscript to better describe their nature and value. Please see Lines 517-528.

      “Additionally, although the TRN analysis revealed organizational patterns in P. aeruginosa regulatory network, the functional significance these topological features, including their specific contributions to pathogenicity, metabolic adaptation, and antibiotic resistance remains to be experimentally determined in the future work. The hierarchical structure and regulatory motifs we identified represent objective network properties derived from our binding data, but translating these structural observations into mechanistic understanding will require condition-specific functional studies, genetic validation, and phenotypic characterization. Our analysis provided a systematic framework and generating testable hypotheses rather than definitive functional conclusions. Nevertheless, these network-level organizational principles provided value to the community as a foundational reference, similar to other regulatory network maps[8] that were useful even before comprehensive validation.”

      - Identification of "master" regulators: Line 527 on virulence regulators: "We first generated gene lists associated with nine pathways" - is this not somewhat circular, i.e. using gene lists generated from (I assume) co-regulated gene sets to identify regulators of those gene lists? I can't tell from the cited reference (80), which is their own prior review article, what the original source of these gene lists was. Somewhat related to this point - Line 32: 24 "master regulators" - if there are that many, is it still considered a master regulator? Line 270: This term "master regulator" would seem to require some quantitative justification. Identifying 24 (a large number of) "master" regulators of virulence would seem to dilute the implied power of the term.

      We apologize for the lack of clarity regarding the virulence pathway gene lists, and we have provided complete gene lists for virulence-related pathways, which were compiled from functional annotations, in our online PA_TFNet database.

      Additionally, we appreciate your concern about the use of “master” regulator. The usage is based on previous studies[9,10], and the master regulator is commonly known in the development of multicellular organisms as a subset of TFs that control the expression of multiple downstream genes and govern lineage commitment or key biological processes. We employed the term "master regulator" in an analogous manner to specify a class of functionally crucial TFs that participate in a pathway or biological event by regulating multiple downstream genes statistically enriched in that pathway. In line with this definition, we identified TFs whose targets were significantly enriched in genes associated with specific virulence pathways (hypergeometric test, P < 0.05).

      We understand the concern that identifying 24 master regulators might seem to dilute the term. However, we would like to clarify that each of these 24 TFs is a "master regulator" with respect to specific virulence pathways based on statistical criteria, not necessarily a global master regulator of multiple pathways of P. aeruginosa. We have revised the Method section. Please see Lines 604-612.

      - Line 234: "Genome-wide synergistic co-association of TFs in P. aeruginosa." This section was an interesting analysis. As I mention above, the weakness of an overexpression approach is not knowing whether the TF is active on the examined conditions. By looking at shared binding peaks across overexpression of different TFs, it should indeed be possible to glean some regulatory connections across TFs. Furthermore, the authors discuss specific examples that appear physiologically reasonable, which is appreciated.

      We thank the reviewer for this positive assessment of our co-association analysis. We agree with the limitation of the overexpression approach, which have been discussed in the Discussion section. We are pleased that the reviewer found the approach and specific examples valuable.

      Minor comments

      - Line 35 - "high-throughput systematic evolution of ligands by exponential enrichment" - no idea what this means. Is this related to the web-based database, or why is it mentioned in the same sentence?

      We apologize for the unclear presentation. To clarify: “High-throughput systematic evolution of ligands by exponential enrichment” (HT-SELEX) is an in vitro technique for determining TF DNA-binding motifs, which our group previously applied to a subset of P. aeruginosa TFs in a prior publication[11]. In the current study, we performed ChIP-seq for 172 TFs, which represent the majority of TFs not covered by the previous HT-SELEX study. Together, these two complementary approaches (HT-SELEX for in vitro binding motifs, ChIP-seq for in vivo genomic binding sites) provide near-complete coverage of the P. aeruginosa TF repertoire. Both datasets are integrated into our PA_TFNet database.

      Due to space constraints in the abstract, we could not provide detailed explanation of HT-SELEX, but we have now improved the clarity in the Introduction to better explain the relationship between our previous HT-SELEX work and the current ChIP-seq study, and why both are mentioned together in the context of the database. Please see Lines 99-105.

      - Line 193 - Only 9 auto-regulating TFs seems like a low number, given the frequency of negative auto-regulation in other organisms like E. coli. Could the authors comment on their expectations based on well-curated TRNs?

      Thank you for this comment. We agree that 9 auto-regulating TFs is lower than might be expected based on E. coli, where auto-regulation is more prevalent. This likely reflects technical limitations of ChIP-seq approach that our detection was limited to standard growth conditions rather than the diverse physiological states where auto-regulation often occurs. Therefore, the 9 TFs we report represent a high-confidence subset, and the true frequency of auto-regulation in P. aeruginosa likely is higher. We added the content in the revised manuscript. Please see Lines 193-196.

      “This number likely represents a conservative estimate, as experiments may not optimally capture auto-regulatory events that depend on native expression levels or specific physiological conditions.”

      - Line 230 - "This conservation suggests that TFs within the same cluster co-regulate similar sets of genes." - Why would clustering of TF binding site motifs need to be done to make this assessment? Couldn't the shared set of regulated genes be identified directly from the binding site data? Computing TF binding site motifs has obvious value, but I am struggling to understand the point of clustering the motifs. Is there some implied evolutionary or physiological connection here? No specific physiological roles or hypotheses are discussed in this section.

      Thank you for this important question. We agree that shared target genes can be identified directly from ChIP-seq binding data, which we also analyzed (co-association analysis). The motif clustering analysis serves a complementary and distinct purpose that provides information not directly obtainable from overlapped targets alone. Specifically, target overlap is inherently condition dependent, and motif clustering captures this intrinsic binding specificity, which reflects the structural similarity of DBDs, evolutionary relationships, and potential for functional redundancy or cooperativity under specific conditions. We have revised the related content in the manuscript, and please see Lines 236-242.

      “Clustering of TF binding motifs identified groups of TFs with similar intrinsic DNA-binding specificities. As expected, many clusters contained TFs from the same DBD families, reflecting evolutionary conservation and potential functional redundancy or competitive binding at shared regulatory elements. Notably, the clustering also uncovered associations between TFs from different DBD families, suggesting convergent evolution of binding specificity or novel regulatory interactions that warrant further investigation.”

      - Line 284 - should "metabolomic" be "metabolic"? I didn't see metabolomic data

      Yes, we have revised. Please see Line 311.

      - Several of the figures are too small (e.g. Fig S4A) or complex (Fig 2A) to see clearly or glean information from.

      Thank you for this comment. We acknowledge that Figure 2A and Figure S4A contain dense information due to the comprehensive nature of the regulatory network and the large number of TFs analyzed. We believe these overview figures serve an important purpose in conveying the scale and organization of the regulatory network, while the tables (Table S6 for Fig. S4A and Table S3 for Fig. 2A) provide the granular data needed for specific inquiries. We have also made the figures available in higher resolution and increased font sizes where possible without compromising the overall layout.

      - I don't understand the organization of the "Ternary regulatory motifs" in Supplementary Data File 4 - A table of contents explaining the tabs and columns would be welcome (for this as well as other supplementary files, some of which are more straightforward than others).

      Thank you for this suggestion. We have now revised all supplementary data files to include header and necessary annotations in the first row. Specifically for Supplementary Data File 4, the three columns (Top, Middle, Bottom) represent the left, middle, and right node, respectively, in each ternary regulatory motif.

      - I would have expected genomic locations of TF binding sites would have been one of the Supplementary Tables, to increase the accessibility of the data. However, the data is made available through their website, https://jiadhuang0417.shinyapps.io/PATF_Net/, which was easy to access and download the full dataset, so this is a minor issue.

      Thank for accessing our PA_TFNet database and for the positive feedback on data accessibility. We agree that providing genomic locations of TF binding sites is crucial. These data are fully available and downloadable through the web interface, which allows flexible searching, filtering, and batch download of binding sites. We felt that the interactive and database format provides more functionality than static supplementary tables (e.g., dynamic filtering by TF, genomic region, or binding strength), given the large scale of this dataset.

      References

      (1) Hua, C., Huang, J., Wang, T., Sun, Y., Liu, J., Huang, L. et al. Bacterial Transcription Factors Bind to Coding Regions and Regulate Internal Cryptic Promoters. Mbio 13, e0164322 (2022).

      (2) Chemani, C., Imberty, A., de Bentzmann, S., Pierre, M., Wimmerová, M., Guery, B. P. et al. Role of LecA and LecB lectins in Pseudomonas aeruginosa-induced lung injury and effect of carbohydrate ligands. Infect Immun 77, 2065-2075 (2009).

      (3) Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A. et al. RegPrecise 3.0–a resource for genome-scale exploration of transcriptional regulation in bacteria. Bmc Genomics 14, 745 (2013).

      (4) Cui, G. Y., Zhang, Y. X., Xu, X. J., Liu, Y. Y., Li, Z., Wu, M. et al. PmiR senses 2-methylisocitrate levels to regulate bacterial virulence in Pseudomonas aeruginosa. Sci Adv 8 (2022).

      (5) Hwang, W., Yong, J. H., Min, K. B., Lee, K.-M., Pascoe, B., Sheppard, S. K. et al. Genome-wide association study of signature genetic alterations among pseudomonas aeruginosa cystic fibrosis isolates. Plos Pathog 17, e1009681 (2021).

      (6) Gutu, A. D., Sgambati, N., Strasbourger, P., Brannon, M. K., Jacobs, M. A., Haugen, E. et al. Polymyxin resistance of Pseudomonas aeruginosa phoQ mutants is dependent on additional two-component regulatory systems. Antimicrob Agents Chemother 57, 2204-2215 (2013).

      (7) Zhang, L., Fritsch, M., Hammond, L., Landreville, R., Slatculescu, C., Colavita, A. et al. Identification of genes involved in Pseudomonas aeruginosa biofilm-specific resistance to antibiotics. PLoS One 8, e61625 (2013).

      (8) Galan-Vasquez, E., Luna, B. & Martinez-Antonio, A. The Regulatory Network of Pseudomonas aeruginosa. Microb Inform Exp 1, 3 (2011).

      (9) Fan, L. G., Wang, T. T., Hua, C. F., Sun, W. J., Li, X. Y., Grunwald, L. et al. A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11 (2020).

      (10) Chan, S. S.-K. & Kyba, M. What is a master regulator? Journal of stem cell research & therapy 3, 114 (2013).

      (11) Wang, T. T., Sun, W. J., Fan, L. G., Hua, C. F., Wu, N., Fan, S. R. et al. An atlas of the binding specificities of transcription factors in Pseudomonas aeruginosa directs prediction of novel regulators in virulence. Elife 10 (2021).

    1. eLife Assessment

      This useful study characterizes the evolution of medial prefrontal cortex activity during the learning of an odor-based choice task. The evidence provided is solid, providing quantification of functional classes of cells over the course of learning using the longitudinal calcium recordings in prefrontal cortex, and quantification of prefrontal sequences. However, the experimental design appears to provide limited evidence to support strong conclusions regarding the functional relevance of neural sequences. The study will be of interest to neuroscientists investigating learning and decision-making processes.

    2. Reviewer #1 (Public review):

      This study presents a useful finding about development of task representations in mouse medial prefrontal cortex using 1-photon calcium recordings in an olfactory-guided spatial memory task. A key strength of the study is the use of longitudinal recordings allowing identification of task-related activity that emerges after learning. The study also reports existence of neuronal sequences during learning and their replay at reward locations. The evidence provided is solid, providing quantification of functional classes of cells over the course of learning using the longitudinal calcium recordings in prefrontal cortex, and quantification of prefrontal sequences.

      (1) The authors continue to state that task phase selective cells (non-splitter) cells can be considered as "cross-condition generalization" and interpret them as "potential building blocks of schemas". However, cross-condition generalization requires demonstration of cross-condition generalization performance (CCGP) of neural decoders across task conditions, which is not shown here.

      (2) The authors note that correlations on short time scales are not similar between sampling and reward phase, acknowledging that these two represent different behavioral states in a cued-memory task, and that the manuscript should more clearly distinguish replay with "pure sequences". However, while the last line in the abstract states that "sub-second neural sequences in the mPFC are more likely involved in behavioral outcomes rather than planning future actions", references are made throughout the manuscript to preplay/replay sequences, including results primarily for non-cued spatial memory tasks, in which there is no cued sampling phase. For example, lines 259-263 state "During odor sampling phase, no such significant replay was observed..." and "... sequence clusters showed small but significant bias to preplay in the sampling phase". If the authors want to distinguish between replay and "pure" sequences, then the terminology "replay" and "preplay" should not be used here.

      Further, large parts of the Discussion are devoted to comparison to hippocampal ripple-associated replay. Lines 355-356 in Discussion state that "the suggestion that mPFC sequences may also support planning [Tang et al., 2021] could not be confirmed by our work as sequences in the odor sampling phase were absent". It should be clarified that this is a comparison between what the authors term "pure sequences" in the sampling phase of an odor-cued task, and internally generated sequences during hippocampal ripples in a non-cued spatial memory task, so this is not a like-for-like comparison.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      There are a few remaining issues:

      (1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

      We agree with the reviewer that task phase selective cells (“non-splitter cells”) alone do not fulfill the “relationality” criterion of schemas. We found only few of them, and so we cannot really say something about how they covary. We, however, would like to stress that our finding that task phase selective cells have stable firing field comparing learned (task) and habituation (no-task) conditions can be considered as “cross-condition generalization.” We have further specified our discussion of schemas with a particular emphasis on a potential interpretation of the generalizing task phase cells as “potential building blocks of schemas.”

      (2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

      We agree with the reviewer that sampling and reward phase represent two very different behavioral states. Nevertheless, correlations on short time scales could be similar, which we show is not the case and therefore we do not consider this result trivial. Regarding the interpretation of sequences, we apologize that we have not been sufficiently clear on distinguishing replay with pure sequences. While we find such sequences in the sampling phase (indicative of fast temporal correlation structure beyond cofiring quantified in Figure 3) they are NOT pre/replaying any task related information. Otherwise, our results are fully in line with previous literature on oscillations that we have included in the previous round of revisions. We added a similar explanation at multiple instances in the Results and Discussion section.

      Reviewer #2 (Public review):

      Comments on revisions:

      Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

      We have added control quantifications from time randomized sequences which produce a much lower amount of detected sequences. See response below.

      Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

      We have added a paragraph at the beginning of the Discussion on how our results align with the Muysers et al. 2025 paper.

      Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters.

      In response to this question, we have added the requested shuffling control (Supplement 1B to Figure 4). In the shuffled data the amount of detected recurring sequence clusters is only about half of those in the original data. The amount of bursts assigned to clusters in the shuffled data only remains 46% of the originally assigned bursts on average, clearly indicating that the detected sequences in the non-randomized data cannot be explained without assuming stable temporal order.

      Some clusters, however, are still detected in randomized data, which, however, is expected if participation of cells is heterogeneous with some highly active cells occurring in more than half of the bursts. Then random sequences spuriously occur above chance level representing the clusters of random order of few highly active cells. In line with this interpretation, we see that

      (1) Bursts that were removed after shuffling have exactly 0 high-firing cells

      (2) Clusters derived from shuffled sequence have a less sparse contribution of high firing cells, i.e., high firing cells contribute to significantly more clusters in randomized data than in nonrandomized data.

      The difference in the distribution of high firing cells further indicates that sequences obtained with and without randomization are of different quality.

      The spurious (false positive) clusters detected after randomization nevertheless may have a physiological meaning as they identify rate coactivation patterns that were also picked up by analysis in Figure 3.

      Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well.

      The Methods sections states “The [cluster merging] procedure was repeated until no pair [of clusters] satisfied the merging criterion.”

      Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account."

      We do not see what is wrong with this sentence. To avoid confusion.” we have replaced lower case l with upper case L as sequence length.

      Reviewer #3 (Public review):

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      We have added further specifications following the Reviewer’s suggestion and now mention that the threshold is permissive and “capturing large amount cofiring structure.”

    1. eLife Assessment

      The authors make an important contribution to comparative functional genomics by developing a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types. Using a single-cell RNA-seq dataset of induced pluripotent stem cells and derived embryonic bodies from four primate species: humans, orangutans, cynomolgus macaques, and rhesus macaques, the authors provide convincing evidence that cell type-specific marker genes are substantially less transferable across species than broadly expressed genes, with transferability declining as phylogenetic distance increases. This study establishes a key framework and reference dataset for comparative single-cell analyses and encourages more rigorous evaluation of marker gene transferability across species.

    2. Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise would be unobservable.

      Undirected differentation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and human and then employed to annotate other species. Jocher, Janssen et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa etc.

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them. Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses? Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types, and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. But some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Comments on revisions:

      I think the authors have addressed my previous comments to my satisfaction, and I thank them for the changes they have made, it's good to see that the manuscript is just as sound as it seemed the first time around.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. eLife Assessment

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. The evidence supporting the main claims is convincing, with multiple replications, validation of their techniques, and appropriate controls. The work will be of broad interest to neuroscientists interested in central mechanisms of motor learning and control, as well as thalamic physiology.

    2. Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate activity of cerebellar nuclei neurons projecting to two thalamic subregions that target motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during task vs after task), the authors report valuable finding of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after task impairs consolidation of the learned skill is interesting.

      Revision comment:

      The revised manuscript is improved in clarity and methodological detail. An important addition is the retrograde labeling data showing a degree of anatomical segregation between CN->CL and CN->VAL pathways that strengthens their reported different functional roles. I still think that potential effects on motor execution when cerebellar nuclei are silenced during task performance may complicate interpretations specifically related to learning. However, the evidence supporting a role of the cerebellar nuclei in off-line consolidation is convincing.

      Overall, the study outlines a multifaceted role of the cerebellum in motor learning, consolidation, and execution. The demonstration that cerebellar projections to distinct forebrain structures contribute to these processes is significant.

    3. Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that: 1) cerebellothalamic connections are important for learning motor skills, 2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning, 3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and 4) that once a skill is acquired, cerebellothalamic connections become important for online task performance. The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between on-line learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      (4) The retrograde tracing experiments (Supplementary Figure 5) demonstrate convincingly that the CN-VAL and CN-CL projections are almost entirely segregated,

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is (as they acknowledge in the Discussion) impaired rotarod performance at fixed higher speeds in Supplementary Figure 4f for CN-VAL projections, suggesting that there could be subtle changes in motor performance below the level of detection of their assays. There is also a trend in the same direction that did not pass significance for CN-CL at higher speeds, suggesting that part of the effects could be related to subtle deficits in performance.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.