10,000 Matching Annotations
  1. May 2025
    1. Reviewer #1 (Public review):

      Summary:

      The paper by Tolossa et al. presents classification studies that aim to predict the anatomical location of a neuron from the statistics of its in-vivo firing pattern. They study two types of statistics (ISI distribution, PSTH) and try to predict the location at different resolutions (region, subregion, cortical layer).

      Strengths:

      This paper provides a systematic quantification of the single-neuron firing vs location relationship.

      The quality of the classification setup seems high.

      The paper uncovers that, at the single neuron level, the firing pattern of a neuron carries some information on the neuron's anatomical location, although the predictive accuracy is not high enough to rely on this relationship in most cases.

      Weaknesses:

      As the authors mention in the Discussion, it is not clear whether the observed differences in firing is epiphenomenal. If the anatomical location information is useful to the neuron, to what extent can this be inferred from the vicinity of the synaptic site, based on the neurotransmitter and neuromodulator identities? Why would the neuron need to dynamically update its prediction of the anatomical location of its pre-synaptic partner based on activity when that location is static, and if that information is genetically encoded in synaptic proteins, etc (e.g., the type of the synaptic site)? Note that the neuron does not need to classify all possible locations to guess the location of its pre-synaptic partner because it may only receive input from a subset of locations. Ultimately, the inability to dissect whether the paper's findings point to a mechanism utilized by neurons or merely represent an epiphenomenon is the main weakness of the curious, though somewhat weak, observations described in this paper.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Tolossa et al. analyze Inter-spike intervals from various freely available datasets from the Allen Institute and from a dataset from Steinmetz et al.. They show that they can modestly decode between gross brain regions (Visual vs. Hippocampus vs. Thalamus), and modestly separate sub areas within brain regions (DG vs. CA1 or various visual brain areas). The core result is that a multi-layer perceptron trained on the ISI distributions can modestly classify different brain areas and perhaps in a reasonably compelling way generalize across animals. The result is interesting but the exact problem formulation still feels a tad murky to me because I am worried the null is a strawman and I'm unsure if anyone has ever argued for this null hypothesis ("the impact of anatomy on a neuron's activity is either nonexistent or unremarkable"). Given the patterns of inputs to different brain areas and the existence of different developmental origin and different cell types within these areas, I am unclear why this would be a good null hypothesis. Nevertheless, the machine learning is reasonable, and the authors demonstrate that a nonlinear population based classifier can pull out reasonable information about the brain area and layer.

      Strengths:

      The paper is reasonably well written, and the definitions are quite well done. For example, the authors clearly explained transductive vs. inductive inference in their decoders. E.g., transductive learning allows the decoder to learn features from each animal, whereas inductive inference focuses on withheld animals and prioritizes the learning of generalizable features. The authors walk the reader through various analyses starting as simply as PCA, then finally showing a MLP trained on ISI distributions and PSTHs performs modestly well in decoding brain area. The key is ISI distributions work well in inductive settings for generalizing from one mouse to the other.

      Weaknesses:

      As articulated in my overall summary, I still found the null hypothesis a tad underwhelming. I am not sure this is really a valid null hypothesis ("the impact of anatomy on a neuron's activity is either nonexistent or unremarkable"), although in the statistical sense it is fine. The authors took on board some of the advice from the first review and clarified the paper but there are portions that are unnecessarily verbose (e.g., "Beyond fundamental scientific insight, our findings may be of benefit in various practical applications, such as the continued development of brain-machine interfaces and neuroprosthetics"). Also, given that ISIs cannot separate between visual areas, why is the statement that these are conserved. I still find it somewhat underwhelming that the thalamus, hippocampus , and visual cortex have different ISI distributions. Multiple researchers have reported similar things in cortex perhaps without the focus on decoding area from these ISI distributions.

      All in all, it is an interesting paper with the notion that ISI distributions can modestly predict brain area and layer. It could have some potential for a tool for neuropixels, although this needs to be developed further for this use case.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewers' thoughtful comments and suggestions. Below, we provide point-by-point responses to the recommendations and outline the updates made to the manuscript.

      (1) Discussion, "the obvious experiment is to manipulate a neuron's anatomical embedding while leaving stimulus information intact."] The epiphenomenon can arise from the placement and types of a neuron's neurotransmitters and neuromodulators, too.

      The content of vesicles released by a neuron is obviously of great importance in determining postsynaptic impact. However, we’re suggesting that (assuming vesicular content is held constant) the anatomically-relevant patterning of spiking might additionally affect the postsynaptic neuron’s integration of the presynaptic input. To avoid confusion, we updated the text accordingly: “the obvious experiment is to manipulate a neuron's anatomical embedding while minimally impacting external and internal variables, such as stimulus information and levels of neurotransmitters or neuromodulators” (Line 594 - 596).

      (2) “In all conditions, the slope of the input duration versus sensitivity line was still positive at 1,800 seconds (Fig. 3B)". This may suggest that the estimate of the calculated statistics (ISI, PSTH) is more reliable with more data, rather than (or in addition to) specific information being extracted from faraway time points. Another potential confound is the training statistics were calculated from all training data, so the test data is a better match to training data when test statistics are calculated from more data. Overall, the validity of the conclusions following this observation is not clear to me.

      This is a great point. Accordingly, we revised the text to include this possibility: “Because the training data were of similar duration, this could be explained by either of two possibilities. First, the signal is relatively short, but noisy—in this case, extended sampling will increase reliability. Second, the anatomical signal is, itself, distributed over time scales of tens to hundreds of seconds.” (Line 252 - 255).

      (3) "This further suggests that there is a latent neural code for anatomical location embedded within the spike train, a feature that could be practically applied to determining the brain region of a recording electrode without the need for post-hoc histology". The performance of the model at the subregion level, which is a typical level of desired precision in locating cells, does not seem to support such a practical application. Please clarify to avoid confusion.

      The current model should not be considered a replacement for traditional methods, such as histology. Our intention is to convey that, with the inclusion of multimodal data and additional samples, a computational approach to anatomical localization has great promise. We updated the manuscript to clarify this point: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Additionally, we directly addressed this point in our original manuscript (Discussion section: Line 498 - 505 in the current version). Furthermore, following the release of our preprint, independent efforts have adopted a multimodal strategy with qualitatively similar results (Yu et al., 2024). Other recent work expands on the idea of utilizing single-neuron features for brain region/structure characterization (La Merre et al., 2024).

      Yu, H., Lyu, H., Xu, E. Y., Windolf, C., Lee, E. K., Yang, F., ... & Hurwitz, C. (2024). In vivo cell-type and brain region classification via multimodal contrastive learning. bioRxiv, 2024-11.

      Le Merre, P., Heining, K., Slashcheva, M., Jung, F., Moysiadou, E., Guyon, N., ... & Carlén, M. (2024). A Prefrontal Cortex Map based on Single Neuron Activity. bioRxiv, 2024-11.

      (4) "These results support the notion the meaningful computational division in murine visuocortical regions is at the level of VISp versus secondary areas.". The use of the word "meaningful" is vague and this conclusion is not well justified because it is possible that subregions serve different functional roles without having different spiking statistics.

      Precisely! It is well established that different subregions serve different functional purposes - but they do not necessitate different regional embeddings. It is important to note the difference between stimulus encoding and the embedding that we are describing. As a rough analogy, the regional embedding might be considered a language, while the stimulus is the content of the spoken words. However, to avoid vague words, we revised the sentence to “These results suggest that the computational differentiability of murine visuocortical regions is at the level of VISp versus secondary areas.” (Line 380 - 381)

      (5) Figure 3D left/right halves look similar. A measure of the effect size needs to accompany these p-values.

      We assume the reviewer is referring to Figure 3E. Although some of the violin plots in Figure 3E look similar, they are not identical. In the revision, we include effect sizes in the caption.

      (6) Figure 3A, 3F: Could uncertainty estimates be provided?

      Yes. We added uncertainty estimates to the text (Line 272 - 294) and to the caption of Figure S2, which displays confusion matrices corresponding to Figure 3A. The inclusion of similar estimates for 3F would be so unwieldy as to be a disservice to the reader—there are 240 unique combinations of stimulus parameters and structures. In the context of the larger figure, 3F serves to illustrate a relationship between stimulus, region, and the anatomical embedding.

      (7) Page 21. "semi-orthogonal". Please reword or explain if this usage is technical.

      We replaced “semi-orthogonal” with “dissociable” (Line 549).

      (8) Page 11, "This approach tested whether..."] Unclear sentence. Please reword.

      We changed “This approach tested whether the MLP’s performance depended on viewing the entire ISI distribution or was enriched in a subset of patterns” to “This approach identified regions of the ISI distribution informative for classification” (Line 261).

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments and summary of the results. We agree that the introductory results (Figs. 1-3) are not particularly compelling when considered in isolation. They provide a baseline of comparison for the subsequent results. Our intention was to approach the problem systematically, progressing from well-established, basic methods to more advanced approaches. This allows us to clearly test a baseline and avoid analytical leaps or untested assumptions. Specifically:

      ● Figure 1 provides an evaluation of the standard dimensionality reduction methods. As expected, these methods yield minimal results, serving as a clear baseline. This is consistent, for example, with an understanding of single units as rate-varying Poisson processes.

      ● Figures 2 and 3 then build upon these results with spiking features frequent in neuroscience literature such as firing rate, coefficient of variation, etc using linear supervised and more detailed spiking features such as ISI distribution using nonlinear supervised machine learning methods.

      By starting from the standpoint of the status quo, we are better able to contextualize the significance of our later findings in Figures 4–6.

      Response to Specific Points in the Summary

      (6) Separability of VISp vs. Secondary Visual Areas

      I found the entire argument about visual areas somewhat messy and unclear. The stimuli used might not drive the secondary visual areas particularly well and might necessitate task engagement.

      We appreciate your feedback that the dissection of visual cortical structures is unclear. To summarize, as shown in the bottom three rows of Figure 6, there is a notable lack of diagonality in visuocortical structures. This means that our model was unable to learn signatures to reliably predict these classes. In contrast, visuocortical layer is returned well above chance, and superstructures (primary and secondary areas) are moderately well identified, albeit still well above chance.

      Consider a thought experiment, if Charlie Gross had not shown faces to monkeys to find IT, or Newsome and others shown motion to find MT and Zeki and others color stimuli to find V4, we would conclude that there are no differences.

      The thought experiment is misleading. The results specifically do not arise from stimulus selectivity—much of Newsome’s own work suggests that the selectivity of neurons in IT etc. is explained by little more than rate varying Poisson processes. In this case, there should be no fundamental anatomical difference in the “language” of the neurons in V4 and IT, only a difference in the inputs driving those neurons. In contrast, our work suggests that the “language” of neurons varies as a function of some anatomical divisions. In other words, in contrast to a Poisson rate code, our results predict that single neuron spike patterns might be remarkably different in MT and IT— and that this is not a function of stimulus selectivity. Notably, the anatomical (and functional) division between V1 and secondary visual areas does not appear to manifest in a different “language”, thus constituting an interesting result in and of itself.

      We regret a failure to communicate this in a tight and compelling fashion on the first submission, but hope that the revision is limpid and accessible.

      Barberini, C. L., Horwitz, G. D., & Newsome, W. T. (2001). A comparison of spiking statistics in motion sensing neurones of flies and monkeys. Motion Vision: Computational, Neural, and Ecological Constraints, 307-320.

      Bair, W., Zohary, E., & Newsome, W. T. (2001). Correlated firing in macaque visual area MT: time scales and relationship to behavior. Journal of Neuroscience, 21(5), 1676-1697.

      Similarly, why would drifting gratings be a good example of a stimulus for the hippocampus, an area thought to be involved in memory/place fields?

      The results suggest that anatomical “language” is not tied to stimuli. It is imperative to recall that neurons are highly active absent experimentally imposed stimuli, such as when an animal is at rest, when an animal is asleep, and when an animal is in the dark (relevant to visual cortices). With this in mind, also recall that, despite the lack of stimuli tailored to the hippocampus, neurons therein were still reliably separable from neurons in seven nuclei in the thalamus, 6 of which are not classically considered visual regions. Should these regions (including hippocampus) have been inert during the presentation of visual stimuli, there would have been very little separability.

      (7) Generalization across laboratories

      “[C]omparison across laboratories was somewhat underwhelming. It does okay but none of the results are particularly compelling in terms of performance.

      Any result above chance is a rejection of the null hypothesis: that a model trained on a set of animals in Laboratory A will be ineffective in identifying brain regions when tested on recordings collected in Laboratory B (in different animals and under different experimental conditions). As an existence proof, the results suggest conserved principles (however modest) that constrain neuronal activity as a function of anatomy. That models fail to achieve high accuracy (in this context) is not surprising (given the limitations of available recordings)---that models achieve anything above chance, however, is.

      Thus, after reading the paper many times, I think part of the problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding.

      We demonstrate that neuronal spike trains carry robust anatomical information. We developed an ML architecture for this and that architecture is publicly available.

      They try to split the middle and I am left somewhat perplexed about what exact scientific problem they or other researchers are solving.

      We humbly suggest that the question of a neurons “language” is highly important and central to an understanding of how brains work. From a computational perspective, there is no reason for a vast diversity of cell types, nor a differentiation of the rules that dictate neuronal activity in one region versus another. A Turing Complete system can be trivially constructed from a small number of simple components, such as an excitatory and inhibitory cell type. This is the basis of many machine learning tools.

      Please do not confuse stimulus specificity with the concept of a neuron’s language. Neurons in VISp might fire more in response to light, while those in auditory cortex respond to sound. This does not mean that these neurons are different - only that their inputs are. Given the lack of a literature describing our main effect—that single neuron spiking carries information about anatomical location—it is difficult to conclude that our results are either commonplace or to be expected.

      I am also unsure why the authors think some of these results are particularly important.

      See above.

      For instance, has anyone ever argued that brain areas do not have different spike patterns?

      Yes. In effect, by two avenues. The first is a lack of any argument otherwise (please do not conflate spike patterns with stimulus tuning), and the second is the preponderance of, e.g., rate codes across many functionally distinct regions and circuits.

      Is that not the premise for all systems neuroscience?

      No. The premise for all systems neuroscience (from our perspective) is that the brain is a) a collection of interacting neurons and b) the collective system of neurons gives rise to behavior, cognition, sensation, and perception. As stated above, these axiomatic first principles fundamentally do not require that neurons, as individual entities, obey different rules in different parts of the brain.

      I could see how one could argue no one has said ISIs matter but the premise that the areas are different is a fundamental part of neuroscience.

      Based on logic and the literature, we fundamentally disagree. Consider: while systems neuroscience operates on the principle that brain regions have specialized functions, there is no a priori reason to assume that these functions must be reflected in different underlying computational rules. The simplest explanation is that a single language of spiking exists across regions, with functional differences arising from processing distinct inputs rather than fundamentally different spiking rules. For example, an identical spike train in the amygdala and Layer 5 of M1 would have profoundly different functional impacts, yet the spike timing itself could be identical (even as stimulus response). Until now, evidence for region-specific spiking patterns has been lacking, and our work attempts to begin addressing this gap. There is extensive further work to be conducted in this space, and it is certain that models will improve, rules will be clarified, and mechanisms will be identified.

      Detailed major comments

      (1) Exploratory trends in spiking by region and structure across the population:

      The argument in this section is that unsupervised analyses might reveal subtle trends in the organization of spiking patterns by area. The authors show 4 plots from t-SNE and claim to see subtle organization. I have concerns. For Figure 1C, it is nearly impossible to see if a significant structure exists that differentiates regions and structures. So this leads certain readers to conclude that the authors are looking at the artifactual structure (see Chari et al. 2024) - likely to contribute to large Twitter battles. Contributing to this issue is that the hyperparameter for tSNE was incorrectly chosen. I do think that a different perplexity should be used for the visualization in order to better show the underlying structure; the current visualization just looks like a single "blob". The UMAP visualizations in the supplement make this point more clearly. I also think the authors should include a better plot with appropriate perplexity or not include this at all. The color map of subtle shades of green and yellow is hard to see as well in both Figure S1 and Figure 1.

      In response to the feedback, we replaced t-SNE/UMAP with LDA, while keeping PCA for dimensionality reduction.

      As stated in the original methods, t-SNE/UMAP hyperparameters were chosen based on the combination that led to the greatest classifiable separability of the regions/structures in the space (across a broad range of possible combinations). It just so happens that the maximally separable structure from a regions/structures perspective is the “blob”. This suggests that perhaps the predominant structure the t-SNE finds in the data is not driven by anatomy. If we selected hyperparameters in some other way that was not based specifically on regions/structures (e.g. simple visual inspection of the plots) the conformation would of course be different and not blob-like. However, we removed the t-SNE and UMAP to avoid further confusion.

      The “muddy appearance” is not an issue with the color map. As seen in Figure 1B, the chosen colors are visibly distinct. Figure 1C (previous version) appeared muddy yellow/green because of points that overlap with transparency, resulting in a mix of clearly defined classes (e.g., a yellow point on top of a blue point creating green). This overlap is a meaningful representation of the separability observed in this analysis. We also tried using 2D KDE for visualization, but it did not improve the impression of visual separability.

      We are removing p-values from the figures because they lead to the impression that we over-interpret these results quantitatively. However, we calculated p-values based on label permutation similar to the way R2 suggests (see previous methods). The conflation with the Wasserstein distances is an understandable misunderstanding. These are unrelated to p-values and used for the heatmaps in S1 only (see previous methods).

      Instead of p-values, we now use the adjusted rand index, which measures how accurately neurons within the same region are clustered together (see Line 670 - 671, Figure 1C, and Figure S1) (Hubert & Arabie 1985). This quantifies the extent to which the distribution of points in dimensionally-reduced space is shaped by region/structure.

      Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

      (2) Logistic classifiers:

      The results in this section are somewhat underwhelming. Accuracy is around 40% and yes above chance but I would be very surprised if someone is worried about separating visual structures from the thalamus. Such coarse brain targeting is not difficult. If the authors want to include this data, I recommend they show it as a control in the ISI distribution section. The entire argument here is that perhaps one should not use derived metrics and a nonlinear classifier on more data is better, which is essentially the thrust of the next section.

      As outlined above, our work systematically increases in model complexity. The logistic result is an intermediate model, and it returns intermediate results. This is an important stepping stone between the lack of a result based on unsupervised linear dimensionality reduction and the performance of supervised nonlinear models.

      From a purely utilitarian perspective, the argument could be framed as “one should not use derived metrics, and a nonlinear classifier on more data is better.” However, please see all of our notes above.

      (3) MLP classifiers:

      Even in this section, I was left somewhat underwhelmed that a nonlinear classifier with large amounts of data outperforms a linear classifier with small amounts of data. I found the analysis of the ISIs and which timescales are driving the classifier interesting but I think the classifier with smoothing is more interesting. So with a modest chance level decodability of different brain areas in the visual system, I found it somewhat grandiose to claim a "conserved" code for anatomy in the brain. If there is conservation, it seems to be at the level of the coarse brain organization, which in my opinion is not particularly compelling.

      The sample size used for both the linear and nonlinear classifiers is the same; however, the nonlinear classifier leverages the detailed spiking time information from ISIs. Our goal here was to systematically evaluate how classical spike metrics compare to more detailed temporal features in their ability to decode brain areas. We chose a linear classifier for spike metrics because, with fewer features, nonlinear methods like neural networks often offer very modest advantages over linear methods, less interpretability, and are prone to overfitting.

      Respectfully, we stand by our word choice. The term “conserved” is appropriate given that our results hold appreciably, i.e., statistically above chance, across animals.

      (4) Generalization section:

      The authors suggest that a classifier learned from one set of data could be used for new data. I was unsure if this was a scientific point or the fact that they could use it as a tool.

      It can be both. We are more driven by the scientific implications of a rejection of the null.

      Is the scientific argument that ISIs are similar across areas even in different tasks?

      It appears so - despite heterogeneity in the tuning of single neurons, their presynaptic inputs, and stimuli, there is identifiable information about anatomical location in the spike train.

      Why would one not learn a classifier from every piece of available data: like LFP bands, ISI distributions, and average firing rates, and use that to predict the brain area as a comparison?

      Because this would obfuscate the ability to conclude that spike trains embed information about anatomy.

      Considering all features simultaneously and adding additional data modalities—such as LFP bands and spike waveforms—has potential to improve classification accuracy at the cost of understanding the contribution of each feature. The spike train as a time series is the most fundamental component of neuronal communication. As a result, this is the only feature of neuronal activity of concern for the present investigation.

      Or is the argument that the ISIs are a conserved code for anatomy? Unfortunately, even in this section, the data are underwhelming.

      We appreciate the reviewer’s comments, but arrive at a very different conclusion. We were quite surprised to find any generalizability whatsoever.

      Moreover, for use as a tool, I think the authors need to seriously consider a control that is either waveforms from different brain areas or the local field potentials. Without that, I am struggling to understand how good this tool is. The authors said "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc)., our studies involve only the timestamps of individual spikes from well-isolated units ". However, we are not talking about information transmission and actually trying to identify and assess brain areas from electrophysiological data.

      While we are not blind to the “tool” potential that is suggested by our work, this is not the primary motivation or content in any section of the paper. As stated clearly in the abstract, our motivation is to ask “whether individual neurons [...] embed information about their own anatomical location within their spike patterns”. We go on to say “This discovery provides new insights into the relationship between brain structure and function, with broad implications for neurodevelopment, multimodal integration, and the interpretation of large-scale neuronal recordings. Immediately, it has potential as a strategy for in-vivo electrode localization.” Crucially, the last point we make is a nod to application. Indeed, our results suggest that in-vivo electrode localization protocols may benefit from the incorporation of such a model.

      In light of the reviewer’s concerns, we have further dampened the weight of statements about our model as a consumer-ready tool.

      Example 1: The final sentence of the abstract now reads: “Computational approximations of anatomy have potential to support in-vivo electrode localization.”

      Example 2: The results sections now contains the following text: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Example 3: We replaced the phrase "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc) " with the phrase “because information is primarily encoded by the firing rate or the timing of spiking and not waveforms (etc)” (Line 116 - 118).

      (5) Discussion section:

      In the discussion, beginning with "It is reasonable to consider . . ." all the way to the penultimate paragraph, I found the argumentation here extremely hard to follow. Furthermore, the parts of the discussion here I did feel I understood, I heavily disagreed with. They state that "recordings are random in their local sampling" which is almost certainly untrue when it comes to electrophysiology which tends to oversample task-modulated excitatory neurons (https://elifesciences.org/articles/69068). I also disagree that "each neuron's connectivity is unique, and vertebrate brains lack 'identified neurons' characteristic of simple organisms. While brains are only eutelic and "nameable" in only the simplest organisms (C. elegans), cell types are exceedingly stereotyped in their connectivity even in mammals and such connectivity defines their computational properties. Thus I don't find the premise the authors state in the next sentence to be undermined ("it seems unlikely that a single neuron's happenstance imprinting of its unique connectivity should generalize across stimuli and animals"). Overall, I found this subsection to rely on false premises and in my opinion it should be removed.

      At the suggestion of R2, we removed the paragraph in question. However, we would like to address some points of disagreement:

      We agree that electrophysiology, along with spike-sorting, quality metrics, and filtering of low-firing neurons, leads to oversampling of task-modulated neurons. However, when we stated that recordings are random in their local sampling, we were referring to structural (anatomical) randomness, not functional randomness. In other words, the recorded neurons were not specifically targeted (see below).

      Electrode arrays, such as Neuropixels, record from hundreds of neurons within a small volume relative to the total number of neurons and the volume of a given brain region. For instance, the paper R2 referenced includes a statement supporting this: “... assuming a 50-μm ‘listening radius’ for the probes (radius of half-cylinder around the probe where the neurons’ spike amplitude is sufficiently above noise to trigger detection) …, the average yield of 116 regular-spiking units/probe (prior to QC filtering) would imply a density of 42,000 neurons/mm³, much lower than the known density of ~90,000 neurons/mm³ for excitatory cells in mouse visual cortex….”

      If we take the estimated volume of V1 to be approximately 3 mm³, this region could theoretically be subdivided into multiple cylinders with a 100-μm diameter. While stereotaxic implantation of the probe mitigates some variability, the natural anatomical variability across individual animals introduces spatially random sampling. This was the randomness we were referring to, and thus, we disagree with the assertion that our claim is “almost certainly untrue.”

      Additionally, each cortical pyramidal neuron is understood to have ~ 10,000 presynaptic partners. It is highly unlikely that these connections are entirely pre-specified, perfectly replicated within the same animal, and identical across all members of species. Further, there is enormous diversity in the activity properties of even neighboring cells of the same type. Consider pyramidal neurons in V1. Single neuron firing rates are log normally distributed, there are many of combinations of tuning properties (i.e., direction, orientation) that must occupy each point in retinotopic space, and there is powerful experience dependent change in the connectivity of these cells. We suggest that it is inconceivable that any two neurons, even within a small region of V1, have identical connectivity.

      Minor Comments:

      (1) Although the description of confusion matrices is good from a didactic perspective, some of this could be moved to methods to simplify the paper.

      We thank the reviewer for the suggestion. However, given the broad readership of eLife, we gently suggest that confusion matrices are not a trivial and universally appreciated plotting format. For the purpose of accessibility, a brief and didactic 2-sentence description will make the paper far more comprehensible to many readers at little cost to experts.

      (2) Figure 3A: It is concluded in their subsequent figure that the longer the measured amount of time, the better the decoding performance. Thus it makes sense why the average PSTHs do not show significant decoding of areas or structures

      That is a good observation. However, all features were calculated from the same duration of data, except in Figure 3B, where we tested the effect of duration. The averaged PSTH was calculated from the same length of data as the ISI distribution and binned to have the same number of feature lengths as the ISI distribution (refer to Methods section). Therefore, we interpreted this as an indication of information degradation through averaging, rather than an effect of data length (Line 234 - 237).

      (3) Figure 3D: A Gaussian is used to fit the ISI distributions here but ISI distributions do not follow a normal distribution, they follow an inverse gamma distribution.

      We agree with the reviewer and we are familiar with the literature that the ISI distribution is best fitted by a gamma family distribution (as a recent, but not earliest example: Li et al. 2018). However, we did not fit a gaussian (or any distribution) to the data, we just calculated the sample mean and variance. Reporting sample mean and variance (or standard deviation) is not something that is only done for Gaussian distributions. They are broadly used metrics that simply have additional intrinsic meaning for Gaussian distributions. We used the schematic illustration in Fig 3D because mean and variance are much more familiar in Gaussian distribution context, but ultimately that does not affect our analyses in Fig 3 E-F. Alternatively, the alpha and beta intrinsic parameters of a gamma distribution could have been used, but they are known by a much smaller portion of neuroscientists.

      Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., Fox, G. E., ... & Tsien, J. Z. (2018). Spike-timing pattern operates as gamma-distribution across cell types, regions and animal species and is essential for naturally-occurring cognitive states. Biorxiv, 145813(10.1101), 145813.

      (4) Figure 3G: Something is wrong with this figure as each vertical bar is supposed to represent a drifting grating onset but yet, they are all at 5 hz despite the PSTH being purportedly shown at many different frequencies from 1 to 15 hz.

      We appreciate your attention to detail, but we are not representing the onset of individual drifting gratings in this. We just meant to represent the overall start\end of the drifting grating session. We did not intend to signal the temporal frequency of the drifting gratings (or the spatial frequency, orientation, or contrast).

    1. eLife Assessment

      This valuable study provides strong evidence for the development of a penetration ring during Magnaporthe oryzae infection and, supported by knockout and expression studies, shows that Ppe1 is involved in the virulence of the fungus. Although the authors demonstrated the close association of Ppe1 with the host plasma membrane, the work fell short in providing direct evidence for its role at the host-pathogen interface and the precise molecular function of the penetration ring. Therefore, the study presented strong structural and phenotypic characterization but remains incomplete regarding mechanistic insights of Ppe1.

    2. Joint Public Review:

      This study presents novel insights into the formation and characterization of a penetration ring during host infection by Magnaporthe oryzae. Based on the solid genetic evidence and localization data, the authors demonstrate the structural presence of the penetration ring and the contribution of Ppe1 to fungal virulence. Nevertheless, the mechanisms through which the penetration ring influences host-pathogen interaction, including its potential function in effector translocation, remain only partially resolved. Further work using higher-resolution imaging and functional assays will help address this knowledge gap. Overall, the findings are valuable for advancing our understanding of plant-pathogen interactions, though important mechanistic questions remain open.

    1. eLife Assessment

      This valuable paper explores the idea that transient modulations of neural gain promote switches between distinct perceptual interpretations of ambiguous stimuli. The authors provide solid evidence for this idea by pupillometry (an indirect proxy of neuromodulatory activity), fMRI, neural network modeling, and dynamical systems analyses. The highly integrative nature of this approach is rare in the field.

    2. Reviewer #1 (Public review):

      Summary:

      This paper proposes a neural mechanism underlying the perception of ambiguous images: neuromodulation changes the gain of neural circuits promoting a switch between two possible percepts. Converging evidence for this is provided by indirect measurements of neuromodulatory activity and large-scale brain dynamics which are linked by a neural network model. However, both the data analysis as well as the computational modeling are incomplete and would benefit from a more rigorous approach.

      This is a revised version of the manuscript which, in my view, is a considerable step forward compared to the original submission.

      In particular, the authors now model phasic gain changes in the RNN, based on the network's uncertainty. This is original and much closer to what is suggested by the phasic pupil responses. They also show that switching is actually a network effect because switching times depend on network configuration (Fig 2). This resolves my main comments 1 and 2 about the model.

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      Comments on revisions:

      This is a second revision. I have no further comments. The authors have not answered the question that I had in the previous round (about the origin of oscillations in the RNN). I think this topic deserves to be explored in more detail but perhaps that is beyond the scope of the current paper.

    3. Reviewer #2 (Public review):

      This paper tests the hypothesis that perceptual switches during the presentation of ambiguous stimuli are accompanied by changes in neuromodulation that alter neural gain and trigger abrupt changes in brain activity. To test this hypothesis, the study combines pupillometry, artificial recurrent network (RNN) analysis and fMRI recording. In particular, the study uses methods of energy landscape analysis inspired by physics, which is particularly interesting.

      Strengths<br /> - The authors should be commended for combining different methods (pupillometry, RNNs, fMRI) to test their hypothesis. This combination provides a mechanistic insight into perceptual switches in the brain and artificial neural networks.<br /> - The study combines different viewpoints and fields of scientific literature, including neuroscience, psychology, physics, dynamical systems. In order to make this combination more accessible to the reader, the different aspects are presented in a pedagogical way to be accessible to all fields.<br /> - This combination of methods and viewpoints is rarely done, so it is very useful.<br /> - The authors introduce dynamic gain modulation in their recurrent neural network, which is novel. They devote a section of the paper to studying the dynamics, fixed points and convergence of this type of network.

      Weaknesses<br /> - The study may not be specific to perceptual switches. This is because the study relies on a paradigm in which participants report when they identify a switch in the item category. Therefore, it is unclear whether the effects reported in the paper are related to the perceptual switch itself, to attention, or to the detection of behaviourally relevant events. The authors are cautious and explicitly acknowledge this point in their study.<br /> - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative).<br /> - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.<br /> - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task don't know in advance the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain considers several possibilities for the second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process among many alternatives and the perceptual switch in the task is therefore different from the competition between only two inputs in the RNN.<br /> - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.<br /> - The authors are to be commended for addressing their research questions with multiple tools and approaches. There are links between the different parts of the study. The RNN and the pupil are linked by the notion of gain modulation, the RNN and the fMRI analysis are linked by the study of the energy landscape, the fMRI study and the pupil study are indirectly linked by previous work for this group showing that the peak in LC fMRI activity precedes a flattening of the energy landscape. These links are very interesting but could have been stronger and more complete.

      Comments on revisions:

      I thank the authors for their responses.<br /> My review presents points that the authors themselves present as weaknesses or limitations. It also includes points that cannot be addressed in a revision (e.g. causality).<br /> Regarding the fact that the RNN only considers two categories, whereas subjects consider more categories (because they don't know the final image), I have toned down my remark (removing "markedly" different, removing the fact that the hypothesis space is vast given that participants have some priors). I also removed the qualifier "mechanistically" different, because it can be understood in different ways. The point remains that the proposed model has 2 inputs, the corresponding network in the brain has >2 inputs (because it considers more categories than the RNN), which is different, and which is the point of my remark. I think it may limit the value of the model, but I don't think it is not "sensible".

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      While interesting, this intuition is not correct. The oscillations are generated by the interaction between excitatory and inhibitory nodes in the network and occur in the model even with stationary gain. All of the plots in figure 3 exploring the dynamical regime of the network at different input x gain combinations (i.e., where the oscillatory regime is characterised) are simulations run with stationary gain.

      To ensure that this intuition is more clearly presented in the manuscript, we have edited the description in the text.

      P. 12: “Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead, we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of (static) gain and  visited by the network.”

      Reviewer #2 (Public review):

      - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.

      Thank you for the suggestion, which we considered in detail. Unfortunately, the  temporal and spatial resolution of the fMRI data collected for this study precluded the same analyses we’ve run in previous work, however this is an important question for future work.

      - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.

      We agree that this is a limitation of the current study, which we previously highlighted in the methods section.

      - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.

      We appreciate the critical analysis of the experimental paradigm but disagree with the reviewers conclusions for two keys reasons: 1) Participants prior exposure to the images, such that they could create an expectation about what stimulus category a particular image would transition into (i.e., the image could not switch into any possible category); and 2) even if the reviewers’ concern was founded, models of K winner-take-all decision making are structured identically irrespective of whether the options are 2 or K options all that changes is the simulated reaction times which depend linearly on the K (for an example model see Hugh Wilson’s textbook Spikes, Decisions, and Actions, 1999, p.89-91). For these reasons, we maintain that the RNN is a sensible representation of the behavioural task.

      - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.

      While we agree that the effect is observable with both static and dynamic gain, the stronger construct validity associated with the dynamic approach, including a stronger link with the observed pupil dynamics and a rich literature associated with modelling the behavioural consequences of surprise/uncertainty led us to the conclusion that the dynamical approach was a better representation of our hypothesis.

      - Fig 1C: I don't see a "top grey bar" indicating significance.

      Thank you for catching this, the caption has been amended. The text was from an older version of the manuscript.

      - p. 10, reference to fig 3F seems incorrect: there is Fig 3F upper and Fig 3F lower, and nothing on Fig 3 and its legend mention the lesion of units

      This has been amended. We meant to refer to 2F.

      - In the response letter you mention a MATLAB tutorial, but I could not find it.

      This has been amended. Github repository can be found at https://github.com/ShineLabUSYD/AmbiguousFigures

    1. eLife Assessment

      In this important study, Baniulyte and Wade provide convincing evidence that translation of a short ORF denoted toiL positioned upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent transcription termination. Strengths of the study include combining a genetic screen to identify 23S rRNA mutations that affect topA1 expression and a creative approach to map the different locations of ribosome stalling within toiL induced by different antibiotics, with ribosome profiling and RNA structure probing by SHAPE to examine consequences of different antibiotics on toiL-mediated regulation. The work leaves unanswered how bacteria benefit by activating expression of the genes using the proposed strategy and the mechanism underlying ToiL's sensing of structurally distinct antibiotics.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

    3. Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses: Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

    4. Reviewer #3 (Public review):

      The authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      [Editors' note: The earlier public reviews are included. No additional reviews were requested.]

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

      Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses:

      Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

      Reviewer #3 (Public review):

      In this revised manuscript, the authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Showing the ribosome density profiles of topAI/yjhQP and toiL in control and tetracycline treated cells is necessary to support that ribosome arrest at toiL increases translation of topAI/yjhQP.

      Figure 7B shows ribosome density around the start of toiL. Ribosome density increases across topAI in the presence of tetracycline, but we have opted not to show this region because we cannot say whether the increase in ribosome occupancy (represented in Figure 7A) is due to an increase in translation efficiency, RNA level, or both.

      - The subinhibitory antibiotic concentrations used in the reporter assays were based on MICs reported in the literature. This is not appropriate since MICs can greatly vary between strains, antibiotic solution stocks, and experimental conditions.

      Reported MICs were used as an initial guide for selecting antibiotic concentrations to test in our reporter assays. We have added text to indicate this, and to highlight that MICs vary considerably between strains.

      - toiL sequence may have evolved to maintain base-pairing with the topAI upstream region rather than, as authors suggest in Discussion, to respond to antibiotic-mediated arrest in an amino acid sequence specific manner.

      We have chosen to frame this as speculation.

      - Authors may consider commenting on the possibility that chloramphenicol does not induce because ToiL lacks alanine residues, whose presence at specific places of a nascent protein have been shown to promote chloramphenicol action (2016 PNAS 113:12150; 2022 NSMB 29:152).

      This is a great point as none of our stalling reporters included an ORF with alanine. We now include a short paragraph in the Discussion section to raise this possibility.

      - Tetracycline was added at the "subinhibitory concentration" of 8 ug/mL for the reporter assays but at 1 ug/mL for the ribosome profiling experiments. Authors should explain what was the rational for this.

      We think the reviewer is mixing up the epidemiological cut-off value of 8 ug/mL with the concentration used in experiments (0.5-1 ug/mL for reporter assays and ribosome profiling). The text was confusing, so we have added a sentence to the Methods section to indicate that epidemiological cut-off values and MICs were only a guide for selecting antibiotic concentrations to test.

      Reviewer #2 (Recommendations for the authors):

      I wish the authors had been slightly less dismissive of the reviewers' comments. At a minimum, it would be nice if the authors could be consistent about the ribosome representation throughout the manuscript;

      We apologize if our previous responses gave the impression of being dismissive. That was certainly not our intention. We greatly value the reviewers' feedback, and we appreciate the opportunity to clarify any misunderstandings. We believe the reviewer is referring to the different shape and color of the ribosome in Figures 8 and 9, and Figure 8 figure supplement 2, which we have now corrected.

    1. eLife Assessment

      This valuable work provides solid evidence that a neuronal metallothionein, GIF/MT-3, incorporates metal-persulfide clusters. A variety of well-designed assays support the authors' hypothesis, revealing that sulfane sulfur is released from MT-3. However, the sufane sulfur content in the canonical induced MT-1 and MT-2 has not been demonstrated. Thus, the biological role of the persulfidated form is not yet clearly defined. There are caveats to the findings that limit the study, but the work will nevertheless prompt major follow-up work.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors reveal that GIF/MT-3 regulates the zinc homeostasis depending on the cellular redox status. The manuscript technically sounds, and their data concretely suggest that the recombinant MTs, not only GIF/MT-3 but also canonical MTs such as MT-1 and MT-2, contain sulfane sulfur atoms for the Zn-binding. The scenario proposed by the authors seems to be reasonable to explain the Zn homeostasis by the cellular redox balance.

      Strengths:

      The data presented in the manuscript solidly reveal that recombinant GIF/MT-3 contains sulfane sulfur.

      Weaknesses:

      It remains unclear whether native MTs, in particular induced MTs in vivo contain sulfane sulfur or not.

      Comments on revisions:

      Although the authors have revealed the sulfane sulfur content in native MT-3, my question, namely, whether canonical MT-1 and MT-2 contained sulfane sulfur after the induction has been left.<br /> The authors argue that the biological significance of sulfane sulfur in MTs lies in its ability to contribute to metal binding affinity, provide a sensing mechanism against oxidative stress, and aid in the regulation of the protein. Due to their biological roles, induced MT-1 and MT-2 could contain sulfane sulfur in their molecules. Thus, I expect the authors to evaluate or explain the sulfane sulfur content in induced MT-1 and MT-2.

    3. Reviewer #3 (Public review):

      Summary:

      The authors were trying to show that a novel neuronal metallothionein of poorly defined function, GIF/MT3, is actually heavily persulfidated in both the Zn-bound and apo (metal-free) forms of the molecule as purified from a heterologous (bacterial) or native host. Evidence in support of this conclusion is strong, with both spectroscopic and mass spectrometry evidence strongly consistent with this general conclusion. The authors would appear to have achieved their aims.

      Strengths:

      The analytical data in support of the author's primary conclusions are strong. The authors also provide some modeling evidence that supports the contention that MT3 (and other MTs) can readily accommodate a sulfane sulfur on each of the 20 cysteines in the Zn-bound structure, with little perturbation of the overall structure. This is not the case with Cys trisulfides, which suggests that the persulfide-metallated state is clearly positioned at lower energy relative to the immediately adjacent thiolate- or trisulfidated metal coordination complexes.

      Weaknesses:

      The biological significance of the findings is not entirely clear. On the one hand, the analytical data are solid (albeit using a protein derived from a bacterial over-expression experiment), and yes, it's true that sulfane S can protect Cys from overoxidation, but everything shown in the summary figure (Fig. 9D) can be done with Zn release from a thiol by ROS, and subsequent reduction by the Trx/TR system. In addition, it's long been known that Zn itself can protect Cys from oxidation. I view this as a minor shortcoming that will motivate follow-up studies.

      Impact:

      The impact will be high since the finding is potentially disruptive to the MT field for sure. The sulfane sulfur counting experiment (the HPE-IAM electrophile trapping experiment) may well be widely adopted by the field. Those in the metals field always knew that this was a possibility, and it will interesting to see the extent to which metal binding thiolates broadly incorporate sulfane sulfur into their first coordination shells.

      Comments on revisions:

      The revised manuscript is only slightly changed from the original, with the inclusion of a supplementary figure (Fig. S2) and minor changes in the text. The authors did not choose to carry out the quantitative Zn binding experiment (which I really wanted to see), but given the complexities of the experiment, I'll let it go.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Comments on revisions:

      Although the authors have revealed the sulfane sulfur content in native MT-3, my question, namely, whether canonical MT-1 and MT-2 contained sulfane sulfur after the induction has been left.

      The authors argue that the biological significance of sulfane sulfur in MTs lies in its ability to contribute to metal binding affinity, provide a sensing mechanism against oxidative stress, and aid in the regulation of the protein. Due to their biological roles, induced MT-1 and MT-2 could contain sulfane sulfur in their molecules. Thus, I expect the authors to evaluate or explain the sulfane sulfur content in induced MT-1 and MT-2.

      Thank you for your valuable comments. In this study, we were not able to examine the role of sulfane sulfur in the induced forms of MT-1 and MT-2. However, this topic is undoubtedly important and intriguing; therefore, we will continue to explore it in future studies.

      Reviewer #3 (Public Review):

      Comments on revisions:

      The revised manuscript is only slightly changed from the original, with the inclusion of a supplementary figure (Fig. S2) and minor changes in the text. The authors did not choose to carry out the quantitative Zn binding experiment (which I really wanted to see), but given the complexities of the experiment, I'll let it go.

      Fig. 9: the authors imply in the mechanistic "redox-switch" figure that Trx/TR can not reduce persulfide linkages. A number of groups have shown this to be the case. I recommend modifying the figure legend or text to make this clear to the reader.

      Thank you for your understanding. Regarding the "redox-switch" figure, although some groups have demonstrated the ability of Trx to reduce persulfide moieties, as you pointed out, we have addressed this discrepancy in the Discussion section as follows (lines 357-361): “In contrast, Trx has been proposed to reduce the persulfide moiety of PTP1B (37) and albumin (38, 39). A possible explanation for this discrepancy is that apo-GIF/MT-3-persulfide is rapidly changed into a different conformation that is topologically resistant to Trx reduction. In other words, Trx may exhibit substrate specificity.” Additionally, we have inserted the following sentence just before the above discussion to further clarify this point:“This suggests that the persulfide moiety in GIF/MT-3 appears to be relatively stable against Trx reduction.”

    1. eLife Assessment

      This valuable study demonstrates that the GSK-3 inhibitor AZD2858 inhibits the formation of TOPBP1 condensates and hence DNA damage responses in colorectal cancer cells. The evidence supporting the claims of the authors is convincing, although uncovering how this drug blocks bio-condensate formation would have strengthened the study. The work will be of interest to cancer researchers searching for synergistic drug combination strategies.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Laura Morano and colleagues have performed a screen to identify compounds that interfere with the formation of TopBP1 condensates. TopBP1 plays a crucial role in the DNA damage response, and specifically the activation of ATR. They found that the GSK-3b inhibitor AZD2858 reduced the formation of TopBP1 condensates and activation of ATR and its downstream target CHK1 in colorectal cancer cell lines treated with the clinically relevant irinotecan active metabolite SN-38. This inhibition of TopBP1 condensates by AZD2858 was independent from its effect on GSK-3b enzymatic activity. Mechanistically, they show that AZD2858 thus can interfere with intra-S-phase checkpoint signaling, resulting in enhanced cytostatic and cytotoxic effects of SN-38 (or SN-38+Fluoracil aka FOLFIRI) in vitro in colorectal carcinoma cell lines.

      Major comments from the first round of peer review:

      Overall the work is rigorous and the main conclusions are convincing. However, they only show the effects of their combination treatments on colorectal cancer cell lines. I'm worried that blocking the formation of TopB1 condensates will also be detrimental in non-transformed cells. Furthermore it is somewhat disappointing that it remains unclear how AZD2858 blocks self-assembly of TopBP1 condensates, although I understand that unraveling this would be complex and somewhat out-of-reach for now. Here are some specific points for improvement:

      1) The authors conclude that "These data supports [sic] the feasibility of targeting condensates formed in response to DNA damage to improve chemotherapy-based cancer treatments". To support this conclusion the authors need to show that proliferating non-transformed cells (e.g. primary cell cultures or organoids) can tolerate the combination of AZD2858 + SN-38 (or FOLFIRI) better than colorectal cancer cells.

      2) Page 19 "This suggests that the combination... arrests the cell cycle before mitosis in a DNA-PKsc-dependent manner." I find the remark that this arrest would be DNA-PKcs-dependent too speculative. I suppose that the authors base this claim on reference 55 but if they want to support this claim they need to prove this by adding DNA-PKcs inhibitors to their treated cells.

      3) When discussing Figure S5B the authors claim that SN-38 + AZD2858 progressively increases the fractions of BrdU positive cells, but this is not supported by statistical analysis. The fractions are still very small, so I would like to see statistics on these data. Alternatively, the authors could take out this conclusion.

      Comments on revised version:

      I have reviewed the revised manuscript and read the rebuttal. The authors have carefully addressed my concerns. There is however one point that needs further work:

      This follows up on my major point #1 in my initial review. I had I asked the authors to demonstrate that FOLFIRI + AZD are less toxic to untransformed colorectal cells than colorectal cancer cell lines.

      It is good to see that the authors took my advice and show effects of the drug treatments on the untransformed colorectal cell line CCD841. It seems to be less sensitive to AZD and FOLFIRI in the figure in the rebuttal. What surprises me is that I cannot find these new figures anywhere in the revised manuscript. Also, the data seem preliminary, because I do not see any standard errors in the graphs, and I cannot find a description of the time of drug incubation. I ask the authors to make sure that the CCD841 data are reproducible, and make sure they incorporate the data in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      In 2021 (PMID: 33503405) and 2024 (PMID: 38578830) Constantinou and colleagues published two elegant papers in which they demonstrated that the Topbp1 checkpoint adaptor protein could assemble into mesoscale phase-separated condensates that were essential to amplify activation of the PIKK, ATR, and its downstream effector kinase, Chk1, during DNA damage signalling. A key tool that made these studies possible was the use of a chimeric Topbp1 protein bearing a cryptochrome domain, Cry2, which triggered condensation of the chimeric Topbp1 protein, and thus activation of ATR and Chk1, in response to irradiation with blue light without the myriad complications associated with actually exposing cells to DNA damage.

      In this current report Morano and co-workers utilise the same optogenetic Topbp1 system to investigate a different question, namely whether Topbp1 phase-condensation can be inhibited pharmacologically to manipulate downstream ATR-Chk1 signalling. This is of interest, as the therapeutic potential of the ATR-Chk1 pathway is an area of active investigation, albeit generally using more conventional kinase inhibitor approaches.

      The starting point is a high throughput screen of 4730 existing or candidate small molecule anti-cancer drugs for compounds capable of inhibiting the condensation of the Topbp1-Cry2-mCherry reporter molecule in vivo. A surprisingly large number of putative hits (>300) were recorded, from which 131 of the most potent were selected for secondary screening using activation of Chk1 in response to DNA damage induced by SN-38, a topoisomerase inhibitor, as a surrogate marker for Topbp1 condensation. From this the 10 most potent compounds were tested for interactions with a clinically used combination of SN-38 and 5-FU (FOLFIRI) in terms of cytotoxicity in HCT116 cells. The compound that synergised most potently with FOLFIRI, the GSK3-beta inhibitor drug AZD2858, was selected for all subsequent experiments.

      AZD2858 is shown to suppress the formation of Topbp1 (endogenous) condensates in cells exposed to SN-38, and to inhibit activation of Chk1 without interfering with activation of ATM or other endpoints of damage signalling such as formation of gamma-H2AX or activation of Chk2 (generally considered to be downstream of ATM). AZD2858 therefore seems to selectively inhibit the Topbp1-ATR-Chk1 pathway without interfering with parallel branches of the DNA damage signalling system, consistent with Topbp1 condensation being the primary target. Importantly, neither siRNA depletion of GSK3-beta, or other GSK3-beta inhibitors were able to recapitulate this effect, suggesting it was a specific non-canonical effect of AZD2858 and not a consequence of GSK3-beta inhibition per se.

      To understand the basis for synergism between AZD2858 and SN-38 in terms of cell killing, the effect of AZD2858 on the replication checkpoint was assessed. This is a response, mediated via ATR-Chk1, that modulates replication origin firing and fork progression in S-phase cell under conditions of DNA damage or when replication is impeded. SN-38 treatment of HCT116 cells markedly suppresses DNA replication, however this was partially reversed by co-treatment with AZD2858, consistent with the failure to activate ATR-Chk1 conferring a defect in replication checkpoint function.

      Figures 4 and 5 demonstrate that AZD2858 can markedly enhance the cytotoxic and cytostatic effects of SN-38 and FOLFIRI through a combination of increased apoptosis and growth arrest according to dosage and treatment conditions. Figure 6 extends this analysis to cells cultured as spheroids, sometimes considered to better represent tumor responses compared to single cell cultures.

      Significance:

      Liquid phase separation of protein complexes is increasingly recognised as a fundamental mechanism in signal transduction and other cellular processes. One recent and important example was that of Topbp1, whose condensation in response to DNA damage is required for efficient activation of the ATR-Chk1 pathway. The current study asks a related but distinct question; can protein condensation be targeted by drugs to manipulate signalling pathways which in the main rely on protein kinase cascades?

      Here, the authors identify an inhibitor of GSK3-beta as a novel inhibitor of DNA damage-induced Topbp1 condensation and thus of ATR-Chk1 signalling.

      This work will be of interest to researchers in the fields of DNA damage signalling, biophysics of protein condensation, and cancer chemotherapy.

      Comments on latest version:

      Morano et al. have revised their manuscript in response to the points raised by reviewer #3 as follows.

      1) Fig. 2E: Correcting the previously erroneous labelling of this Fig. makes it match the textual description.

      2) Figs 3A and B: The revised textual description of the flow cytometry BrdU incorporation is now precise.

      3) Fig. 3E: Removing the suspect WB images is a pragmatic decision that does not significantly affect the overall conclusions of the paper.

      4) Fig. 3D: Despite its puzzling appearance this data is now described accurately in the text as "DSBs remained elevated after the combined treatment" rather than "increased after the combined treatment. A more convincing increase in the presumed damaged DNA band is evident in Fig. 4D when AZD2858 is combined with a much lower concentration of SN38 (1.5nM) which could mean that the concentration used in Fig. 3D (300nM) induced maximal damage that could not be further enhanced.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have extended their previous research to develop TOPBP1 as a potential drug target for colorectal cancer by inhibiting its condensation. Utilizing an optogenetic approach, they identified the small molecule AZD2858, which inhibits TOPBP1 condensation and works synergistically with first-line chemotherapy to suppress colorectal cancer cell growth. The authors investigated the mechanism and discovered that disrupting TOPBP1 assembly inhibits the ATR/Chk1 signaling pathway, leading to increased DNA damage and apoptosis, even in drug-resistant colorectal cancer cell lines.

      Comments on latest version:

      The authors have addressed most of the concerns that I raised in the first round of revision and I have no further questions. I appreciate the authors's efforts in carrying out an preliminary in vivo work, although as the authors pointed out the compound seems to be not effective in vivo. Future work is desired to address this to clarify the significance of the work.

    1. eLife Assessment

      This valuable study focuses on defining how the HSP70 chaperone system utilizes J-domain proteins to regulate the heat shock response-associated transcription factor HSF1. Using a combination of orthogonal techniques in yeast, this manuscript provides compelling evidence that the J-domain protein Apj1 facilitates attenuation of HSF1 transcriptional activity through a mechanism involving its dissociation from heat shock gene promoter regions. This work improves our understanding of HSF1 regulation and will be of broad interest to cell biologists interested in proteostasis, chaperone networks, and stress-responsive signaling.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors present a thorough mechanistic study of the J-domain protein Apj1 in Saccharomyces cerevisiae, establishing it as a key repressor of Hsf1 during the attenuation phase of the heat shock response (HSR). The authors integrate genetic, transcriptomic (ribosome profiling), biochemical (ChIP, Western), and imaging data to dissect how Apj1, Ydj1, and Sis1 modulate Hsf1 activity under stress and non-stress conditions. The work proposes a model where Apj1 specifically promotes displacement of Hsf1 from DNA-bound heat shock elements, linking nuclear PQC to transcriptional control.

      Strengths:

      Overall, the work is highly novel - this is the first detailed functional dissection of Apj1 in Hsf1 attenuation. It fills an important gap in our understanding of how Hsf1 activity is fine-tuned after stress induction, with implications for broader eukaryotic systems. I really appreciate the use of innovative techniques, including ribosome profiling and time-resolved localization of proteins (and tagged loci) to probe the Hsf1 mechanism. The overall proposed mechanism is compelling and clear - the discussion proposes a phased control model for Hsf1 by distinct JDPs, with Apj1 acting post-activation, while Sis1 and Ydj1 suppress basal activity.

      The manuscript is well-written and will be exciting for the proteostasis field and beyond.

    3. Reviewer #2 (Public review):

      Despite over 50 years of investigation, our understanding of how the ubiquitous heat shock response, governed by the transcription factor HSF1, was regulated was minimal. In recent years, a coordinated yet simple negative feedback circuit has been elucidated in high detail that centers on the chaperone Hsp70 as a direct-binding inhibitor of HSF1 transcriptional activation. However, roles for the obligatory Hsp70 J-domain partner co-chaperones are currently poorly understood. The present study applies several orthogonal techniques to the question and uncovers an unexpected role for the nuclear JDP Apj1 in attenuation of the heat shock response (HSR) via removal of Hsf1 from HSEs in heat shock gene promoter regions. Interestingly, Apj1 appears to play no role in initiating repression of Hsf1, as null mutants do not exhibit constitutive derepression of the HSR. This role is likely filled by the general nucleo/cytoplasmic JDP Ydj1, as previously reported. These results enhance understanding of HSR regulation and underscore the pivotal role that chaperones play in controlling pro-survival gene expression.

      Overall, the work is exceptionally well done and controlled, and the results are properly and appropriately interpreted. Several of the approaches, while powerful, are somewhat indirect (i.e., following gene expression via ribosomal profiling) but ultimately provide a compelling answer to the main question being asked. However, at the end of the day, there is really only one major finding here: Apj1 regulates Hsf1 attenuation via Hsp70. That finding is strongly supported by the experimental data but lacks the one piece of mechanistic evidence found in other recent papers - differential binding of Ssa1/2 to Hsf1 at either the N- or C-terminal binding sites.

    4. Reviewer #3 (Public review):

      Summary:

      The heat shock response (HSR) is an inducible transcriptional program that has provided paradigmatic insight into how stress cues feed information into the control of gene expression. The recent elucidation that the chaperone Hsp70 controls the DNA binding activity of the central HSR transcription factor Hsf1 by direct binding has spurred the question of how such a general chaperone obtains specificity. This study has addressed the next logical question: how J-domain proteins execute this task in budding yeast, the leading cell model for studying the HSR. While an involvement and in part overlapping function of general class A and B J-domain proteins, Ydj1 and Sis1 are indicated by the genetic analysis, a highly specific role for the class A Apj1 in displacing Hsf1 from the promoters is found, unveiling specificity in the system.

      Strengths

      The central strong point of the paper is the identification of class A J-domain protein Apj1 as a specific regulator of the attenuation of the HSR by removing Hsf1 from HSEs at the promoters. The genetic evidence and the ChIP data strongly support this claim. This identification of a specific role for a lowly expressed nuclear J-domain protein changes how the wiring of the HSR should be viewed. It also raises important questions regarding the model of chaperone titration, the concept that a chaperone with limited availability is involved in a tug of war involving competing interactions with misfolded protein substrates and regulatory interactions with Hsf1. Perhaps Apj1, with its low levels and interactions with misfolded and aggregated proteins in the nucleus, is the titrated Hsp70 (co)chaperone that determines the extent of the HSR? This would mean that Apj1 is at the nexus of the chaperone titration mechanism. Although Apj1 is not a highly conserved J domain protein among eukaryotes the strength of the study is that is provides a conceptual framework for what may be required for chaperone titration in other eukaryotes: One or more nuclear J-domain proteins with low nuclear levels that has an affinity for Hsf1 and that can become limiting due to interactions with misfolded Hsp70 proteins. The provides a pathway for how these may be identified using, for example, ChIP-seq.

      Weaknesses

      A built-in challenge when studying the mechanism of the HSR is the general role of the Hsp70 chaperone system and its J domain proteins. Indeed, a weakness of the study is that it is unclear which of the phenotypic effects have to do with directly recruiting Hsp70 to Hsf1 dependent on a J domain protein and what instead is an indirect effect of protein misfolding caused by the mutation. This interpretation problem is clearly and appropriately dealt with in the manuscript text and in experiments, but is of such fundamental nature that it cannot easily be fully ruled out. One way forward is a reconstituted biochemical system that monitors how Hsf1 DNA binding is affected by the Hsp70 system, misfolded proteins, and the various J domain proteins. Yet this approach is clearly beyond the scope of this study.

    5. Author response:

      Reviewer 1:

      We thank the reviewer for his/her very positive comments.

      Reviewer 2:

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer 3:

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

    1. eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

    2. Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

    3. Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable. All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, „our study provides a novel understanding of how QTP shapes migration patterns of birds, " is simply overstretching.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: „we assume species' responses to environments are conservative and their evolution should not discount our findings." But I do not see that clearly stated in the main text.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study. I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

    4. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

      We appreciate your positive comments and constructive suggestions. We fully acknowledge your concerns about clearly communicating the limitations associated with the data used and analytical assumptions. We will try to get more satellite tracking data of birds migrating across the plateau. We will carefully consider the insights that our paper can deliver and make sure the limitations of our datasets and the critical assumption of niche conservatism are clearly presented. By explicitly clarifying these caveats, we believe the transparency and interpretability of the findings will be much improved.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for constructive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which will help us improve our manuscript.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.

      We understand your question about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. We agree that such an approach must be used properly. In the revision, we will explicitly clarify why this counterfactual comparison is useful – namely, it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths. We acknowledge that the counterfactual results are theoretical and will explicitly emphasise the assumptions involved (e.g. species–environment relationships hold between pre- and post- lift environments) in the main text. Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). We will also tone down the language around this analysis to avoid overstating its real-world relevance. In summary, we will clarify that the counterfactual analysis is meant to complement, not replace, empirical observations, and we will discuss its limitations so that its role is appropriately bounded in the paper.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      Thank you for your comments. We apologise for any confusion regarding the scope of our dataset. Our main conclusions are not solely derived from seven bird species. Rather, we integrated a full list of 50 bird species that migrate across the QTP and analysed their migratory patterns with eBird data. We studied the factors influencing their choices of migratory routes with seven species that were among the few with available tracking data across the QTP. In this revision, we will clarify the role of these seven species and the rationale for their selection. Additionally, we attempt to include more satellite tracking data to improve spatial coverage, as recommended by the reviewer and editor. Based on discussions with potential collaborators, we will hopefully include a number of at least 10 more species with available tracking data.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we will clearly state the assumptions of niche conservatism in the Introduction.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. However, in this study we intend to infer broad-scale movement patterns (e.g. general directions and stopover regions) rather than precise one-to-one population linkages. In the revision, we will carefully rephrase those sections to make clear that our inferences are at the species level and at large spatial scales. We will also explicitly state in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis can only suggest plausible routes and region-to-region linkages. We will contrast migratory routes identified by using eBird data and satellite tracking for the same species to check their similarity. We argue that, even with its limits, the eBird dataset can still yield useful insights (such as identifying major flyway corridors over the QTP).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      Thank you for recognising our efforts in the study. By integrating both satellite tracking and community-contributed data, we explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift shapes migratory patterns of birds. We will also acknowledge the study’s limitations to ensure that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We appreciate your suggestions to incorporate field tracking or radar studies to strengthen our results. All coauthors have years of field experiences, even on the QTP and Arctic. For example, the tracking data of peregrine falcons (Falco peregrinus) that we will incorporate in the revision are collected with during our own fieldwork in the Arctic for more than six years. We agree that more direct tracking (through GPS tagging or radar) would be an ideal way to validate migration pathways and population connectivity. In this revision, as stated above we will try to more species with satellite tracking data. We will also note that future studies should build on our findings by using dedicated tracking of more individual birds and radar monitoring of migration over the QTP. We will cite recent advances in these techniques and suggest that incorporating more tracking data could further test the hypotheses generated by our analyses.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We will rewrite this sentence to remove any ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We will remove the sentence to avoid misinterpretation.

      L 158 what is a migration circle? I do not know such a term.

      We will amend it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We will present this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on energy reserves acquired before breeding — rather than an ‘income’ strategy that depends on food acquired during breeding. However, we note that this interpretation would require further study.” By adding this caution, we will make it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We will also double-check that the rest of the discussion around this point is framed appropriately.


      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper is much improved as a result. In addition to minor points, we have made three main changes to our manuscript in response to the reviews. First, we addressed the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we discussed the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we clarified our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We are appreciative of the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we have elaborated on details of our findings, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibet Plateau (Lines 102-116), and (ii) major factors that shape current migration patterns of birds in the plateau (Lines 118-138). We have also better referenced the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We have revised the manuscript carefully in response to the reviewer’s comments and believe that it is much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We have addressed this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We have clarified this in the Introduction (Lines 81-93).

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering bird distributions are more likely to be influenced by food resources and vegetation distributions (Qu et al. 2010, Li et al. 2021, Martins et al. 2024), and the available food and vegetation before the uplift can provide suitable habitats for birds (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau rise on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We have clarified the assumptions and evidence we have for the modelling in Methods (Lines 362-370).

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. In contrast, tracking data can provide detailed information of the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We have better demonstrated how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We have also added a separate caveat section to discuss the limitations stated above (Lines 202-215).

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      In this revision, we have clarified the selection criteria for the 50 species and outlined the boundaries of the breeding areas of all birds (Lines 243-249). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list. Migratory birds may follow a capital or income migratory strategy depending on how much birds ingest endogenous reserved energy gained prior to reproduction. We have added discussions on how these migratory strategies might influence the effects of environment on migratory direction (Lines 183-200).

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. This would minimize the possible errors by just using breeding areas such as the biases caused by relative locations of breeding areas to the QTP as the reviewer pointed. We have better explained this both in the Introduction, Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we have reported Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC)assess the performances of the models (Table S1). AUC is a threshold-independent measurement for discrimination ability between presence and random points (Phillips et al. 2006). When the AUC value is higher than 0.75, the model was considered to be good (Elith et al. 2006). (Lines 379-383).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Please see our responses above.

      Reviewer #2 (Recommendations for the authors):

      Methodological issues:

      Line 219 Why have you selected only 64 species and what were the selection criteria?

      We have clarified the selection criteria (Lines 243-248). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      Minor:

      Line 219 eBird has very uneven distribution, especially in vast areas of Russia. How can your exercise on Lines 232-238 overcome this issue?

      Yes, eBird data can be biased due to patchy sampling and variation of observers’ skills in identifying species. To address this issue, we have developed an adaptive spatial-temporal modelling (stemflow; Chen et al. 2024) to correct the imbalance distribution of data and modelled the observer experience to address the bias in recognising species. The stemflow was developed based on a machine learning modelling framework (AdaSTEM) which leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species at different scales. It has been frequently used in modelling eBird data (Fink et al. 2013, Johnston et al. 2015, Fink et al. 2020) and has been proven to be efficient and advanced in multi-scale spatiotemporal data modelling. We have better explained this (Lines 251-270; Lines 307-321).

      Line 54 This sentence sounds very empty and in fact does not tell us much.

      We have adjusted this sentenced to “Animal movement underpins species’ spatial distributions and ecosystem processes”.

      Line 55 Again a sentence that implies a causality of the annual cycle to make the species migrate. It does not make sense.

      We have revised this sentence as “An important animal movement behaviour is migrating between breeding and wintering grounds”.

      Line 58 How is our fascination with migratory journeys related to the present article? I think this line is empty.

      We have changed this sentence to “Those migratory journeys have intrigued a body of different approaches and indicators to describe and model migration, including migratory direction, speed, timing, distance, and staging periods”.

      Figure 1 - ABC insets are OK, but a combination of lati- and longitudinal patterns is possible, e.g. in species with conservative strategies or for whatever other reason.

      Thank you for the suggestion. We kept the ABC insets rather than combining them together as we believe this can deliver a clear structure of influence of QTP uplift under different scenarios.

      The legend to Figure 2 is not self-explanatory. Please make it clear what the response variable is and its units. The first line of the legend should read something like The influence of environmental factors on the direction of avian migration.

      Thank you. We have amended the legends of Figure 2 as suggested:

      “Figure 2. The influence of environmental factors on the direction of avian migration.  Migratory directions are calculated based on the azimuths between each adjacent stopover, breeding and wintering areas for each species. We employ multivariate linear regression models under the Bayesian framework to measure the correlation between environmental factors and avian migratory directions. Wind represents the wind cost calculated by wind connectivity. Vegetation is measured by the proportion of average vegetation cover in each pixel (~1.9° in latitude by 2.5° in longitude). Temperature is the average annual temperature. Precipitation is the average yearly precipitation. All environmental layers are obtained using the Community Earth System Model. West QTP, central QTP, and East QTP denote areas in the areas west (longitude < 73°E), central (73°E ≤ longitude < 105°E), and east of (longitude ≥ 105°E) the Qinghai-Tibet Plateau, respectively.”

      References

      Chen, Y., Z. Gu, and X. Zhan. 2024. stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software 9:6158.

      Elith, J., C. H. Graham, R. P. Anderson, M. Dudík, S. Ferrier, A. Guisan, R. J. Hijmans, F. Huettmann, J. R. Leathwick, A. Lehmann, J. Li, L. G. Lohmann, B. A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. Scachetti-Pereira, R. E. Schapire, J. Soberón, S. Williams, M. S. Wisz, and N. E. Zimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

      Fink, D., T. Auer, A. Johnston, V. Ruiz-Gutierrez, W. M. Hochachka, and S. Kelling. 2020. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications 30:e02056.

      Fink, D., T. Damoulas, and J. Dave. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. Pages 1284-1290 in Proceedings of the AAAI Conference on Artificial Intelligence.

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Johnston, A., D. Fink, M. D. Reynolds, W. M. Hochachka, B. L. Sullivan, N. E. Bruns, E. Hallstein, M. S. Merrifield, S. Matsumoto, and S. Kelling. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications 25:1749-1756.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Li, S.-F., P. J. Valdes, A. Farnsworth, T. Davies-Barnard, T. Su, D. J. Lunt, R. A. Spicer, J. Liu, W.-Y.-D. Deng, J. Huang, H. Tang, A. Ridgwell, L.-L. Chen, and Z.-K. Zhou. 2021. Orographic evolution of northern Tibet shaped vegetation and plant diversity in eastern Asia. Science Advances 7:eabc7741.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild Bird Migration across the Qinghai-Tibetan Plateau: A Transmission Route for Highly Pathogenic H5N1. Plos One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Qu, Y., F. Lei, R. Zhang, and X. Lu. 2010. Comparative phylogeography of five avian species: implications for Pleistocene evolutionary history in the Qinghai-Tibetan plateau. Molecular Ecology 19:338-351.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of Brown Headed Gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, T. Ma, L.-X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s Gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, Z. Xing, and F.-S. Li. 2011. Migration Routes and Stop-Over Sites Determined with Satellite Tracking of Bar-Headed Geese (Anser indicus) Breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. eLife Assessment

      This valuable study proposes a network implementation of the "re-aiming" learning strategy, which has been hypothesized to underlie brain-computer interface learning. Combining theoretical arguments, numerical simulations, and analysis of experimental data, the authors provide convincing evidence for their hypothesis. This paper will likely be of broad interest to the systems neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      This study considers learning with brain-computer interfaces (BCIs) in nonhuman primates, and in particular, the high speed and flexibility with which subjects learn to control these BCIs.

      The authors raise the hypothesis that such learning is based on controlling a small number of input or control variables, rather than directly adapting neural connectivity within the network of neurons that drive the BCI. Adapting a small number of input variables would circumvent the issue of credit assignment in high dimensions and allow for quick learning, potentially using cognitive strategies ("re-aiming"). Based on a computational model, the authors show that such a strategy is viable in a number of experimental settings and reproduces previous experimental observations:

      (1) Differences in learning with decoders either within or outside of the neural manifold (the space spanned by the dominant modes of neural activity).

      (2) A novel, theory-based prediction on biases in BCI learning due to the positivity of neural firing rates, which is then confirmed in data from previous experiments.

      (3) An example of "illusory credit assignment": Changes in neurons' tuning curves depending on whether these neurons are affected by changes in the BCI decoder, even though learning only happens on the level of low-dimensional control variables.

      (4) A reproduction of results from operant conditioning of individual neurons, in particular, the observation that it is difficult to change the firing rates of neurons strongly correlated before learning in different directions (up vs down).

      Taken together, these observations yield strong evidence for the plausibility that subjects use such a learning strategy, at least during short-term learning.

      Strengths:

      Text and figures are clearly structured and allow readers to understand the main concepts well. The study presents a very clear and simple model that explains a number of seemingly disparate or even contradictory observations (neuron-specific credit assignment vs. low-dimensional, cognitive control). The predicted and tested bias due to positivity of firing rates provides a neat example of how such a theory can help understand experimental results. The idea that subjects first use a small number of command variables (those sufficient in the calibration task) and later, during learning, add more variables provides a nice illustration of the idea that learning takes place on multiple time scales, potentially with different mechanisms at play. On a more detailed level, the study is a nice example of closely matching the theory to the experiment, in particular regarding the modeling of BCI perturbations.

      Weaknesses:

      Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

      Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

    3. Reviewer #2 (Public review):

      Summary :

      The paper proposes a model to explain the learning that occurs in brain-computer interface (BCI) tasks when animals need to adapt to novel BCI decoders. The model consists of a network formulation of the "re-aiming" learning strategy, which assumes that BCI learning does not modify the underlying neural circuitry, but instead occurs through a reorganization of existing neural activity patterns.

      The authors formalize this in a recurrent neural network (RNN) model, driven by upstream inputs that live in a low-dimensional space.

      They show that modelling BCI learning as reorganization of these upstream inputs can explain several experimental findings, such as the difference in the ability of animals to adapt to within vs outside-manifold perturbations, biases in the decoded behaviour after within-manifold perturbations, or qualitative changes in the neural responses observed during credit assignment rotation perturbations or operant conditioning of individual neurons.

      Overall, while the idea of re-aiming as a learning strategy has previously been proposed in the literature, the authors show how it can be formalized in a network model, which allows for more direct comparisons to experimental data.

      Strengths:

      The paper is very well written. The presentation of the model is clear, and the use of vanilla RNN dynamics driven by upstream inputs that are constant in time is consistent with the broader RNN modeling literature.

      The main value of the paper lies in the fact that it proposes a network implementation for a learning strategy that had been proposed previously. The network model has a simple form, but the optimization problem is performed in the space of inputs, which requires the authors to solve a nonlinear optimization problem in that space.

      While some of the results (eg the fact that the model can adapt to within but not outside-manifold perturbations) are to be expected based on the model assumptions, having a network model allows to make more direct and quantitative comparisons to experiments, to investigate analytically how much the dimension of the output is constrained by the input, and to make predictions that can be tested in data.

      The authors perform such comparisons across three different experiments. The results are clearly presented, and the authors show that they hold for various RNN connectivities.

      Weaknesses :

      The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models.

      Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

      In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs. It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

    4. Author response:

      Reviewer #1 (Public Review):

      Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

      We thank the reviewer for raising this important point. While we agree that recurrent dynamics were not the focus of this study, we would like to point out that 1) dynamics, of some kind, are necessary to simulate the decoder fitting process and 2) recurrent neural networks (RNNs) are valuable for obtaining general insights on how biological constraints shape the reachable manifold:

      (1) To simulate the decoder fitting process, we had to simulate neural activity during the so-called “calibration task”. Some dynamics to these responses are necessary to produce a population response with dimensionality resembling what was found in experiments (10 dimensions). Moreover, dynamics are necessary to create a common direction of high variance across population responses to the calibration task stimuli (see Supplementary Figure 2a and surrounding discussion), which is necessary to reproduce the biases in readouts demonstrated in Figure 4 (as many within-manifold decoder perturbations are aligned with it; Supplementary Figure 2b).

      Because feed-forward networks lack dynamics, reproducing our results with a feed-forward network would require using an input with dynamics. Rather than making an arbitrary choice for these input dynamics, we chose to keep the input static and instead generate the dynamics with a RNN, which is in line with recent models of motor cortex.

      We agree, however, that this is an important point worth clarifying in the manuscript. In our revision we will aim to add a demonstration of how to reproduce a subset of our results with a feed-forward network and a dynamic input.

      (2) While we agree that RNNs impose certain limitations over feed-forward networks, we see these limitations as an advantage because they provide a framework for understanding the structure of the reachable manifold in terms of biological constraints. For example, our simulations in Supplementary Figure 1 show that the dimensionality of the reachable manifold is highly dependent on recurrent connectivity: inhibition-stabilized connectivity makes it higher-dimensional whereas task-specific optimized connectivity makes it lower-dimensional. Such insights are valuable to understand the broader implications and experimental predictions of the re-aiming strategy.

      Because feed-forward networks are untied from the reality of recurrent cortical circuitry, they cannot be characterized in terms of such biological constraints. For instance, as the reviewer points out, dynamical stability is not a well-defined property of feed-forward networks. Such models therefore cannot provide any insight into how the biological constraint of dynamical stability could influence the reachable manifold (which we show it does in Figure 5b). Relatedly, feed-forward networks cannot be optimized to solve complex spatiotemporal tasks like the ballistic reaching task we used for our task-optimized RNN (Supplementary Figure 1, right column), so cannot be used to understand how such behavioral constraints would influence the reachable manifold.

      We agree that these reasons for using RNNs are subtle and left implicit in how they are currently exposed in the text. We will add a discussion point clarifying these in our revision.

      Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

      We completely agree: our model is best used to provide a qualitative description of the capabilities of the re-aiming strategy. We will be sure to revise our manuscript to keep such quantitative comparisons at a minimum.

      Reviewer #2 (Public Review):

      The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models. Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

      We agree this remains a limitation of our study. To contrast our re-aiming model with models of synaptic plasticity (in the input and/or recurrent weights), we have included substantial discussion of these alternative models in two sections of the manuscript:

      • Introduction, where we elaborate on the argument that synaptic plasticity requires solving an exceptionally difficult optimization problem in high dimensions

      • Discussion section “The role of synaptic plasticity in BCI learning”, where we review a number of synaptic plasticity models and experimental results they can account for

      We fully agree that more quantitative comparisons remain an important follow-up to this line of research. However, it is worth noting that there are many such models out there. Moreover, as is the case with many computational models, the results one can achieve with any given model can be highly sensitive to a number of different hyperparameters (e.g. learning rates). We therefore feel that a more rigorous comparison requires deeper study and is out of scope of this manuscript.

      In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs. 

      We thank the reviewer for raising this question. We agree that the fact that animals take longer to adapt to OMPs suggests that the underlying learning strategy is somehow different. But the argument we try to make in this section of the paper is that it in fact does not require an entirely different mechanism. Our simulations show that the same mechanism of re-aiming can suffice to learn OMPs, but it simply requires re-aiming in the larger space of all command variables available to the motor system (rather than just the two command variables evoked by the calibration task). Because this is a much higher-dimensional search space (10-20 vs. 2 dimensions, which is a substantial difference due to the curse of dimensionality), we argue that learning should be slower, even though the mechanism (i.e. re-aiming) is the same.

      This is an important and somewhat surprising takeaway from these simulations, which we will try to bring up more explicitly and clearly in the revision.

      It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

      They do in fact make different predictions, and we thank the reviewer for asking and pointing out the lack of discussion of this point. The key difference between these two learning mechanisms is demonstrated in Figure 5b: under generalized re-aiming, there is a fundamental limit to the set of activity patterns one can learn to produce in the brain-computer interface (BCI) learning task. This is quantified in that analysis by the asymptotic participation ratio of the reachable manifold as K increases, which indicates that there is a limited ~12-dimensional subspace that the reachable manifold can occupy. The specific orientation of this subspace is determined by the (recurrent and input) connectivity of the recurrent neural network. With synaptic plasticity in any of the weight matrices (Wrec,Win,U), this subspace could be re-oriented in any arbitrary direction. Our theory of “generalized re-aiming” therefore predicts that the reachable manifold is 1) constrained to a low-d subspace and 2) is not modified when learning BCIs with outside-manifold perturbations.

      Experimentally testing this would require a within-/outside- manifold perturbation BCI learning task akin to that of Sadtler et al, but where the “intrinsic manifold” is measured from population responses evoked by every possible motor command so as to entirely contain the full reachable manifold at max K. This would require measuring motor cortical activity during naturalistic behavior under a wide range of conditions, rather than just in response to the 2D cursor movements on the screen used in the calibration task of the original study. In this case, learning outside-manifold perturbations would require re-orienting the reachable manifold, so a pure generalized re-aiming strategy would fail to learn them. Synaptic plasticity, on the other hand, would not.

      We will be sure to elaborate further on this claim in the revised manuscript.

    1. eLife Assessment

      This important study combines an innovative experimental approach with mathematical modeling to demonstrate that genes separated by strong topological boundaries can exhibit coordinated transcriptional bursting, providing new insights into how regulatory information is transmitted across the genome. The evidence is solid within the studied locus, but the interpretation and generality of the findings would be strengthened by additional validation using simulated data and broader application beyond a single genomic region. This work will be of interest to cell biologists and biophysicists working on transcription and chromatin.

    2. Reviewer #1 (Public review):

      In this manuscript, Kerlin et al. introduce a novel and conceptually important framework for analyzing allelic transcriptional heterogeneity using single-molecule microscopy. The authors aim to distinguish regulatory interactions occurring in cis-between genes on the same allele-from those in trans, between alleles, thereby extending classical models of transcriptional noise into the spatial and allelic domain. They apply this approach to three genes within the FOS locus in MCF7 cells, under both basal and estrogen-induced conditions, and report distinct patterns of transcriptional coordination that depend on gene proximity and chromatin insulation.

      A major strength of this work lies in its innovative methodology and the clarity with which the analytical framework is described. The authors effectively build on foundational ideas in gene expression variability and adapt them to resolve a previously underexplored question - how nearby genes on the same allele may influence each other's transcriptional activity. The imaging data are of high quality, the mathematical derivation is comprehensive, and the overall presentation is strong. The study makes a compelling argument for the value of allele-resolved analysis, highlighting that failure to account for allelic and chromatin context may lead to inaccurate or incomplete interpretations of regulatory mechanisms.

      That said, the scope of the data is currently limited to a single locus in one cell type. As such, some of the general conclusions, particularly those in the abstract and discussion, may be overstated. The evidence supports the findings within the FOS locus, but it remains unclear whether the observed patterns apply broadly across the genome. The utility and generality of the method would be significantly strengthened by additional validation.

      One specific area where the analysis could be improved is through the inclusion of randomized control comparisons. For example, the results presented in Figure 2D and analyzed in Figure 3 could be compared against randomized datasets to establish a baseline of what would be expected by chance. This would help determine the significance of the observed correlations and strengthen confidence in the model's specificity.

      Additionally, the framework should be tested on simulated datasets with a known ground truth to evaluate the robustness of its assumptions and the reliability of its outputs. Testing the approach against existing allele-specific single-cell datasets from other studies would also help assess its generalizability. While the authors suggest the framework could be extended to transcriptomics and spatial omics, these possibilities are not explored in the current study, and future work in this direction should be clearly marked as such.

      In summary, this manuscript presents a methodologically rigorous and biologically significant advance in the study of gene regulation. The approach fills an important gap by enabling allele-resolved, locus-specific analysis of transcriptional coordination, with implications for both basic science and clinical applications. The conclusions are well supported within the studied context, but further validation - particularly through randomized data comparison, simulations, and broader application - would be valuable in assessing the broader utility of the framework.

    3. Reviewer #2 (Public review):

      Summary:

      I am not familiar with mathematical modeling of gene expression, so I will evaluate this manuscript solely from a biological point of view.

      Kerlin et al. combined single-molecule RNA FISH and mathematical modeling approaches to quantitatively characterize changes in the transcriptional dynamics of three neighboring genes at the FOS locus in response to estradiol (E2) stimulation. They showed that the neighboring JDP2 and BATF genes, located on the same side of the TAD boundary, exhibit highly coordinated bursting dynamics. While FOS and JDP2/BATF are strongly insulated (~7:1 intra-to-inter-domain contact ratio) by the TAD boundary, correlated bursting dynamics were still observed between these gene pairs, suggesting that enhancers can bypass strong insulation sites. The authors proposed that burst co-occurrence arises from the activity of ERα-bound enhancers at the locus. They also proposed that the burst size correlation between two neighboring genes located on the same side of the TAD boundary results from local spreading of histone marks.

      Strengths:

      The direct visualization of coordinated transcriptional bursting across a strong insulation site is novel. This finding was carefully analyzed using the mathematical framework developed by the authors.

      Weaknesses:

      Several models were proposed based on single-molecule RNA FISH analysis of the FOS locus, but the generality of these findings remains uncertain. The proposed models were not directly tested through follow-up experiments, leaving the authors' conclusions largely speculative.

    4. Reviewer #3 (Public review):

      Summary

      Kerlin et.al combined single-molecule RNA FISH with oligonucleotide-based DNA FISH to directly examine the transcriptional activities of three adjacent genes at individual alleles in MCF7 cells. Importantly, they provided quantitative methods to resolve allele-specific (cis) and cell-to-cell (trans) variation and quantified the contribution of burst co-occurrence and burst size, which may help to more accurately analyze transcription coregulation. They found that transcriptional variability is largely gene-autonomous, and by disentangling burst co-occurrence and burst size after E2 induction, they proposed two distinct mechanisms of local gene regulation.

      Strengths:

      (1) Innovative Research Methods: Successfully integrates single-molecule RNA FISH with oligonucleotide-based DNA FISH to directly image the transcriptional activities of three adjacent genes at individual alleles. This enables the observation of transcriptional dynamics more precisely and provides a powerful tool for studying gene regulation.

      (2) Novel Data Analysis Approaches: Develops two new analysis methods to dissect the sources of gene activity (co)variation. One approach separates allele-extrinsic, allele-intrinsic, and gene-autonomous components, and the other quantifies the contributions of burst co-occurrence and burst size correlations. These methods help to more accurately analyze transcriptional correlations between genes and reveal potential regulatory mechanisms.

      Weaknesses:

      Biological Insights: The findings challenge the traditional view of contact insulation sites as strict regulators of gene coregulation and suggest two distinct coregulatory mechanisms influenced by local chromosome folding. However, expression activity of multiple genes is differentially correlated at the population-level or cell-level versus single-allele-level. More in-depth analysis is needed for further biological insights.

    1. eLife Assessment

      This study presents a valuable finding on the intersection between tuberculosis and diabetes and the impact on immune responses, notably T cell and myeloid cell responses. The single-cell data collected and analyzed are convincing and provide a rich dataset to develop a more detailed understanding of cellular responses during Mtb infection of diabetic mice. Some of the mechanistic claims are incomplete, as there are no experiments performed to clearly define a role for IL-16 or IL-17 in disease. Inclusion of analysis of human samples would have strengthened the conclusions in the paper for translational impact, as well as the inclusion of a DM group alone in addition to DM-TB vs TB in some of the experiments.

    2. Reviewer #1 (Public review):

      Summary:

      The authors hypothesized that the lung immune landscape in mice with diabetes and TB comorbidity is different from that of mice with DM-only or TB-only, or healthy mice. Systematically, the authors established the 'basal' lung immune landscape in DM or healthy animals before infection with Mycobacterium tuberculosis, allowing them to tease out changes in cell types with TB infection and focused subsequent studies on DM-TB and TB comparisons. The authors chose day 21 post-Mtb infection as the point of analysis since this is the peak of immune responses to Mtb infection as per an earlier study (Das et al. 2021). As expected, the authors found differences in the cellular composition of the DM mice with or without TB or TB-only mice, including reduced IFNg response, elevated Th17 cells, increased IL-16 signaling, and altered naive CD4+ and naive CD8+ T cell numbers. The authors have used a series of techniques for methodological and analytical approaches to identify potential pathways that can be targeted for therapies against DM-TB. However, the authors have failed to propose a model that could explain their observations at the time point tested, lowering enthusiasm for the conclusions of the study.

      Strengths:

      The strength of the study is the use of a validated model of mouse DM-TB and a meticulous approach to establish and define a 'baseline" lung cellular landscape in DM and healthy mice before Mtb infection. The use of an up-to-date analytical pipeline by the authors is commendable.

      The literature review is exhaustive, and the authors have put considerable effort into borrowing from other conditions where the identified genes of pathways have been implicated.

      Weaknesses:

      The key limitations of the study include:

      (1) The authors have failed to link a specific cell type, that is, Th17 cell activation, to or with IL-16 signaling as the drivers regulating conditions that contribute significantly to the dysregulated immune responses in DM-TB. For context, naive CD4+ and naive CD8+ T cells cannot be considered "specific cell types" because they have no determined cell fate; they could mature to any other cell type - cytotoxic T cells, Th1, or even Th17 or Tc17 cells.

      (2) Since day 21 post-Mtb infection is an earlier timepoint, the authors should have provided data on cellular composition in the experiments in Figure 7. From the work of Kornfeld and colleagues, there is delayed cell recruitment in DM-TB, but it is likely that later on, due to persistent inflammation (from chronic hyperglycemia), DM-TB mice have more or equal cell numbers in the lung. Anecdotally, the authors found differences in CFU at a later time point but not at 21 days post-infection. This fits with human studies where there is a higher prevalence of cavities in DM-TB compared to TB-only patients. The authors missed the opportunity to clarify this important point by excluding cellular data from the 56-day post-infection experiments.

      (3) The power of the study would be improved by the direct comparisons of three groups: DM vs DM-TB vs TB to recapitulate the human populations and allow the authors to address the question of 'why does DM worsen TB outcome?'. The current analysis of DM-TB vs TB is not fit for this because the TB is on a healthy background, while DM-TB is a result of two conditions that independently perturb immune homeostasis.

    3. Reviewer #2 (Public review):

      Summary:

      While immune cell distribution in tuberculosis (TB) is well documented, research on its disruption in diabetes-tuberculosis (DM-TB) comorbidity remains limited. In this study, Chaudhary et al. explore immune cell perturbations in DM-TB using single-cell RNA sequencing (scRNA-seq), providing key insights into the impaired host immune response. By elucidating the molecular mechanisms underlying immune dysfunction in DM-TB, this study addresses an important knowledge gap. The study demonstrates that diabetes impairs lung immune cell infiltration and contributes to a dampened immune response against Mycobacterium tuberculosis. Reduced Th1 and M1 macrophage populations indicate a compromised ability to mount an effective pro-inflammatory response, which is essential for TB control. The observed increase in IL-16 signaling and reduction in TNF and IFN-II responses suggest a shift toward a more immunosuppressive or dysregulated inflammatory state. The interplay between chronic inflammation, hyperglycemia, and dyslipidemia in diabetes further exacerbates immune dysfunction, reinforcing the idea that metabolic disorders significantly impact TB pathogenesis.

      Strengths:

      This well-designed study employs robust methodology, well-executed experiments, and a well-written manuscript. The use of scRNA-seq is a notable strength, offering high-resolution analysis of immune cell heterogeneity in the lung environment. Additionally, the study corroborates its findings in a long-term infection model, demonstrating that chronic M. tuberculosis (H37Rv) infection in diabetic mice leads to increased bacterial burden and worsened tissue pathology.

      Weaknesses:

      (1) The study focuses on CD3⁺ and CD11c⁺ cells but does not extensively examine other key immune players that may contribute to DM-TB pathogenesis. Given that diabetes affects multiple immune compartments, a broader immune profiling approach would provide a more comprehensive understanding.

      (2) While the study identifies increased IL-16 signaling and reduced TNF/IFN-II responses, the precise molecular mechanisms driving these changes remain unclear. Further investigation into metabolic-immune crosstalk (e.g., how hyperglycemia affects immune cell differentiation and cytokine secretion) would strengthen the mechanistic depth of the findings.

      (3) The study suggests targeting IL-16 and Th17 cells as potential therapeutic strategies; however, no experimental validation (e.g., testing IL-16 inhibitors in DM-TB models) is provided. Validating these interventions would enhance their translational relevance.

      (4) Incorporating clinical samples (e.g., PBMCs from DM-TB patients) could help bridge the gap between murine and human studies, offering more translational insights into disease mechanisms.

      Overall, this study provides valuable findings, but addressing these concerns would further strengthen its impact on understanding DM-TB immunopathogenesis.

    1. eLife Assessment

      This valuable study reports the conservation of sperm-egg envelope binding by demonstrating successful recognition of the micropyle in fish eggs by the mouse sperm. However, the evidence supporting the conclusions drawn remains incomplete. In particular, the proposed specific role of CatSper in micropyle recognition and passage is not fully demonstrated. This study will be of interest to reproductive biologists and clinicians studying the biology of fertilization and fertility.

    2. Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Weaknesses:

      The rationale of some of the experiments is not well defined.

      Major Comments:

      (1) Figure 5<br /> I do not understand the rationale for performing experiments using CatSper-null sperm and CD9-null oocytes. It is well established that CatSper-null sperm are unable to penetrate the zona pellucida (ZP), so the relevance of this approach is unclear.

      (2) Micropyle penetration and sperm motility<br /> CatSper-null sperm are reportedly unable to cross the micropyle, but this could be due to their reduced motility rather than a lack of hyperactivation per se. Were these experiments conducted using capacitated or non-capacitated spermatozoa? What was the observed motility of CatSper-null sperm during these assays? Clarifying these conditions is essential to avoid drawing incorrect conclusions from the results.

      (3) Rheotaxis and micropyle navigation<br /> Previous studies have shown that CatSper-null sperm fail to undergo rheotaxis. Could this defect be related to their inability to locate and penetrate the micropyle? Exploring a potential shared mechanism could be informative.

      (4) Lines 61-74<br /> This paragraph omits important information regarding acrosomal exocytosis, which occurs prior to sperm-egg fusion. Including this detail would strengthen the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      Garibova et al. investigated the conservation of sperm recognition and interaction with the egg envelope in two groups of distantly related animals: mammals (mouse) and fish (zebrafish). Previous work and key physiological differences between these two animal groups strongly suggest that mouse sperm would be incapable of interaction with the zebrafish egg envelope (chorion) and its constituent proteins, though homologous to the mammalian zona pellucida (ZP). Indeed, the authors showed that mouse sperm do not bind recombinant zebrafish ZP proteins nor the intact chorion. Surprisingly, however, mouse sperm are able to locate and bind to the zebrafish micropyle, a specialized canal within the chorion that serves as the egg's entry point for sperm. This study suggests that sperm attraction to the egg might be highly conserved from fish to mammals and depends on the presence of a still unknown glycosylated protein within the micropyle. The authors further demonstrate that mouse sperm are able to enter the micropyle and accumulate within the intrachorionic space, potentially through a CatSper-dependent mechanism.

      Strengths:

      The authors convincingly demonstrate that mouse sperm do not bind zebrafish ZP proteins or the chorion. Furthermore, they make the interesting observation that mouse sperm are able to locate and enter the zebrafish micropyle in an MP-dependent manner, which is quite unexpected given the large evolutionary distance between these species, the many physiological differences between mouse and zebrafish gametes, and the largely different modes of both fertilization and reproduction in these species. This may indicate that the sperm chemoattractant in the egg is conserved between mammals and fish; however, whether zebrafish sperm are attracted to mouse eggs was not tested.

      Weaknesses:

      The key weakness of this study lies in the rationale behind the overall investigation. In mammals, the zona pellucida (ZP) has been implicated in binding sperm in a taxon-specific manner, such that human sperm are incapable of binding the mouse ZP. Indeed, work by the corresponding author showed that this specificity is mediated by the N-terminal region of the ZP protein ZP2 (Avella et al., 2014). The N-termini of human and mouse ZP2 share 48% identity, which is higher than the overall identity between mouse and zebrafish ZP2, with the latter ortholog entirely lacking the N-terminal domain that is essential for sperm binding to the ZP. Given this known specificity for mouse vs. human sperm-ZP binding, it does not follow that mouse sperm would bind ZP proteins from not only a species that is much more distantly related, but also one that is not even a mammal, the zebrafish. Furthermore, the fish chorion does not play a role in sperm binding at all, while the mammalian ZP can bind sperm at any location. On the contrary, the zebrafish chorion prevents polyspermy by limiting sperm entry to the single micropyle.

      In addition, though able to provide some information regarding the broad conservation of sperm-egg interaction mechanisms, the biological relevance of these findings is difficult to describe. Fish and mammals are not only two very distinct and distantly related animal groups, but also employ opposite modes of fertilization and reproduction (external vs. internal, oviparous vs viviparous). Fish gametes interact in a very different environment compared to mammals and lack many typically mammalian features of fertilization (e.g., sperm capacitation, presence of an acrosome, interaction with the female reproductive tract), making it difficult to make any physiologically relevant claims from this study. While this study may indicate conserved mechanisms of sperm attraction to the egg, the identity of the molecular players involved is not investigated. With this knowledge, the reader is forced to question the motivation behind much of the study.

      During fertilization in fish, the sperm enters the micropyle and subsequently, the egg, as it is simultaneously activated by exposure to water. During egg activation, the chorion lifts as it separates from the egg and fills with water. This mechanism prevents supernumerary sperm from entering the egg after the successfully fertilizing sperm has bound and fused. In this study, the authors show that mouse sperm enter the micropyle and accumulate in the intrachorionic space. Whether any sperm successfully entered the egg is not addressed, and the status of egg activation is not reported. In Supplementary Videos 3-4, the egg shown has been activated for some time, as evident by the separation of yolk and cytoplasm, yet the chorion is only partially expanded (likely due to mouse IVF conditions). How multiple sperm were able to enter the micropyle but presumably not the egg is not addressed, yet this suggests that the zebrafish mechanism of blocking polyspermy (fertilization by multiple sperm) is not effective for mouse sperm or is rendered ineffective due to mouse IVF conditions. The authors do not discuss these observations in the context of either species' physiological process of fertilization, highlighting the lack of biological context in interpreting the results.

      The authors further show that the zebrafish micropyle does not trigger the acrosome reaction in mouse sperm. Whether the acrosome reacts is not correlated with a sperm's ability to cross the micropyle opening, as both acrosome-intact and acrosome-reacted sperm were observed within the intrachorionic space. While the acrosome reaction is a key event during mammalian fertilization and is required for sperm to fertilize the egg, zebrafish sperm do not contain an acrosome. Thus, these results are particularly difficult to interpret biologically, bringing into question whether this observation has biological relevance or is a byproduct of egg activation/chorion lifting that indirectly draws sperm into the chorion.

      The final experiments regarding CatSper1's role in mediating mouse sperm entry into the micropyle/chorion are not convincing. As no molecular interactions are described or perturbed, the reader cannot be sure whether the sperm's failure to enter is due to signaling via CatSper1 or whether the overall failure to undergo hyperactivation limits sperm motility such that the mutant sperm can no longer find and enter the zebrafish micropyle. Indeed, in Figure 5E, no CatSper1 mutant sperm are visible near any part of the egg, suggesting that overall motility is impaired, and this is not a phenotype specific to interactions with the micropyle.

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexpression of LC neurons after chronic stress is compelling. However, to fully support the broad implications for LC degeneration and Alzheimer's disease, the study would benefit from stronger causal integration and validation in age-relevant models.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how chronic stress may contribute to LC dysfunction in AD by examining the mechanisms underlying NA accumulation and α2A-AR internalization. Using electrophysiological recordings and molecular analyses, the authors propose that stress-induced receptor internalization impairs autoinhibition, leading to excessive NA accumulation and increased MAO-A activity. The findings have potential implications for understanding the progression of AD-related neurodegeneration and targeting noradrenergic dysfunction as a therapeutic strategy.

      Strengths:

      (1) The study integrates electrophysiology and molecular approaches to explore the mechanistic effects of chronic stress on LC neurons.

      (2) The evidence supporting NA accumulation and α2A-AR internalization as contributing factors to LC dysfunction is novel and relevant to AD pathology.

      (3) The electrophysiological findings, particularly the loss of spike-frequency adaptation and reduction in GIRK currents, provide functional insights into stress-induced changes in LC activity.

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca2+ drives MAO-A activation, and how they activate MAO-A should be considered.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca²⁺-dependent mechanism.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD.

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      Impact:

      This study provides valuable insights into the impact of chronic stress on LC function and its relevance to AD pathogenesis. The proposed mechanism linking NA dysregulation and receptor internalization may have implications for developing therapeutic strategies targeting the noradrenergic system in neurodegenerative diseases. However, additional validation is needed to strengthen the mechanistic claims before the findings can be fully integrated into the field.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces locus coeruleus (LC) neuron degeneration. The authors demonstrate that chronic stress leads to internalization of α2A-adrenergic receptors (α2A-ARs) on LC-neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, asparagine endopeptidase (AEP) activation, and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are mostly well supported by data, but some aspects of image acquisition need to be extended.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK1 using a range of pharmacological agents, highlighting the innovative nature of the work.

      Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly regarding therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive data set showing that repeated excitation or restraint stress internalises somato dendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO-A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting, and immunohistochemistry. The final schematic is appealing and could, in principle, explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      (1) Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      (2) The use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative.

      (3) Well-executed electrophysiology.

      (4) Translation relevance - converges to a model that can be discussed by peers (scientists can only discuss models - not data!).

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

    5. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We will revise our manuscript so as to make it easy to follow the logical flow in transitions between mechanistic components.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca<sup>2+</sup> drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Fig. 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown (Figs. 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figs. S4). [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Fig 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by Ca<sup>2+</sup> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.

      By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Its subsequent anxiety and memory impairment are just supplementary to our current study. We will soften the implication of anxiety and memory impairment.

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup>-dependent mechanism.

      We can hardly agree with this comment.

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2+</sup>-dependent downregulation of GIRK (Fig. 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Fig. 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Fig. 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD.

      We will discuss the role of VMAT2 in NA accumulation, especially when VMAT2 was impaired. Indeed, it has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We will add such discussion in the revised manuscript.

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). A delayed spiking pattern in response to depolarizing pulses (Figure S9) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examine the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that 3-day RS may be insufficient to induce changes in MAO-A, AEP and Tau N368 and some period of high [Ca<sup>2+</sup>]<sub>i</sub> condition may be necessary to induce such changes. We will discuss this in the revised manuscript.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We will revise accordingly.

    1. eLife Assessment

      This useful study describes a physical mechanism for the emergence of spiral patterns in the outer epithelial layer of the mammalian cornea independent of pre-patterning or guidance cues, using an agent-based model of self-propelled particles with alignment. The model is well constructed, however the central premise of the manuscript, that the spiral patterning of epithelial corneal cells occurs without guidance cues, is incomplete and not fully supported. Several significant questions remain unanswered, such as the role of the corneal curvature or the importance of topological defects. Furthermore, comparison between the model and data are qualitative at best for the moment.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Kostanjevec et al. investigates the mechanism behind spiral pattern formation in the cornea. The authors demonstrate that the spiral motion pattern on the mammalian corneal surface emerges from the interaction between the limbus position, cell division, extrusion, and collective cell migration. Using LacZ mosaic murine corneas, they reveal a tightening spiral flow pattern and show that their cell-based, in silico model accurately reproduces these patterns without global guidance cues. Additionally, they present a continuum model that extends the XYZ hypothesis to describe cell flux on the cornea, offering a quantitative explanation for tissue-scale processes on curved surfaces.

      Strengths:

      The manuscript is well-written, with a systematic approach that clearly explains experimental setups, model construction, assumptions, parameter selection, and predictions. The discussion also provides insightful perspectives on the broader implications of the results for both physics and biology.

      Weaknesses:

      The central premise of the manuscript, that the spiral patterning of epithelial corneal cells occurs without guidance cues, is not fully supported. The authors overlook the potential role of axons in guiding epithelial cells, despite clear evidence of spiral axon patterns in their own Fig. 1b. Previous literature indicates that axon patterning precedes epithelial cell patterning, suggesting that epithelial migration might be influenced by pre-existing neural structures (e.g., Leiper et al. 2002, IOVS 2013). The authors need to address this point, possibly by exploring whether axonal patterns serve as a template for epithelial cell migration, or by providing experimental evidence to rule out axon-based guidance.

      While the model is well-constructed, it currently falls short of its stated goal of elucidating the mechanisms of spiral formation. Key questions remain unanswered:<br /> Is the curvature of the cornea necessary for spiral formation, or would a simpler disk geometry suffice?<br /> What role do boundary conditions play?<br /> How well do the model's predictions quantitatively match experimental data?<br /> The current comparisons in Fig. 4c-f lack quantitative agreement, and this discrepancy should be discussed with possible explanations.

      The authors emphasize polar alignment as a key feature of the spiral pattern based on simulation results. However, they do not provide experimental evidence for this polar alignment. The manuscript includes discussions of polar and nematic symmetries that, without supporting data, feel somewhat distracting. If direct experimental evidence for polar alignment is not available, the authors could instead quantify nematic alignment as the spiral forms. This would also allow them to explore potential crosstalk between nematic cell orientation and the polar alignment of self-propulsion, especially considering recent studies showing alternative mechanisms for vortex formation in similar systems.

    3. Reviewer #2 (Public review):

      In K. Kostanjevec et.al, the authors study a possible mechanism for the formation of spiral patterns in the cornea. First the authors analyze an inferred velocity field, which is deduced from images of fixed corneas, and then determine the position-dependent spiral angle of this velocity fields. Next, the authors analysed two possible markers of cell polarity: the direction of the centrosome-nuclei and the axis of mitosis. Then the authors introduce a stochastic agent-based model of self-propelled particles with over-damped dynamics and with aligning interactions to the orientation of the nearest neighbors and to the particle's velocity. The authors claim to be able to reproduce the equal-time autocorrelation function and the velocity Fourier spectrum. Then the authors introduce the geometry of the cornea by constraining the dynamics on a spherical cap and show that their model can reproduce a typical trajectory in experiments. Finally, the authors produce a phase diagram of the states at a fixed time point as a function of the spherical cap radius and the strength of the coupling aligning constant. Finally, the authors propose an interpretation of the cell fluxes based on the equation of mass conservation.

    4. Author response:

      We thank the referees for finding our work well written and systematic. We are planning a revision of the manuscript based on the public review and the confidential recommendations of the referees.

      The role of axons:

      Indeed, radial axon projections appear before mature epithelial stripes in the cornea (Iannaccone et al., 2012). Our claim is, however, not that guidance cues are absent, but that global cues are unnecessary. The alignment term in our model, together with evidence that corneal epithelial cells follow contact-mediated substrate cues (Walczysko et al., 2016), show that corneal cells migration is responsive to external forces, and the underlying patterns of axonal projections could be one of those cues.

      Experiments (Collinson et al., 2002) and simulations in this work show that a rapid spiral epithelial flow forms first, with cells migrating radially for ~2 weeks before stripes become visible. Axons seeking the path of least resistance within this moving basal layer would therefore appear radial early on. By contrast, establishing visible stripes requires an entire cohort of epithelial cells to travel from the limbus to the central cornea (Fig. 7). Extensive in-vivo studies (Song et al., 2004; Leiper et al., 2009) find no evidence that axons direct epithelial migration; if anything, epithelial flow dictates axonal trajectories.

      Geometry and boundaries:

      The spiral also forms on a flat disc, but its exact shape changes with curvature and cap angle; this variation is seen across mammals, including humans (Dua et al., 1993) and in diseases such as keratoconus. On a spherical cap the boundary winding number fixes the interior index, so ongoing limbal influx keeps the total index = 1. 

      In the revised version, we will therefore simulate a range of curvatures, cap angles, a prolate ellipsoid, and cases without limbal division, then compare with published data and disease states.

      In-vitro data and parameter fits:

      Although our dataset is limited, the inferred parameters match three independent invitro estimates (Kostanjevec et al. 2020; Saraswathibhatla et al. 2021; Kammeraat et al. in prep.). Spatial correlations exceed those expected from persistence alone, implying some polar alignment - consistent with Saraswathibhatla et al. 2021.  Slide-scanner images that we will include in the revision show cells are neither elongated nor nematically ordered. In the revision we will detail our parameter extraction, highlight evidence for alignment, stress the substrate-based activity mechanism, and draw attention to the supplementary videos.

      Topological clarification:

      Stagnation points can be seen as topological defects because classification depends only on vector directions. Boundary conditions can remove such defects in fluids, yet two sources/sinks still interact via the same logarithmic Green’s function that governs disclinations, despite di^erent physics. The Euler characteristic is a property of the surface; while the boundary winding number fixes the field index, it does not alter the surface’s Euler characteristic. 

      In the revision, we will add a concise primer on the di^erential-geometric concepts to make these points explicit.

    1. eLife Assessment

      This study demonstrates the application of END-seq, originally developed to study genome-wide DNA double-strand breaks, to telomere biology; the work packs a punch, concisely demonstrating the utility of this approach and the new insights that can be gained. The authors confirm that telomeres in telomerase-positive cells terminate with 5'-ATC in a Pot1-dependent manner, and demonstrate that this principle holds true in telomerase-negative ALT cells as well; S1-END-seq is similarly developed for telomeres, showing that ALT cells harbor several regions of ssDNA. The study is well-executed, the new insights are fundamental and compelling, and the optimized END-seq approaches will be widely utilized. The interest of the paper could be heightened by deepening the discussion of potential biases in telomere representation, the origin of the ssDNA captured in ALT cells, and the occurrence of variant telomere repeats in the cell lines studied.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript from Azeroglu et al. presents the application of END-Seq to examine the sequence composition of chromosome termini, i.e., telomeres. END-seq is a powerful genome sequencing strategy developed in Andre Nussesweig's lab to examine the sequences at DNA break sites. Here, END-Seq is applied to explore the nucleotide sequences at telomeres and to ascertain (i) whether the terminal end sequence is conserved in cells that activate the ALT telomere elongation mechanism and (ii) whether the processes responsible for telomere end sequence regulation are conserved. With these aims clearly articulated, the authors convincingly show the power of this technique to examine telomere end-processing.

      Strengths:

      (1) The authors effectively demonstrate the application of END-seq for these purposes. They verify prior data that 5'terminal sequences of telomeres in HeLa and RPE cells end in a canonical ATC sequence motif. They verify that the same sequence is present at the 5' ends of telomeres by performing END-seq across a panel of ALT cancer cells. As in non-ALT cells, the established role of POT1, a ssDNA telomere binding protein, in coordinating the mechanism that maintains the canonical ATC motif is likewise verified. However, by performing END-Seq in mouse cells lacking POT1 isoforms, POT1a and POT1b, the authors uncover that POT1b is dispensable for this process. This reveals a novel, important insight relating to the evolution of POT1 as a telomere regulatory factor.

      (2) The authors then demonstrate the utility of S1-END-seq, a variation of END-Seq, to explore the purported abundance of single-stranded DNA at telomeres within telomeres of ALT cancer cells. Here, they demonstrate that ssDNA abundance is an intrinsic aspect of ALT telomeres and is dependent on the activity of BLM, a crucial mediator of ALT.

      Overall, the authors have effectively shown that END-seq can be applied to examine processes maintaining telomeres in normal and cancerous cells across multiple species. Using END-Seq, the authors confirm prior cell biological and sequencing data and the role of POT1 and BLM in regulating telomere termini sequences and ssDNA abundance. The study is nice and well-written, with the experimental rationale and outcomes clearly explained.

      Weaknesses:

      This reviewer finds little to argue with in this study. It is timely and highly valuable for the telomere field. One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

    3. Reviewer #2 (Public review):

      This is a short yet very clear manuscript demonstrating that two methods (END-seq and S1-END-seq), previously developed in the Nussenzweig laboratory to study DSBs in the genome, can also be applied to the 5' ends of mammalian telomeres and the accumulation of telomeric single-stranded DNA.

      The authors first validate the applicability of END-seq using different approaches and confirm that mammalian telomeres preferentially end with an ATC 5' end through a mechanism that requires intact POT1 (POT1a in mice). They then extend their analysis to cells that maintain telomeres through the ALT mechanism and demonstrate that, in these cells as well, telomeres frequently end in an ATC 5' sequence via a POT1-dependent mechanism. Using S1-END-seq, the authors further show that ALT telomeres contain single-stranded DNA and estimate that each telomere in ALT cells harbors at least five regions of ssDNA.

      I find this work very interesting and incisive. It clearly demonstrates that END-seq can be applied with unprecedented depth and precision to the study of telomeric features such as the 5' end and ssDNA. The data are very clear and thoroughly interpreted, and the manuscript is well written. The results are carefully analyzed and effectively presented. Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.

      I only have a few minor suggestions:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      I believe Figures 1 and 2 should be merged.

      Scale bars should be added to all microscopy figures.

    4. Reviewer #3 (Public review):

      Summary:

      A subset of cancer cells attain replicative immortality by activating the ALT mechanism of telomere maintenance, which is currently the subject of intense research due to its potential for novel targeted therapies. Key questions remain in the field, such as whether ALT telomeres adhere to the same end-protection rules as telomeres in telomerase-expressing cells, or if ALT telomeres possess unique properties that could be targeted with new, less toxic cancer therapies. Both questions, along with the approaches developed by the authors to address them, are highly relevant.

      Strengths:

      Since chromosome ends resemble one-ended DSBs, the authors hypothesized that the previously described END-SEQ protocol could be used to accurately sequence the 5' end of telomeres on the C-rich strand. As expected, most reads corresponded to the C-rich strand and, confirming a previous observation by de Lange's group, most chromosomes end with the ATC-5' sequence, a feature that was found to be dependent on POT1 and to be conserved in both human ALT cells and mouse cells. Through a complementary method, S1-END-SEQ, the authors further explored ssDNA regions at telomeres, providing new insights into the characteristics of ALT telomeres. The study is original, the experiments were well-controlled and excellently executed.

      Weaknesses:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      Minor Points:

      (1) The Y-axes of Figure 4 should be relabeled to account for the G-strand reads. Additionally, statistical analyses are absent in Figure 4 and Figure S3.

      (2) A careful proofreading of the manuscript is necessary.

    5. Author response:

      We thank the reviewers for their thoughtful and generous assessment of our work. Overall, the reviewers found our work to be novel and relevant. In particular: reviewer #1 found that our manuscript “It is timely and highly valuable for the telomere field” reviewer #2 stated, “Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.” Reviewer #3 stated that “The study is original, the experiments were well-controlled and excellently executed.”

      We are extremely grateful for these comments and want to thank all the reviewers and the editors for their time and effort in reviewing our work.

      The reviewers had a number of suggestions to improve our work. We have addressed all the points as highlighted in the point-by-point responses below.

      Reviewer 1:

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends.

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. eLife Assessment

      This important study provides evidence for asymptomatic Bordetella pertussis carriage among mothers in a longitudinal cohort in Zambia, significantly advancing understanding of transmission dynamics. The evidence presented is convincing, with strengths including routine sampling irrespective of symptoms and rigorous qPCR methodology, although confirmatory diagnostics would further strengthen the claims. Overall, the study represents an influential contribution to the field of infectious disease epidemiology.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates the role of asymptomatic pertussis carriage in transmission between mothers and their infants, in particular. The authors used a longitudinal cohort study that involved 1,315 mother-infant dyads in Lusaka, Zambia, and they utilized qPCR-based detection of IS481 to track Bordetella pertussis transmission over time. Insights from the study suggest that minimally symptomatic or asymptomatic mothers may act as a reservoir for B. pertussis transmission in the infants, thus challenging the traditional surveillance methods that focus on symptomatic cases. Additionally, the study also identified a subgroup of persistently colonized individuals where mothers were majorly asymptomatic despite sustained bacterial presence.

      The authors aimed to improve comprehension of pertussis transmission dynamics in high-burden low-resource settings, and they advocated for enhanced molecular surveillance strategies to capture full pertussis infection, including those that might have gone undetected.

      Strengths:

      The strengths are the use of innovative study design, especially the longitudinal approach and routine sampling, rather than symptom-driven testing that minimizes bias in the study. The methodology was also rigorous and transparent by evaluating the IS481 signal strength to classify pertussis detection and conducting retesting to assess qPCR reliability. There were also important epidemiological insights, and the findings challenge the traditional wisdom by suggesting that pertussis transmission may frequently occur outside of symptomatic cases. The findings also showed its relevance to global health and policy by arguing for the incorporation of molecular tools like qPCR for surveillance of pertussis in low-resource settings.

      Weaknesses:

      These include reliability on qPCR-based detection without additional validation measures like confirmatory culture or serology. There are also potential alternate explanations for transmission patterns observed in the study such as shared environmental exposure or household transmission. Additionally, there is limited generalizability as the study was done in a single urban site in Zambia. There is also a lack of functional immune data.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors describe the results of a longitudinal study of pertussis infection in mother/infant dyads in Lusaka, Zambia. Unlike many past studies, the authors assessed the infection status of individuals independently of whether they were symptomatic for a respiratory infection. As a result, this work represents one of the first studies specifically designed to assess asymptomatic transmission of pertussis. Using qPCR, the authors find strong evidence for the role of asymptomatic transmission from mothers to infants and also evidence for long-term bacterial carriage. This work represents an important contribution to our understanding of the global burden of pertussis. Also, it highlights the still under-appreciated role of asymptomatic transmission across many infectious diseases (including vaccine-preventable ones).

      Strengths:

      Unlike many past studies, the authors assessed the infection status of individuals independently of whether they were symptomatic for a respiratory infection. As a result, this work represents one of the first studies specifically designed to assess asymptomatic transmission of pertussis. Using qPCR, the authors find strong evidence for the role of asymptomatic transmission from mothers to infants and also evidence for long-term bacterial carriage.

      Weaknesses:

      While I am quite enthusiastic about the work, I am concerned that a number of likely relevant confounders were not discussed and that the broader implications of their findings were not well grounded in the existing literature. For example, I could not find information on the vaccination status of the mothers in the study. Given the conclusions about asymptomatic transmission and the durability of immunity, it is important to know the vaccination status of the mothers. Moreover, did the authors have other metadata on the mother/infant dyads, e.g., household size, vaccination status of household members, etc.? Given the potential implications of more widespread asymptomatic transmission associated with pertussis infection, I believe the authors should better couch their results in the context of the broader debate around asymptomatic transmission.

    1. eLife Assessment

      Cardiac Ca2+/Na+ exchange is mediated by the NCX1 antiporter, whose activity is tightly regulated. This important manuscript describes the structural basis of activation by the lipid DiC8-PIP2 and inhibition by binding of a small molecule to NCX1. These results provide convincing insights into NCX1 regulation and the structural basis of cellular Ca2+ signaling.

    2. Reviewer #1 (Public review):

      This study uses structural and functional approaches to investigate regulation of the Na/Ca exchanger NCX1 by an activator, PIP2 and an inhibitor, SEA0400. Previous functional studies suggest both of these compounds interact with the Na-dependent inactivation process to mediate their effects.

      State of the art methods are employed here, and the data are of high quality and presented very clearly. While there is merit in combining structural studies on both compounds as they relate to Na-dependent activation, in the end it is somewhat disappointing that neither is explored in further depth.

      The novel aspect of this work is the study on PIP2. Unfortunately, technical limitations precluded structural data on binding of the native PIP2, and so an unnatural short-chained analog, di-C8 PIP2, was used instead. This raises the question of whether these two molecules, which have similar but very distinctly different profiles of activation, actually share the same binding pocket and mode of action. The authors conduct a "competition" experiment, arguing the effect of di-C8-PIP2 addition subsequent to PIP2 suggests competition for a single binding site. In this scenario, PIP2 would need to vacate the binding site prior to di-C8-PIP2 occupying it. However, the lack of an effect of washout alone, suggests PIP2 does not easily unbind. This raises the possibility (probability?) of a non-competitive effect of di-C8-PIP2 at a different site. An additionally informative experiment would be to determine if a saturating concentration of di-C8-PIP2 could prevent the full activation induced by subsequent PIP2 addition. However, the relative affinities of the two ligands might make such an experiment challenging in practice.

      In an effort to address the binding site directly, the authors mutate key residues predicted to be important in liganding the phosphorylated head group of PIP2. However, the only mutations that have a significant effect in PIP2 activation also influence the Na-dependent inactivation process independently of PIP2. While these data are consistent with altering PIP2 binding (which cannot be easily untangled from its functional effect on Na-dependent inactivation), a primary effect on Na-inactivation, rather than PIP2 binding, cannot be fully ruled out. A more extensive mutagenic study, based on other regions of the di-C8 PIP2 binding site, would have given more depth to this work and might have been more revealing mechanistically.

      The SEA0400 aspect of the work does not integrate particularly well with the rest of the manuscript. This study confirms the previously reported structure and binding site for SEA0400 but provides little further information. While interesting speculation is presented regarding the connection between SEA0400 inhibition and Na-dependent inactivation, further experiments to test this idea are not included here.

      Comments on revisions:

      (1) The competition assay data for di-C8-PIP2 and PIP2 is a nice addition, but in its description in the text, the authors should be a bit more circumspect about their conclusions, based on the possibility/probability that the effect observed is actually non-competitive (as detailed above).<br /> (2) The authors should acknowledge the formal possibility that the functional effects of the mutations studies are a consequence of a direct effect on Na-dependent inactivation, independent of PIP2 binding.<br /> (3) The authors might strengthen their arguments for combining studies on PIP2 and SEA0400.<br /> (4) The authors could be clearer where their work on SEA0400 extends beyond the previously published observations.

    3. Reviewer #3 (Public review):

      NCXs are key Ca2+ transporters located on the plasma membrane, essential for maintaining cellular Ca2+ homeostasis and signaling. The activities of NCX are tightly regulated in response to cellular conditions, ensuring precise control of intracellular Ca2+ levels, with profound physiological implications. Building upon their recent breakthrough in determining the structure of human NCX1, the authors obtained cryo-EM structures of NCX1 in complex with its modulators, including the cellular activator PIP2 and the small molecule inhibitor SEA0400. Structural analyses revealed mechanistically informative conformational changes induced by PIP2 and elucidated the molecular basis of inhibition by SEA0400. These findings underscore the critical role of the interface between the transmembrane and cytosolic domains in NCX regulation and small molecule modulation. Overall, the results provide key insights into NCX regulation, with important implications for cellular Ca2+ homeostasis.

      Comments on revisions:

      The authors have adequately addressed my previous comments.

    1. eLife Assessment

      This study presents valuable findings with practical and theoretical implications for drug discovery, particularly in the context of repurposing cipargamin CIP for the treatment of Babesia spp. The evidence is solid with the methods, data, and analyses broadly supporting the claims. The paper will be of great interest to scientists in drug discovery, computational biology, and microbiology.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors present the repurposing of cipargamin (CIP), a known drug against plasmodium and toxoplasma against babesia. They proved the efficacy of CIP on babesia in the nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of babesia. Overall, the conclusions drawn by the authors are well justified by the data presented. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      The authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

    3. Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by a lack of adaptive immunity.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection.

      Comments on revisions:

      The authors have edited the manuscript and, in doing so, have addressed all queries pertaining to experimental design. The authors have decided to keep the discussion unchanged, but have replied to this reviewer regarding comments on interpretation of some data. The reader could have benefited from the authors' explanation. Nonetheless, the manuscript in its present form describes a valuable and significant body of work.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors have tried to repurpose cipargamin (CIP), a known drug against Plasmodium and Toxoplasma against Babesia. They proved the efficacy of CIP on Babesia in nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      Authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      We appreciate your positive feedback. Your acknowledgment reinforces our commitment to rigor and thoroughness in our research.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na+ ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by lack of adaptive immunity.

      Thank you for the comments and for your time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. In the SCID mouse model, cipargamin was tested in combination with tafenoquine but not with atovaquone and/or azithromycin, although the latter combination is often used as first-line therapy for human babesiosis caused by Babesia microti.

      Thank you for your insightful comments. We agree that using a single daily dose over 7 days is one of the limitations in the in vivo trial. Our main goals were to demonstrate cipargamin's efficacy and understand its antibabesial agent mechanism. For future work, we plan to conduct dose‐optimization studies to determine the lowest effective dose in vivo. Regarding the drug combination in the SCID mouse model, although atovaquone and/or azithromycin are frequently used as first-line therapies for human babesiosis, resistance to these traditional drugs is emerging. Based on this challenge, we opted to evaluate a combination with tafenoquine as a novel partner, aiming to overcome resistance issues and improve therapeutic outcomes.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      None other than some minor grammatical mistakes.

      We have corrected the grammatical mistakes.

      Reviewer #3 (Recommendations for the authors):

      The revised manuscript is much improved. I have the following comments.

      Comment 1: Atovaquone plus azithromycin is effective against Babesia microti (Figure 1C) but not against Babesia rodhaini (Figure 1E). It would be valuable to provide a possible explanation.

      Thank you for highlighting this issue. One potential explanation is that B. microti and B. rodhaini might have intrinsic differences in drug sensitivity and susceptibility. A previous study reported that both species possess a unique linear monomeric mitochondrial genome with a dual flip-flop inversion system, which generates four distinct genome structures (Hikosaka et al., 2012). In addition, previous studies have shown that mitochondria-associated energy production is greater in B. microti than in B. rodhaini (Shikano et al., 1998). This suggests that B. microti, whose metabolism is largely driven by mitochondrial function, may be more susceptible to drugs (like atovaquone) that induce parasite death by disrupting mitochondrial targets such as cytochrome b (Wormser et al., 2010). Moreover, B. rodhaini tends to proliferate more rapidly and causes acute infections, which may outpace any drug effects. Further, the rapid proliferation of apicomplexan parasites, as is the case in Plasmodium (Salcedo-Sora et al., 2014), Theileria (Metheni et al., 2015), and B. rodhaini (Rickard, 1970; Shikano et al., 1995), has been ascribed to glycolysis as the primary energy source. This may have contributed to the reduced efficacy of atovaquone and azithromycin in B. rodhaini-infected mice in the current study. Nonetheless, we plan to explore these interspecies differences in our future work.

      Comment 2: The relapse that follows a 7-day treatment with cipargamin is transient in BALB/ mice infected with Babesia rodhaini (Figure 1E) but persistent in SCID mice infected with Babesia microti (Figure 5C). It would be valuable to provide a possible explanation.

      Thank you for your insightful comment. One possible explanation is the difference in immune status between the two mouse models. BALB/c mice have a fully functional immune system that can likely clear residual parasites following a transient relapse after cipargamin treatment. In contrast, SCID mice lack an adaptive immune response, which might allow residual B. microti parasites to persist and cause a sustained relapse. Additionally, intrinsic differences between B. rodhaini and B. microti, such as growth rate or drug susceptibility, could also play a role. We plan to explore these factors in future studies.

      Comment 3: The effect of cipargamin on parasite pH is the greatest when assessed 4 to 8 min after exposure is initiated (Figure 3E). Yet, resistance of parasites that carry a mutation in ATP4, the target of cipargamin, was assessed 20 min after cipargamin addition. At this time point, cipargamin has very little effect (Figure 3E). Accordingly, data reported in Figure 3G are of limited value.

      Thank you for your comment. The initial pH increase we see around 4 to 8 minutes likely reflects the rapid inhibition of ATP4-mediated Na⁺/H⁺ exchange by cipargamin, which quickly alkalinizes the cell. However, after the initial increase, compensatory processes, such as proton influx or metabolic acid production, gradually restored the pH, resulting in a later decline. Although assessing the pH level at 20 minutes may have recorded less dramatic changes, it still allowed us to compare the sustained differences between wild-type and mutant strains. We agree that including earlier time points for the mutants might provide further insight and we will consider this in our future work.

      Comment 4: In Figure 3H, please report the lack of statistical significance between wild-type parasites and parasites that carry the mutation L921V.

      In Figure 3H, the ATPase activity in erythrocytes infected with wild-type parasites (6.31 ± 1.20 nmol Pi/mg protein/min) is higher than that of the L921V mutation (5.11 ± 0.50 nmol Pi/mg protein/min), but the difference is not statistically significant (P = 0.095), so no asterisk was added.

      Comment 5: Tafenoquine was administered as a single 20 mg/kg dose. Please specify whether this dose is for tafenoquine succinate or tafenoquine base.

      Thank you for raising this point. In our study, the single 20 mg/kg dose refers to tafenoquine succinate. We have clarified this detail in the revised manuscript (Line 40).

      Comment 6: A single dose of 20 mg/kg tafenoquine succinate was first tested in the SCID mouse model of severe babesiosis by Mordue et al (JID 2019), not by Liu et al. (JID 2024). Please amend discussion accordingly (line 311). As correctly stated in the discussion, the single 20 mg/kg dose was not sufficient to prevent relapse of Babesia microti in the study by Mordue et al. Please provide a possible explanation for why no parasitemia was detected for 90 days in your SCID model (Figure 5C).

      Thank you for your comment. We have modified the suggested citation (Line 309). As noted by Mordue et al. (JID 2019), a single 20 mg/kg dose of tafenoquine succinate was insufficient to prevent relapse in their SCID mouse model using B. microti (ATCC 30221 Gray strain). In our study, however, no parasitemia was detected for 90 days (Figure 5C) using the B. microti Peabody mjr strain (ATCC PRA-99). Differences in the parasite strain and the timing of treatment relative to infection may have contributed to the extended suppression of parasitemia observed in our study. We plan to explore these aspects in future work.

      Comment 7: Real-time PCR was used to confirm eradication of Babesia microti infection (Figure 5D). Please specify the blood volume from which genomic DNA was extracted for each mouse. Please specify the amount of genomic DNA (i.e., not the volume) used in each reaction. Please explain how/why the cut-off was set at 35 cycles. What were the Ct values when blood was obtained from uninfected mice? For infected mice treated with cipargamin plus tafenoquine, there was no amplification. Was each reaction subjected to a maximum of 40 cycles (as suggested by Figure 5D)?

      In our qPCR assay, genomic DNA was extracted from 200 µL of blood per mouse (Line 458). In each reaction, we used 100 ng of genomic DNA (Line 464), and the thermocycling conditions were set at 40 cycles. We set the cut-off at 35 cycles based on our optimization experiments: samples with Ct values ≤ 35 consistently indicated the presence of parasite DNA, while samples without parasite DNA (distilled water and DNA from uninfected mice) had CT values > 35 cycles or undetectable. Although each reaction was run for 40 cycles, for our analysis, we defined samples as negative if no signal was observed beyond cycle 35. In mice treated with cipargamin plus tafenoquine, no signal was detected until 40 cycles, indicating the absence of parasite DNA in the samples.

      Comment 8:  Persistence of parasite DNA in blood of tafenoquine treated mice highlights the limitation of PCR to assess persistence of infection. That is, PCR cannot distinguish between viable parasites and non-viable (or dead) parasites. An adoptive transfer of blood to immunocompromised mice can help determine whether persistence of DNA is due to persistence of viable parasites. Because the experiment was carried out in SCID mice, no adoptive transfer is needed. Few parasites are required for a successful infection of immunocompromised mice (SCID mice included). Given that parasitemia never rose following treatment of SCID mice with a single dose of tafenoquine, it is highly likely that parasite DNA detected on day 90 post-infection in these tafenoquine treated mice came from persistent non-viable/dead parasites.

      We appreciate your comment and acknowledge that the use of PCR has limitations in differentiating between live and dead parasites. It is possible that the residual DNA may represent a small population of dormant parasites that are not actively replicating and thus remain below the detection threshold of parasitemia. Even in highly immunocompromised SCID mice, such dormant parasites might persist without causing overt infection under our experimental conditions. An adoptive transfer experiment in SCID mice, although not strictly necessary, could validate whether the detection of low levels of DNA comes from viable parasites capable of reactivating under different circumstances. Future studies using more sensitive viability assays or adoptive transfer approaches could provide further insights into this possibility.

    1. eLife Assessment

      This study provides important insights as to how interacting brain areas produce movement during the execution of a skilled multi-directional reaching task. Using a combination of single neuron and neural population analysis, optogenetic stimulation, and computational models, the authors provide convincing evidence of an asymmetrical influence between mouse premotor and motor cortex during the execution of a well-practiced behaviour. This asymmetry can only be captured by some but not all population analysis methods, which is a key lesson to the field in and of itself. Analyzing how activity that is shared and private to these areas relates to different aspects of movements, and why different methods provide different outcomes regarding the nature of inter-area interactions would further strengthen this work.

    2. Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area, than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes of the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components in the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of the variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results emphasize the challenges in studying interactions between brain regions with reciprocal interactions, multiple external inputs, and recurrent within-region connections.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like hierarchy.

      The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the reciprocal interaction and similarity in neural activity across RFA and CFA is an important observation that is supported by the authors' findings, the evidence for a hierarchical interaction between the two regions appears to be weaker. The primary evidence for a hierarchical interaction comes from a causal optogenetic manipulation, carried out at the onset of the reaching movement and conducted with n = 3 in each experimental group, which shows an effect in both regions, yet the effect is greater when silencing the activity in RFA and examining the resulting change in CFA, than vice versa. Analysis of the simultaneously recorded neural activity, on the other hand, reveals mostly no clear hierarchy with leading or lagging dynamics between the regions. The findings of the optogenetic manipulation might be more compelling if similar effects were observed when the same manipulation was applied at different stages of movement preparation and execution, indicating a consistent interaction that is independent from the movement phase.

      The methods used to investigate hierarchical interactions through analysis of simultaneously recorded activity yielded inconsistent results. For instance, CCA and PLS showed no clear lead-lag relationship, while DLAG provided some evidence suggesting RFA leads CFA. Overall, these methods largely failed to demonstrate a clear hierarchical interaction. Assuming a partial hierarchy exists, this inconsistency may indicate that the hierarchy is not reflected in the activity patterns or that these analytical methods are inadequate for detecting such interactions within complex neural networks that are influenced by multiple external inputs, reciprocal inter-regional connections, and dominant intra-regional recurrent activity.

      As is also argued by the authors, these inconsistent findings underscore the need for caution when interpreting results from similar analyses used to infer inter-regional interactions from neural activity patterns alone. However, the study lacks sufficient explanation for why different methods yielded different results and more elaborate clarification is needed for the findings presented. For example, in the population-level analyses using CCA and PLS, the authors show that both techniques reveal components that are highly similar across regions and explain a substantial portion of each region's variance. Yet, shifting the activity of one region relative to the other to explore potential lead-lag relationships does not alter the results of these analyses. If the regions' activities were better aligned at some unknown true lead-lag time (or aligned at zero), one would expect a peak in alignment within the tested range, as is observed when these same analyses are applied to activity within a single region. It is thus unclear why shifting one region's activity relative to the other does not change the outcome. The interpretation of these results therefore, remains ambiguous and would benefit from further clarification.

    3. Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can across-area dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      Comments on revisions:

      I appreciate the authors' thoughtful revisions in response to prior reviews, which I believe have substantially improved the manuscript. In particular, I found the addition of the new section "Manifestations of hierarchy in firing patterns" to be valuable, as it begins to address some of the more complex and potentially conflicting observations

    4. Reviewer #3 (Public review):

      This study investigates how two cortical regions which are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling. The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a two-area model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely include this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to control of movement details. An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Further experimental investigation is needed to separate this hypothesis from other possibilities.

      Comments on revisions:

      The authors have improved the manuscript by reviewing several aspects of the text and the addition of supplemental materials. I believe these revisions have clarified some important aspects of the results.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. eLife Assessment

      This study provides valuable findings that MK2 inhibitor CMPD1 can inhibit the growth, migration and invasion of breast cancer cells both in vitro and in vivo. The evidence supporting the claims of the authors is solid, although the detailed molecular mechanism and additional animal experiments would strengthen the paper. This study will be of interest to the breast cancer field.

    2. Reviewer #1 (Public review):

      In this paper, the authors reveal that the MK2 inhibitor CMPD1 can inhibit the growth, migration and invasion of breast cancer cells both in vitro and in vivo by inducing microtubule depolymerization, preferentially at the microtubule plus-end, leading to cell division arrest, mitotic defects, and apoptotic cell death. They also showed that CMPD1 treatment upregulates genes associated with cell migration and cell death, and downregulates genes related to mitosis and chromosome segregation in breast cancer cells, suggesting a potential mechanism of CMPD1 inhibition in breast cancer. Besides, they used the combination of an MK2-specific inhibitor, MK2-IN-3, with the microtubule depolymerizer vinblastine to simultaneously disrupt both the MK2 signaling pathway and microtubule dynamics, and they claim that inhibiting the p38-MK2 pathway may help to enhance the efficacy of MTAs in the treatment of breast cancer.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores the potential of inhibiting the p38-MK2 signaling pathway to enhance the efficacy of microtubule-targeting agents (MTAs) in breast cancer treatment using a dual-target inhibitor.

      Strengths:

      The study identifies the p38-MK2 pathway as a promising target to enhance the efficacy of microtubule-targeting agents (MTAs), offering a novel therapeutic strategy for breast cancer treatment. The study also employs a wide range of techniques, especially live-cell imaging, to assess the microtubule dynamics in TNBC cells. The revised manuscript added new in vitro and in vivo evidence that furtherly supported the conclusions.

      Comments on revisions:

      The authors have appropriately addressed all of my comments and concerns. Specifically, they performed additional in vitro experiments using MCF10A cells and p53 knockout cells to determine the IC50 of CMPD1. They also repeated the in vivo treatment experiment and evaluated the toxicity of the drug treatment in the CAL-51 model. Furthermore, they provided genetic evidence for the combination treatment. I'm satisfied with the revision and have no further major comments. Minor comment: make sure the name of the chemo drug shown in Fig. 3 is consistent.

    4. Reviewer #3 (Public review):

      Summary:

      The authors demonstrated MK2i could enhance the therapeutic efficacy of MTAs. With the tumour xenograft and migration assay, the author suggested that the p38-MK2 pathway may serve as a promising therapeutic target in combination with MTAs in cancer treatment.

      Strengths:

      The authors provided a potential treatment for breast cancer.

      Comments on revisions:

      A xenograft experiment should be included to evaluate the synergistic effect of MK2i and vinblastine.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, the authors reveal that the MK2 inhibitor CMPD1 can inhibit the growth, migration, and invasion of breast cancer cells both in vitro and in vivo by inducing microtubule depolymerization, preferentially at the microtubule plus-end, leading to cell division arrest, mitotic defects, and apoptotic cell death. They also showed that CMPD1 treatment upregulates genes associated with cell migration and cell death, and downregulates genes related to mitosis and chromosome segregation in breast cancer cells, suggesting a potential mechanism of CMPD1 inhibition in breast cancer. Besides, they used the combination of an MK2-specific inhibitor, MK2-IN-3, with the microtubule depolymerizer vinblastine to simultaneously disrupt both the MK2 signaling pathway and microtubule dynamics, and they claim that inhibiting the p38-MK2 pathway may help to enhance the efficacy of MTAs in the treatment of breast cancer. However, there are a few concerns, including:

      (1) What is the effect of CMPD1 on breast cancer metastasis?

      In this study, we hypothesized that the MK2 signaling pathway could synergize with microtubule-targeting agents (MTAs) to enhance anti-cancer efficacy. We utilized CMPD1 as a potent dual-function inhibitor, targeting both MK2 and microtubule dynamics. By simultaneously inhibiting these pathways, CMPD1 not only shows the therapeutic impact of MTAs, but also significantly suppresses breast cancer cell migration and invasion. Therefore, we propose that CMPD1, through its dual inhibition of MK2 activity and microtubule dynamics, may offer enhanced specificity and efficacy in preventing breast cancer metastasis and limiting tumor progression.

      (2) The mechanism is lacking as to how MK2 inhibitors enhance the efficacy of MTAs.

      Thank you for the valuable suggestion. We agree that our current findings do not fully elucidate the underlying mechanism by which MK2 inhibition synergistically enhances the efficacy of MTAs. We recognize this as an important area for further investigation and are committed to exploring the molecular interplay between MK2 signaling and microtubule dynamics in future studies. A deeper mechanistic understanding will be critical to establishing a strong rationale for the potential co-treatment of MK2 inhibitors and MTAs in clinical breast cancer therapy.

      Reviewer #2 (Public review):

      Summary:

      This study explores the potential of inhibiting the p38-MK2 signaling pathway to enhance the efficacy of microtubule-targeting agents (MTAs) in breast cancer treatment using a dual-target inhibitor.

      Strengths:

      The study identifies the p38-MK2 pathway as a promising target to enhance the efficacy of microtubule-targeting agents (MTAs), offering a novel therapeutic strategy for breast cancer treatment. In addition, the study employs a wide range of techniques, especially live-cell imaging, to assess the microtubule dynamics in TNBC cells.

      We sincerely appreciate your recognition of the significance and impact of our work.

      Weaknesses:

      The study primarily uses RPE1 cells as the control for normal cells, which may not fully capture the response of normal mammary epithelial cells. While CMPD1 is shown to be effective in suppressing tumor growth in MDA-MB-231 xenograft, the study lacks detailed toxicity data to confirm its safety profile in vivo.

      Thank you for your valuable suggestions. In the revised manuscript, we have included CMPD1 treatment in MCF10A cells, a more appropriate non-transformed control line commonly used in breast cancer research. Notably, MCF10A cells exhibited results similar to those observed in RPE1 cells, further reinforcing our conclusion that breast cancer cells display increased sensitivity to CMPD1 treatment. These new findings are presented in Figure 2-Supplement 1A-C. Additionally, we performed further xenograft experiments using CAL-51 and MDA-MB-231 cells. We collected data on tumor growth, mouse body weight, survival rates, and other relevant parameters to comprehensively assess toxicity. The newly obtained results are presented in Figure 3F-G and Figure 3-Supplement 1-3.

      Reviewer #3 (Public review):

      Summary:

      The authors demonstrated MK2i could enhance the therapeutic efficacy of MTAs. With Tumor xenograft and migration assay, the author suggested that the p38-MK2 pathway may serve as a promising therapeutic target in combination with MTAs in cancer treatment.

      Strengths:

      The authors provided a potential treatment for breast cancer.

      Thank you for recognizing the importance and significance of our work.

      Weaknesses:

      (1) In Figure 2, the authors used a human retinal pigment epithelial-1 (RPE1) cell line to show that breast cancer cells are more sensitive to CMPD1 treatment. MCF10A cells would be suggested here as a suitable control. Besides, to compare the sensitivity, IC50 indifferent cell lines should be measured.

      In the revised manuscript, we have addressed these points by determining the IC50 values for CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. These new results are presented in Figure 2-Supplement Figure 3.

      (2) The data of MDA-MB-231 in Figure 1D is not consistent with CAL-51 and T47D, also not consistent with the data in Figures 2B-C.

      In the revised manuscript, we have included all relevant statistical analyses in Figure 1D. In MDA-MB-231 cells, there are no statistically significant differences in mitotic duration between 1 µM and 5 µM, 5 µM and 10 µM, or 1 µM and 10 µM CMPD1 treatments. Similarly, no significant differences are observed between 1 µM and 5 µM or 5 µM and 10 µM CMPD1 treatments in CAL-51 cells, and between 5 µM and 10 µM in T-47D cells. These results suggest that mitotic duration does not exhibit a clear dose-dependent relationship within the 1–10 µM range, likely because mitotic arrest has reached a near-plateau effect at these concentrations.

      It is also important to note that the experimental conditions in Figures 1 and 2 are fundamentally different. Figure 1 investigates the effects of higher concentrations of CMPD1 (≥1 µM), which severely disrupt microtubule organization and result in robust mitotic arrest, with cells arrested in mitosis for over 8 hours. In contrast, the conditions in Figure 2 utilize much lower concentrations of CMPD1 (10–50 nM), which are insufficient to cause complete microtubule depolymerization, but are capable of inducing a subtle yet statistically significant mitotic delay, particularly in breast cancer cell lines. These lower concentrations were chosen to mimic clinically relevant intratumoral drug levels. Previous studies have reported that paclitaxel (PTX) concentrations in patient tumors approximate ~50 nM when modeled in vitro. At these physiologically relevant levels, PTX does not induce strong mitotic arrest but instead causes moderate delays that result in division errors and chromosomal instability, ultimately contributing to cancer cell death. In this study, the conditions used in Figure 2 emulate these clinically relevant concentrations for CMPD1. We found that, similar to PTX, low-dose CMPD1 induces a slight but significant mitotic delay without triggering a full mitotic arrest. Notably, unlike PTX, CMPD1 appears to exert this effect selectively in breast cancer cells, contributing to mitotic errors and potentially enhancing therapeutic efficacy through targeted chromosomal instability.

      (3) To support the authors' conclusion in Figure 5, an additional animal experiment performed by tail vein injection would be helpful.

      While current technical limitations have precluded us from conducting this suggested experiment in this study, we have performed complementary xenograft studies using CAL-51 cells treated with CMPD1. These experiments included a comprehensive toxicity analysis. Furthermore, we carried out an in vitro migration assay using CAL-51 cells under combined treatment with the MK2 inhibitor and vinblastine. These additional findings are presented in Figure 3–Supplement 1–3 and Figure 6–Supplement 3. We recognize the importance of the suggested tail vein injection approach and are actively pursuing further mechanistic studies, including this experiment, in our ongoing and future work.

      (4) Page 14, to evaluate the combination result of MK2i and vinblastine, an in vivo animal assay must be performed.

      We appreciate the reviewer’s valuable suggestion. We are actively investigating the synergistic mechanisms between the MK2 inhibitor and microtubule-targeting agents (MTAs). In future studies, we plan to extend our findings by conducting xenograft experiments to further evaluate their therapeutic potential in vivo.

      (5) The authors used RNA-seq to show some pathways affected by CMPD1. What are the key/top genes that were affected? How about the mechanism?

      In the revised manuscript, we have included the top 20 upregulated and downregulated genes identified from RNA-seq analysis using MDA-MB-231 cells. This new data is presented in Figure 6-Supplement Figure 4. Gene Ontology (GO) Biological Process (BP) pathway enrichment analysis revealed that the most significantly enriched pathways among upregulated genes are associated with cell migration, whereas the downregulated genes are primarily involved in mitosis and chromosome segregation. These transcriptional changes are consistent with the phenotypic outcomes observed in our experiments, supporting the functional relevance of CMPD1 treatment. However, further investigation will be necessary to elucidate the detailed molecular mechanisms underlying these effects.

      (6) Line 127, more experiments should be involved to support the conclusion.

      In the revised manuscript, we have addressed this point by performing additional experiments, including determination of the IC₅₀ values of CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. We also conducted live-cell imaging analyses using MCF10A cells. These new results further reinforce our conclusion that breast cancer cells are more sensitive to CMPD1 treatment than normal breast epithelial cells, and that this sensitivity is independent of p53 status. The new data are presented in Figure 2-Supplement Figures 1 and 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1D: As the concentration of CMPD1 increased, the mitotic duration of MDA-MB-231 cells decreased, why was that?

      Although there appears to be a slight decrease in mitotic duration with increasing concentrations of CMPD1, our quantitative analysis reveals no statistically significant differences among the 1 to 10 µM treatment groups in MDA-MB-231 cells. In the revised manuscript, we have included all relevant statistical analyses in Figure 1D for clarity. Importantly, all CMPD1-treated groups exhibit a pronounced and statistically significant prolongation of mitosis compared to the DMSO-treated control. While the average mitotic duration in control cells is approximately 30 minutes, cells exposed to 1–10 µM CMPD1 consistently display mitotic durations exceeding 8 hours, indicating a strong and sustained mitotic arrest across this concentration range.

      Reviewer #2 (Recommendations for the authors):

      (1) The rationale for using RPE1 as normal cell control instead of normal mammary epithelial cells as control is unclear. Using normal mammary epithelial cells such as MCF10A for the study is recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have included additional experiments using non-transformed mammary epithelial MCF10A cells. The new data, presented in Figure 2-Supplement Figures 1 and 3, include both IC50 measurements and live-cell imaging analyses. These results further support our conclusion that breast cancer cells are significantly more sensitive to CMPD1 treatment compared to normal mammary epithelial cells.

      (2) It is intriguing that CAL-51 cells are more sensitive to CMPD1 than MDA-MB-231 cells; examining how p53 signaling changes in these cells would be worthwhile.

      We appreciate this insightful comment. In the revised manuscript, we have measured the IC₅₀ values for both CAL-51 and CAL-51 p53 knockout (p53KO) cells. The results show no significant difference in CMPD1 sensitivity between the two, suggesting that the enhanced sensitivity of CAL-51 cells is independent of p53 status. These new findings are presented in Figure 2—Supplement Figure 3.

      (3) Figures S1A and B are not described and cited in the main text.

      We apologize for this oversight. In the revised manuscript, we have correctly cited and described Figures S1A and B (Figure 2-Supplement Figure 2 A-B in revised manuscript) in the main text.

      (4) I'm not that convinced by the conclusion made from Lines 201-204. First, Figure S2C, which is the growth of tumor volume, does not reflect the toxicity of the drug treatment. No additional data evaluating the toxicity (such as body weight change) under the regimen was shown. Second, although the tumor weight by the endpoint indicated some anti-tumor effect in the MDA-MB-231 xenograft model, the tumor volume does not show the same pattern (the dot lines do not well distinguish which group from which). I would suggest repeating the in vivo experiment using CAL-51 cells since it is more sensitive to CMPD1 according to the previous data.

      Thank you for this thoughtful and constructive feedback. In the revised manuscript, we have addressed these concerns through several additional experiments. We performed new xenograft studies using CAL-51 TNBC cells, in parallel with further toxicity-focused analyses in the MDA-MB-231 model. Consistent with previous results, CMPD1 treatment significantly suppressed tumor growth in CAL-51 xenografts (Figure 3F-G), further supporting its efficacy in a more sensitive cell line. To evaluate drug-associated toxicity, we measured body weight changes throughout the course of treatment. CMPD1-treated mice maintained a comparable weight gain to the control group, whereas mice treated with paclitaxel (PTX) showed significantly reduced body weight (Figure 3-Supplement Figure 2A). Notably, animal deaths occurred only in the PTX-treated groups in both MDA-MB-231 and CAL-51 models (Figure 3-Supplement Figure 2B). We also assessed organ toxicity, including both anatomical and functional evaluations of the kidney and liver, and observed no significant damage in CMPD1-treated mice (Figure 3-Supplement Figures 3A-B and 3D). Furthermore, white blood cell (WBC) counts remained stable in the CMPD1 group, while PTX treatment led to a significant reduction (Figure 3-Supplement Figures 3C-D). These additional data provide strong evidence for the anti-tumor efficacy and lower toxicity of CMPD1 in vivo.

      (5) While I appreciate the combination effect of treating cells with the MK2 inhibitor with vinblastine. I would consider using genetic knockdown as a complementary approach to demonstrate that inhibiting the p38-MK2 pathway synergized with microtubule depolymerizing agents. In addition, could inhibition of the p38-MK2 pathway alone induce the cell growth inhibition observed with CMPD1 treatment?

      Thank you for these important suggestions. In the revised manuscript, we have incorporated siRNA-mediated knockdown of MK2 in combination with vinblastine treatment. This genetic approach revealed synergistic effects on mitotic index and mitotic errors, closely mirroring the phenotypes observed with pharmacological co-treatment using the MK2 inhibitor and vinblastine (Figure 6-Supplement Figure 2A-C). These results further validate the role of the p38-MK2 pathway in modulating mitotic progression in the presence of MTAs. To address whether MK2 inhibition alone is sufficient to impair cell growth, we performed validation experiments using the MK2 inhibitor at 10 µM. At this concentration, the inhibitor effectively blocked phosphorylation of Hsp27, a major downstream substrate of MK2, under H2O2-induced ROS stress conditions (Figure 6-Supplement Figure 1A-B), confirming MK2 signaling pathway inhibition. However, treatment with the MK2 inhibitor alone did not significantly affect cell proliferation, as shown by a 4-day growth curve analysis in CAL-51 cells (Figure 6-Supplement Figure 1C). These findings suggest that inhibition of the p38-MK2 pathway alone is not sufficient to suppress cancer cell growth, and that its synergistic interaction with MTAs, such as vinblastine, is essential for the observed anti-proliferative effects.

      (6) Phenotypic studies (such as anchorage-independent growth and cell migration and invasion assay) of combining MK2 inhibitor with vinblastine in TNBC cells are recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have conducted cancer cell migration assays using CAL-51 TNBC cells treated with control, MK2 inhibitor alone, vinblastine alone, or the combination of both. Our results demonstrate that the combination treatment significantly enhances the inhibition of cell migration compared to either agent alone (Figure 6-Supplement Figure 3A-C). These findings provide additional phenotypic evidence supporting the synergistic interaction between MK2 inhibition and microtubule-targeting agents in TNBC cells.

      Reviewer #3 (Recommendations for the authors):

      The authors can utilize diverse experiments to support their conclusions.

      Thank you for this important suggestion. In the revised manuscript, we have conducted a series of additional experiments to robustly support our conclusions.

      These include:

      (1) Xenograft studies using CAL-51 TNBC cells, along with comprehensive toxicity evaluations.

      (2) CMPD1 sensitivity analysis in non-transformed MCF10A mammary epithelial cells.

      (3) IC50 measurements in MDA-MB-231, CAL-51, CAL-51 p53 knockout, and MCF10A cells.

      (4) Cell migration assays assessing the combination effects of MK2 inhibitor and vinblastine

      (5) siRNA-mediated genetic knockdown of MK2 to complement pharmacological findings

      Collectively, these additional data sets substantially strengthen the evidence base for our conclusions and provide a more comprehensive mechanistic understanding.

    1. eLife Assessment

      This manuscript makes a valuable contribution to the field by uncovering a molecular mechanism for miRNA intracellular retention, mediated by the interaction of PCBP2, SYNCRIP, and specific miRNA motifs. Overall, the findings are convincing and advance our understanding of RNA-binding protein-mediated miRNA sorting, providing deeper insights into miRNA dynamics.

    2. Reviewer #1 (Public review):

      In this study, Marocco and colleages perform a deep characterization of the complex molecular mechanism guiding the recognition of a particular CELLmotif previously identified in hepatocytes in another publication. Having miR-155-3p with or without this CELLmotif as initial focus, authors identify 21 proteins differentially binding to these two miRNA versions. From these, they decided to focus on PCBP2. They elegantly demonstrate PCBP2 binding to miR-155-3p WT version but not to CELLmotif-mutated version. miR-155-3p contains a hEXOmotif identified in a different report, whose recognition is largely mediated by another RNA-binding protein called SYNCRIP. Interestingly, mutation of the hEXOmotif contained in miR-155-3p did not only blunt SYNCRIP binding, but also PCBP2 binding despite the maintenance of the CELLmotif. This indicates that somehow SYNCRIP binding is a pre-requisite for PCBP2 binding. EMSA assay confirms that SYNCRIP is necessary for PCBP2 binding to miR-155-3p, while PCBP2 is not needed for SYNCRIP binding. Then authors aim to extend these finding to other miRNAs containing both motifs. For that, they perform a small-RNA-Seq of EVs released from cells knockdown for PCBP2 versus control cells, identifying a subset of miRNAs whose expression either increases or decreases. The assumption is that those miRNAs containing PCBP2-binding CELLmotif should now be less retained in the cell and go more to extracellular vesicles, thus reflecting a higher EV expression. The specific subset of miRNAs having both the CELLmotif and hEXOmotif (9 miRNAs) whose expressions increase in EVs due to PCBP2 reduction is also affected by knocking-down SYNCRIP in the sense that reduction of SYNCRIP leads to lower EV sorting. Further experiments confirm that PCBP2 and SYNCRIP bind to these 9 miRNAs and that knocking down SYNCRIP impairs their EV sorting.

      In the revised manuscript, the authors have addressed most of my concerns and questions. I believe the new experiments provide stronger support for their claims. My only remaining concern is the lack of clarity in the replicates for the EMSA experiment. The one shown in the manuscript is clear; however, the other three replicates hardly show that knocking down SYNCRIP has an effect on PCBP2 binding. Even worse is the fact that these replicates do not support at all that PCBP2 silencing has no effect on SYNCRIP binding, as the bands for those types of samples are, in most of the cases, not visible. I think the authors should work on repeating a couple of times EMSA experiment.

    3. Reviewer #2 (Public review):

      Summary:

      The author of this manuscript aimed to uncover the mechanisms behind miRNA retention within cells. They identified PCBP2 as a crucial factor in this process, revealing a novel role for RNA-binding proteins. Additionally, the study discovered that SYNCRIP is essential for PCBP2's function, demonstrating the cooperative interaction between these two proteins. This research not only sheds light on the intricate dynamics of miRNA retention but also emphasizes the importance of protein interactions in regulating miRNA behavior within cells.

      Strengths:

      This paper makes important progress in understanding how miRNAs are kept inside cells. It identifies PCBP2 as a key player in this process, showing a new role for proteins that bind RNA. The study also finds that SYNCRIP is needed for PCBP2 to work, highlighting how these proteins work together. These discoveries not only improve our knowledge of miRNA behavior but also suggest new ways to develop treatments by controlling miRNA locations to influence cell communication in diseases. The use of liver cell models and thorough experiments ensures the results are reliable and show their potential for RNA-based therapies

      Weaknesses:

      The manuscript is well-structured and presents compelling data, but I noticed a few minor corrections that could further enhance its clarity:

      Figure References: In the response to Reviewer 1, the comment states, "It's not Panel C, it's Panel A of Figure 1"-this should be cross-checked for consistency.<br /> Supplementary Figure 2 is labeled as "Panel A"-please verify if additional panels (B, C, etc.) are intended.

      Western Blot Quality: The Alix WB shows some background noise. A repeat with optimized conditions (or inclusion of a cleaner replicate) would strengthen the data. Adding statistical analysis for all WBs would also reinforce robustness.

      These are relatively small refinements, and the manuscript is already in excellent shape. With these adjustments, it will be even stronger.

    1. eLife Assessment

      The manuscript by Hawes et al. provides important findings on how striatal projection neurons regulate spontaneous locomotion speed in the context of implicit motivation and distinct contextual valence. Overall, the evidence for the findings is solid, although evidence for the claim that striatonigral projections from the matrix and patches have functionally opposing roles is incomplete. This work will be of broad interest to neuroscientists in the basal ganglia, movement control, and cognition fields.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted.

    3. Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation.

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

    4. Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

    1. eLife Assessment

      This manuscript provides fundamental new insight into the mechanisms linking photoperiod, reproduction function, and feeding activity, using medaka, a genetic model that itself exhibits photoperiodic responses. As well as identifying key neuropeptide genes that are regulated by photoperiod and involved in regulating feeding activity, the authors establish a knockout line for agrp1 using CRISPR Cas9 - based approach, profiting from the extensive use and development on this methodology in medaka. The combination of the RNAseq and quantitative in situ hybridization analysis with the knockout results as well as the study of ovariectomized fish provides compelling evidence implicating agrp1 in feeding regulation in response to photoperiod and reproductive status.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding on how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day-length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this.<br /> The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      Comments on revisions:

      The authors dealt adequately with the comments and suggestions of this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Comments on revised version:

      My concerns from the first review have been addressed. The manuscript's key points are clearly presented, and the conclusions are readily comprehensible

    4. Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since it's reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice, are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mouse, these two neuropeptides are not coexpressed in medaka neurons and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mouse due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      The manuscript has been further strengthened by the inclusion of additional information according to my advice: The analysis of ovariectomized female fish and juvenille fish has now been reported in terms of their feeding behaviour and so provide a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the discussion section has been expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function. Modifications made in order to address technical concerns of the other 2 reviewers have also significantly strengthened the presentation of this work.

      Weaknesses:

      These have now been addressed in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis of how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well-controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding of how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this. The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      We would like to thank Reviewer #1 for the really constructive advice. In the revised manuscript, we provided more information on the molecular characterization of the agrp1 KO-mutant and to emphasize the importance of our present animal model that permits the analysis of neuropeptide effects on feeding behavior in response to seasonal changes of day length.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Weaknesses:

      While medaka is an appropriate model for studying seasonal breeding, the results presented are insufficient to fully support the authors' conclusions.

      Specifically, methods and data analyses are incomplete in justifying the primary claims:<br /> - the procedure for the food intake assay is unclear;

      - the sample size is very small;

      - the statistical analysis is not always adequate.

      Additionally, the discussion fails to consider the possible role of other hormones that may be involved in the feeding mechanism.

      We would like to thank Reviewer #2 for the helpful comments. As the reviewer suggested, we revised the paragraph describing the procedure for the food intake assay to make it much easier for the readers to understand in the revised manuscript. In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017). As for the statistical analyses, we revised our manuscript so that the readers may be convinced with the validity of our statistical analyses.

      Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since its reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour, and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mice, these two neuropeptides are not co-expressed in medaka neurons, and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mice due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long-day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long-day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity, and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic-dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      Weaknesses:

      Some experimental models used in this study, namely ovariectomized female fish and juvenile fish have not been analysed in terms of their feeding behaviour and so do not give a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the scope of the discussion section should be expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function.

      We would like to thank Reviewer #3 for the insightful advice. We added several pertinent sentences describing the ovariectomized female fish and juvenile fish, and our revised manuscript will give more complete view of their feeding regulatory mechanism in the context of reproduction status. In addition, we revised the discussion section to incorporate the valuable suggestion of the Reviewer #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General: the text could profit from a careful editing of errors, including adjusting singular and plural status of nouns and verbs: examples are line 107 noun, line 96 verb suitable text editing software is available to do this task

      Thank you for your suggestion. We thoroughly read the entire manuscript and corrected such errors in the revised manuscript.

      As medaka is a unique genetic vertebrate model to study seasonal effects, it would be interesting to know whether the authors found novel or rather unexpected genes with a differential expression between LD and SD. It is understandable that the authors focused on argrp1 and npyb, as these have already been well studied in mammalian models although not in this context. Novel insights with genes previously not implicated in feeding regulation could underscore the unique nature of medaka as a model.

      We appreciate your kind comments, which we found really encouraging to us. Since we focused on feeding-related peptides, we did not find any novel genes that have not been reported.

      ISH is unreliable as a methodology to quantify expression levels. Yet the authors use this to compare fed and starved females to compare expression levels of agrp1. They use a temporal staining comparison and compare 90-minute and 300-minute staining reactions. However, they do not explain why they use the 90-minute staining time point and why 300 minutes of staining is the "saturation point of staining". They should provide compelling data for their claim and the selection of time points or else refrain from using these (at best) semi-quantitative ISH and provide more detailed (using serial sections) data to quantify the number of expressing cells.

      Anyhow, the quantification of mRNA expression levels may not be that significant when trying to compare different states of gene function, as translational and post-translational steps can have large effects on gene function. This should be discussed adequately.

      Thank you very much for your comments. We conducted ISH by using medaka under LD or SD, not using those under fed or starved conditions. In addition, our previous study demonstrated that the slopes of the increase in the number of cells stained by ISH are also different if there is a difference in the expression level (Mitani et al., 2010). Although we do not have quantitative data of cell numbers, we confirmed that the number of cells expressing agrp1 was saturated around 300 mins in our preliminary experiments, and therefore we terminated the chemogenic reactions at 300 mins. Based on these, we compared the cell ratio of 90 min (beginning of coloring) /300 min (saturation). However, since this analysis may not be worth discussing in detail, we moved this part to the supplementary figure as the reviewer suggested.

      The molecular characterization of the agrp1 ko mutant is a bit thin.

      Line 221: "We obtained agrp1<sup>−/−</sup> medaka, which has lots of amino acid changes in functional site for AgRP1" is a bit vague as a description for the ko-mutation. It would be really helpful if the authors could provide a scheme showing the wt protein with the relevant functional sites alongside the presumptive mutant protein.

      How did the authors verify the molecular nature of their mutation? They should use suitable antibodies and western-blot analysis (maybe reagents from Shainer et al., 2019 work in medaka); in case this is not possible they could isolate & clone the mutant transcript and use in-vitro translation systems to show that the presumptive mutant protein can actually be translated from this transcript. Another strategy could be to use a second non-allelic and (hopefully) non-complementing mutation (ko1/ko2 heterozygots for example) to show that ko-mutation acts the way the authors presume. The authors mention agrp1 ko medaka lines (plural!) in line 520, thus they may have an additional ko allele at hand.

      Thank you very much for your comments. We explained the mutation site in Figure 6-Supplementary Figure 1 (A: DNA sequences and B: predicted amino acid sequence, of WT and mutants). In addition, we added immunohistochemistry data of WT and mutant using anti-AgRP antibody (Figure 6-Supplementary Figure 1C). While AgRP-immunoreactive signals were observed in WT, those were not in agrp1<sup>−/−</sup>. This result suggests that AgRP1 is not functional in agrp1<sup>−/−</sup>.

      Presumably, the authors analysed heterozygous agrp1<sup>+/−</sup> females and found they are as wt. If so the authors should say so.

      Yes, we analyzed food intake of agrp1<sup>+/−</sup>. We added a supplementary figure (Figure 6-Supplementary Figure 2) and a sentence in L. 233-234.

      How about agrp1<sup>−/−</sup> medaka males: do they show a discernible phenotype?

      We analyzed the phenotypes of agrp1<sup>−/−</sup> males but did not describe the results, since the present paper only focused on female-specific feeding behavior.

      agrp1<sup>−/−</sup> females show no significant sensitivity of food intake to day length (Figure 6C). Does their (reduced) oocyte production react to day length? With other words: how much of the seasonal sensitivity is left in agrp1<sup>−/−</sup> females. The authors suggest that E2 acts upstream of agrp1 and therefore some seasonality may still be left in agrp1<sup>−/−</sup> females.

      Although agrp1<sup>−/−</sup> female is suggested to display abnormal seasonality of food intake, agrp1<sup>−/−</sup> female in LD spawns and that in SD does not, indicating that seasonality of gonadal maturation still remains in agrp1<sup>−/−</sup> female.

      The authors show that fshb and lhb are downregulated in agrp1<sup>−/−</sup> females. Is this also the case in wt females at SD?

      Thank you very much for your comment. As described above, agrp1<sup>−/−</sup> can spawn, which indicates that mechanisms for the downregulation of gonadotropins in agrp1<sup>−/−</sup> may be different from that in SD female.

      Figure 1_Supplementary Figure 2: the trends are visible in B and C, however, there is quite some variance between LD1, 2, and 3; the same for SD 1, 2, and 3. Can the authors give an explanation for this?

      Since the data for LD1, 2, and 3 (SD1, 2, and 3) were obtained from different individual fish, the variance may be reasonable. We conducted expression analyses by using RNA-seq to find candidate genes that show larger differences than individual ones.

      Figure 7E: the ovaries are difficult to see and the size bar in the wt picture is missing.

      Thank you very much for your comments. We added a scale bar in the wt picture.

      509 ff: the authors do not describe what exactly the "sham operation" encompasses: were the females just anesthetised or was there an actual operation without removing the ovaries?

      The sham operation group was anesthetized, received an abdominal incision without removing the ovaries, and received skin suture by using a silk thread. We added this explanation in the Method section.

      519 ff: was the agrp1<sup>−/−</sup> ko induced in the d-rR strain to have the same genetic background as the wt fish?

      Exactly. As the reviewer pointed out, the genetic background of agrp1 -/- was the same as that of WT.

      Minor points (Text edits):

      Line 42: change "when" into "where".

      Line: 54 "under the fixed appropriate ambient temperature" change into "while keeping an appropriate temperature constant".

      Line 55: here it would be good to briefly explain what long-day and short-day is so that the reader has an idea about the changes required without having to scroll down to the M&M section. For example LD 14/10 light-dark cycle, SD 10/14 light-dark cycle.

      Line 88: change "measurement" into "measuring".

      Line 96 change eats -> eat.

      Line 107 change female -> females.

      We deeply appreciate the reviewer’s suggestions described above. We corrected them as the reviewer suggested (L. 42, L. 54, L. 55, L. 89, L. 96, L. 107).

      Line 144-145: the sentence "since hypothalamic npy control..." does not make sense. Please correct.

      Thank you very much for your suggestion. We corrected the sentence so that it makes sense (L. 145-146).

      Line 180 and 185: the term here should be "LD induced sexual activity" rather than maturity. Age is the main determinant of maturity whereas light (LD) determines activity, in other words SD females are sexually mature if they are post-puberty stage.

      Thank you very much for your suggestion. Since the sentence “LD-induced sexual maturity” made the reviewer confused, we corrected the sentence “substance(s) from LD-induced mature ovary” or “ovarian maturity”. Even though SD females are at post-puberty stage, their ovaries are immature and do not possess mature oocytes (L. 181).

      Line 222: the authors should include the relevant information about the females: presumably agrp1.

      In Line 226-228, we explained the phenotypes of agrp1 knockout and added information for AgRP1 protein in Figure 6-Supplementary figure 1C.

      Lines 449 ff: authors should state that the analysis was done in females, instead of just writing "medaka". This is also in line with the preceding paragraph of the M&M section.

      Thank you very much for your suggestions. We corrected the sentence as the reviewer suggested (L.469)

      Line 305: change like other mammals -> like in mammals.

      Thank you very much for your suggestion. We corrected the sentence as the reviewer suggested (L. 320)

      Reviewer #2 (Recommendations for the authors):

      (1) The procedure of the food intake assay is not clear.

      - Habituation Period: Medaka were placed into a white cup containing 100 mL of water and allowed to habituate for 5 minutes. However, is 5 minutes sufficient to reduce stress in the fish? A stressed fish does not exhibit the same feeding behavior as an unstressed one.

      Thank you for your comment. We confirmed that 5 minutes is enough for habituation in medaka, since medaka can swim freely in a few minutes after replacement from the tank and show normal feeding behavior.

      - Feeding Protocol: Medaka were fed with 200 μL aliquots of brine shrimp-containing water. This procedure was repeated multiple times. How many times was this feeding procedure repeated? Was it 3, 10, or 100 times?

      Although there was a small variation in each trial, we usually applied tubes about 5 times or so.

      - Brine Shrimp Counting: You collected 10 mL of the breeding water to count the number of uneaten brine shrimp. Can you confirm that sampling 10% of the total volume is representative? Were any tests conducted to validate this? Given that you developed an automated tool to count the brine shrimp, why didn't you count them in all 100 mL?

      The reason for collecting 10 mL is to collect the leftover shrimp as soon as possible. Ten mins after the start of the experiment, we quickly placed a magnetic bar to stir the breeding water so that the shrimp concentration will be constant. Then we collected 10 mL aliquot from the experimental cup by using a micro pipette. In preliminary trials, we applied shrimps, the amount of which is almost the same as that applied to WT medaka in LD, to a white cup containing 100 mL water, and we divided it into 10 mL and 90 mL aliquots and separately counted the number of shrimps in each aliquot. Here, we confirmed that the variance between the numbers calculated by counting the shrimps in 10 mL aliquot and the total volume of 100 mL falls within the range of the variance of total applied shrimp. Thus, our present counting method can be considered reasonable.

      - Brine Shrimp Aliquot Measurement: You mentioned counting the number of brine shrimp in the 200 μL solution three times before and after the experiments. What does this mean? Did you use this procedure to calculate the mean number of brine shrimp in each 200 μL aliquot?

      Thank you for your comment. As the reviewer commented, to calculate the mean number of brine shrimp in each 200 µL aliquot, we counted the number of brine shrimp in the 200 µL solution three times before and after the experiments.

      - How did you normalize the food intake data? This procedure is not detailed in the methods section.

      Thank you very much for pointing it out. We normalized food intake by subtracting the amount of shrimp by the average of those in LD or WT fish. This explanation was added in the Method section (L. 439).

      (2) Sample Size. Various tests were conducted with a low number of medaka (e.g., 2 brains for RNA-seq, 8 females for ovariectomy). Are these sample sizes sufficient to draw reliable conclusions?

      In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum; we pooled two brains as one sample and used three samples per group. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017).

      (3) Statistical Analysis.

      - The authors used both parametric and non-parametric tests but did not specify how they assessed the normal distribution of the data. For example, if I understood correctly, a t-test was used to compare a small dataset (n=3). In such cases, a U-test would be more appropriate.

      Thank you for your comment. As for Figure 1 -Supplementary Figure 2C, we showed the graphs just to show you candidates. To avoid misunderstanding, we deleted statistical statements in that panel.

      - It is unclear why the Steel-Dwass test was used instead of the Kruskal-Wallis test for comparing agrp1 and npyb expressions in control, OVX, and E2-administered medaka.

      While the authors mentioned using non-parametric tests, they did not specify in which contexts or conditions they were applied.

      Thank you very much for your comment. Kruskal-Wallis test statistically shows whether or not there are differences among any of three groups. To perform multiple comparisons among the three groups, we used Steel-Dwass test.

      - The results section lacks details on the statistical tests used, including the specific test (e.g., Z, U, or W values) and degrees of freedom.

      Thank you for your comment. As the reviewer pointed out, we added such statements in all the figure legends containing statistics.

      (4) Previous studies have shown that photoperiod treatments alter the production of various hormones in medaka (e.g., Lucon-Xiccato et al., 2022; Shimmura et al., 2017), some of which, like growth hormone (GH), have been shown to influence feeding behavior (Canosa et al., 2007).

      In your RNA-seq analysis, did you observe any changes in the expression of genes involved in other hormone synthesis pathways, such as pituitary hormones (GH and TSH), leptin, or ghrelin (e.g., see Volkoff, 2016; Blanco, 2020; Bertolucci et al., 2019)?

      Including such evidence in the discussion would provide a broader perspective on the hormonal regulation of food intake in medaka.

      We appreciate your constructive comments. Unfortunately, since we performed RNA-seq using the whole brain after removal of the pituitary, we could not check such changes in the expression of pituitary hormone-related genes. As additional information about the feeding-related hormones, leptin did not show significant difference in our RNA-seq analysis, and we could not analyze ghrelin because ghrelin has not been annotated in medaka (NCBI and ensembl).

      Reviewer #3 (Recommendations for the authors):

      There are some parts of the study that need to be developed further in order to provide a more comprehensive analysis.

      (1) In the juvenile as well as ovariectomized female fish, the authors should confirm experimentally whether day length influences feeding activity.

      Thank you very much for your suggestion. We analyzed feeding behavior of juvenile (Figure 4-Supplementary Figure 1) and OVX female (Figure 5-Supplementary Figure 1). As shown in these figures, food intake in juvenile and OVX were not significantly different between LD and SD.

      (2) More discussion as to the relevance of increasing feeding activity to support reproductive functions such as sustained egg production would be valuable. One assumes the metabolic costs of producing eggs on a daily basis in this species would inevitably require increased food intake. Is this a reasonable prediction?

      We deeply appreciate your suggestion. We strongly agree with this argument, and we added such discussion in “Discussion” section (L. 406-408).

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We appreciate the editor’s suggestion. We added P-value in the main manuscript, where statistical analyses were performed. In addition, we described test statics in the figure legends. We did not use df values for the statistics used in the present analyses, and therefore did not describe it in the main text.

    1. eLife Assessment

      This valuable work formulates an individual-based model to understand the evolution of division of labor in vertebrates, in particular, to examine the role of indirect versus direct fitness benefits. The evidence supporting the main conclusions is incomplete at this stage, with key details of simulation assumptions not adequately described and exploration of alternative assumptions and parameter space lacking.

    2. Reviewer #1 (Public review):

      This paper presents a computational model of the evolution of two different kinds of helping ("work," presumably denoting provisioning, and defense tasks) in a model inspired by cooperatively breeding vertebrates. The helpers in this model are a mix of previous offspring of the breeder and floaters that might have joined the group, and can either transition between the tasks as they age or not. The two types of help have differential costs: "work" reduces "dominance value," (DV), a measure of competitiveness for breeding spots, which otherwise goes up linearly with age, but defense reduces survival probability. Both eventually might preclude the helper from becoming a breeder and reproducing. How much the helpers help, and which tasks (and whether they transition or not), as well as their propensity to disperse, are all evolving quantities. The authors consider three main scenarios: one where relatedness emerges from the model, but there is no benefit to living in groups, one where there is no relatedness, but living in larger groups gives a survival benefit (group augmentation, GA), and one where both effects operate. The main claim is that evolving defensive help or division of labor requires the group augmentation; it doesn't evolve through kin selection alone in the authors' simulations.

      This is an interesting model, and there is much to like about the complexity that is built in. Individual-based simulations like this can be a valuable tool to explore the complex interaction of life history and social traits. Yet, models like this also have to take care of both being very clear on their construction and exploring how some of the ancillary but potentially consequential assumptions affect the results, including robust exploration of the parameter space. I think the current manuscript falls short in these areas, and therefore, I am not yet convinced of the results. Much of this is a matter of clearer and more complete writing: the Materials and Methods section in particular is incomplete or vague in some important junctions. However, there are also some issues with the assumptions that are described clearly.

      Below, I describe my main issues, mostly having to do with model features that are unclear, poorly motivated (as they stand), or potentially unrealistic or underexplored.

      One of the main issues I have is that there is almost no information on what happens to dispersers in the model. Line 369-67 states dispersers might join another group or remain as floaters, but gives no further information on how this is determined. Poring through the notation table also comes up empty as there is no apparent parameter affecting this consequential life history event. At some point, I convinced myself that dispersers remain floaters until they die or become breeders, but several points in the text contradict this directly (e.g., l 107). Clearly this is a hugely important model feature since it determines fitness cost and benefits of dispersal and group size (which also affects relatedness and/or fitness depending on the model). There just isn't enough information to understand this crucial component of the model, and without it, it is hard to make sense of the model output.

      Related to that, it seems to be implied (but never stated explicitly) that floaters do no work, and therefore their DV increases linearly with age (H_work in eq.2 is zero). That means any floaters that manage to stick around long enough would have higher success in competition for breeding spots relative to existing group members. How realistic is this? I think this might be driving the kin selection-only results that defense doesn't evolve without group augmentation (one of the two main ways). Any subordinates (which are mainly zero in the no GA, according to the SI tables; this assumes N=breeder+subordinates, but this isn't explicit anywhere) would be outcompeted by floaters after a short time (since they evolve high H and floaters don't), which in turn increases the benefit of dispersal, explaining why it is so high. Is this parameter regime reasonable? My understanding is that floaters often aren't usually high resource holding potential individuals (either b/c high RHP ones would get selected out of the floater population by establishing territories or b/c floating isn't typically a thriving strategy, given that many resources are tied to territories). In this case, the assumption seems to bias things towards the floaters and against subordinates to inherit territories. This should be explored either with a higher mortality rate for floaters and/or a lower DV increase, or both.

      When it comes to floaters replacing dead breeders, the authors say a bit more, but again, the actual equation for the scramble competition (which only appears as "scramble context" in the notation table) is not given. Is it simply proportional to R_i/\sum_j R_j ? Or is there some other function used? What are the actual numbers of floaters per breeding territory that emerge under different parameter values? These are all very important quantities that have to be described clearly.

      I also think the asexual reproduction with small mutations assumption is a fairly strong one that also seems to bias the model outcomes in a particular way. I appreciate that the authors actually measured relatedness within groups (though if most groups under KS have no subordinates, that relatedness becomes a bit moot), and also eliminated it with their ingenious swapping-out-subordinates procedure. The fact remains that unless they eliminate relatedness completely, average relatedness, by design, will be very high. (Again, this is also affected by how the fate of the dispersers is determined, but clearly there isn't a lot of joining happening, just judging from mean group sizes under KS only.) This is, of course, why there is so much helping evolving (even if it's not defensive) unless they completely cut out relatedness.

      Finally, the "need for division of labor" section is also unclear, and its construction also would seem to bias things against division of labor evolving. For starters, I don't understand the rationale for the convoluted way the authors create an incentive for division of labor. Why not implement something much simpler, like a law of minimum (i.e., the total effect of helping is whatever the help amount for the lowest value task is) or more intuitively: the fecundity is simply a function of "work" help (draw Poisson number of offspring) and survival of offspring (draw binomial from the fecundity) is a function of the "defense" help. As it is, even though the authors say they require division of labor, in fact, they only make a single type of help marginally less beneficial (basically by half) if it is done more than the other. That's a fairly weak selection for division of labor, and to me it seems hard to justify. I suspect either of the alternative assumptions above would actually impose enough selection to make division of labor evolve even without group augmentation.

      Overall, this is an interesting model, but the simulation is not adequately described or explored to have confidence in the main conclusions yet. Better exposition and more exploration of alternative assumptions and parameter space are needed.

    3. Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. A main conclusion of the paper is that direct fitness benefits are the primary factor causing the evolution of vertebrate division of labor, rather than indirect fitness benefits.

      Strengths:

      The paper formulates an individual-based model that is inspired by vertebrate life history. The model incorporates numerous biologically realistic details, including the possibility to evolve age polytheism where individuals switch from work to defence tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model makes assumptions that restrict the possibility that kin selection leads to the evolution of helping. In particular, the model assumes that in the absence of group augmentation, subordinates can only help breeders but cannot help non-breeders or increase the survival of breeders, whereas with group augmentation, subordinates can help both breeders and non-breeders and increase the survival of breeders. This is unrealistic as subordinates in real organisms can help other subordinates and increase the survival of non-breeders, even in the absence of group augmentation, for instance, with targeted helping to dominants or allies. This restriction artificially limits the ability of kin selection alone to lead to the evolution of helping, and potentially to division of labor. Hence, the conclusion that group augmentation is the primary driving factor driving vertebrate division of labor appears forced by the imposed restrictions on kin selection. The model used is also quite particular, and so the claimed generality across vertebrates is not warranted.

      I describe some suggestions for improving the paper below, more or less in the paper's order.

      First, the introduction goes to great lengths trying to convince the reader that this model is the first in this or another way, particularly in being only for vertebrates, as illustrated in the abstract where it is stated that "we lack a theoretical framework to explore the conditions under which division of labor is likely to evolve" (line 13). However, this is a risky and unnecessary motivation. There are many models of division of labor and some of them are likely to be abstract enough to apply to vertebrates even if they are not tailored to vertebrates, so the claims for being first are not only likely to be wrong but will put many readers in an antagonistic position right from the start, which will make it harder to communicate the results. Instead of claiming to be the first or that there is a lack of theoretical frameworks for vertebrate division of labor, I think it is enough and sufficiently interesting to say that the paper formulates an individual-based model motivated by the life history of vertebrates to understand the evolution of vertebrate division of labor. You could then describe the life history properties that the model incorporates (subordinates can become reproductive, low relatedness, age polyethism, etc.) without saying this has never been done or that it is exclusive to vertebrates; indeed, the paper states that these features do not occur in eusocial insects, which is surprising as some "primitively" eusocial insects show them. So, in short, I think the introduction should be extensively revised to avoid claims of being the first and to make it focused on the question being addressed and how it is addressed. I think this could be done in 2-3 paragraphs without the rather extensive review of the literature in the current introduction.

      Second, the description of the model and results should be clarified substantially. I will give specific suggestions later, but for now, I will just say that it is unclear what the figures show. First, it is unclear what the axes in Figure 2 show, particularly for the vertical one. According to the text in the figure axis, it presumably refers to T, but T is a function of age t, so it is unclear what is being plotted. The legend explaining the triangle and circle symbols is unintelligible (lines 227-230), so again it is unclear what is being plotted; part of the reason for this unintelligibility is that the procedure that presumably underlies it (section starting on line 493) is poorly explained and not understandable (I detail why below). Second, the axes in Figure 3 are similarly unclear. The text in the vertical axis in panel A suggests this is T, however, T is a function of t and gamma_t, so something else must be being done to plot this. Similarly, in panel B, the horizontal axis is presumably R, but R is a function of t and of the helping genotype, so again some explanation is lacking. In all figures, the symbol of what is being plotted should be included.

      Third, the conclusions sound stronger than the results are. A main conclusion of the paper is that "kin selection alone is unlikely to select for the evolution of defensive tasks and division of labor in vertebrates" (lines 194-195). This conclusion is drawn from the left column in Figure 2, where only kin selection is at play, and the helping that evolves only involves work rather than defense tasks. This conclusion follows because the model assumes that without group augmentation (i.e., xn=0, the kin selection scenario), subordinates can only help breeders to reproduce but cannot help breeders or other subordinates to survive, so the only form of help that evolves is the least costly, not the most beneficial as there is no difference in the benefits given among forms of helping. This assumption is unrealistic, particularly for vertebrates where subordinates can help other group members survive even in the absence of group augmentation (e.g., with targeted help to certain group members, because of dominance hierarchies where the helping would go to the breeder, or because of alliances where the helping would go to other subordinates). I go into further details below, but in short, the model forces a narrow scope for the kin selection scenario, and then the paper concludes that kin selection alone is unlikely to be of relevance for the evolution of vertebrate division of labor. This conclusion is particular to the model used, and it is misleading to suggest that this is a general feature of such a particular model.

      Overall, I think the paper should be revised extensively to clarify its aims, model, results, and scope of its conclusions.

    4. Author response:

      We will revise the statements of novelty in the introduction by more clearly emphasizing how our model addresses gaps in the existing literature. In addition, we will clarify the description of the dispersal process. Briefly, we use the same dispersal gene β to represent the likelihood an individual will either leave or join a group, thereby quantifying both dispersal and immigration using the same parameter. Specifically, individuals with higher β are more likely to remain as floaters (i.e., disperse from their natal group to become a breeder elsewhere), whereas those with lower β are either more likely to remain in their natal group as subordinates (i.e., queue in a group for the breeding position) or join another group if they dispersed. Immigrants that join a group as a subordinate help and queue for a breeding position, as does any natal subordinate born into the group. To follow the suggestion of the referee and more fully explore the impact of competition between subordinates born in the group and subordinate immigrants, we will explore extending our model to allow dispersers to leave their natal group and join another as subordinates, by incorporating a reaction norm based on their age or rank (D = 1 / (1 + exp (β<sub>t</sub> * t – β<sub>0</sub>)) . This approach will allow individuals to adjust also their dispersal strategy to their competitiveness and to avoid kin competition by remaining as a subordinate in another group.

      We apologize that there was some confusion with terminology. We use the term “disperser” to describe individuals that disperse from their natal group. Dispersers can assume one of three roles: (1) they can migrate to another group as "subordinates"; (2) they can join another group as "breeders" if they successfully outcompete other candidates; or (3) they can remain as "floaters" if they fail to join a group. "Floaters" are individuals who persist in a transient state without access to a breeding territory, waiting for opportunities to join a group in an established territory. Therefore, dispersers do not work when they are floaters, but they may later help if they immigrate to a group as a subordinate. Consequently, immigrant subordinates have no inherent competitive advantage over natal subordinates (as step 2.2. “Join a group” is followed by step 3. “Help”, which occurs before step 5. “Become a breeder”). Nevertheless, floaters can potentially outcompete subordinates of the same age if they attempt to breed without first queuing as a subordinate (step 5) when subordinates are engaged in work tasks. We believe that this assumption is realistic and constitutes part of the costs associated with work tasks. However, floaters are at a disadvantage for becoming a breeder because: (1) floaters incur higher mortality than individuals within groups (eq. 3); and (2) floaters may only attempt to become breeders in some breeding cycles (versus subordinate groups members, who are automatically candidates for an open breeding position in the group in each cycle). Therefore, due to their higher mortality, floaters are rarely older than individuals within groups, which heavily influences dominance value and competitiveness. Additionally, any competitive advantage that floaters might have over other subordinate group members is unlikely to drive the kin selection-only results because subordinates would preferably choose defense tasks instead of work tasks so as not to be at a competitive disadvantage compared to floaters.

      We note that reviewers also mention that floaters often aren't usually high resource holding potential (RHP) individuals and, therefore, our assumptions might be unrealistic. As we explain above, floaters are not inherently at a competitive advantage in our model. In any case, empirical work in a number of species has shown that dispersers are not necessarily those of lower RHP or of lower quality. In fact, according to the ecological constraints hypothesis, one might predict that high quality individuals are the ones that disperse because only individuals in good condition (e.g., larger body size, better energy reserves) can afford the costs associated with dispersal (Cote et al., 2022). By adding a reaction norm approach to explore the role of age or rank in the revised version, we can also determine whether higher or lower quality individuals are the ones dispersing. We will address the issues of terminology and clarity of the relative competitive advantage of floaters versus subordinates, and also include more information in the Supplementary Tables (e.g., the number of floaters). As a side note, the “scramble context” we mention was an additional implementation that we decided to remove from the final manuscript, but we forgot to remove from Table 1 before submission.

      The reviewers also raised a question about asexual reproduction and relatedness more generally. As we showed in the Supplementary Tables and the section on relatedness in the SI (“Kin selection and the evolution of division of labor"), high relatedness does not appear to explain our results. In evolutionary biology generally and in game theory specifically (with the exception of models on sexual selection or sex-specific traits), asexual reproduction is often modelled because it reduces unnecessary complexity. To further study the effect of relatedness on kin structures more closely resembling those of vertebrates, however, we will create an additional “relatedness structure level”, where we will shuffle half of the philopatric offspring using the same method used to remove relatedness completely. This approach will effectively reduce relatedness structure by half and overcome the concerns with our decision to model asexual reproduction.

      Briefly, we will elaborate on the concept of division of labor and the tasks that cooperative breeders perform. In nature, multiple tasks are often necessary to successfully rear offspring. For example, in many cooperatively breeding birds, the primary reasons that individuals fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are necessary to successfully produce offspring, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by individuals within the group. We simplify this principle in the model by maximizing reproductive output when both tasks are carried out to a similar extent, allowing for some flexibility from the mean. In response to the reviewer suggestion about making fecundity a function of work tasks and offspring survival as a function of defensive tasks, these are actually equivalent in model terms, as it’s the same whether breeders produce three offspring and two die, or if they only produce one. This represents, of course, a simplification of the natural context, where breeding unsuccessfully is more costly (in terms of time and energy investment) than not breeding at all, but this is approach is typically used in models of this sort.

      The scope of this paper was to study division of labor in cooperatively breeding species with fertile workers, in which help is exclusively directed towards breeders to enhance offspring production (i.e., alloparental care). Our focus is in line with previous work in most other social animals, including eusocial insects and humans, which emphasizes how division of labor maximizes group productivity. Other forms of “general” help are not considered in the paper, and such forms of help are rarely considered in cooperatively breeding vertebrates or in the division of labor literature, as they do not result in task partitioning to enhance productivity.

      How do we model help? Help provided is an interaction between H (total effort) and T (proportion of total effort invested in each type of task). We will make this definition clearer in the revised manuscript. Thank you for pointing out an error in Eq. 1. This inequality was indeed written incorrectly in the paper (but is correct in the model code); it is dominance rank instead of age (see code in Individual.cpp lines 99-119). We will correct this mistake in the revision.

      There was also a question about bounded and unbounded helping costs. The difference in costs is inherent to the nature of the different task (work or defense): while survival is naturally bounded, with death as the lower bound, dominance costs are potentially unbounded, as they are influenced by dynamic social contexts and potential competitors. Therefore, we believe that the model’s cost structure is not too different to that in nature.

      Thank you for your comments about the parameter landscape. It is important to point out that variations in the mutation rate do not qualitatively affect our results, as this is something we explored in previous versions of the model (not shown). Briefly, we find that variations in the mutation rates only alter the time required to reach equilibrium. Increasing the step size of mutation diminishes the strength of selection by adding stochasticity and reducing the genetic correlation between offspring and their parents. Population size could, in theory, affect our results, as small populations are more prone to extinction. Since this was not something we planned to explore in the paper directly, we specifically chose a large population size, or better said, a large number of territories (i.e. 5000) that can potentially host a large population.

      During the exploratory phase of the model development, various parameters and values were also assessed. However, the manuscript only details the ranges of values and parameters where changes in the behaviors of interest were observed, enhancing clarity and conciseness. For instance, variation in y<sub>h</sub> (the cost of help on dominance when performing “work tasks”) led to behavioral changes similar to those caused by changes in x<sub>h</sub> (the cost of help in survival when performing “defensive tasks”), as both are proportional to each other. Specifically, since an increase in defense costs raises the proportion of work relative to defense tasks, while an increase in the costs of work task has the opposite effect, only results for the variation of x<sub>h</sub> were included in the manuscript to avoid redundancy. We will make this clearer in the revision.

      Finally, following the advice from the reviewers, we will add the symbols of the variables to the figure axes, and clarify whether the values shown represent a genetic or phenotypic trait. In Figure 2, the x-axis is H and the y-axis is T. In Figure 3A, the subindex t in x-axis is incorrect; it should be subindex R (reaction norm to dominance rank instead of age), the y-axis is T. In Figure 3B, the x-axis is R, and the y-axis is T. All values of T, H and R are phenotypic expressed values (see Table 1). For instance, T values are the phenotypic expressed values from the individuals in the population according to their genetic gamma values and their current dominance rank at a given time point.

      References

      Cote, J., Dahirel, M., Schtickzelle, N., Altermatt, F., Ansart, A., Blanchet, S., Chaine, A. S., De Laender, F., De Raedt, J., & Haegeman, B. (2022). Dispersal syndromes in challenging environments: A cross‐species experiment. Ecology Letters, 25(12), 2675–2687.

    1. eLife Assessment

      This study presents an important practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework, which convincingly merges oligonucleotide-based detection with immunoassay methodologies.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

    3. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with misHCR and misHCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to misHCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

    1. eLife Assessment

      This important study demonstrates that interferon beta stimulation induces WTAP transition from aggregates to liquid droplets, coordinating m6A modification of a subset of mRNAs that encode interferon-stimulated genes and restricting their expression. The evidence presented is solid, supported by microscopy, immunoprecipitations, m6A sequencing, and ChIP, to show that WTAP phosphorylation controls phase transition and its interaction with STAT1 and the methyltransferase complex.

    2. Reviewer #1 (Public review):

      Summary:

      This study puts forth the model that under IFN-B stimulation, liquid-phase WTAP coordinates with the transcription factor STAT1 to recruit MTC to the promoter region of interferon stimulated genes (ISGs), mediating the installation of m6A on newly synthesized ISG mRNAs. This model is supported by strong evidence that the phosphorylation state of WTAP, regulated by PPP4, is regulated by IFN-B stimulation, and that this results in interactions between WTAP, the m6A methyltransferase complex, and STAT1, a transcription factor that mediates activation of ISGs. This was demonstrated via a combination of microscopy, immunoprecipitations, m6A sequencing, and ChIP. These experiments converge on a set of experiments that nicely demonstrate that IFN-B stimulation increases the interaction between WTAP, METTL3, and STAT1, that this interaction is lost with knockdown of WTAP (even in the presence of IFN-B), and that this IFN-B stimulation also induces METTL3-ISG interactions.

      Strengths:

      The evidence for the IFN-B stimulated interaction between METTL3 and STAT1, mediated by WTAP, is quite strong. Removal of WTAP in this system seems to be sufficient to reduce these interactions and the concomitant m6A methylation of ISGs. The conclusion that the phosphorylation state of WTAP is important in this process is also quite well supported. The authors have now also provided substantial evidence that phase separation of WTAP upon interferon stimulation facilitates m6A-methylation of multiple interferon stimulated genes.

    3. Reviewer #2 (Public review):

      In this study, Cai and colleagues investigate how one component of the m6A methyltransferase complex, the WTAP protein, responds to IFNb stimulation. They find that viral infection or IFNb stimulation induces the transition of WTAP from aggregates to liquid droplets through dephosphorylation by PPP4. This process affects the m6A modification levels of ISG mRNAs and modulates their stability. In addition, the WTAP droplets interact with the transcription factor STAT1 to recruit the methyltransferase complex to ISG promoters and enhance m6A modification during transcription. The investigation dives into a previously unexplored area of how viral infection or IFNb stimulation affects m6A modification on ISGs. The observation that WTAP undergoes a phase transition is significant in our understanding of the mechanisms underlying m6A's function in immunity. However, there are still key gaps that should be addressed to fully accept the model presented.

      Major points:<br /> (1) More detailed analyses on the effects of WTAPsgRNA on the m6A modification of ISGs:<br /> a. A comprehensive summary of the ISGs, including the percentage of ISGs that are m6A-modified,<br /> b. The distribution of m6A modification across the ISGs, and<br /> c. A comparison of the m6A modification distribution in ISGs with non-ISGs.<br /> In addition, since the authors propose a novel mechanism where the interaction between phosphorylated STAT1 and WTAP direct the MTC to the promoter regions of ISGs to facilitate co-transcriptional m6A modification, it is critical to analyze whether the m6A modification distribution holds true in the data.

      (2) Since a key part of the model includes the cytosol-localized STAT1 protein undergoing phosphorylation to translocate to the nucleus to mediate gene expression, the authors should focus on the interaction between phosphorylated STAT1 and WTAP in Figure 4, rather than the unphosphorylated STAT1. Only phosphorylated STAT1 localizes to the nucleus, so the presence of pSTAT1 in the immunoprecipitate is critical for establishing a functional link between STAT1 activation and its interaction with WTAP.

      (3) The authors should include pSTAT1 ChIP-seq and WTAP ChIP-seq on IFNb-treated samples in Figure 5 to allow for a comprehensive and unbiased genomic analysis for comparing the overlaps of peaks from both ChIP-seq datasets. These results should further support for their hypothesis that WTAP interacts with pSTAT1 to enhance m6A modifications on ISGs.

      Minor points:<br /> (1) Since IFNb is primarily known for modulating biological processes through gene transcription, it would be informative if the authors discussed the mechanism of how IFNb would induce the interaction between WTAP and PPP4.

      (2) The authors should include mCherry alone controls in Figure 1D to demonstrate that mCherry does not contribute to the phase separation of WTAP. Does mCherry have or lack a PLD?

      (3) The authors should clarify the immunoprecipitation assays in the methods. For example, the labeling in Fig. 2A suggests that antibodies against WTAP and pan-p were used for two immunoprecipitations. Is that accurate?

      (4) The authors should include overall m6A modification levels quantified of GFPsgRNA and WTAPsgRNA cells, either by mass spectrometry (preferably) or dot blot.

      Comments on revisions:

      The authors thoroughly addressed the aforementioned points during the review process.

    4. Reviewer #3 (Public review):

      Summary:

      This study presents a valuable finding on the mechanism used by WTAP to modulate the IFN-β stimulation. It describes the phase transition of WTAP driven by IFN-β-induced dephosphorylation. The evidence supporting the claims of the authors is solid.

      Strength:

      The key finding is the revelation that WTAP undergoes phase separation during virus infection or IFN-β treatment. The authors conducted a series of precise experiments to uncover the mechanism behind WTAP phase separation and identified the regulatory role of 5 phosphorylation sites. They also succeeded in pinpointing the phosphatase involved.

    1. eLife Assessment

      TDP-43 mislocalization is a key feature of some neurodegenerative diseases, but cellular models are lacking. The authors endogenously-tagged TDP-43 with a C-terminal GFP tag in human iPSCs, followed by expression of an intrabody-NES that targeted GFP to the cytosol. They convincingly report physical mislocalization and functional depletion of TDP-43, as measured by microscopy and RNAseq. This method will be valuable to investigators studying the biological consequences of TDP-43 mislocalization and the methodology is in line with the current state-of-the-art.

    2. Reviewer #2 (Public review):

      Summary:

      TDP-43 mislocalization occurs in nearly all of ALS, roughly half of FTD, and as a co-pathology in roughly half of AD cases. Both gain of function and loss of function mechanisms associated with this mislocalization likely contribute to disease pathogeneisis.

      Here, the authors describe a new method to induce TDP-43 mislocalization in cellular models. They endogenously-tagged TDP-43 with a C-terminal GFP tag in human iPSCs. They then expressed an intrabody - fused with a nuclear export signal (NES) - that targeted GFP to the cytosol. Expression of this intrabody-NES in human iPSC derived neurons induced nuclear depletion of homozygous TDP-43-GFP, caused its mislocalization to the cytosol, and at least in some cells appeared to cause cytosolic aggregates. This mislocalization was accompanied by induction of cryptic exons in well characterized transcripts known to be regulated by TDP-43, a hallmark of functional TDP-43 loss and consistent with pathological nuclear TDP-43 depletion. Interestingly, in heterozygous TDP-43-GFP neurons, expression of intrabody-NES appeared to also induce the mislocalization of untagged TDP-43 in roughly half of the neurons, suggesting that this system can also be used to study effects on untagged endogenous TDP-43 as well as TDP-43-GFP fusion protein.

      Strengths:

      A clearer understanding of how TDP-43 mislocalization alters cellular function, as well as pathways that mitigate clearance of TDP-43 aggregates, is critical. But modeling TDP-43 mislocalization in disease-relevant cellular systems has proven to be challenging. High levels of overexpression of TDP-43 lacking an NES can drive endogenous TDP-43 mislocalization, but such overexpression has direct and artificial consequences on certain cellular features (e.g. altered exon skipping) not seen in diseased patients. Toxic small molecules such as MG132 and arsenite can induce TDP-43 mislocalization, but co-induce myriad additional cellular dysfunctions unrelated to TDP-43 or ALS. TDP-43 binding oligonucleotides can cause cytosolic mislocalization as well. Each system has pros and cons, and additional ways to induce TDP-43 mislocalization would be useful for the field. The method described in this manuscript could provide researchers with a powerful way to study the combined biology of cytosolic TDP-43 mislocalization and nuclear TDP-43 depletion, with additional temporal control that is lacking in current method. Indeed, the author see some evidence of differences in RNA splicing caused by pure TDP-43 depletion versus their induced mislocalization model. Finally, their method may be especially useful in determining how TDP-43 aggregates are cleared by cells, potentially revealing new biological pathways that could be therapeutically targeted.

      Weaknesses:

      The method and supporting data have some limitations.

      • Tagging of TDP-43 with a bulky GFP tag may alter its normal physiological functions, for example, phase separation properties and functions within complex ribonucleoprotein complexes. The authors show that normal splicing function of GFP-TDP-43 is maintained, suggesting that physiology is largely preserved, but other functions and properties of TDP-43 that were not directly tested could be altered.

      • Potential differences in splicing and micro RNAs between TDP-43 knockdown and TDP-43 mislocalization are potentially interesting. However, different patterns of dysregulated RNA splicing can occur at different levels of TDP-knockdown and can differ in different batches of experiments, thus it is difficult to asses whether the changes observed in this paper are due to mislocalization per se, or rather just reflect differences in nuclear TDP-43 abundance or batch effects.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Nuclear depletion and cytoplasmic mislocalization/aggregation of the DNA and RNA binding protein TDP-43 are pathological hallmarks of multiple neurodegenerative diseases. Prior work has demonstrated that depletion of TDP-43 from the nucleus leads to alterations in transcription and splicing. Conversely, cytoplasmic mislocalization/aggregation can contribute to toxicity by impairing mRNA transport and translation as well as miRNA dysregulation. However, to date, models of TDP-43 proteinopathy rely on artificial knockdown- or overexpression-based systems to evaluate either nuclear loss or cytoplasmic gain of function events independently. Few model systems authentically reproduce both nuclear depletion and cytoplasmic miscloalization/aggregation events. In this manuscript, the authors generate novel iPSC-based reagents to manipulate the localization of endogenous TDP-43. This is a valuable resource for the field to study pathological consequences of TDP-43 proteinopathy in a more endogenous and authentic setting. However, in the current manuscript, there are a number of weaknesses that should be addressed to further validate the ability of this model to replicate human disease pathology and demonstrate utility for future studies.

      Strengths:

      The primary strength of this paper is the development of a novel in vitro tool.

      Weaknesses:

      There are a number of weaknesses detailed below that should be addressed to thoroughly validate these new reagents as more authentic models of TDP-43 proteinopathy and demonstrate their utility for future investigations.

      (1) The authors should include images of their engineered TDP-43-GFP iPSC line to demonstrate TDP-43 localization without the addition of any nanobodies (perhaps immediately prior to addition of nanobodies). Additionally, it is unclear whether simply adding a GFP tag to endogenous TDP-43 impact its normal function (nuclear-cytoplasmic shuttling, regulation of transcription and splicing, mRNA transport etc).

      We have included images of the untransduced day 20 MNs derived from the engineered TDP43-GFP iPSC lines and the unedited line (Supplementary Fig. 1B).

      We acknowledge the reviewer’s concern about the potential impact of the GFP tag on TDP43's normal function. To address this, we have validated the functionality of TDP43 by assessing the inclusion of cryptic exons in highly sensitive targets such as UNC13A and STMN2, both of which are known to be directly regulated by TDP43.

      We compared MNs derived from the unedited parent line with the TDP43-GFP MNs prior to nanobody addition. As measured by qPCR, cryptic exon inclusion in UNC13A and STMN2 was not observed in the unedited or edited TDP43-GFP MNs (Supplementary Fig.1C), confirming that the tagging does not induce splicing defects by itself. The cryptic exon inclusion in UNC13A and STMN2 were only observed in TDP43-GFP MNs expressing the NES nanobody (Supplementary Fig. 2D). These findings were further supported by our next-generation sequencing data, which also showed that cryptic exon inclusion was specific to the TDP43 mislocalization condition (Supplementary Fig.3 and 4).

      Thus, we have strong evidence that the GFP-tagged TDP43 behaves similarly to the wild-type protein and does not interfere with its function in our model.

      (2) Can the authors explain why there is a significant discrepancy in time points selected for nanobody transduction and immunostaining or cell lysis throughout Figure 1 and 2? This makes interpretation and overall assessment of the model challenging.

      For the phenotypic data shown in Fig.1, we added the AAVs at day 18 or 20 and analyzed the cells at day 40. For the phosphorylated TDP43 western blot (revised Fig. 3D), cells were treated with doxycycline at day 20 to induce nanobody expression and samples were harvested at day 40. Thus, cells were harvested between days 20 or 22 after adding the nanobodies. The onset of transgene expression when using AAVs in neurons typically display slow kinetics. We observed TDP43 mislocalization in less than 50% of the neurons after 7 days post-transduction that peaked at 10-12 days after addition of the nanobodies, when more than 80% of the cells displayed TDP43 mislocalization. Hence, we do not believe that a two-day difference significantly alters the interpretation of the data.

      The decision to harvest neurons at day 30 for the qPCR data was taken to investigate whether the splicing changes seen at day 40 from the transcriptomics analysis can be detected well before the phenotypes observed at day 40.

      (3) The authors should further characterize their TDP-43 puncta. TDP-43 immunostaining is typically punctate so it is unclear if the puncta observed are physiologic or pathologic based on the analyses carried out in the current version of this manuscript. Additionally, do these puncta co-localize with stress granule markers or RNA transport granule markers? Are these puncta phosphorylated (which may be more reminiscent of end-stage pathologic observations in humans)?

      We have tried immunostaining neurons for phosphorylated TDP43. However, our immunostaining attempts were unsuccessful. Depending on the antibody, we either saw no signal (antibody from Cosmo Bio, TIP-PTD-M01A) or even the control neurons displayed detectable phosphorylation within the nucleus (antibody from Proteintech 22309-1-AP). Consequently, we performed western blot analysis using an antibody from Cosmo Bio, (TIP-PTD-M01A) that clearly shows hyperphosphorylation of TDP43 in whole cell lysates (Fig. 3D, E). Hence, we have referred to these structures as puncta and not aggregates (Page 4).

      To assess co-localization of the puncta with stress granules, we immunostained for the stress granule marker G3BP1. This was done in MNs that were treated with sodium arsenite (SA) or PBS as a control. In the PBS treated control MN cultures, TDP43 mislocalization alone did not induce stress granule formation. G3BP1+ stress granules were only observed following SA stress (0.5 mM, 60 minutes). Further, only a subset of TDP43 puncta overlapped with these stress granules (Supplementary Fig. 7) (Page 6).

      (4) The authors should include multiple time points in their evaluation of TDP-43 loss of function events and aggregation. Does loss of function get worse over time? Is there a time course by which RNA misprocessing events emerge or does everything happen all at once? Does aggregation get worse over time? Do these neurons die at any point as a result of TDP-43 proteinopathy?

      We agree that a time course to analyze TDP43 mislocalization and its consequences would be ideal. However, the mislocalization of TDP43 across neurons is not a coordinated process. At each given time instance, neurons display varying levels of TDP43 mislocalization. Answering the questions raised by the reviewer would require tracking individual neurons in real time in a controlled environment over weeks. Unfortunately, we currently do not have the hardware to run these experiments. However, we do observe increased levels of cleaved caspase 3 in MNs expressing the NES nanobody, indicating that these neurons indeed undergo apoptosis by day 40 (Fig.1).

      We have, however, analyzed changes in splicing using qPCR for 12 genes over a time course starting as early as 4 hours after inducing mislocalization. We detect time-dependent cryptic splicing events in all genes as early as 8 hours after doxycycline addition, coinciding with the appearance TDP43 mislocalization (Fig. 4A, B).

      (5) Can the authors please comment on whether or not their model is "tunable"? In real human disease, not every neuron displays complete nuclear depletion of TDP-43. Instead there is often a gradient of neurons with differing magnitudes of nuclear TDP-43 loss. Additionally, very few neurons (5-10%) harbor cytoplasmic TDP-43 aggregates at end-stage disease. These are all important considerations when developing a novel authentic and endogenous model of TDP-43 proteinopathy which the current manuscript fails to address.

      As shown in Fig .1, the neurons expressing the NES-nanobody display a wide range of mislocalization as assessed by the % of nuclear TDP43 present. By titrating the amount of AAVs added to the culture, the model can be tuned to achieve a wide gradient of TDP43 mislocalization.

      We calculated the size and percentage of neurons displaying TDP43 puncta. The size and the number of aggregates varies across the neurons that display TDP43 mislocalization. Around 50% of the neurons displayed small (1  um<sup>2</sup>) puncta while large puncta (> 5  um<sup>2</sup>) were observed in <10% of the cells, similar to observations in patient tissue (Fig. 1F).

      Reviewer #2 (Public Review):

      Summary:

      TDP-43 mislocalization occurs in nearly all of ALS, roughly half of FTD, and as a co-pathology in roughly half of AD cases. Both gain-of-function and loss-of-function mechanisms associated with this mislocalization likely contribute to disease pathogeneisis.

      Here, the authors describe a new method to induce TDP-43 mislocalization in cellular models. They endogenously tagged TDP-43 with a C-terminal GFP tag in human iPSCs. They then expressed an intrabody - fused with a nuclear export signal (NES) - that targeted GFP to the cytosol. Expression of this intrabody-NES in human iPSC-derived neurons induced nuclear depletion of homozygous TDP-43-GFP, caused its mislocalization to the cytosol, and at least in some cells appeared to cause cytosolic aggregates. This mislocalization was accompanied by induction of cryptic exons in well characterized transcripts known to be regulated by TDP-43, a hallmark of functional TDP-43 loss and consistent with pathological nuclear TDP-43 depletion. Interestingly, in heterozygous TDP-43-GFP neurons, expression of intrabody-NES appeared to also induce the mislocalization of untagged TDP-43 in roughly half of the neurons, suggesting that this system can also be used to study effects on untagged endogenous TDP-43 as well as TDP-43-GFP fusion protein.

      Strengths:

      A clearer understanding of how TDP-43 mislocalization alters cellular function, as well as pathways that mitigate clearance of TDP-43 aggregates, is critical. But modeling TDP-43 mislocalization in disease-relevant cellular systems has proven to be challenging. High levels of overexpression of TDP-43 lacking an NES can drive endogenous TDP-43 mislocalization, but such overexpression has direct and artificial consequences on certain cellular features (e.g. altered exon skipping) not seen in diseased patients. Toxic small molecules such as MG132 and arsenite can induce TDP-43 mislocalization, but co-induce myriad additional cellular dysfunctions unrelated to TDP-43 or ALS. TDP-43 binding oligonucleotides can cause cytosolic mislocalization as well. Each system has pros and cons, and additional ways to induce TDP-43 mislocalization would be useful for the field. The method described in this manuscript could provide researchers with a powerful way to study the combined biology of cytosolic TDP-43 mislocalization and nuclear TDP-43 depletion, with additional temporal control that is lacking in current method. Indeed, the authors see some evidence of differences in RNA splicing caused by pure TDP-43 depletion versus their induced mislocalization model. Finally, their method may be especially useful in determining how TDP-43 aggregates are cleared by cells, potentially revealing new biological pathways that could be therapeutically targeted.

      Weaknesses:

      The method and supporting data have limitations in its current form, outlined below, and in its current form the findings are rather preliminary.

      (1) Tagging of TDP-43 with a bulky GFP tag may alter its normal physiological functions, for example phase separation properties and functions within complex ribonucleoprotein complexes. In addition, alternative isoforms of TDP-43 (e.g. "short" TDP-43, would not be GFP tagged and therefore these species would not be directly manipulatable or visualizable with the tools currently employed in the manuscript.

      With reference to our answer above, we have confirmed using qPCR and RNA-seq analysis that adding a GFP tag to the C-terminus of TDP43 does not result in an appreciable loss of functionality. We do not observe any cryptic exon inclusion in STMN2 and UNC13A. Cryptic exon inclusion in these genes, especially STMN2, has been recognized as a very sensitive indicator of TDP43 loss of function (Supplementary Fig 1C, Supplementary 2D, Fig. 3, Fig.4)

      We acknowledge that truncated alternatively spliced versions of TDP43 will lose the GFP-tag and cannot be manipulated with our system. Since our GFP tag is positioned on the C-terminus, our system cannot manipulate these truncated fragments as the tag is lost in these isoforms. But these isoforms, if present, should be detectable using the Proteintech antibody against total TDP43, which recognizes N-terminal TDP43 epitopes. However, western blot analysis, even 20 days after inducing TDP43 mislocalization, showed no truncated fragments. This suggests that TDP43 mislocalization alone is insufficient to generate significant levels of truncated isoforms. We have added this section to the Limitations paragraph (page 9).

      (2) The data regarding potential mislocalization of endogenous TDP-43 in the heterozygous TDP-43-GFP lines is especially intriguing and important, yet very little characterization was done. Does untagged TDP-43 co-aggregate with the tagged TDP-43? Is localization of TDP-43 immunostaining the same as the GFP signal in these cells?

      The purpose of the heterozygous experiments was to see whether mislocalized TDP43 could potentially trap the untagged TDP43. If this was not the case, we would have seen a maximum of 50% of the TDP43 signal mislocalized to the cytoplasm. The fact that a sizeable proportion of cells had significantly higher levels of TDP43 loss from the nucleus, indicates that mislocalized TDP43 can indeed trap the untagged protein fraction. We used GFP immunostaining to identify the tagged TDP43 while an antibody against the endogenous TDP43 protein was used to detect total TDP43 levels. In the cells that show near complete loss of nuclear TDP43, the total TDP43 signal coincides with the GFP (tagged TDP43) signal. We are unable to distinguish the untagged fraction selectively as we do not have an antibody that can detect this directly.  

      But we agree with the reviewer that these observations need further detailed follow-up that we are unable to provide currently. Hence, we have removed this figure from the manuscript.

      (3) The experiments in which dox was used to induce the nanobody-NES, then dox withdrawn to study potential longer-lasting or self-perpetuating inductions of aggregation is potentially interesting. However, the nanobody was only measured at the RNA level. We know that protein half lives can be very long in neurons, and therefore residual nanobody could be present at these delayed time points. The key measurement to make would be at the protein level of the nanobody if any conclusions are be made from this experiment.

      The reviewer has highlighted an important point. To address this issue, we tagged the nanobodies with a V5 tag that allowed us to directly measure nanobody levels within cells. After Dox withdrawal, we indeed observed significant expression of the nanobody within cells even after two weeks of Dox withdrawal. Extending the time point to three weeks allowed complete loss of the nanobody in most neurons. However, in contrast to our observations at two weeks, this was accompanied by a reversal of TDP43 mislocalization in these neurons at three weeks (Fig. 5).

      Surprisingly, in less than 10% of the neurons, we observed >80% of the total TDP43 still mislocalized to the cytoplasm, despite nearly undetectable levels of the nanobody. Super-resolution microscopy further revealed persistent cytoplasmic TDP43 in these neurons that did not overlap with residual nanobody signal. This suggests that in these neurons, the nanobody was no longer required to maintain TDP43 mislocalization (Fig. 5, page 7)

      (4) Potential differences in splicing and microRNAs between TDP-43 knockdown and TDP-43 mislocalization are potentially interesting. However, different patterns of dysregulated RNA splicing can occur at different levels of TDP-knockdown, thus it is difficult to assess whether the changes observed in this paper are due to mislocalization per se, or rather just reflect differences in nuclear TDP-43 abundance.

      This a fair point. It is possible that microRNA dysregulation might require a greater loss of nuclear TDP43 and maybe more resilient to TDP43 loss as compared to splicing. We have acknowledged this in the discussion section (page 9).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be helpful to include nuclear vs cytoplasmic ratios of TDP-43 instead of simply "% nuclear TDP-43"

      We have used % nuclear TDP43 as these values have biologically meaningful upper and lower bounds, which makes it easier to compare across experiments. We found that using a ratio of nuclear vs cytoplasmic TDP43 intensities displayed higher variability and a wider range.

      We have re-labelled the y-axis as “% Nuclear TD43 / soma TDP43” to make our quantification clearer. The conversion from % nuclear TDP43 to N/C is straightforward. If the % nuclear TDP43 is X, then the N/C ratio can be calculated as X / (100-X). For example, a % nuclear TDP43 of 80% would amount to an N/C ratio of 80/20 = 4.

      (2) The axis descriptions in Figure 1D are very unclear. While this is described better in the figure legend, it would be beneficial to have a more descriptive y-axis title in the figure (which may mean increasing the number of graphs).

      Axis descriptions and figures changed as recommended.

      (3) In Figure 1, the time points at which iPSNs were transduced with nanobody and/or fixed for immunostaining is somewhat inconsistent across all panels. This hinders interpretation of the figure as a whole. The authors should use same transduction and immunostaining time points for consistency or demonstrate that the same phenotype is observed regardless of transduction and immunostaining day as long as the time in between (time of nano body expression) is consistent. Subsequently, in Figure 2, a different set of time points is used.

      Please see our response in the public comments above

      (4) In Figure 1, please show individual data points for each independent differentiation to demonstrate the level of reproducibility from batch to batch.

      Data points have been shown per replicate (Supplementary Fig. 2)

      We have refined our approach for phenotypic analysis to improve consistency across different clones. Previously, we set thresholds on % nuclear TDP43 to distinguish MNs with nuclear versus mislocalized TDP43. This was done by ranking all cells based on % nuclear TDP43 and applying quantile-based thresholds—designating the top 25% as control and the bottom 25% as mislocalized, ensuring equal number of cells per category. However, we observed significant variability in thresholds across clones. For instance, the E8 clone had thresholds of 96% and 29%, while the E5 clone had 93% and 40%.

      To address this, we reanalysed the data using a standardized three-bin approach:

      (1) Control: MNs expressing the control nanobody.

      (2) Low-Moderate Mislocalization: MNs expressing the NES nanobody with > 40% nuclear TDP43.

      (3) Severe Mislocalization: MNs expressing the NES nanobody with < 40% nuclear TDP43.

      This approach ensured a more reliable comparison of TDP43 mislocalization effects across experiments. The conclusions remain the same.

      (5) In Figure 2, please show individual data points.

      Data points for all the qPCR analyses in the paper have been included as a supplementary text file.

      (6) In Figure 3, please show individual data points.

      Data points for the western blot data have been included as a supplementary data file.

      All other comments are within the public review.

      Reviewer #2 (Recommendations For The Authors):

      (1) In general more robust quantification of many of the described phenotypes are necessary. In particular, no apparent quantification of cytosolic mislocalization was performed in Figure 1, or quantification of mislocalization of Figure 3F. It is unclear in the western blot in Fig 1G if TDP-43 signal were normalized to total protein, and of note it seems that expression of the intrabody-NES reduced total proteins in the western blots that were shown. No quantification or measurement of the insoluble material was done or shown.

      We have quantified cytosolic mislocalization of TDP43 (Fig. 1C). The y-axis indicates the total TDP43 signal observed in the nucleus as a percentage of the total signal observed in the soma (including the nucleus). This value has the advantage of ranging between 100% (perfectly nuclear) to 0% (complete nuclear loss). The boxplots indicate that expression of the NES-nanobody results in a range of cytosolic mislocalization with a median value around 40% of the TDP43 remaining in the nucleus.

      Western blot data in previous Fig. 1G was normalized to alpha-tubulin. We were unable to get a good signal for the insoluble fraction. From the alpha-tubulin alone, it cannot be concluded that NES-nanobody results in a decrease in total protein levels. In the revised western blot for phosphorylated TDP43 (Fig. 3D, E), we have quantified total and phosphorylated TDP43. Here, we observe a six-fold increase in the levels of phosphorylated TDP43 without a significant change in total TDP43 protein levels.

      To avoid potential mis-interpretation of our results, we have now removed the previous Fig. 1G.

      (2) Additional images of nearly all microscopy data at higher magnifications would be required to better evaluate TDP-43 localization. Ideally including images for each channel in addition to merged images, and especially for key figures such as Figure 1B, 3B, 3F.

      Better images have been provided.

      (3) No control images were shown for Figure 1F and 3F. It is unclear what the bright punctate spots of cytoplasmic TDP-43 GFP signal represent. Are these true aggregates? If so, additional characterization would be required before such conclusions can be made, beyond the relatively superficial western blot analysis that was done in Figure 1.

      Control images have now been provided (Figure 1E). As we mentioned above, immunostaining analysis to characterize whether the aggregates are phosphorylated failed to provide a clear signal. However, we have now confirmed that the mislocalized TDP43 is indeed hyper-phosphorylated (Figure 3D, E). We have acknowledged this in the main text, and have referred to these as puncta reminiscent of aggregates (Page 4, Page 6).

    1. eLife Assessment

      This paper reports on an important study that aims to move beyond current experimental approaches in speech production by (1) investigating speech in the context of a fully interactive task and (2) employing advanced methodology to record intracranial brain activity. Together these allow for examination of the unfolding temporal dynamics of brain-behaviour relationships during interactive speech. This approach and the analyses presented in support of the authors' claims pose convincing evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:<br /> (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real time, is novel and exciting.<br /> (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.<br /> (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.<br /> (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation about predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.<br /> (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.<br /> (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?<br /> (4) In the discussion the results are related to a previously described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.<br /> (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      Comments on revisions:

      The authors have generally responded to the critiques from the first round of review, and have provided additional details that help readers to understand what was done.

      In my opinion, the paper still suffers from a lack of clarity about the interpretation, which is partly due to the fact that the results themselves are not straightforward. For example, the heterogeneity across individual electrodes that is obvious from Fig 3 makes it hard to justify the ROI-based approach. And even the electrode clustering, while more data-driven, does not substantially help the fact that the effects appear to be less spatially-organized than the authors may want to claim.

      I recognize the value of introducing this new mutual adaptation paradigm, which is the main strength of the paper. However, the conclusions that can be drawn from the data presented here seem incomplete at best.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left (and to a lesser extent, the right) hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (left secondary auditory cortex and bilateral IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the brain areas involved in interactive speaking behaviours, particularly highlighting high-frequency activity of the bilateral IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using pre-recorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      Weaknesses:

      One limitation of the current study is the relatively sparse coverage of the right hemisphere by the implanted electrodes (91 electrodes in the right compared to 145 in the left). Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. In a previous version of this article, the authors therefore chose not to include data from the right hemisphere in reported analyses. However, after highlighting previous literature suggesting that the right hemisphere likely has high relevance to verbal coordination behaviours such as those under investigation here, the authors have now added analyses of the right hemisphere data to the results. These confirm an involvement of the right hemisphere in this task, largely replicating left hemisphere results. Some hemispheric differences were found in responses within the STG; however, interpretation should be tempered by an awareness of the relatively sparse coverage of the right hemisphere meaning that some regions have very few electrodes, resulting in reduced statistical power.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in the secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in the inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:

      (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real-time, is novel and exciting.

      (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.

      (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.

      (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      We thank the Reviewer for their positive comments on our manuscript.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation of predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.

      We understand the reviewer’s comment. It is true that this work, being the first in the field using real-time adapting synchronous speech and intracerebral neural data, is a descriptive work, that hopefully will pave the way for further studies. We have now added more statistical analyses (see point 2) to go beyond a descriptive approach and we have also rewritten the discussion to clarify how this work can possibly contribute to disentangle different models of language interaction. Most importantly we have also run new analyses taking into account the specific phase relationship, as suggested.

      We already had an analysis using instantaneous phase difference in the phase-amplitude coupling approach, that bridges phase of behaviour to neural responses (amplitude in the high-frequency range). However, this analysis, as the reviewer noted, does not distinguish between positive and negative lags, but rather uses the continuous fluctuations of coordinative behaviour. Following the reviewer’s suggestion, we have now run a new analysis estimating the average delay (between virtual partner speech and patient speech) in each trial, using a cross-correlation approach. This gives a distribution of delays across trials that can then be “binned” as positive or negative. We have thus rerun the phase-amplitude coupling analyses on positive and negative trials separately, to assess whether the phase amplitude relationship depends upon the anticipatory (negative lags) or compensatory (positive lags) behaviour. Our new analysis (now in the supplementary, see figure below) does not reveal significant differences between positive and negative lags. This lack of difference, although not easy to interpret, is nonetheless interesting because it seems to show that the IFG does not have a stronger coupling for anticipatory trials. Rather the IFG seems to be strongly involved in adjusting behaviour, minimizing the error, independently of whether this is early or late.

      We have updated the “Coupling behavioural and neurophysiological data” section in Materials and methods as follows:  

      “In the third approach, we assessed whether the phase-amplitude relationship (or coupling) depends upon the anticipatory (negative delays) or compensatory (positive delays) behaviour between the VO and the patients’ speech. We computed the average delay in each trial using a cross-correlation approach on speech signals (between patient and VP) with the MATLAB function xcorr. A median split (patient-specific ; average median split = 0ms, average sd = 24ms) was applied to conserve a sufficient amount of data, classifying trials below the median as “anticipatory behaviour” and trials above the median as “compensatory behaviour”. Then we conducted the phase-amplitude coupling analyses on positive and negative trials separately.”

      We also added a paragraph on this finding in the Discussion:

      “Our results highlight the involvement of the inferior frontal gyrus (IFG) bilaterally, in particular the BA44 region, in speech coordination. First, trials with a weak verbal coordination (VCI) are accompanied by more prominent high frequency activity (HFa, Fig.4; Fig.S4). Second, when considering the within-trial time-resolved dynamics, the phase-amplitude coupling (PAC) reveals a tight relation between the low frequency behavioural dynamics (phase) and the modulation of high-frequency neural activity (amplitude, Fig.5B ; Fig.S5). This relation is strongest when considering the phase adjustments rather than the phase of speech of the VP per se : larger deviations in verbal coordination are accompanied by increase in HFa. Additionally, we also tested for potential effects of different asynchronies (i.e., temporal delay) between the participant's speech and that of the virtual partner but found no significant differences (Fig.S6). While lack of delay-effect does not permit to conclude about the sensitivity of BA44 to absolute timing of the partner’s speech, its neural dynamics are linked to the ongoing process of resolving phase deviations and maintaining synchrony.”

      (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.

      The reviewer is right. We have now added statistical analyses showing that:

      (1) the ratio between synchronization and desynchronization evolves across frequencies (as often reported in the literature).

      (2) the sign of rho values also evolves across frequencies.

      (3) the clustering does indeed differ when taking into account behaviour. We have also clarified the use of clustering and the reasoning behind it.

      We have updated the Materials and methods section as follows:

      “The statistical difference between spatial clustering in global effect and brain-behaviour correlation was estimated with linear model using the R function lm (stat package), post-hoc comparisons were corrected for multiple comparisons using the Tukey test (lsmeans R package ; Lenth, 2016). The statistical difference between clustering in global effect and behaviour correlation across the number of clusters was estimated using permutation tests (N=1000) by computing the silhouette score difference between the two conditions.” We have updated the Results section as follows:

      (1) “This modulation between synchronization and desynchronization across frequencies was significant (F(5) = 6.42, p < .001 ; estimated with linear model using the R function lm).”

      (2) “The first observation is a gradual transition in the direction of correlations as we move up frequency bands, from positive correlations at low frequencies to negative ones at high frequencies (F(5) = 2.68, p = .02). This effect, present in both hemispheres, mimics the reversed desynchronization/synchronization process in low and high frequency bands reported above.”

      (3) “Importantly, compared to the global activity (task vs rest, Fig 3A), the neural spatial profile of the behaviour-related activity (Fig 3B) is more clustered, in the left hemisphere. Indeed, silhouette scores are systematically higher for behaviour-related activity compared to global activity, indicating greater clustering consistency across frequency bands (t(106) = 7.79, p < .001, see Figure S3). Moreover, silhouette scores are maximal, in particular for HFa, for five clusters (p < .001), located in the IFG BA44, the IPL BA 40 and the STG BA 41/42 and BA22 (see Figure S3).”

      (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?

      Thank you for this thoughtful feedback. We agree that the relationship between speech coordination and specific speech units, such as consonants versus vowels, is an intriguing question. However, in our study, both interlocutors (the participant and the virtual partner) are adapting their speech production in real-time. This interactive coordination makes it difficult to isolate neural signatures corresponding to precise segments like consonants or vowels, as the adjustments occur in a continuous and dynamic context.

      The VP's ability to adapt depends on its sensitivity to spectral cues, such as the transition from one phonetic element to another. This is likely influenced by the type of articulation, with certain transitions being more salient (e.g., between a stop consonant like "p" and a vowel like "a") and others being less distinct (e.g., between nasal consonants like "m" and a vowel). Thus, the VP’s spectral adaptation tends to occur at these transitions, which are more prominent in some cases than in others.

      For the participants, previous studies have shown a greater sensitivity during the production of stressed vowels (Oschkinat & Hoole, 2022; Li & Lancia, 2024), which may reflect a heightened attentional or motor adjustment to stressed syllables.

      Here, we did not specifically address the question of coordination at the level of individual linguistic units. Moreover, even if we attempted to focus on this level, it would be challenging to relate neural dynamics directly to specific speech segments. The question of how synchronization at the level of individual linguistic units might relate to neural data is complex. The lack of clear, unit-specific predictions makes it difficult to parse out distinct neural signatures tied to individual segments, particularly when both interlocutors are continuously adjusting their speech in relation to one another.

      Therefore, while we recognize the potential importance of examining synchronization at the level of individual phonetic elements, the design of our task and the nature of the coordination in this interactive context (realtime bidirection adaptation) led us to focus more broadly on the overall dynamics of speech synchronization at the syllabic level, rather than on specific linguistic units.

      We now state at the end of the Discussion section:

      “It is worth noting that the influence of specific speech units, such as consonants versus vowels, on speech coordination remains to be explored. In non-interactive contexts, participants show greater sensitivity during the production of stressed vowels, possibly reflecting heightened attentional or motor adjustments (Oschkinat & Hoole, 2022; Li & Lancia, 2024). In this study, the VP’s adaptation relies on sensitivity to spectral cues, particularly phonetic transitions, with some (e.g., formant transitions) being more salient than others. However, how these effects manifest in an interactive setting remains an open question, as both interlocutors continuously adjust their speech in real time. Future studies could investigate whether coordination signals, such as phase resets, preferentially align with specific parts of the syllable.” References cited:

      – Oschkinat, M., & Hoole, P. (2022). Reactive feedback control and adaptation to perturbed speech timing in stressed and unstressed syllables. Journal of Phonetics, 91, 101133.

      – Li, J., & Lancia, L. (2024). A multimodal approach to study the nature of coordinative patterns underlying speech rhythm. In Proc. Interspeech, 397-401.

      (4) In the discussion the results are related to a previously-described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.

      We thank the reviewer for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised Discussion section, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context". Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised Discussion also incorporates findings by Ozker et al. (2022, 2024), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases. This result is reminiscent of findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externallygenerated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection. In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020). Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.” References cited:

      – Franken, M. K., Hartsuiker, R. J., Johansson, P., Hall, L., & Lind, A. (2021). Speaking With an Alien Voice: Flexible Sense of Agency During Vocal Production. Journal of Experimental Psychology-Human perception and performance, 47(4), 479-494. https://doi.org/10.1037/xhp0000799

      – Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in human neuroscience, 5, 82.

      – Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say. Psychological Science, 25(6), 1198-1205. https://doi.org/10.1177/0956797614529797

      – Meekings, S., & Scott, S. K. (2021). Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies. Journal of Cognitive Neuroscience, 33(3), 422-444. https://doi.org/10.1162/jocn_a_01661

      – Niziolek C. A., Nagarajan S. S., Houde J. F (2013) What does motor efference copy represent? Evidence from speech production Journal of Neuroscience 33:16110–16116Ozker M., Doyle W., Devinsky O., Flinker A (2022) A cortical network processes auditory error signals during human speech production to maintain fluency PLoS Biology 20.

      – Ozker, M., Yu, L., Dugan, P., Doyle, W., Friedman, D., Devinsky, O., & Flinker, A. (2024). Speech-induced suppression and vocal feedback sensitivity in human cortex. eLife, 13, RP94198. https://doi.org/10.7554/eLife.94198

      – Zheng, Z. Z., MacDonald, E. N., Munhall, K. G., & Johnsrude, I. S. (2011). Perceiving a Stranger's Voice as Being One's Own: A 'Rubber Voice' Illusion? PLOS ONE, 6(4), e18655.

      (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      The reviewer is correct, we apologize for this missing information. We now specify that the coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level.  

      Concerning the definition of coupling as weak, one should consider that, in the Kuramoto model, the strength of coupling (k) is relative to the spread of the natural frequencies (Δω) in the system. In our study, the natural frequencies of syllables range approximately from 2 Hz to 10Hz, resulting in a frequency spread of Δω = 8 Hz. For coupling to strongly synchronize oscillators across such a wide range, k must be comparable to or exceed Δω. Thus, since k = 0.1 is far much smaller than Δω, it is therefore classified as weak coupling.

      We have now modified the Materials and methods section as follows:

      “More precisely, for a third of the trials the VP had a neutral behaviour (close to zero coupling: k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = -0.09). And for the last third of the trials the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”

      Regarding the criterion of including regions recorded in at least 7 patients, our goal was to balance data completeness with statistical power. Given our total sample of 16 patients, this threshold ensures that each included region is represented in at least ~44% of the cohort, reducing the likelihood of spurious findings due to extremely small sample sizes. This choice also aligns with common neurophysiological analysis practices, where a minimum number of subjects (at least 2 in extreme cases) is required to achieve meaningful interindividual comparisons while avoiding excessive data exclusion. Additionally, this threshold maintains a reasonable tradeoff between maximizing patient inclusion and ensuring that statistical tests remain robust.

      We have now added more information in the Results section “Spectral profiles in the language network are nuanced by behaviour” on this point as follows:

      “To balance data completeness and statistical power, we included only brain regions recorded in at least 7 patients (~44% of the cohort) for the left hemisphere and at least 5 patients for the right hemisphere (~31% of the cohort), ensuring sufficient representation while minimizing biases due to sparse data.”

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (secondary auditory cortex and IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the left hemisphere brain areas involved in interactive speaking behaviours, particularly highlighting the highfrequency activity of the IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor, and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using prerecorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      We sincerely appreciate the Reviewer's thoughtful and positive feedback on our manuscript.

      Weaknesses:

      One major limitation of the current study is the lack of coverage of the right hemisphere by the implanted electrodes. Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. However, this means that the current study neglects the potentially important role of the right hemisphere in this task. The right hemisphere has previously been proposed to support feedback control for speech (likely a core process engaged by synchronous speech), as opposed to the left hemisphere which has been argued to underlie feedforward control (Tourville & Guenther, 2011). Indeed, a previous fMRI study of synchronous speech reported the engagement of a network of right hemisphere regions, including STG, IPL, IFG, and the temporal pole (Jasmin et al., 2016). Further, the release from speech-induced suppression during a synchronous speech reported by Jasmin et al. was found in the right temporal pole, which may explain the discrepancy with the current finding of reduced leftward high-frequency activity with increasing verbal coordination (suggesting instead increased speech-induced suppression for successful synchronisation). The findings should therefore be interpreted with the caveat that they are limited to the left hemisphere, and are thus likely missing an important aspect of the neural processing underpinning verbal coordination behaviour.

      We have now included, in the supplementary materials, data from the right hemisphere, although the coverage is a bit sparse (Figures S2, S4, S5, see our responses in the ‘Recommendation for the authors’ section, below). We have also revised the Discussion section to add the putative role of right temporal regions (see below as well).

      A further limitation of this study is that its findings are purely correlational in nature; that is, the results tell us how neural activity correlates with behaviour, but not whether it is instrumental in that behaviour. Elucidating the latter would require some form of intervention such as electrode stimulation, to disrupt activity in a brain area and measure the resulting effect on behaviour. Any claims therefore as to the specific role of brain areas in verbal coordination (e.g. the role of the IFG in supporting online coordinative adjustments to achieve synchronisation) are therefore speculative.

      We appreciate the reviewer’s observation regarding the correlational nature of our findings and agree that this is a common limitation of neuroimaging studies. While elucidating causal relationships would indeed require intervention techniques such as electrical stimulation, our study leverages the unique advantages of intracerebral recordings, offering the best available spatial and temporal resolution alongside a high signal-tonoise ratio. These attributes ensure that our data accurately reflect neural activity and its temporal dynamics, providing a robust foundation for understanding the relationship between neural processes and behaviour. Therefore, while causal claims are beyond the scope of this study, the precision of our methodology allows us to make well-supported observations about the neural correlates of synchronous speech tasks.

      Recommendations for the authors:

      Reviewing Editor Comment:

      After joint consultation, we are seeing the potential for the report to be strengthened and the evidence here to be deemed ultimately at least 'solid': to us (editors and reviewers) it seems that this would require both (1) clarifying/acknowledging the limitations of not having right hemisphere data, and (2) running some of the additional analyses the reviewers suggest, which should allow for richer examination of the data e.g. phase relationships in areas that correlate with synchronisation.

      We have now added data on the right hemisphere (RH) that we did not previously report due to a rather sparse sampling of the RH. These results are now reported in the Results section as well as in the Supplementary section, where we put all right hemisphere figures for all analyses (Figure S2, S4, S5). We have also run additional analyses digging into the phase relationship in areas that correlate with synchronisation (Figure S6). These additional analyses allowed us to improve the Discussion section as well.

      Reviewer #1 (Recommendations For The Authors):

      In some sections, the writing is a bit unclear, with both typos and vague statements that could be fixed with careful proofreading.

      We thank the reviewer for pointing out areas where the writing could be improved. We carefully proofread the manuscript to address typos and clarify any vague statements. Specific sections identified as unclear have been rephrased for better precision and readability.

      In Figure 1, the colors repeat, making it impossible to tell patients apart.

      We have now updated Figure 1 colormap to avoid redundancy and added the right hemisphere.

      Line 132: "16 unilateral implantations (9 left, 7 bilateral implantations)". Should this say 7 right hemisphere? If so, the following sentence stating that there was "insufficient cover [sic] of the right hemisphere" is unclear, since the number of patients between LH and RH is similar.

      The confusion was due to the fact that the lateralization refers to the presence/absence of electrodes in the Heschl’s gyrus (left : H’ ; right : H) exclusively.

      We have thus changed this section as follows:

      “16 patients (7 women, mean age 29.8 y, range 17 - 50 y) with pharmacoresistant epilepsy took part in the study. They were included if their implantation map covered at least partially the Heschl's gyrus and had sufficiently intact diction to support relatively sustained language production.” The relevant part (previously line 132) now states:

      “Sixteen patients with a total of 236 electrodes (145 in the left hemisphere) and 2395 contacts (1459 in the left hemisphere, see Figure 1). While this gives a rather sparse coverage of the right hemisphere, we decided, due to the rarity of this type of data, to report results for both hemispheres, with figures for the left hemisphere in the main text and figures for the right hemisphere in the supplementary section.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To address the concern regarding the absence of data from the right hemisphere, I would advise the authors to directly acknowledge this limitation in their Discussion section, citing relevant work suggesting that the right hemisphere has an important role to play in this task (e.g. Jasmin et al., 2016). You should also make this clear in your abstract e.g. you could rewrite the sentence in line 40 to be: "Then, we recorded the intracranial brain activity of the left hemisphere in 16 patients with drug-resistant epilepsy...".

      We are grateful to the reviewer for this comment that incited us to look into the right hemisphere data. We have now included results in the right hemisphere, although the coverage is a bit sparse. We have also revised the Discussion section to add the putative role of right temporal regions. Interestingly, our results show, as suggested by the reviewer, a clear involvement of the RH in this task.

      First, the full brain analyses show a very similar implication of the RH as compared to the LH (see Figure below). We have now added in the Results section:

      “As expected, the whole language network is strongly involved, including both dorsal and ventral pathways (Fig 3A). More precisely, in the left temporal lobe the superior, middle and inferior temporal gyri, in the left parietal lobe the inferior parietal lobule (IPL) and in the left frontal lobe the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG). Similar results are observed in the right hemisphere, neural responses being present across all six frequency bands with medium to large modulation in activity compared to baseline (Figure S2A) in the same regions. Desynchronizations are present in the theta, alpha and beta bands while the low gamma and HFa bands show power increases.”

      As to compared to the left hemisphere, assessing brain-behaviour correlations in the right hemisphere does not provide the same statistical power, because some anatomical regions have very few electrodes. Nonetheless, we observe a strong correlation in the right IFG, similar to the one we previously reported in the left hemisphere, and we now report in the Results section:

      “The decrease in HFa along the dorsal pathway is replicated in the right hemisphere (Figure S4). However, while both the right STG BA41/42 and STG BA22 present a power increase (compared to baseline) — with a stronger increase for the STG BA41/42 — neither shows a significant correlation with verbal coordination (t(45)=-1.65, p=.1 ; t(8)=-0.67, p=.5 ; Student’s T test, FDR correction). By contrast, results in the right IFG BA44 are similar to the one observed in the left hemisphere with a significant power increase associated with a negative brainbehaviour correlation (t(17) = -3.11, p = .01 ; Student’s T test, FDR correction).”

      Interestingly, the phase-amplitude coupling analysis yields very similar results in both hemispheres (exception made for BA22). We have thus updated the Results section as follows:

      “Notably, when comparing – within the regions of interest previously described – the PAC with the virtual partner speech and the PAC with the phase difference, the coupling relationship changes when moving along the dorsal pathway: a stronger coupling in the auditory regions with the speech input, no difference between speech and coordination dynamics in the IPL and a stronger coupling for the coordinative dynamics compared to speech signal in the IFG (Figure 5B ). When looking at the right hemisphere, we observe the same changes in the coupling relationship when moving along the dorsal pathway, except that no difference between speech and coordination dynamics is present in the right secondary auditory regions (STG BA22; Figure S5).”

      We also included in the Discussion section the right hemisphere results also mentioning previous work of Guenther and the one of Jasmin. On the section “Left secondary auditory regions are more sensitive to coordinative behaviour” one can read:

      “Furthermore, the absence of correlation in the right STG BA22 (Figure S4) seems in first stance to challenge influential speech production models (e.g. Guenther & Hickok, 2016) that propose that the right hemisphere is involved in feedback control. However, one needs to consider the the task at stake heavily relied upon temporal mismatches and adjustments. In this context, the left-lateralized sensitivity to verbal coordination reminds of the works of Floegel and colleagues (2020, 2023) suggesting that both hemispheres are involved depending on the type of error: the right auditory association cortex monitoring preferentially spectral speech features and the left auditory association cortex monitoring preferentially temporal speech features. Nonetheless, the right temporal pole seems to be sensitive to speech coordinative behaviour, confirming previous findings using fMRI (Jasmin et al., 2016) and thus showing that the right hemisphere has an important role to play in this type of tasks (e.g. Jasmin et al., 2016).”

      References cited:

      – Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      – Floegel, M., Kasper, J., Perrier, P., & Kell, C. A. (2023). How the conception of control influences our understanding of actions. Nature Reviews Neuroscience, 24(5), 313-329.

      – Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In Neurobiology of language (pp. 725-740). Academic Press.

      (2) When discussing previous work on alignment during synchronous speech, you may wish to include a recently published paper by Bradshaw et al (2024); this manipulated the acoustics of the accompanist's voice during a synchronous speech task to show interactions between speech motor adaptation and phonetic convergence/alignment.

      We thank the reviewer for pointing to this recent and interesting paper. We added the article as reference as follows

      “Furthermore, synchronous speech favors the emergence of alignment phenomena, for instance of the fundamental frequency or the syllable onset (Assaneo et al., 2019 ; Bradshaw & McGettigan, 2021 ; Bradshaw et al., 2023; Bradshaw et al., 2024).”

      (3) Line 80: "Synchronous speech resembles to a certain extent to delayed auditory feedback tasks"- I think you mean "altered auditory feedback tasks" here.

      In the case of synchronous speech it is more about timing than altered speech signals, that is why the comparison is done with delayed and not altered auditory feedback. Nonetheless, we understand the Reviewer’s point and we have now changed the sentence as follows:

      “Synchronous speech resembles to a certain extent to delayed/altered auditory feedback tasks”

      (4) When discussing superior temporal responses during such altered feedback tasks, you may also want to cite a review paper by Meekings and Scott (2021).

      We thank the reviewer for this suggestion, indeed this was a big oversight!

      The paper is now quoted in the introduction as follows:

      “Previous studies have revealed increased responses in the superior temporal regions compared to normal feedback conditions (Hirano et al., 1997 ; Hashimoto & Sakai, 2003 ; Takaso et al., 2010 ; Ozerk et al., 2022 ; Floegel et al., 2020 ; see Meekings & Scott, 2021 for a review of error-monitoring and feedback control in the STG during speech production).”

      Furthermore, we updated the discussion part concerning the speaker-induced suppression phenomenon (see below our response to the point 10).

      (5) Line 125: "The parameters and sound adjustment were set using an external low-latency sound card (RME Babyface Pro Fs)". Can you please report the total feedback loop latency in your set-up? Or at the least cite the following paper which reports low latencies with this audio device.

      Kim, K. S., Wang, H., & Max, L. (2020). It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback. Journal of Speech, Language, and Hearing Research, 63(8), 25222534. https://doi.org/10.1044/2020_JSLHR-19-00419

      We now report the total feedback loop latency (~5ms) and also cite the relevant paper (Kim et al., 2020).

      (6) Line 127 "A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli." What do you mean here by an 'optimal balance'? Was the participant's own voice always louder than the VP stimuli? Can you report roughly what you consider to be a comfortable volume in dB?

      This point was indeed unlcear. We have now changed as follows:

      “A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli. The aim of this procedure was that the patient would subjectively perceive their voice and the VP-voice in equal measure. VP voice was delivered at approximately 70dB.”

      (7) Relatedly, did you use any noise masking to mask the air-conducted feedback from their own voice (which would have been slightly out of phase with the feedback through the headphones, depending on your latency)?

      Considering the low-latency condition allowed with the sound card (RME Babyface Pro Fs), we did not use noise masking to mask the air-conducted feedback from the self-voice of the patients.

      (8) Line 141: "four short sentences were pre-recorded by a woman and a man." Did all participants synchronise with both the man and woman or was the VP gender matched to that of the participant/patient?

      We thank the reviewer for this important missing detail. We know changed the text as follows:

      “Four stimuli corresponding to four short sentences were pre-recorded by both a female and a male speaker. This allowed to adapt to the natural gender differences in fundamental frequency (i.e. so that the VP gender matched that of the patients). All stimuli were normalised in amplitude.”

      (9) Can you clarify what instructions participants were given regarding the VP? That is, were they told that this was a recording or a real live speaker? Were they naïve to the manipulation of the VP's coupling to the participant?

      We have now added this information to the task description as follows:

      “Participants, comfortably seated in a medical chair, were instructed that they would perform a real-time interactive synchronous speech task with an artificial agent (Virtual Partner, henceforth VP, see next section) that can modulate and adapt to the participant’s speech in real time.”

      “The third step was the actual experiment. This was identical to the training but consisted of 24 trials (14s long, speech rate ~3Hz, yielding ~1000 syllables). Importantly, the VP varied its coupling behaviour to the participant. More precisely, for a third of the sequences the VP had a neutral behaviour (close to zero coupling : k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = - 0.09). And for the last third of the sequences the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”  

      (10) The paragraph from line 438 entitled "Secondary auditory regions are more sensitive to coordinative behaviour" includes an interesting discussion of the relation of the current findings to the phenomenon of speech-induced suppression (SIS). However, the authors appear to equate the observed decrease in highfrequency activity as speech coordination increases with the phenomenon of SIS (in lines 456-457), which is quite a speculative leap. I would encourage the authors to temper this discussion by referring to SIS as a potentially related phenomenon, with a need for more experimental work to determine if this is indeed the same phenomenon as the decreases in high-frequency power observed here. I believe that the authors are arguing here for an interpretation of SIS as reflecting internal modelling of sensory input regardless of whether this is self-generated or other-generated; if this is indeed the case, I would ask the authors to be more explicit here that these ideas are not a standard part of the traditional account of SIS, which only includes internal modelling of self-produced sensory feedback.

      As stated in the public review, we thank both reviewers for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised discussion, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context." Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised discussion also incorporates findings by Ozker et al. (2024, 2022), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of synchrony increases. This result aligns with findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externally generated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection.

      In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020).”

      (11) Within this section, you also speculate in line 460 that "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice." I would recommend citing studies on the 'rubber voice' effect to back up this claim (e.g. Franken et al., 2021; Lind et al., 2014; Zheng et al., 2011).

      We are grateful to the Reviewer for this interesting suggestion. Directly following the previous comment, the section now states:

      “Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.”

      (12) As noted in my public review, since your methods are correlational, you need to be careful about inferring the causal role of any brain areas in supporting a specific aspect of functioning e.g. line 501-504: "By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the input-output phase difference (input of the VP - output of the speaker), a metric that reflects the amount of error in the internal computation to reach optimal coordination, which indicates that this region optimises the predictive and coordinative behaviour required by the task." I would argue that the latter part of this sentence is a conclusion that, although consistent with, goes beyond the current data in this study, and thus needs tempering.

      We agree with the Reviewer and changed the sentence as follows:

      “By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the inputoutput phase difference (input of the VP - output of the speaker), a metric that could possibly reflect the amount of error in the internal computation to reach optimal coordination. This indicates that this region could have an implication in the optimisation of the predictive and coordinative behaviour required by the task.”

    1. eLife Assessment

      This study reveals the important role of upstream open reading frames (uORFs) in limiting the translational variability of downstream coding sequences. Through a combination of computational simulations, comparative analyses of translation efficiency across different developmental stages in two closely related Drosophila species, and manipulative, experimental validation of translation buffering by an uORF for a gene, the authors provide convincing evidence supporting their conclusions. This work will be of broad interest to molecular biologists and geneticists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to explore the role of upstream open reading frames (uORFs) in stabilizing protein levels during Drosophila development and evolution. By utilizing a modified ICIER model for ribosome translation simulations and conducting experimental validations in Drosophila species, the study investigates how uORFs buffer translational variability of downstream coding sequences. The findings reveal that uORFs significantly reduce translational variability, which contributes to gene expression stability across different biological contexts and evolutionary timeframes.

      Strengths:

      (1) The study introduces a sophisticated adaptation of the ICIER model, enabling detailed simulation of ribosomal traffic and its implications for translation efficiency.<br /> (2) The integration of computational predictions with empirical data through knockout experiments and translatome analysis in Drosophila provides a compelling validation of the model's predictions.<br /> (3) By demonstrating the evolutionary conservation of uORFs' buffering effects, the study provides insights that are likely applicable to a wide range of eukaryotes.

      Weaknesses:

      (1) Although the study is technically sound, it does not clearly articulate the mechanisms through which uORFs buffer translational variability. A clearer hypothesis detailing the potential molecular interactions or regulatory pathways by which uORFs influence translational stability would enhance the comprehension and impact of the findings.<br /> (2) The study could be further improved by a discussion regarding the evolutionary selection of uORFs. Specifically, it would be beneficial to explore whether uORFs are favored evolutionarily primarily for their role in reducing translation efficiency or for their capability to stabilize translation variability. Such a discussion would provide deeper insights into the evolutionary dynamics and functional significance of uORFs in genetic regulation.

      Comments on revisions:

      The authors have adequately addressed my previous concerns.

    3. Reviewer #2 (Public review):

      uORFs, short open reading frames located in the 5' UTR, are pervasive in genomes. However, their roles in maintaining protein abundance are not clear. In this study, the authors propose that uORFs act as "molecular dam", limiting the fluctuation of the translation of downstream coding sequences. First, they performed in silico simulations using an improved ICIER model, and demonstrated that uORF translation reduces CDS translational variability, with buffering capacity increasing in proportion to uORF efficiency, length, and number. Next, they analysed the translatome between two related Drosophila species, revealing that genes with uORFs exhibit smaller fluctuations in translation between the two species and across different developmental stages within the same species. Moreover, they identified that bicoid, a critical gene for Drosophila development, contains a uORF with substantial changes in translation efficiency. Deleting this uORF in Drosophila melanogaster significantly affected its gene expression, hatching rates, and survival under stress conditions. Lastly, by leveraging public Ribo-seq data, the authors showed that the buffering effect of uORFs is also evident between primates and within human populations. Collectively, the study significantly advances our understanding of how uORFs regulate the translation of downstream coding sequences at the genome-wide scale, as well as during development and evolution. It would be particularly interesting to explore whether similar buffering functions are conserved in other organisms, and whether their regulatory effects could be harnessed for practical applications, such as improving crop traits or benefiting human health.

      Comments on revisions:

      The authors have fully addressed all of my concerns, and the revisions have substantially improved the manuscript. I have no further comments.

    1. eLife Assessment

      These are valuable findings for those interested in how neural signals reflect auditory speech streams, and in understanding the roles of prediction, attention, and eye movements in this tracking. However, the evidence as it stands is incomplete. Further analyses are needed to clarify how the observed results relate to the relevant theoretical claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study aimed at replicating two previous findings that showed (1) a link between prediction tendencies and neural speech tracking, and (2) that eye movements track speech. The main findings were replicated which supports the robustness of these results. The authors also investigated interactions between prediction tendencies and ocular speech tracking, but the data did not reveal clear relationships. The authors propose a framework that integrates the findings of the study and proposes how eye movements and prediction tendencies shape perception.

      Strengths:

      This is a well-written paper that addresses interesting research questions, bringing together two subfields that are usually studied in separation: auditory speech and eye movements. The authors aimed at replicating findings from two of their previous studies, which was overall successful and speaks for the robustness of the findings. The overall approach is convincing, methods and analyses appear to be thorough, and results are compelling.

      Weaknesses:

      Eye movement behavior could have presented in more detail and the authors could have attempted to understand whether there is a particular component in eye movement behavior (e.g., blinks, microsaccades) that drives the observed effects.

    3. Reviewer #2 (Public review):

      Summary

      Schubert et al. recorded MEG and eye tracking activity while participants were listening to stories in single-speaker or multi-speaker speech. In a separate task, MEG was recorded while the same participants were listening to four types of pure tones in either structured (75% predictable) or random (25%) sequences. The MEG data from this task was used to quantify individual 'prediction tendency': the amount by which the neural signal is modulated by whether or not a repeated tone was (un)predictable, given the context. In a replication of earlier work, this prediction tendency was found to correlate with 'neural speech tracking' during the main task. Neural speech tracking is quantified as the multivariate relationship between MEG activity and speech amplitude envelope. Prediction tendency did not correlate with 'ocular speech tracking' during the main task. Neural speech tracking was further modulated by local semantic violations in the speech material and by whether or not a distracting speaker was present. The authors suggest that part of the neural speech tracking is mediated by ocular speech tracking. Story comprehension was negatively related with ocular speech tracking.

      Strengths

      This is an ambitious study, and the authors' attempt to integrate the many reported findings related to prediction and attention in one framework is laudable. The data acquisition and analyses appear to be done with great attention to methodological detail. Furthermore, the experimental paradigm used is more naturalistic than was previously done in similar setups (i.e.: stories instead of sentences).

      Weaknesses

      While the analysis pipeline is outlined in much detail, some analysis choices appear ad-hoc and could have been more uniform and/or better motivated (other than: this is what was done before).

    4. Reviewer #3 (Public review):

      I thank the authors for their extensive revision of this paper, and I found some elements greatly improved.<br /> In particular, the authors do embrace a somewhat more speculative tone in the current version, which I think is fitting for this work, as the data seem (to me) to be not fully conclusive. The data set collected here is clearly valuable and unique (and I would encourage the authors to make it publicly available!), however, my overall impression is that the specific analyses reported here might not fully

      Despite the revised description of methods, results and figures, I still have trouble understanding many of the results and the authors conclusive interpretation of them. These are my main reservations:

      (1) Regarding "individual prediction tendency" - thank you for adding clarifying methodological details and showing the data in a new Figure (#2). Honestly, however, I still can't say that I fully understand the result. For example, why is there also a significant response in the random condition as well? And how do you interpret the interesting time-course (with a peak ~200ms prior to the stimulus, and a reduction overtime from there?<br /> Also (I may have missed this, but..) what neural data was used to train the classifier and derive the "prediction tendency" index? Was it just the broadband neural response? Is there a way to know which sensors contributed to this metric (e.g., are they predominantly auditory? Frontal?)? And is there a way to establish the statistical significance of this metric (e.g., how good the decoder actually was in predicting behavioral sensitivity?). I don't see any statistics in the results section describing the individual prediction tendency.

      (2) Regarding the TRF analysis - Thanks for clarifying the approach used to obtain 2-second long "segments" of speech tracking. This is an interesting approach, however I think quite new(?) , and for me it raises a whole new set of questions, as well as additional controls and data that I would have liked to see, to be convinced that results are significant. I will elaborate:

      - Do I understand correctly that you segment the real and predicted neural response into 2-second long segments and then calculate the Pearsons' correlation between them to assess the goodness of the model? This is very unclear, since in the methods section you state only that "the same" analysis was performed as for the full data - but what exactly? Clearly, values will be very different when using such short segments. I feel that additional details are still required (and perhaps data shown) to fully understand the "semantic violation" analysis of TRFs.

      - I would like to reiterate my previous comment regarding the use of permutation tests to verify the validity of TRF-based measures derived. This would be especially important when using new approaches (such as the segmentation used here). The authors argue that this is not needed since this was not done in their previously published study. However, this sounds a bit like "two wrongs make a right" argument... why not just do it, and let us know that this 2-second segmentation approach allows estimating reliable speech tracking?

      - Following up on my previous comment that defining "clusters" as at least two neighboring channels (Figure 3) - the fact that this is a default in Fieldtrip is by no means sufficient justification!. This seems quite liberal to me, especially given the many comparisons performed. Here too, permutations can help to determine the necessary data-driven threshold for corrections. This is of course critical for interpreting the result shown in Figures 3E&G that are critical "take home messages" of the paper - i.e., that the prediction-index from the first part of the experiment is related to speech tracking in the second part of the experiment. To my eyes, this does not look extremely convincing, but perhaps the authors can show more conclusive data to support this (e.g., scatter plots of the betas across participant?).<br /> - A similar point can be made for the effect of semantic violations (though here the scalp-level result is somewhat more clustered). The authors point out that the semantic effect is a "replication" of their result reported in Schubert et al. 2023, but if I am not mistaken the results there were somewhat different (as was the manipulation). It would be nice to explicitly discuss the similarity/difference between these effects.

      (3) Regarding the ocular-TRFs -

      - Maybe this is just me, but I believe that effects that are robust should be clearly visible in the data, without the need for fancy "black-box" statistical models. In the case of the ocular TRFs, it is hard for me to see how these time-courses are not just noise (and, again, a permutation test would have helped to convince me..). The inconsistent results for horizontal and vertical eye-movements vis a vis the experimental conditions (single vs. multi-speaker conditions) don't help either, despite the authors argument that these are "independent" - but why should this be the case, especially if there is nothing really to look at in this task?<br /> - I remain with this scepticism for the mediation-portion of the analysis as well... But perhaps replications from other groups or making the data public will help shed further light on this in the future.

      Minor<br /> - Thanks for adding information about the creation of semantic-violation stimuli. Since the violations and lexical-controls were taken from different audio recordings, it would have been nice to verify that differences between neural responses cannot be attributed to differences in articulations (e.g., by comparing their spectro-temporal properties).

    1. eLife Assessment

      The authors analyses describe a novel mechanism by which a retrotransposon-derived LTR may be involved in genomic imprinting and demonstrate imprinting of the ZDBF2 locus in rabbits and Rhesus macaques using allele-specific expression analysis. This imprinting of the ZDBF2 locus correlates with transcription of GPR1-AS orthologs. The accompanying genomic analysis is very well executed allowing for the conclusions reached in the manuscript. The revisions made at the request of the reviewers in this important manuscript strengthen the evidence from the genomic analyses, and as a result, the evidence is now convincing and will be informative to the genomics and developmental biology communities.

    2. Reviewer #1 (Public review):

      Summary:

      The study tests the conservation of imprinting of the ZBDF2 locus across mammals. ZDBF2 is known to be imprinted in mouse, human and rat. The locus has a unique mechanism of imprinting: although imprinting is conferred by a germline DMR methylated in oocytes, the DMR is upstream to ZDBF2 (at GPR1) and monoallelic methylation of the gDMR does not persist beyond early developmental stages. Instead, a lncRNA (GPR1-AS, also known as Liz in mouse) initiating at the gDMR is expressed transiently in embryos and sets up a secondary DMR (by mechanisms not fully elucidated) that then confers monoallelic expression of ZDBF2 in somatic tissues.

      In this study, the authors first interrogate existing placental RNA-seq datasets from multiple mammalian species, and detect GPR1-AS1 candidate transcripts in human, baboon, macaque and mouse, but not in about a dozen other mammals. Because of the varying depth, quality and nature of these RNA-seq libraries, the ability to definitely detect the GPR1-AS1 lncRNA is not guaranteed; therefore, they generate their own deep, directional RNA-seq data from tissues/embryos from five species, finding evidence of GPR1-AS in rabbit, chimpanzee, but not bovine, pig or opossum. From these surveys, the authors conclude that the lncRNA is present only in Euarchontoglires mammals. To test the association between GPR1-AS and ZDBF2 imprinting, they perform RT-PCR and sequencing in tissue from wallabies and cattle, finding biallelic expression of ZDBF2 in these species that also lack a detected GPR1-AS transcript. From inspection of the genomic location of the GPR1-AS first exon, the authors identify an overlap with a solo LTR of the MER21C retrotransposon family in those species in which the lncRNA is observed, except for some rodents, including mouse. However, they do detect a degree of homology (46%) to the MER21C consensus at the first exon on Liz in mouse. Finally, the authors explore public RNA-seq datasets to show that GPR1-AS is expression transiently during human preimplantation development, an expression dynamic that would be consistent with the induction of monoallelic methylation of a somatic DMR at ZDBF2 and consequent monoallelic expression.

      Strengths:

      The analysis uncovers a novel mechanism by which a retrotransposon-derived LTR may be involved in genomic imprinting.<br /> The genomic analysis is very well executed.<br /> New directional and deeply-sequenced RNA-seq datasets from placenta or trophectoderm of five mammalian species and marsupial embryos, which will be of value to the community.

      Weaknesses:

      Although the genomic analysis is very strong, the study remains entirely correlative. All of the data are descriptive, and much of the analysis is performed on RNA-seq and other datasets from the public domain; a small amount of primary data is generated by the authors.<br /> Evidence that the residual LTR in mouse is functionally relevant for Liz lncRNA expression is lacking.

      Comments on revision:

      The authors have responded very constructively to all points raised by me and the other reviewers. For example, the authors have gone to further, extensive efforts in seeking to identify an LTR at the mouse Liz locus - which is not found - but additional multiple genome alignments provide evidence for sequence conservation consistent with retention of a functional relic of the MER21C in rodent genomes. Moreover, they demonstrate the promoter activity of this mouse sequence region in transfections. They have also demonstrated imprinted expression of ZDBF2 in two additional species - rabbit and rhesus macaque - consistent with their model.

    3. Reviewer #2 (Public review):

      Summary:

      This work concerns the evolution of ZDBF2 imprinting in mammalian species via initiation of GPR1 antisense (AS) transcription from a lineage-specific long-terminal repeat (LTR) retrotransposon. It extends previous work describing the mechanism of ZDBF2 imprinting in mice and humans by demonstrating conservation of GPR1-AS transcripts in rabbits and non-human primates. By identifying the origin of GPR1-AS transcription as the LTR MER21C, the authors claim to account for how imprinting evolved in these species but not in those lacking the MER21C insertion. This illustrates the principle of LTR co-option as a means of evolving new gene regulatory mechanisms, specifically to achieve parent-of-origin allele specific expression (imprinting). Examples of this phenomenon have been described previously, but usually involve initiation of transcription during gametogenesis rather than post-fertilization, as in this work. The findings of this paper are therefore relevant to biologists studying imprinted genes or interested more generally in the evolution of gene regulatory mechanisms.

      Strengths:

      (1) The authors convincingly demonstrate the existence of GPR1-AS orthologs in specific mammalian lineages using high quality RNA-seq libraries collected from diverse mammalian species.

      (2) The authors demonstrate imprinting of the ZDBF2 locus in rabbits and Rhesus macaques using allele-specific expression analysis. The transcription of GPR1-AS orthologs therefore correlates with imprinting of the ZDBF2 locus.

      Weaknesses:

      (1) Experimental evidence directly linking GPR1-AS transcription to ZDBF2 imprinting in rabbits and non-human primates is lacking. Consideration should be given to the challenges associated with studying non-model species and manipulating repeat sequences. Further, this mechanism is established in humans and mice, so the authors' model is arguably sufficiently supported merely by the existence of GPR1-AS orthologs in other mammalian lineages.

    4. Reviewer #3 (Public review):

      Kobayashi et al identify MER21C as a common promoter of GPR1-AS/Liz in Euarchontoglires, which establishes a somatic DMR that controls ZFDB2 imprinting. In mice, MER21C appears to have diverged significantly from its primate counterparts and is no longer annotated as such.

      The authors used high-quality cross-species RNA-seq data to characterise GPR1-AS-like transcripts, which included generating new data in five different species. The association between MER21C/B elements and the promoter of GPR1-AS in most species is clear and convincing. The expression pattern of MER21C/B elements overall further supports their role in enabling correct temporal expression of GPR1-AS during embryonic development.

      In the revised version of the manuscript the authors provided additional support for the common evolutionary origin of the GPR1-AS/Liz promoter between primates and rodents. They also showed a more extensive concordance between the presence of GPR1-AS-like transcripts and ZDBF2 imprinting.

      Altogether, these findings robustly support the conclusions of the paper, shedding light into the events underlying the evolution of imprinting at the ZDBF2 locus.

    5. Author response:

      The following is the authors’ response to the original reviews

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Recommendations  Analysis: 

      (1) Given that a MER21B/C LTR was not immediately identified at the start site of the Liz lncRNA in the mouse, and its match is only 46%, this raises the question of whether an analogous LTR would be identified at the homologous location in other species on deeper analysis. The authors need to argue that what has been conserved in the LTR alone in mouse is the essential element conferring the ability to initiate transcription of Liz. A transient reporter assay might be sufficient to do this. 

      We believe that the 46% identity between the first exon of mouse Liz and the consensus sequence of MER21C is so weak that its traces as MER21C are too attenuated to be detected by standard in silico analyses, such as homology searches. For instance, when pairwise alignments are performed between the first exon of mouse Liz and the consensus sequences of solo-LTRs other than MER21C, MER21C does not emerge as the most similar sequence (Figure 5 – figure supplement 1). This is in stark contrast to similar analyses involving the first exon of human and rabbit GPR1AS (which overlaps with MER21C), where MER21C is identified as the most similar sequence. [pages: 26, 31-32]

      The positions of these LTRs were initially annotated using RepeatMasker. To ensure robust analysis, we performed additional searches with RepeatMasker under more sensitive conditions, adjusting search engines (e.g., RMblast to HMMER or Cross-match) and sensitivity settings. Nevertheless, MER21C or closely related LTRs were still undetectable in mouse, rat, and hamster (Figure 4 – figure supplement 1). However, a multiple genome alignment generated by Cactus/UCSC revealed a syntenic region corresponding to the first exon of human GPR1-AS, overlapping with LTR21C, in the genomes of mice, as well as rats and hamsters (Figure 4 – figure supplement 2). Although RepeatMasker did not annotate MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Due to the limitations of the Cactus alignment track in delineating precise homologous boundaries across species, extracting sequences for evolutionary tree construction was not feasible. Nevertheless, these findings support the hypothesis that the first exon of GPR1-AS (Liz in mice) originated from a MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      A combination of traditional annotation of repetitive elements using RepeatMasker and the reconstruction of ancestral genomes through multiple genome alignment can reveal highly degenerated LTR relics. This approach is likely to point to significant future directions for research. This point is further elaborated in the discussion section. [page 42]

      Furthermore, in response to the reviewer's suggestion, we investigated the promoter activity of the GPR1-AS and Liz first exons, which are hypothesized to have originated from the same MER21C insertion. Using a dual reporter assay, we demonstrated that the first exon of mouse Liz exhibits promoter activity in a human cell line comparable to that of the human GPR1-AS promoter. Thus, despite the relatively low sequence similarity between the Liz first exon and the MER21C consensus sequence (46% as determined by pairwise alignment, Figure 5 – figure supplement 2), the promoter activity remains functionally conserved. We further discuss the potential functional motifs within the putative MER21C LTR-derived sequences in Figure 4B-D. Taken together, these findings suggest that despite a high level of degeneracy of the promoter region in rodents, including mice, the most parsimonious explanation for the origin of this regulatory element in rodents is the presence of the same LTR relic detectable in humans/primates, which is essential for robust transcription initiation of Liz and GPR1-AS, respectively. [pages: 27, 32]

      (2) Imprinting will depend on an initiating mechanism in the germline, in addition to events in the embryo that induce the secondary DMR at ZDBF2. The authors should therefore examine as far as possible the presence of a gDMR in the species with/without GPR1-AS1 and ZDBF2 imprinting. Whole-genome bisulphite sequencing data from oocytes and sperm should exist for some of the relevant species (e.g., pig, cow: Ivanova et al. 2020 PMID: 32393379; Lu et al. 2012 PMID: 34818044). 

      As the reviewer noted, the presence of a gDMR is essential for the establishment of imprinting. Following another reviewer's suggestion, we have now demonstrated that the ZDBF2 gene in rhesus monkeys is also subject to imprinting (see Figure 3C-D). We also acquired whole genome bisulfite sequencing data for rhesus monkey sperm and oocytes, identified DMRs between them, and discovered an oocyte-specifically methylated gDMR in the first exon of GPR1-AS (which overlaps with MER21C)(Figure 3 – figure supplement 1A). This finding is consistent with observations in humans and mice. Conversely, we obtained similar sequencing data for porcine and bovine sperm and oocytes and conducted the same analysis (Figure 3 – figure supplement 1A,B). However, we did not detect any oocyte-specific methylated gDMRs in the GPR1 intragenic region (where GPR1AS is transcribed from an intron of GPR1) in these species of the Laurasiatheria superorder. These results support the hypothesis that ZDBF2 is not imprinted in lineages outside the Euarchontoglires, the superorder which includes both rodents and primates. We have included these important DMR results as a supplement to Figure 3. [pages 16-21]

      Presentation: 

      (1) The first section of the Introduction would benefit from the inclusion of some additional general references on genomic imprinting. 

      We have added two review articles, Tucci et al. (2019) and Kobayashi (2021), as references in the first section of the Introduction. [page 5]

      (2) Introduction statement: "....nearly 200 imprinted genes have been identified in mice and humans. However, less than half of these genes overlapped in both species." This was the conclusion of one study (Tucci et al. 2016), so it would be better to provide a caveat to the statement "However, one comparative analysis suggested that fewer than half of these genes overlapped in both species". 

      The point being that the actual number of imprinted genes is still a matter of debate (see Edwards et al. 2023 PMID: 36916665), and the extent of overlap will depend on the strength of the evidence for each gene in the human and mouse imprinted gene lists. So, it is very difficult to put an accurate figure on the extent of overlap - but the authors' point is valid that there are species- or lineage-specific imprinted genes. 

      We have revised this sentence following reviewer #1's suggestion. [page 5]

      (3) Introduction statement: "The establishment of species-specific imprinting.....can be driven by various evolutionary events, including.....differences in the function of DNA methyltransferases". I am not aware that this has been described as an evolutionary event causing species-specific imprinting - without supporting evidence, I recommend to remove this suggestion. 

      We thank the reviewer for this comment and realize that we should have been more explicit here. We were referring to DNMT3C, a rodent-specific member of the DNMT3 family, which is responsible for the paternal methylation imprinting of Rasgrf1 in mice (Barau et al., Science, 2016), in association with the piRNA pathway and targeting of a specific retrotransposon within the DMR (Watanabe et al. Science, 2011). The Rasgrf1 gene is imprinted in mice but not considered imprinted in humans (though some conflicting data exist). While it is likely that the emergence of DNMT3C was a pre-requisite to the establishment of Rasgrf1 imprinting in evolutionary terms, clear evidence is lacking. Following the reviewer’s suggestion, we have removed the phrase "differences in the function of DNA methyltransferases" from the text. However, we have reintroduced this point in the Introduction section as a potential mechanism that may contribute to the establishment of species-specific imprinted genes, alongside the roles of ZNF445 and ZFP57, which regulate the maintenance of imprinting with partially divided roles between humans and mice. [page 6]

      (4) It would be very useful for readers to have a schema of the Gpr1/Zdbf2 locus that indicates the locations of the germline and somatic DMRs and their relationship to the Liz transcript. 

      (5) There is a summary figure amongst the Supplementary Figures (Suppl. Fig. 7) - it would be beneficial to readers to have this summary figure in the main text rather than the supplement. 

      Following reviewer #1’s suggestion, we have moved the regulatory system schema at the Gpr1/Zdbf2 locus, originally shown in Supplementary Figure 7, to the main text as Figure 7. In addition, in response to comment 4, we have revised the figure to explicitly depict the relationship between the Liz transcript and the establishment of the somatic DMR (sDMR), enhancing the clarity of the regulatory interactions at this locus. [page 38]

      (6) With a focus of the study on LTRs as cis-regulatory elements having been co-opted in genomic imprinting mechanisms - whether in the female germline (as in Bogutz et al. 2019) or in the current study as an activating element post-fertilisation - it is a real omission that the authors do not to refer to the role of tissue-specific LTRs as the candidate regulatory elements in non-canonical imprinting (see Hanna et al. 2019 PMID: 31665063). Please include in Introduction and/or Discussion. 

      We added a sentence explaining canonical and non-canonical imprinting and the cases where LTRs act as regulatory elements in non-canonical imprinting, with reference to the study of Hanna et al., as suggested. [page 6]

      (7) Discussion statement: "Two paternally expressed imprinted genes, PEG10/SIRH1 and PEG11/RTL1/SIRH2 have been identified in mammals. They encode GAG-POL proteins of sushi-ichi LTR retrotransposons and are essential for mammalian placenta formation and maintenance." 

      These sentences should be combined: "Two paternally expressed imprinted genes, PEG10/SIRH1, and PEG11/RTL1/SIRH2, that encode GAG-POL proteins of sushi-ichi LTR retrotransposons have been identified in mammals and are essential for mammalian placenta formation and maintenance." 

      We have revised this sentence according to reviewer #1's suggestion. [page 41]

      Reviewer #2 (Recommendations For The Authors): 

      When showing assembled GPR1-AS transcripts via genome browser tracks, it would be valuable to add normalized counts of reads mapping to each strand, in order to more convincingly demonstrate the existence of these transcripts. I ask for this because in my experience Stringtie will assemble transcripts that are only marginally supported by reads. 

      In response to Reviewer #2's suggestion, FPKM and TPM values for all StringTiepredicted GPR1-AS-like transcripts are now included in Figure 6. Each of these transcripts has a TPM value greater than 1, supporting their validity. [pages: 35]

      Reviewer #3 (Recommendations For The Authors): 

      (1) The tree in Figure 5A is one of the main arguments supporting the divergence of the mouse Liz promoter from a common MER21C element, but this contains only a handful of species, making it difficult to appreciate the full extent of its evolution. Presumably its faster mutation rate in mouse would also be supported by other closely related rodents, which would solidify the conclusion that the Liz promoter is derived from an ancient MER21C insertion. So my suggestion is to expand this tree substantially to other species, comparing sequences syntenic to the GPR1-AS/Liz promoter. 

      (2) It may also be worth trying different TE/LTR annotation tools and/or running Repeatmasker with different parameters, to see if an MER21C element is detected in mouse using a more sensitive approach. 

      In response to this suggestion, we performed computational analyses with RepeatMasker under various settings (e.g., switching search engines from RMblast to HMMER or Crossmatch, adjusting speed/sensitivity settings from default to slow). Despite these modifications, a MER21C element was not detected near the mouse Liz promoter. However, a multiple genome alignment track generated by Cactus/UCSC revealed a syntenic region, corresponding to the first exon of human GPR1-AS, which overlaps with LTR21C, also present in the genomes of mouse, rat, and hamster (Figure 4 – figure supplement 1). While RepeatMasker did not identify MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Although the Cactus alignment track does not delineate the exact boundaries of homologous regions across species (relative to humans) and thus precludes extracting each homologous sequence to construct an evolutionary tree, these findings support the hypothesis that the first exon of GPR1-AS (referred to as Liz in mice) originated from an ancient MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      (3) According to Dfam, MER21C is not common to all eutherians, but specific to Boroeutheria, whilst MER21B is presumably specific to Euarchontoglires. To clarify MER21C/B evolution, it would be useful to show the number of elements present in select species from each group (including an outgroup). 

      (7) In Figure 4 it is hard to distinguish between red and purple. 

      Initially, we referenced Repbase (e.g., MER21C: Origin/Eutheria), but, as Reviewer #3 noted, Dfam should be the primary reference. We have now included the copy numbers of MER21C and MER21B for each genome in Figure 4, providing a clearer understanding of their evolutionary appearance (MER21C appears specific to Boroeutheria, while MER21B is specific to Euarchontoglires). Additionally, we adjusted the MER21B position color from purple to dark purple to improve visibility. Furthermore, we have also underlined the copy number of MER21C or MER21B located within the GPR1 region in each species. For example, in the Treeshrew genome, the LTR overlapping with GPR1-AS is annotated as MER21B, so we underlined the copy number of MER21B (2,305). These changes now clearly indicate whether homologous sequences to the first exon of GPR1-AS are annotated as MER21C or MER21B in each genome. [page 22]

      (4) Could the imprinting status of ZDBF2 not be determined in chimpanzees and rabbits? Or is it already known? Either way, a clarification would be useful to further support the concordance between GPR1-AS-like transcripts and ZDBF2 imprinting.

      The imprinting status of ZDBF2 had not previously been reported in chimpanzees, rhesus macaques, or rabbits, where GPR1-AS-like transcripts were identified. Therefore, we conducted allele-specific expression analysis of ZDBF2 using blood samples from rhesus macaques and rabbits. As expected, paternal-allele-specific expression of ZDBF2 was observed in both species, consistent with findings in humans and mice. These results have been added to Figure 3. Although we did not analyze the imprinting status in chimpanzees, we believe the existing data sufficiently support our hypothesis. [pages: 16, 19-20]

      (5) The authors briefly discuss the role of KRAB-ZFPs in controlling TE expression. An interesting addition would be to analyse the expression of the main KRAB-ZFP that binds to MER21C (ZFP789, according to data from PMID 28273063). This could be linked to the temporal control of MER21C expression. 

      In response to Reviewer #3's suggestion, we focused on the expression pattern of ZNF789 (noted by the reviewer as ZFP789), the primary KRAB-ZFP known to bind MER21C, as identified by Didier Trono’s group (PMID 28273063). Strikingly, our analysis reveals that ZNF789 is specifically downregulated at the 4-cell stage, which aligns with the timing of MER21C reactivation. While it remains to be determined whether this downregulation directly influences MER21C reactivation or the initiation of GPR1-AS expression, this finding is significant and consistent with our model. We have incorporated this information in Figure 5 – figure supplement 3. [pages: 33]

      (6) The sentence "Liz directs DNA methylation at the somatic DMR, which competes with ZDBF2 to repress the paternal allele" (introduction) was confusing to me. 

      This sentence has been revised to be more accurate as follows; Liz transcription counteracts the H3K27me3-mediated repression of Zdbf2 by promoting the deposition of antagonistic DNA methylation at the secondary DMR. [page 7]

      (8) In Figure 5 I take it that 'consensus motif' refers to ELF1/2? Maybe change the legend. 

      To clarify potential confusion around the term 'consensus motif,' which may have been mistaken for 'consensus MER21C' (the consensus sequence of MER21C-LTR from the Dfam database), we have revised the figure legend. We now refer to the motif as the "common motif," indicating the sequence common to all MER21C-derived sequences and overlapping with the first exon of GPR1-AS. [page 29]

    1. eLife Assessment

      The ExA-SPIM methodology developed here and characterized and supported by convincing evidence is an important development for the field of light sheet microscopy as the new technology provides an impressive field of view making it possible to image the entire expanded mouse brain at cellular and subcellular resolution.

    2. Reviewer #1 (Public Review):

      Summary:

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm ), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains, and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex and human brain slices of white matter.

      This is impressive work, and represents a leap over existing light-sheet microscopes. As an example, it offers a ~ fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences.

      Strengths:

      -ExA-SPIM features an exceptional combination of field of view, working distance, resolution and throughput.

      -An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability.

      -The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform.

      Review of the revised manuscript:

      The authors have carefully addressed my previous concerns and suggestions.

    3. Reviewer #2 (Public Review):

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue.

      Glaser et al. have responded to the reviewer comments by removing some of the overstated claims from the prior manuscript and editing portions of the manuscript text to enhance the clarity. Although the manuscript would be stronger if the authors had been able to provide data that justified the original high-impact claims from the initial publication (e.g. that the images could be used for robust and automated neuronal tracing across large volumes), the amended manuscript text now more closely matches the supporting data. As with the initial submission, I believe that the microscope design and characterization is a useful contribution to the field and the data are quite stunning.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex, and human brain slices of white matter. 

      This is impressive work and represents a leap over existing light-sheet microscopes. As an example, it offers a fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences. 

      Strengths: 

      - ExA-SPIM features an exceptional combination of field of view, working distance, resolution, and throughput. 

      - An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability. 

      - The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform. 

      Weaknesses: 

      - There is a general question about the scaling laws of lenses, and expansion microscopy, which in my opinion remained unanswered: In the context of whole brain imaging, a larger expansion factor requires a microscope system with larger volumetric coverage, which in turn will have lower resolution (Figure 1B). So what is optimal? Could one alternatively image a cleared (non-expanded) brain with a high-resolution ASLM system (Chakraborty, Tonmoy, Nature Methods 2019, potentially upgraded with custom objectives) and get a similar effective resolution as the authors get with expansion? This is not meant to diminish the achievement, but it was unclear if the gains in resolution from the expansion factor are traded off by the scaling laws of current optical systems. 

      Paraphrasing the reviewer: Expanding the tissue requires imaging larger volumes and allows lower optical resolution. What has been gained?

      The answer to the reviewer’s question is nuanced and contains four parts. 

      First, optical engineering requirements are more forgiving for lenses with lower resolution. Lower resolution lenses can have much larger fields of view (in real terms: the number of resolvable elements, proportional to ‘etendue’) and much longer working distances. In other words, it is currently more feasible to engineer lower resolution lenses with larger volumetric coverage, even when accounting for the expansion factor. 

      Second, these lenses are also much better corrected compared to higher resolution (NA) lenses. They have a flat field of view, negligible pincushion distortions, and constant resolution across the field of view. We are not aware of comparable performance for high NA objectives, even when correcting for expansion.

      Third, although clearing and expansion render tissues ‘transparent’, there still exist refractive index inhomogeneities which deteriorate image quality, especially at larger imaging depths. These effects are more severe for higher optical resolutions (NA), because the rays entering the objective at higher angles have longer paths in the tissue and will see more aberrations. For lower NA systems, such as ExaSPIM, the differences in paths between the extreme and axial rays are relatively small and image formation is less sensitive to aberrations. 

      Fourth, aberrations are proportional to the index of refraction inhomogeneities (dn/dx). Since the index of refraction is roughly proportional to density, scattering and aberration of light decreases as M^3, where M is the expansion factor. In contrast, the imaging path length through the tissue only increases as M. This produces a huge win for imaging larger samples with lower resolutions. 

      To our knowledge there are no convincing demonstrations in the literature of diffraction-limited ASLM imaging at a depth of 1 cm in cleared mouse brain tissue, which would be equivalent to the ExA-SPIM imaging results presented in this manuscript.  

      In the discussion of the revised manuscript we discuss these factors in more depth. 

      - It was unclear if 300 nm lateral and 800 nm axial resolution is enough for many questions in neuroscience. Segmenting spines, distinguishing pre- and postsynaptic densities, or tracing densely labeled neurons might be challenging. A discussion about the necessary resolution levels in neuroscience would be appreciated. 

      We have previously shown good results in tracing the thinnest (100 nm thick) axons over cm scales with 1.5 um axial resolution. It is the contrast (SNR) that matters, and the ExaSPIM contrast exceeds the block-face 2-photon contrast, not to mention imaging speed (> 10x).  

      Indeed, for some questions, like distinguishing fluorescence in pre- and postsynaptic structures, higher resolutions will be required (0.2 um isotropic; Rah et al Frontiers Neurosci, 2013). This could be achieved with higher expansion factors.

      This is not within the intended scope of the current manuscript. As mentioned in the discussion section, we are working towards ExA-SPIM-based concepts to achieve better resolution through the design and fabrication of a customized imaging lens that maintains a high volumetric coverage with increased numerical aperture.  

      - Would it be possible to characterize the aberrations that might be still present after whole brain expansion? One approach could be to image small fluorescent nanospheres behind the expanded brain and recover the pupil function via phase retrieval. But even full width half maximum (FWHM) measurements of the nanospheres' images would give some idea of the magnitude of the aberrations. 

      We now included a supplementary figure highlighting images of small axon segments within distal regions of the brain.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue. 

      Strengths: 

      I commend the authors for making all of the documentation, models, and acquisition software openly accessible and believe that this will help assist others who would like to replicate the instrument. I anticipate that the protocols for imaging large expanded tissues (such as an entire mouse brain) will also be useful to the community. 

      Weaknesses: 

      The characterization of the instrument needs to be improved to validate the claims. If the manuscript claims that the instrument allows for robust automated neuronal tracing, then this should be included in the data. 

      The reviewer raises a valid concern. Our assertion that the resolution and contrast is sufficient for robust automated neuronal tracing is overstated based on the data in the paper. We are hard at work on automated tracing of datasets from the ExA-SPIM microscope. We have demonstrated full reconstruction of axonal arbors encompassing >20 cm of axonal length.  But including these methods and results is out of the scope of the current manuscript. 

      The claims of robust automated neuronal tracing have been appropriately modified.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Smaller questions to the authors: 

      - Would a multi-directional illumination and detection architecture help? Was there a particular reason the authors did not go that route?

      Despite the clarity of the expanded tissue, and the lower numerical aperture of the ExA-SPIM microscope, image quality still degrades slightly towards the distal regions of the brain relative to both the excitation and detection objective. Therefore, multi-directional illumination and detection would be advantageous. Since the initial submission of the manuscript, we have undertaken re-designing the optics and mechanics of the system. This includes provisions for multi-directional illumination and detection. However, this new design is beyond the scope of this manuscript. We now mention this in L254-255 of the Discussion section.

      - Why did the authors not use the same objective for illumination and detection, which would allow isotropic resolution in ASLM? 

      The current implementation of ASLM requires an infinity corrected objective (i.e. conjugating the axial sweeping mechanism to the back focal plane). This is not possible due to the finite conjugate design of the ExA-SPIM detection lens.

      More fundamentally, pushing the excitation NA higher would result in a shorter light sheet Rayleigh length, which would require a smaller detection slit (shorter exposure time, lower signal to noise ratio). For our purposes an excitation NA of 0.1 is an excellent compromise between axial resolution, signal to noise ratio, and imaging speed. 

      For other potentially brighter biological structures, it may be possible to design a custom infinity corrected objective that enables ASLM with NA > 0.1.

      - Have the authors made any attempt to characterize distortions of the brain tissue that can occur due to expansion? 

      We have not systematically characterized the distortions of the brain tissue pre and post expansion. Imaged mouse brain volumes are registered to the Allen CCF regardless of whether or not the tissue was expanded. It is beyond the scope of this manuscript to include these results and processing methods, but we have confirmed that the ExA-SPIM mouse brain volumes contain only modest deformation that is easily accounted for during registration to the Allen CCF. 

      - The authors state that a custom lens with NA 0.5-0.6 lens can be designed, featuring similar specifications. Is there a practical design? Wouldn't such a lens be more prone to Field curvature? 

      This custom lens has already been designed and is currently being fabricated. The lens maintains a similar space bandwidth product as the current lens (increased numerical aperture but over a proportionally smaller field of view). Over the designed field of view, field curvature is <1 µm. However, including additional discussion or results of this customized lens is beyond the scope of this manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      System characterization: 

      - Please state what wavelength was used for the resolution measurements in Figure 2.

      An excitation wavelength of 561 nm was used. This has been added to the manuscript text.

      - The manuscript highlights that a key advance for the microscope is the ability to image over a very large 13 mm diameter field of view. Can the authors clarify why they chose to characterize resolution over an 8diameter mm field rather than the full area? 

      The 13 mm diameter field of view refers to the diagonal of the 10.6 x 8.0 mm field of view. The results presented in Figure 1c are with respect to the horizontal x direction and vertical y direction. A note indicating that the 13 mm is with respect to the diagonal of the rectangular imaging field has been added to the manuscript text. The results were presented in this way to present the axial and lateral resolution as a function of y (the axial sweeping direction).

      - The resolution estimates seem lower than I would expect for a 0.30 NA lens (which should be closer to ~850 nm for 515 nm emission). Could the authors clarify the discrepancy? Is this predicted by the Zemax model and due to using the lens in immersion media, related to sampling size on the camera, or something else? It would be helpful if the authors could overlay the expected diffraction-limited performance together with the plots in Figure 2C. 

      As mentioned previously, the resolution measurements were performed with 561 nm excitation and an emission bandpass of ~573 – 616 nm (595 nm average). Based on this we would expect the full width half maximum resolution to be ~975 nm. The resolution is in fact limited by sampling on the camera. The 3.76 µm pixel size, combined with the 5.0X magnification results in a sampling of 752 nm. Based on the Nyquist the resolution is limited to ~1.5 µm. We have added clarifying statements to the text.

      - I'm confused about the characterization of light sheet thickness and how it relates to the measured detection field curvature. The authors state that they "deliver a light sheet with NA = 0.10 which has a width of 12.5 mm (FWHM)." If we estimate that light fills the 0.10 NA, it should have a beam waist (2wo) of ~3 microns (assuming Gaussian beam approximations). Although field curvature is described as "minimal" in the text, it is still ~10-15 microns at the edge of the field for the emission bands for GFP and RFP proteins. Given that this is 5X larger than the light sheet thickness, how do the authors deal with this? 

      The generated light sheet is flat, with a thickness of ~ 3 µm. This flat light sheet will be captured in focus over the depth of focus of the detection objective. The stated field curvature is within 2.5X the depth of focus of the detection lens, which is equivalent to the “Plan” specification of standard microscope objectives.

      - In Figure 2E, it would be helpful if the authors could list the exposure times as well as the total voxels/second for the two-camera comparison. It's also worth noting that the Sony chip used in the VP151MX camera was released last year whereas the Orca Flash V3 chosen for comparison is over a decade old now. I'm confused as to why the authors chose this camera for comparison when they appear to have a more recent Orca BT-Fusion that they show in a picture in the supplement (indicated as Figure S2 in the text, but I believe this is a typo and should be Figure S3). 

      This is a useful addition, and we have added exposure times to the plot. We have also added a note that the Orca Flash V3 is an older generation sCMOS camera and that newer variants exist. Including the Orca BT-Fusion. The BT-Fusion has a read noise of 1.0 e- rms versus 1.6 e- rms, and a peak quantum efficiency of ~95% vs. 85%. Based on the discussion in Supplementary Note S1, we do not expect that these differences in specifications would dramatically change the data presented in the plot. In addition, the typo in Figure S2 has been corrected to Figure S3.

      - In Table S1, the authors note that they only compare their work to prior modalities that are capable of providing <= 1 micron resolution. I'm a bit confused by this choice given that Figure 2 seems to show the resolution of ExA-SPIM as ~1.5 microns at 4 mm off center (1/2 their stated radial field of view). It also excludes a comparison with the mesoSPIM project which at least to me seems to be the most relevant prior to this manuscript. This system is designed for imaging large cleared tissues like the ones shown here. While the original publication in 2019 had a substantially lower lateral resolution, a newer variant, Nikita et al bioRxiv (which is cited in general terms in this manuscript, but not explicitly discussed) also provides 1.5-micron lateral resolution over a comparable field of view. 

      We have updated the table to include the benchtop mesoSPIM from Nikita et al., Nature Communications, 2024. Based on this published version of the manuscript, the lateral resolution is 1.5 µm and axial resolution is 3.3 µm. Assuming the Iris 15 camera sensor, with the stated 2.5 fps, the volumetric rate (megavoxels/sec) is 37.41.

      - The authors state that, "We systematically evaluated dehydration agents, including methanol, ethanol, and tetrahydrofuran (THF), followed by delipidation with commonly used protocols on 1 mm thick brain slices. Slices were expanded and examined for clarity under a macroscope." It would be useful to include some data from this evaluation in the manuscript to make it clear how the authors arrived at their final protocol. 

      Additional details on the expansion protocol may be included in another manuscript.

      General comments: 

      There is a tendency in the manuscript to use negative qualitative terms when describing prior work and positive qualitative terms when describing the work here. Examples include: 

      - "Throughput is limited in part by cumbersome and error-prone microscopy methods". While I agree that performing single neuron reconstructions at a large scale is a difficult challenge, the terms cumbersome and error-prone are qualitative and lacking objective metrics.

      We have revised this statement to be more precise, stating that throughput is limited in part by the speed and image quality of existing microscopy methods.

      - The resolution of the system is described in several places as "near-isotropic" whereas prior methods were described as "highly anisotropic". I agree that the ~1:3 lateral to axial ratio here is more isotropic than the 1:6 ratio of the other cited publications. However, I'm not sure I'd consider 3-fold worse axial resolution than lateral to be considered "near" isotropic.

      We agree that the term near-isotropic is ambiguous. We have modified the text accordingly, removing the term near-isotropic and where appropriate stating that the resolution is more isotropic than that of other cited publications.

      - In the manuscript, the authors describe the photobleaching in their imaging conditions as "negligible". Figure S5 seems to show a loss of 60% fluorescence after 2000 exposures (which in the caption is described as "modest"). I'd suggest removing these qualitative terms and just stating the values.

      We agree and have changed the text accordingly.

      - The results section for Figure 5 is titled "Tracing axons in human neocortex and white matter". Although this section states "larger axons (>1 um) are well separated... allowing for robust automated and manual tracing" there is no data for any tracing in the manuscript. Although I agree that the images are visually impressive, I'm not sure that this claim is backed by data.

      We have now removed the text in this section referring to automated and manual tracing.

    1. eLife Assessment

      This comprehensive study presents important findings that delineate how specific dopaminergic neurons (DANs) instruct aversive learning in Drosophila larvae exposed to high salt through an integration of behavioral experiments, imaging, and connectomic analysis. The work reveals how a numerically minimal circuit achieves remarkable functional complexity, with redundancies and synergies within the DL1 cluster that challenge our understanding of how few neurons generate learning behaviors. By establishing a framework for sensory-driven learning pathways, the study makes a compelling and substantial contribution to understanding associative conditioning while demonstrating conservation of learning mechanisms across Drosophila developmental stages.

    2. Reviewer #1 (Public review):

      In this paper Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines for individual neurons, the authors show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron has only a partial phenotype. The authors use calcium imaging to show that the DAN-g1 is not the only DAN responding to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role for the associative memory. DAN-f1, which does not respond to salt, is able to lead to the formation of a memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, when silenced together with DAN-g1, it enhances the memory deficit of DAN-g1. Overall, this work brings evidence of a complex interaction between DL1 DANs in both the encoding of salt signals and their teaching role in associative learning, with none of them being individually necessary and sufficient for both functions.

      Overall, the manuscript contributes interesting results that are useful to understand the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow to test their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association to it. Proper genetic controls are carried across the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      In this work the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act partially redundant, and that single cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs this represents a very comprehensive study linking the structural, functional and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows to define the cellular substrates and pathways of aversive learning down to the single cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility to unravel different sensory processing pathways within the DL1 cluster and integration with the higher order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and balanced, putting their data in the appropriate context. The authors also implemented neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      Previous comments were fully addressed by the authors.

    4. Reviewer #3 (Public review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. But the authors go beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimen (1 or 3 trials), three different tastants (salt, quinine and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for two of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines for individual neurons, the authors show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron has only a partial phenotype. The authors use calcium imaging to show that the DAN-g1 is not the only DAN responding to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role for the associative memory. DAN-f1, which does not respond to salt, is able to lead to the formation of a memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, when silenced together with DAN-g1, it enhances the memory deficit of DAN-g1. Overall, this work brings evidence of a complex interaction between DL1 DANs in both the encoding of salt signals and their teaching role in associative learning, with none of them being individually necessary and sufficient for both functions.

      Strengths:

      Overall, the manuscript contributes interesting results that are useful to understand the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow to test their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association to it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, but the authors discuss these differences appropriately. In general, the optogenetic approach is more appropriate as developmental compensations are not of major interest for the question investigated.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set is necessary in behavioral assays (with a partial phenotype). No manipulation completely abolishes the salt-odor association, leaving important open questions on the identity of the neural circuits involved in this behavior.

      The EM data analysis reveals a non-trivial organization of sensory inputs into DANs, but it is difficult to extrapolate a link to the functional data presented in the paper.

      We would like to once again thank Reviewer 1 for the positive assessment of our work and for the valuable suggestions provided on the first revision of the manuscript. In this second revision, we have addressed the linguistic issues and most of the minor comments as recommended. We now hope that the current version of our manuscript meets the reviewer’s expectations both in terms of language and content.

      Reviewer #2 (Public review):

      Summary:

      In this work the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act partially redundant, and that single cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs this represents a very comprehensive study linking the structural, functional and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows to define the cellular substrates and pathways of aversive learning down to the single cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility to unravel different sensory processing pathways within the DL1 cluster and integration with the higher order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and balanced, putting their data in the appropriate context. The authors also implemented neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      Previous comments were fully addressed by the authors.

      We sincerely thank Reviewer 2 for the positive evaluation of our work. We are glad that our responses in the first revision addressed the previous concerns and appreciate the reviewer’s constructive feedback once again.

      Reviewer #3 (Public review):

      Summary:

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. But the authors go beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimen (1 or 3 trials), three different tastants (salt, quinine and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for two of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      We would also like to thank Reviewer 3 for the positive assessment of our work. Many of the constructive comments provided were incorporated into the first revision, contributing significantly to the improved clarity and overall quality of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor comments (and some semantics that could be addressed to improve the manuscript)

      Title: is the title correct given that c1 and d1 do not really signal punishment?

      We think the title is correct and would like to keep it as it is.

      L72 striatum misspelled

      We have corrected the error.

      L74 constitute instead of provide?

      We made the suggested modification in the text.

      L129: "But can these four individual DANs also process other sensory modalities?" other then what? What was used before?

      We have made the required change, which now allows us to contrast somatosensory and chemosensory information.

      L172: (Please refer to the discussion regarding the partial reduction of the memory); would be more natural to explain shortly here, or add a sentence before this parenthesis that point to the effect

      We made the requested change in the manuscript and added a short sentence before the parenthesis.

      L182: "DL1 neurons convey a dopaminergic aversive teaching signal" you cannot make this statement from just TH-GAL4!

      We agree - that's why we have completely revised the sentence and now further restricted it and also refer to further larval and adult published data

      L264: "possible redundancy among" I don't think you are testing a redundancy here, it is more likely a developmental compensation.

      We made the requested change in the sentence and added a potential developmental compensation as an interpretation of our results.

      L296: "to determine if the activation of individual DL1 DANs signals aspects of the natural high salt punishment," - how can the optogenetic activation tell something about aspects of the natural salt punishment? I understand the fact that salt is present, but still I find it inaccurate

      Our approach is based on the framework established by Bertram Gerber and colleagues over the past two decades in larval Drosophila research. According to this logic, memory recall is dependent on the specific properties of the test context, particularly the type and concentration of the stimulus presented on the test plate. Aversive memory retrieval occurs only when the test conditions closely match those of the training stimulus. Consequently, the larva's behavior on the test plate serves as an indicator of the memory content being recalled. We therefore adhere to this established methodology (Gerber & Hendel, 2006; Schleyer et al., 2011; Schleyer et al., 2015).

      L307 "DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching" you cannot conclude that given that f1 does not even respond to salt. I understand the logic of the salt during test, but I think it is still a stretched interpretation

      We agree and thus have deleted the sentence.

      L310 "Individual DL1 DANs are acutely necessary" this is too general, it seems that only one is

      We have changed the title and now clearly state that this is only one DAN of the DL1 cluster.

      Reviewer #2 (Recommendations for the authors):

      In Fig.6 the text flow could be optimized as the authors first mention Fig. 6E,F before they follow up with Fig. 6A-D.

      Thanks for bringing this up – we changed it in the revised version of the manuscript. Now 6A-D is mentioned first.

      In Fig.6 the finding that optogenetic inactivation but not ablation of DAN-g1 slightly but significantly reduces aversive salt learning suggests that there is an individual contribution of this DAN in this paradigm. The authors emphasize redundancy of DL1 DANs although the effect size seems comparable between DAN-g1 and DAN-f1,g1 silencing.

      In response to this concern and the one of reviewer 2, we have revised the section title and removed the final sentence of the section before to avoid placing emphasis on the potential redundancy of DL1 DANs within this results section.

      Reviewer #3 (Recommendations for the authors):

      The authors replied to each issue I raised, and revised their manuscript accordingly. In particular, regarding my major concern (the sufficiency of the neurons for salt-"specific" memories), I think the authors found a good solution.

      I have no further comments.

      We sincerely thank the reviewer for the positive feedback on our revision. We are pleased that the revised manuscript meets the expectations and appreciate the time and effort invested in the review process.

    1. eLife Assessment

      This study provides an important understanding of the contribution of different striatal subregions, the anterior Dorsal Lateral Striatum (aDLS) and the posterior Ventrolateral Striatum (pVLS), to auditory discrimination learning. The authors have included robust behavior combined with multiple observational and perturbation techniques. The data provided are convincing of the relevance of task-related activity in these two subregions during learning.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Setogawa et al. employ an auditory discrimination task in freely moving rats, coupled with small animal imaging, electrophysiological recordings, and pharmacological inhibition/lesioning experiments to better understand the role of two striatal subregions: the anterior Dorsal Lateral Striatum (aDLS) and the posterior Ventrolateral Striatum (pVLS), during auditory discrimination learning. Attempting to better understand the contribution of different striatal subregions to sensory discrimination learning strikes me as a highly relevant and timely question, and the data presented in this study are certainly of major interest to the field. The authors have set up a robust behavioral task, systematically tackled the question about a striatal role in learning with multiple observational and manipulative techniques. Additionally, the structured approach the authors take by using neuroimaging to inform their pharmacological manipulation experiments and electrophysiological recordings is a strength.

      Comments on revisions:

      The authors have addressed some concerns raised in the initial review but some remain. In particular it is still unclear what conclusions can be drawn about task-related activity from scans that are performed 30 minutes after the behavioral task. I continue to think that a reorganization/analysis data according to event type would be useful and easier to interpret across the two brain areas, but the authors did not choose to do this. Finally, switching the cue-response association, I am convinced, would help to strengthen this study.

    3. Reviewer #2 (Public review):

      The study by Setogawa et al. aims to understand the role that different striatal subregions belonging to parallel brain circuits have in associative learning and discrimination learning (S-O-R and S-R tasks). Strengths of the study are the use of multiple methodologies to measure and manipulate brain activity in rats, from microPET imaging to excitotoxic lesions and multielectrode recordings across anterior dorsolateral (aDLS), posterior ventral lateral (pVLS)and dorsomedial (DMS) striatum.

      The main conclusions are that the aDLS promotes stimulus-response association and suppresses response-outcome associations. The pVLS is engaged in the formation and maintenance of the stimulus-response association. There is a lot of work done and some interesting findings however, the manuscript can be improved by clarifying the presentation and reasoning. The inclusion of important controls will enhance the rigor of the data interpretation and conclusions.

      Comments on revisions:

      The authors have made important revisions to the manuscript and it has improved in clarity. They also added several figures in the rebuttal letter to answer questions by the reviewers. I would ask that these figures are also made public as part of the authors' response or if not, included in the manuscript.

    1. eLife Assessment

      This study presents a valuable meta-analysis of two independent genome-wide association studies (GWASs) elucidating the role of plasma proteins as biomarkers for improving early detection of prostate cancer (PCa). The evidence supporting novel protein biomarkers of PCa risk is solid, although exploration of how these markers may also be shared with other prostate diseases would have strengthened the study. The work will be of interest to the field for elucidating novel variants of prostate cancer risk.

    2. Reviewer #1 (Public review):

      Summary:

      In Causal associations between plasma proteins and prostate cancer: a Proteome-Wide Mendelian Randomization, the authors present a manuscript which seeks to identify novel markers for prostate cancer through analysis of large biobank-based datasets and to extend this analysis to potential therapeutic targets for drugs. This is an area that is already extensively researched, but remains important, due to the high burden and mortality of prostate cancer globally.

      Strengths:

      The main strengths of the manuscript are the identification and use of large biobank data assets, which provide large numbers of cases and controls, essential for achieving statistical power. The databases used (deCODE, FinnGen, and the UK Biobank) allow for robust numbers of cases and controls. The analytical method chosen, Mendelian Randomization, is appropriate to the problem. Another strength is the integration of multi-omic datasets, here using protein data as well as GWAS sources to integrate genomic and proteomic data.

      Weaknesses:

      The main weaknesses of the manuscript relate to the following areas:

      (1) The failure of the study to analyse the data in the context of other closely related conditions such as benign prostatic hyperplasia (BPH) or lower urinary tract symptoms (LUTS), which have some pathways and biomarkers in common, such as inflammatory pathways (including complement) and specific markers such as KLK3. As a consequence, it is not possible for readers to know whether the findings are specific to prostate cancer or whether they are generic to prostate dysfunction. Given the prevalence of prostate dysfunction (half of men reaching their sixth decade), the potential for false positives and overtreatment from non-specific biomarkers is a major problem, resulting in the evidence presented in this manuscript being weak. Other researchers have addressed this issue using the same data sources as presented here, for example, in this paper, looking at BPH in the UK Biobank population.<br /> https://www.nature.com/articles/s41467-018-06920-9

      (2) There is no discussion of Gleason scores with regard to either biomarkers or therapies, and a general lack of discussion around indolent disease as compared with more aggressive variants. These are crucial issues with regard to the triage and identification of genomically aggressive localized prostate cancers. See, for example, the work set out in: https://doi.org/10.1038/nature20788 .

      (3) An additional issue is that the field of PCa research is fast-moving. The manuscript cites ~80 references, but too few of these are from recent studies, and many important and relevant papers are not included. The manuscript would be much stronger if it compared and contrasted its findings with more recent studies of PCa biomarkers and targets, especially those concerned with multi-omics and those including BPH.

      (4) The Methods section provides no information on how the Controls were selected. There is no Table providing cohort data to allow the reader to know whether there were differences in age, BMI, ethnic grouping, social status or deprivation, or smoking status, between the Cases and Controls. These types of data are generally recorded in Biobank data, so this sort of analysis should be possible, or if not, the authors' inability to construct an appropriately matched set of Controls should be discussed as a Limitation.

      Assessing impact:

      Because of the weaknesses of the approach identified above, without further additions to the manuscript, the likely impact of the work on the field is minimal. There is no significant utility of the methods and data to the community, because the data are pre-existing and are not newly introduced to the community in this work, and Mendelian randomization is a well-described approach in common use, and therefore, the assets and methods described in the manuscript are not novel. With regard to the authors achieving their aims, without assessing specificity and without setting their findings in the context of the latest literature, the authors (and readers) cannot know or assess whether the biomarkers identified or the druggable targets will be useful in the clinic.

      In conclusion, adding additional context and analysis to the manuscript would both help readers interpret and understand the work and would also greatly enhance its significance. For example, the UK Biobank includes data on men with BPH / LUTS, as analysed in this paper, for example, https://doi.org/10.1038/s41467-018-06920-9. By extending this analysis to identify which biomarkers and druggable targets are specific to PCa, and which are generic to prostate dysfunction, the authors would substantially reduce the risks of diagnostic false positives. This would help to manage the risks of inappropriate treatment or overtreatment.

    3. Reviewer #2 (Public review):

      This is potentially interesting work, but the analyses are attempted in a rather scattergun way, with little evident critical thought. The structure of the work (Results before Methods) can work in some manuscripts, but it is not ideal here. The authors discuss results before we know anything about the underlying data that the results come from. It gives the impression that the authors regard data as a resource to be exploited, without really caring where the data comes from. The methods can provide meaningful insights if correctly used, but while I don't have reasons to doubt that the analyses were conducted correctly, findings are presented with little discussion or interpretation. No follow-up analyses are performed.

      In summary, there are likely some gems here, but the whole manuscript is essentially the output from an analytic pipeline.

      Taking the researchers aims in turn:

      (1) Meta-GWAS - while combining two datasets together can provide additional insights, the contribution of this analysis above existing GWAS is not clear. The PRACTICAL consortium has already reported the GWAS of 70% of these data. What additional value does this analysis provide? (Likely some, but it's not clear from the text.) Also, the presentation of results is unclear - authors state that only 5 gene regions contained variants at p<5x10-8, but Figure 1 shows dozens of hits above 5x10-8. Also, the red line in Figure 1 (supposedly at 5x10-8) is misplaced.

      (2) Cross-phenotype analysis. It is not really clear what this analysis is, or why it is done. What is the iCPAGdb? A database? A statistical method? Why would we want to know cross-phenotype associations? What even are these? It seems that the authors have taken data from an online resource and have written a paragraph based on this existing data with little added value.

      (3) PW-MR. I can see the value of this work, but many details are unclear. Was this a two-sample MR using PRACTICAL + FinnGen data for the outcome? How many variants were used in key analyses? Again, the description of results is sparse and gives little added value.

      (4) Colocalization - seems clear to me.

      (5) Additional post-GWAS analyses (pathway + druggability) - again, the analyses seem to be performed appropriately, although little additional insight other than the reporting of output from the methods.

      Minor points:

      (6) The stated motivation for this work is "early detection". But causality isn't necessary for early detection. If the authors are interested in early detection, other analysis approaches are more appropriate.

      (7) The authors state "193 proteins were associated with PCa risk", but they are looking at MR results - these analyses test for disease associations of genetically-predicted levels of proteins, not proteins themselves.

      Strengths:

      The data and methods used are state-of-the-art.

      Weaknesses:

      The reader will have to provide their own translational insight.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In Causal associations between plasma proteins and prostate cancer: a Proteome-Wide Mendelian Randomization, the authors present a manuscript which seeks to identify novel markers for prostate cancer through analysis of large biobank-based datasets and to extend this analysis to potential therapeutic targets for drugs. This is an area that is already extensively researched, but remains important, due to the high burden and mortality of prostate cancer globally.

      Strengths:

      The main strengths of the manuscript are the identification and use of large biobank data assets, which provide large numbers of cases and controls, essential for achieving statistical power. The databases used (deCODE, FinnGen, and the UK Biobank) allow for robust numbers of cases and controls. The analytical method chosen, Mendelian Randomization, is appropriate to the problem. Another strength is the integration of multi-omic datasets, here using protein data as well as GWAS sources to integrate genomic and proteomic data.

      Thank you for your positive feedback regarding the overall quality of our work and we greatly appreciate you taking time and making effort in reviewing our manuscript.

      Weaknesses:

      The main weaknesses of the manuscript relate to the following areas:

      (1) The failure of the study to analyse the data in the context of other closely related conditions such as benign prostatic hyperplasia (BPH) or lower urinary tract symptoms (LUTS), which have some pathways and biomarkers in common, such as inflammatory pathways (including complement) and specific markers such as KLK3. As a consequence, it is not possible for readers to know whether the findings are specific to prostate cancer or whether they are generic to prostate dysfunction. Given the prevalence of prostate dysfunction (half of men reaching their sixth decade), the potential for false positives and overtreatment from non-specific biomarkers is a major problem, resulting in the evidence presented in this manuscript being weak. Other researchers have addressed this issue using the same data sources as presented here, for example, in this paper, looking at BPH in the UK Biobank population. https://www.nature.com/articles/s41467-018-06920-9

      Thank you for your valuable comment. We fully agree that biomarker development must prioritize specificity to avoid overtreatment. While our study is a foundational step toward identifying potential therapeutic targets or complementary biomarkers for prostate cancer (PCa)—not as a direct endorsement of these proteins for standalone clinical diagnosis. Mendelian randomization (MR) analysis strengthens causal inference by design, and we further ensured robustness through sensitivity analyses (e.g. MR-Egger regression for pleiotropy, Bonferroni correction for multiple testing). These methods distinguish true causal effects from nonspecific associations. Importantly, while PSA’s lack of specificity is widely recognized, its role in reducing PCa mortality underscores the value of biomarker-driven screening. Our findings align with the need to integrate multiple markers (e.g. combining a novel protein with PSA) to improve diagnostic precision. Translating these causal insights into clinical tools remains challenging but represents a necessary next step, and we emphasize that this work provides a rigorous starting point for future validation studies.

      (2) There is no discussion of Gleason scores with regard to either biomarkers or therapies, and a general lack of discussion around indolent disease as compared with more aggressive variants. These are crucial issues with regard to the triage and identification of genomically aggressive localized prostate cancers. See, for example, the work set out in: https://doi.org/10.1038/nature20788

      Thank you for pointing this out. We acknowledge that our original analysis did not directly address this critical issue due to a key data limitation: the publicly available GWAS summary statistics for PCa (from openGWAS and FinnGen) do not provide genetic associations stratified by phenotypic severity or molecular subtypes. This limitation precluded MR analysis of proteins specifically linked to aggressive disease. To partially bridge this gap, we integrate evidence from recent studies in the revised Discussion section to explore the relevance of potential biomarkers to aggressive PCa.

      (3) An additional issue is that the field of PCa research is fast-moving. The manuscript cites ~80 references, but too few of these are from recent studies, and many important and relevant papers are not included. The manuscript would be much stronger if it compared and contrasted its findings with more recent studies of PCa biomarkers and targets, especially those concerned with multi-omics and those including BPH.

      Thank you for your professional comments. We have rigorously updated the manuscript to include more recent publications and we systematically compare and contrast our findings with these recent studies in the revised Discussion section.

      (4) The Methods section provides no information on how the Controls were selected. There is no Table providing cohort data to allow the reader to know whether there were differences in age, BMI, ethnic grouping, social status or deprivation, or smoking status, between the Cases and Controls. These types of data are generally recorded in Biobank data, so this sort of analysis should be possible, or if not, the authors' inability to construct an appropriately matched set of Controls should be discussed as a Limitation.

      We thank the reviewer for raising this important methodological concern. We have expanded the Limitations section to state it.

      Reviewer #2 (Public review):

      This is potentially interesting work, but the analyses are attempted in a rather scattergun way, with little evident critical thought. The structure of the work (Results before Methods) can work in some manuscripts, but it is not ideal here. The authors discuss results before we know anything about the underlying data that the results come from. It gives the impression that the authors regard data as a resource to be exploited, without really caring where the data comes from. The methods can provide meaningful insights if correctly used, but while I don't have reasons to doubt that the analyses were conducted correctly, findings are presented with little discussion or interpretation. No follow-up analyses are performed.

      In summary, there are likely some gems here, but the whole manuscript is essentially the output from an analytic pipeline.

      We thank the reviewer for the thoughtful evaluation of our work.

      Taking the researchers aims in turn:

      (1) Meta-GWAS - while combining two datasets together can provide additional insights, the contribution of this analysis above existing GWAS is not clear. The PRACTICAL consortium has already reported the GWAS of 70% of these data. What additional value does this analysis provide? (Likely some, but it's not clear from the text.) Also, the presentation of results is unclear - authors state that only 5 gene regions contained variants at p<5x10-8, but Figure 1 shows dozens of hits above 5x10-8. Also, the red line in Figure 1 (supposedly at 5x10-8) is misplaced.

      Thank you very much for your feedback. Although the PRACTICAL consortium constituted the majority of PCa GWAS data, our meta-analysis integrating FinnGen data enhanced statistical power enabling robust detection of low-frequency variants with minor allele frequencies. Moreover, FinnGen's Finnish ancestry (genetic isolate) helps distinguish population-specific effects. The presentation of results showed the top 5 gene regions contained variants at p < 5×10<sup>-8</sup>. We apologize for not noticing that the red line was not displayed correctly in the original figures included in the manuscript. We have updated it in the revised manuscript.

      (2) Cross-phenotype analysis. It is not really clear what this analysis is, or why it is done. What is the iCPAGdb? A database? A statistical method? Why would we want to know cross-phenotype associations? What even are these? It seems that the authors have taken data from an online resource and have written a paragraph based on this existing data with little added value.

      We thank you for raising this issue. The iCPAGdb (interactive Cross-Phenotype Analysis of GWAS database) is an integrative platform that systematically identifies cross-phenotype associations and evaluates genetic pleiotropy by leveraging LD-proxy associations from the NHGRI-EBI GWAS Catalog. The pathogenesis and progression of prostate cancer constitute a complex pathophysiological continuum characterized by dynamic multisystem interactions, extending beyond singular molecular pathway dysregulation to encompass coordinated disruptions across endocrine regulation, immune microenvironment remodeling, and metabolic reprogramming. Therefore, it is indispensable for discriminating primary pathogenic drivers from secondary compensatory responses, ultimately informing the development of precision therapeutic strategies.

      (3) PW-MR. I can see the value of this work, but many details are unclear. Was this a two-sample MR using PRACTICAL + FinnGen data for the outcome? How many variants were used in key analyses? Again, the description of results is sparse and gives little added value.

      We thank you for raising this issue. Two-sample MR refers to an analytical design where genetic instruments for the exposure (plasma proteins) and genetic associations with the outcome (PCa) are derived from non-overlapping populations. This ensures complete sample independence between exposure and outcome datasets to avoid confounding biases, regardless of whether the outcome data originate from single or multiple cohorts. The meta-analysis of PRACTICAL and FinnGen GWAS generates 27,210 quality-controlled variants (p < 5×10<sup>-8</sup>, MAF ≥ 1%, LD-clumped r<sup>2</sup> < 0.1) used in key analyses.

      (4) Colocalization - seems clear to me.

      (5) Additional post-GWAS analyses (pathway + druggability) - again, the analyses seem to be performed appropriately, although little additional insight other than the reporting of output from the methods.

      The post-MR druggability and pathway analyses serve two primary scientific purposes: (1) therapeutic prioritization - systematically evaluating which MR-identified proteins represent tractable drug targets (either through existing FDA-approved agents or compounds in clinical development) with direct relevance to cancer or PCa management, and (2) mechanistic hypothesis generation - mapping these candidate proteins to coherent biological pathways to guide future functional validation studies investigating their causal roles in prostate carcinogenesis.

      Minor points:

      (6) The stated motivation for this work is "early detection". But causality isn't necessary for early detection. If the authors are interested in early detection, other analysis approaches are more appropriate.

      We appreciate your insightful feedback. While early detection is one motivation for this work, our primary goal extends to identifying causally implicated proteins that may serve as intervention targets for PCa prevention or therapy.  Establishing causality is critical for distinguishing biomarkers that drive disease pathogenesis from those that are secondary to disease progression, as the former holds greater specificity for early detection and prioritization of therapeutic targets. While we acknowledge that validation for early detection may require additional methodologies, MR analysis provides a foundational step by prioritizing candidate proteins with causal links to disease. This approach ensures that downstream efforts focus on biomarkers and targets with the greatest potential to alter disease trajectories, rather than merely correlative markers.

      (7) The authors state "193 proteins were associated with PCa risk", but they are looking at MR results - these analyses test for disease associations of genetically-predicted levels of proteins, not proteins themselves.

      In MR, the exposure of interest is the lifelong effect of genetically predicted protein levels. This approach is designed to infer causality while avoiding confounding and reverse causation, as genetic variants are fixed at conception and unaffected by disease processes. When we state “193 proteins were associated with PCa risk,” we specifically refer to proteins whose genetically predicted levels (based on instrument SNPs from protein QTLs) show causal links to PCa. Importantly, MR does not measure the direct association between observed protein concentrations and disease. Instead, it estimates the lifelong causal effect of protein levels predicted by genetics. This distinction is critical for disentangling cause from consequence. For example, a protein elevated due to tumor progression would not be identified as causal in MR if its genetic predictors are unrelated to PCa risk.

      We acknowledge that clinical translation requires further validation of these proteins in observational studies measuring actual protein levels. However, MR provides a robust first step by prioritizing candidates with causal roles, thereby reducing the risk of investing in biomarkers confounded by disease processes.

    1. eLife Assessment

      This study presents a valuable meta-analysis that highlights low and highly variable breast cancer survival rates across Africa, emphasizing the pressing need for public health in Africa. The evidence supporting the claims of the authors is solid, although a clarification of the crude 5-year survival rates would have strengthened the study. The work will be of interest to scientists working in the field of public health and breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      This meta-analysis synthesized data from 79 studies across 22 African countries, encompassing over 27,000 breast cancer patients, to estimate 5-year survival rates. The pooled survival rate was 48%, with substantial regional variation, ranging from 64% in Northern Africa to 32% in Western Africa. Survival outcomes were associated with socioeconomic indicators such as education level, Human Development Index (HDI), and Socio-demographic Index (SDI). Although no significant differences in survival were observed between sexes, non-Black Africans had better outcomes. Despite global advances in cancer care, breast cancer survival in Africa has largely stagnated since the early 2010s, underscoring the need for improved healthcare infrastructure, early detection, and equitable access to treatment.

      Strengths:

      The study has several strengths. It features a comprehensive literature search, adherence to the PRISMA reporting guideline, and prospective registration on PROSPERO, including documentation of protocol deviations. The authors employed rigorous meta-analytic techniques, including subgroup analyses and meta-regression, allowing for a nuanced investigation of potential effect modifiers.

      Weaknesses:

      Analyses of crude 5-year survival rates are inherently difficult to interpret, particularly in the absence of key clinical variables such as stage at diagnosis or whether cancers were detected through screening programs. This omission raises concerns about lead time bias, where earlier diagnosis (e.g., via screening) may falsely appear to improve survival without affecting actual mortality. The higher survival seen in North Africa, for example, may reflect earlier diagnosis rather than improved prognosis or care quality. In this context, the age of the study population is another important aspect.

      Relatedly, the representativeness of the included study populations is unclear. The data sources for individual studies - whether from national cancer registries or single tertiary hospitals -are not systematically reported. This distinction is crucial, as survival outcomes differ significantly between population-based and hospital-based cohorts. Without this contextual information, the generalizability of the findings is limited.

      The meta-regression analyses further raise concerns. The authors use study-level covariates (e.g., national HDI, average years of schooling) to explain variation in survival, yet they do not acknowledge the risk of ecological bias. Inferring individual-level effects from aggregated data is methodologically flawed, and the authors' causal interpretation of these associations is inappropriate, especially given the potential for confounding by unmeasured variables at both the individual and study levels.

      The assessment of publication bias is similarly problematic. While funnel plot asymmetry and a significant Egger's test are interpreted as evidence of bias, such methods are unreliable in meta-analyses of observational studies. Smaller studies may differ meaningfully from larger ones, not due to selective reporting, but because they may recruit patients from specialized tertiary centers where outcomes are poorer. The observed relationship between study size and survival may therefore reflect true differences in patient populations, not publication bias.

      Despite claiming to search for gray literature via Google Scholar, no such studies appear in the PRISMA flowchart. This is a missed opportunity. Gray literature - especially reports from cancer registries - could have enhanced the quality and completeness of the data. While cancer registration systems are not available in all African countries, several do exist, and the authors should have made greater efforts to incorporate routine surveillance data where available. Mortality data from vital statistics systems, available in some countries, could also have provided useful context or validation.

      The study's approach to quality assessment is limited. The scoring tool, adapted from Ssentongo et al., conflates completeness of reporting with risk of bias and fails to address key domains such as study population representativeness, selection bias, and lead time bias. Rather than calculating an overall quality score, the authors should have used a structured tool that evaluates risk of bias across defined domains-such as ROBINS-I, ROBINS-E, or tools developed for prevalence studies (e.g., Tonia et al., BMJ Mental Health, 2023). Cochrane guidance and the textbook by Egger, Higgins, and Davey Smith (DOI:10.1002/9781119099369) provide valuable resources for this purpose.

      The cumulative meta-analysis is not particularly informative, considering the massive heterogeneity in survival rates. It would be more meaningful to stratify the analysis by calendar period. In general, with such important heterogeneity, the calculation of an overall estimate does not add much.

      The authors spend quite some time discussing differences in survival between men and women and between the current and the 2018 estimates, despite the fact that the survival rates are similar, with widely overlapping confidence intervals. The use of a Z-test in this context is inappropriate as it ignores the heterogeneity between studies.

      Minor point:

      The terms retrospective and prospective are not particularly helpful - every longitudinal study of survival is retrospective. What the authors mean is whether or not the data were collected within a study designed to address this question, or whether existing data were used that were collected for another purpose. See also DOI: 10.1136/bmj.302.6771.249.

    3. Reviewer #2 (Public review):

      Summary:

      The study provides an updated literature review and meta-analysis for the 5-year survival estimates in breast cancer patients across continental Africa. The findings reveal substantial disparities between regions and other factors, highlighting the disadvantaged areas in Africa and the urgent need to address these inequities across the continent.

      Strengths:

      The main strengths of this study include:<br /> (1) the thorough literature search with an increasing number of included studies that enhances result reliability;<br /> (2) standard and appropriate statistical methods with clear reporting;<br /> (3) a comprehensive discussion.

      Overall, the paper is well-structured, clearly presented, and provides useful insights.

      Weaknesses:

      However, I have a few concerns that I would like the authors to address.

      (1) The conclusion "A country-wise comparison with 2018 estimates suggests a declining survival tendency, with WHO AFRO countries reporting the poorest estimates among other WHO regions." appears to have been drawn from the comparisons across both different regions and different time periods, which is incorrect! As shown in Figure 8, survival in Africa has increased from below 30% (WHO AFRO 2017) to around 50% (AFRICA 2024, presumably the current study). Section 3.5 is confusing and headed in the wrong direction. The key message in Figure 8 is that the current survival estimate in Africa is still lower than that of other WHO regions from a few years ago, highlighting the urgent need to improve survival in Africa.

      (2) The previous review by Ssentongo et al. classified countries into North Africa and sub-Saharan Africa (SSA), regions divided by the Sahara Desert. This classification is not only geographical-based, but also accounts for the significant differences in ethnicity, health system, and socioeconomic factors. North Africa (especially Egypt, Tunisia, Morocco) has better cancer registries, earlier detection, more treatment access, and therefore better survival outcomes (as shown in Figure 2). SSA tends to have worse outcomes, due to later-stage diagnosis, limited pathology, and access barriers. Given that the survival in women with breast cancer is among the lowest for several SSA countries, the study would benefit from an additional comparison between pooled estimates of North African and SSA, and comparisons with previous pooled estimates.

      (3) The authors classified studies under the female group. Females constituted at least 80% of the sample population, and subgroup analysis revealed only a marginal discrepancy in survival rates between the two sexes. However, most of the breast cancer patients and related studies consist predominantly of females. Given the non-negligible differences in various aspects between females and males, sensitivity analyses restricted to studies among females (as in Figure 2-3) would be informative for future research in breast cancer patients.

      (4) Stage at diagnosis and treatment are the strongest prognostic factors for breast cancer survival. Though data regarding these variables are not available for all studies, and it's complicated to compare or pool the results from different studies (as mentioned in the limitation), could the authors perform the regression analyses regarding early vs. late stages, and the percentage of treatment received? These two factors are too significant to overlook in studies on breast cancer survival.

      (5) The authors reported that studies published before 2019 had a higher survival than those conducted from 2019 onwards, which could be misleading and requires further explanation. As the authors noted ─"the year of publication may not be a reliable measure of the effect in question"─ a better approach would be to use the year of inclusion, i.e., the year the studies were conducted.

      (6) Northern and Western Africa both have the highest incidence of breast cancer in Africa, yet their 5-year survival estimates differ substantially. However, the authors have discussed the survival disparities without considering their similarly higher incidence rates. Could this disparity reflect different contributing factors, with the higher incidence rate in Northern Africa resulting from better screening programs (leading to more detections), while that in Western Africa stems from other epidemiological factors despite lower screening participation? Though the incidence rate is not the primary focus of this study, briefly exploring this dichotomy would enhance the discussion and provide valuable insights for readers.

    4. Author response:

      We thank the reviewing editors, senior editors, and reviewers for their time, efforts, and constructive feedback. We believe the points raised are addressable and we would like to proceed with a revised submission for further review. Specifically, we plan the following revisions:

      Editor’s Comments

      We will clarify study definitions to ensure the meaning of "5-year crude overall survival time" is explicit for readers.

      Reviewer 1 Comments

      - Clarify and supplement the work with detailed sources of study origin (cancer registries or single-center cohorts).

      - Conduct a multi-level hierarchical meta-analysis to address concerns of ecological fallacy in interpreting results.

      - Perform an ecological sensitivity analysis and clarify findings regarding small study effects.

      - Expand the search base significantly by including African local databases; preliminary searches have identified over 50 potentially eligible doctoral theses, dissertations, local journal articles, and gray literature, potentially adding data from five or more additional countries.

      Reviewer 2 Comments

      - Conduct subgroup analyses by sex and assess the influence of the percentage of males in mixed cohorts.

      - Enhance the limited meta-analysis and provide supplementary full forest plots for all analyses.

      - Clarify phrasing in sections identified by the reviewer.

      Additional Planned Clarifications and Analyses

      - Elucidate the role of cumulative meta-analysis in mitigating lead-time bias.

      - Include supplementary cumulative meta-analysis based on the year of investigation (instead of publication year).

      - Perform subgroup analyses by clinical staging, TNM grading, and treatment modalities where data from ≥10 studies is available.

      - Expand discussion on the merits of quality assessment versus risk of bias evaluation in large scale epidemiological and observational studies, in line with other studies of this scale.

      - Condense the comparison with 2018 estimates, as per reviewer suggestions.

      Clarification Regarding SSA vs. AU Classification

      We do not intend to compare survival between "Sub-Saharan Africa" (SSA) and North Africa, as this binary classification is historically rooted and does not reflect current African Union (AU) administrative or policy groupings. Our regional analyses will adhere to the AU’s contemporary regional framework to better reflect political, cultural, and healthcare system realities.

      On Registry Data

      We will clarify that we will not extract raw registry data, as such data is typically unprocessed and does not provide 5-year overall survival metrics. As such extracting raw, individual-level data from registries or vital statistics systems falls outside the methodological scope of a meta-analysis. Meta-analyses are designed to synthesize published survival estimates or those available from reports where survival analyses have already been conducted. Utilizing raw surveillance data would require primary data processing and survival analysis — effectively creating new data, not synthesizing existing results. This would represent a distinct study design, such as a pooled analysis or original cohort study, rather than a meta-analysis. Where registry reports present summary survival estimates (e.g., 5-year overall survival) in a format compatible with meta-analysis, we will certainly include them.

      All planned additional analyses will depend on data quality, consistency, and feasibility for pooling using state-of-the-art statistical techniques. Where pooling is not possible, we will transparently report limitations.

    1. eLife Assessment

      This valuable study provides a conceptual advance in our understanding of how membrane geometry modulates the balance between specific and non-specific molecular interactions, reversing multiphase morphologies in postsynaptic protein assemblies. Using a mesoscale simulation framework grounded in experimental binding affinities, the authors successfully recapitulate key experimental observations in both solution and membrane-associated systems, providing novel mechanistic insight into how spatial constraints regulate postsynaptic condensate organization. While the evidence supporting the conclusions is largely solid, a few aspects of the analysis and model proposed remain incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses mesoscale simulations to investigate how membrane geometry regulates the multiphase organization of postsynaptic condensates. It reveals that dimensionality shifts the balance between specific and non-specific interactions, thereby reversing domain morphology observed in vitro versus in vivo.

      Strengths:

      The model is grounded in experimental binding affinities, reproduces key experimental observations in 3D and 2D contexts, and offers mechanistic insight into how geometry and molecular features drive phase behavior.

      Weaknesses:

      The model omits other synaptic components that may influence domain organization and does not extensively explore parameter sensitivity or broader physiological variability.

    3. Reviewer #2 (Public review):

      This is a timely and insightful study aiming to explore the general physical principles for the sub-compartmentalization--or lack thereof--in the phase separation processes underlying the assembly of postsynaptic densities (PSDs), especially the markedly different organizations in three-dimensional (3D) droplets on one hand and the two-dimensional (2D) condensates associated with a cellular membrane on the other. Simulation of a highly simplified model (one bead per protein domain) is carefully executed. Based on a thorough consideration of various control cases, the main conclusion regarding the trade-off between repulsive excluded volume interactions and attractive interactions among protein domains in determining the structures of 3D vs 2D model PSD condensates is quite convincing. The results in this manuscript are novel; however, as it stands, there is substantial room for improvement in the presentation of the background and the findings of this work. In particular, (i) conceptual connections with prior works should be better discussed, (ii) essential details of the model should be clarified, and (iii) the generality and limitations of the authors' approach should be better delineated. Specifically, the following items should be addressed (with the additional references mentioned below cited and discussed):

      (1) Excluded volume effects are referred to throughout the text by various terms and descriptions such as "repulsive force according to the volume" (e.g., in the Introduction), "nonspecific volume interaction", and "volume effects" in this manuscript. This is somewhat curious and not conducive to clarity, because these terms have alternate or connotations of alternate meanings (e.g., in biomolecular modeling, repulsive interactions usually refer to those with longer spatial ranges, such as that between like charges). It will be much clearer if the authors simply refer to excluded volume interactions as excluded volume interactions (or effects).

      (2) Inasmuch as the impact of excluded volume effects on subcompartmentalization of condensates ("multiple phases" in the authors' terminology), it has been demonstrated by both coarse-grained molecular dynamics and field-theoretic simulations that excluded volume is conducive to demixing of molecular species in condensates [Pal et al., Phys Rev E 103:042406 (2021); see especially Figures 4-5 of this reference]. This prior work bears directly on the authors' observation. Its relationship with the present work should be discussed.

      (3) In the present model setup, activation of the CaMKII kinase affects only its binding to GluN2Bc. This approach is reasonable and leads to model predictions that are essentially consistent with the experiment. More broadly, however, do the authors expect activation of the CaMKII kinase to lead to phosphorylation of some of the molecular species involved with PSDs? This may be of interest since biomolecular condensates are known to be modulated by phosphorylation [Kim et al., Science 365:825-829 (2019); Lin et al, eLife 13:RP100284 (2025)].

      (4) The forcefield for confinement of AMPAR/TARP and NMDAR/GluN2Bc to 2D should be specified in the main text. Have the authors explored the sensitivity of their 2D findings on the strength of this confinement?

      (5) Some of the labels in Figure 1 are confusing. In Figure 1A, the structure labeled as AMPAR has the same shape as the structure labeled as TARP in Figure 1B, but TARP is labeled as one of the smaller structures (like small legs) in the lower part of AMPAR in Figure 1A. Does the TARP in Figure 1B correspond to the small structures in the lower part of AMPAR? If so, this should be specified (and better indicated graphically), and in that case, it would be better not to use the same structural drawing for the overall structure and a substructure. The same issue is seen for NMDAR in Figure 1A and GluN2Bc in Figure 1B.

      (6) In addition to clarifying Figure 1, the authors should clarify the usage of AMPAR vs TARP and NMDAR vs GluN2Bc in other parts of the text as well.

      (7) The physics of the authors' model will be much clearer if they provide an easily accessible graphical description of the relative interaction strengths between different domain-representing spheres (beads) in their model. For this purpose, a representation similar to that given by Feric et al., Cell 165:1686-1697 (2016) (especially Figure 6B in this reference) of the pairwise interactions among the beads in the authors' model should be provided as an additional main-text figure. Different interaction schemes corresponding to inactive and activated CAMKII should be given. In this way, the general principles (beyond the PSD system) governing 3D vs 2D multiple-component condensate organization can be made much more apparent.

      (8) Can the authors' rationalization of the observed difference between 3D and 2D model PSD condensates be captured by an intuitive appreciation of the restriction on favorable interactions by steric hindrance and the reduction in interaction cooperativity in 2D vs 3D?

      (9) In the authors' model, the propensity to form 2D condensates is quite weak. Is this prediction consistent with the experiment? Real PSDs do form 2D condensates around synapses.

      (10) More theoretical context should be provided in the Introduction and/or Discussion by drawing connections to pertinent prior works on physical determinants of co-mixing and de-mixing in multiple-component condensates (e.g., amino acid sequence), such as Lin et al., New J Phys 19:115003 (2017) and Lin et al., Biochemistry 57:2499-2508 (2018).

      (11) In the discussion of the physiological/neurological significance of PSD in the Introduction and/or Discussion, for general interest it is useful to point to a recently studied possible connection between the hydrostatic pressure-induced dissolution of model PSD and high-pressure neurological syndrome [Lin et al., Chem Eur J 26:11024-11031 (2020)].

      (12) It is more accurate to use "perpendicular to the membrane" rather than "vertical" in the caption for Figure 3E and other such descriptions of the orientation of the CaMKII hexagonal plane in the text.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Yamada, Brandani, and Takada have developed a mesoscopic model of the interacting proteins in the postsynaptic density. They have performed simulations, based on this model and using the software ReaDDy, to study the phase separation in this system in 2D (on the membrane) and 3D (in the bulk). They have carefully investigated the reasons behind different morphologies observed in each case, and have looked at differences in valency, specific/non-specific interactions, and interfacial tension.

      Strengths:

      The simulation model is developed very carefully, with strong reliance on binding valency and geometry, experimentally measured affinities, and physical considerations like the hydrodynamic radii. The presented analyses are also thorough, and great effort has been put into investigating different scenarios that might explain the observed effects.

      Weaknesses:

      The biggest weakness of the study, in my opinion, has to do with a lack of more in-depth physical insight about phase separation. For example, the authors express surprise about similar interactions between components resulting in different phase separation in 2D and 3D. This is not surprising at all, as in 3D, higher coordination numbers and more available volume translate to lower free energy, which easily explains phase separation. The role of entropy is also significantly missing from the analyses. When interaction strengths are small, entropic effects play major roles.

      In the introduction, the authors present an oversimplified view of associative and segregative phase transitions based on the attractive and repulsive interactions, and I'm afraid that this view, in which all the observed morphologies should have clear pairwise enthalpic explanations, diffuses throughout the analysis. Meanwhile, I believe the authors correctly identify some relevant effects, where they consider specific/non-specific interactions, or when they investigate the reduced valency of CaMKII in the 2D system.

      Also, I sense some haste in comparing the findings with experimental observations. For example, the authors mention that "For the current four component PSD system, the product of concentrations of each molecule in the dilute phase is in good agreement with that of the experimental concentrations (Table S2)." But the data used here is the dilute phase, which is the remnant of a system prepared at very high concentrations and allowed to phase separate. The errors reported in Table S2 already cast doubt on this comparison. Or while the 2D system is prepared via confining the particles to the vicinity of the membrane, the different diffusive behavior in the membrane, in contrast to the bulk (i.e., the Saffman-Delbrück model), is not considered. This would thus make it difficult to interpret the results of a coupled 2D/3D system and compare them to the actual system.

    5. Author response:

      We thank all the reviewers for their thoughtful comments on our submitted manuscript.

      The main points made by all three reviewers were: to discuss the components of the omitted synapses and explore parameter sensitivity and broader physiological variability; to provide deeper physical insights into phase separation; to clarify terminology and provide better presentation and context in relation to previous studies.

      We fully agree with the first point, suggesting that parameter sensitivity and broader physiological variability should be explored. Our model omits scaffold proteins such as GKAP, Shank and Homer, which are present at the bottom of the PSD hierarchy. In addition, there are many other interactions in PSDs whose affinity is altered by phosphorylation, and the phase separation state of the condensate is likely to be affected by ionic concentration and other environmental factors. We will include a more detailed discussion of these environmental factors and a limitation of our study in the Discussion section. Furthermore, regarding to the sensitivity of the parameters, the reviewer's point that the membrane potential parameter is an important value is right since it directly regulates the difference between 3D and 2D systems. We plan to verify this by changing the strength of the membrane potential, and by running simulations again to see how much it affects the morphology of condensates.

      The second point is that we should provide deeper physical insight into phase separation in different dimensions. It would not be straightforward to directly estimate the entropy of the system due to the nature of the model. However, as pointed out, the difference of phase behavior can be elucidated through various simplified theories such as the lattice model. In this context, the reduced coordination number in 2D systems compared to 3D systems, and the decreased pseudo-attractive force due to the depletion effect, can offer rationalizations. We would like to add some theoretical discussion of these aspects with equations.

      Third, we will clarify terminology and provide better explanation in relation to previous studies. In some parts in manuscripts, such as complexes containing receptors, there were some disunity in terminology and lack of annotations in figures. We will improve the wording and visualization in the text for further clarity and add relevant references, as suggested by the reviewers.

      Also, as additionally suggested, scripts for the simulation and analysis together with the initial structure obtained will be deposited to Zenodo or GitHub.

    1. eLife Assessment

      This work presents an important technical advancement with the release of MorphoNet 2.0, a user-friendly, standalone platform for 3D+T segmentation and analysis in biological imaging. The authors provide convincing evidence of the tool's capabilities through illustrative use cases, though broader validation against current state-of-the-art tools would strengthen its position. The software's accessibility and versatility make it a resource that will be of value for the bioimaging community, particularly in specialized subfields.

    2. Reviewer #1 (Public review):

      The authors present a substantial improvement to their existing tool, MorphoNet, intended to facilitate assessment of 3D+t cell segmentation and tracking results, and curation of high-quality analysis for scientific discovery and data sharing. These tools are provided through a user-friendly GUI, making them accessible to biologists who are not experienced coders. Further, the authors have re-developed this tool to be a locally installed piece of software instead of a web interface, making the analysis and rendering of large 3D+t datasets more computationally efficient. The authors evidence the value of this tool with a series of use cases, in which they apply different features of the software to existing datasets and show the improvement to the segmentation and tracking achieved.

      While the computational tools packaged in this software are familiar to readers (e.g., cellpose), the novel contribution of this work is the focus on error correction. The MorphoNet 2.0 software helps users identify where their candidate segmentation and/or tracking may be incorrect. The authors then provide existing tools in a single user-friendly package, lowering the threshold of skill required for users to get maximal value from these existing tools. To help users apply these tools effectively, the authors introduce a number of unsupervised quality metrics that can be applied to a segmentation candidate to identify masks and regions where the segmentation results are noticeably different from the majority of the image.

      This work is valuable to researchers who are working with cell microscopy data that requires high-quality segmentation and tracking, particularly if their data are 3D time-lapse and thus challenging to segment and assess. The MorphoNet 2.0 tool that the authors present is intended to make the iterative process of segmentation, quality assessment, and re-processing easier and more streamlined, combining commonly used tools into a single user interface.

      One of the key contributions of the work is the unsupervised metrics that MorphoNet 2.0 offers for segmentation quality assessment. These metrics are used in the use cases to identify low-quality instances of segmentation in the provided datasets, so that they can be improved with plugins directly in MorphoNet 2.0. However, not enough consideration is given to demonstrating that optimizing these metrics leads to an improvement in segmentation quality. For example, in Use Case 1, the authors report their metrics of interest (Intensity offset, Intensity border variation, and Nuclei volume) for the uncurated silver truth, the partially curated and fully curated datasets, but this does not evidence an improvement in the results. Additional plotting of the distribution of these metrics on the Gold Truth data could help confirm that the distribution of these metrics now better matches the expected distribution.

      Similarly, in Use Case 2, visual inspection leads us to believe that the segmentation generated by the Cellpose + Deli pipeline (shown in Figure 4d) is an improvement, but a direct comparison of agreement between segmented masks and masks in the published data (where the segmentations overlap) would further evidence this.

      We would appreciate the authors addressing the risk of decreasing the quality of the segmentations by applying circular logic with their tool; MorphoNet 2.0 uses unsupervised metrics to identify masks that do not fit the typical distribution. A model such as StarDist can be trained on the "good" masks to generate more masks that match the most common type. This leads to a more homogeneous segmentation quality, without consideration for whether these metrics actually optimize the segmentation

      In Use case 5, the authors include details that the errors were corrected by "264 MorphoNet plugin actions ... in 8 hours actions [sic]". The work would benefit from explaining whether this is 8 hours of human work, trying plugins and iteratively improving, or 8 hours of compute time to apply the selected plugins.

    3. Reviewer #2 (Public review):

      Summary:

      This article presents Morphonet 2.0, a software designed to visualise and curate segmentations of 3D and 3D+t data. The authors demonstrate their capabilities on five published datasets, showcasing how even small segmentation errors can be automatically detected, easily assessed, and corrected by the user. This allows for more reliable ground truths, which will in turn be very much valuable for analysis and training deep learning models. Morphonet 2.0 offers intuitive 3D inspection and functionalities accessible to a non-coding audience, thereby broadening its impact.

      Strengths:

      The work proposed in this article is expected to be of great interest to the community by enabling easy visualisation and correction of complex 3D(+t) datasets. Moreover, the article is clear and well written, making MorphoNet more likely to be used. The goals are clearly defined, addressing an undeniable need in the bioimage analysis community. The authors use a diverse range of datasets, successfully demonstrating the versatility of the software.

      We would also like to highlight the great effort that was made to clearly explain which type of computer configurations are necessary to run the different datasets and how to find the appropriate documentation according to your needs. The authors clearly carefully thought about these two important problems and came up with very satisfactory solutions.

      Weaknesses:

      There is still one concern: the quantification of the improvement of the segmentations in the use cases and, therefore, the quantification of the potential impact of the software. While it appears hard to quantify the quality of the correction, the proposed work would be significantly improved if such metrics could be provided.

      The authors show some distributions of metrics before and after segmentations to highlight the changes. This is a great start, but there seem to be two shortcomings: first, the comparison and interpretation of the different distributions does not appear to be trivial. It is therefore difficult to judge the quality of the improvement from these. Maybe an explanation in the text of how to interpret the differences between the distributions could help. A second shortcoming is that the before/after metrics displayed are the metrics used to guide the correction, so, by design, the scores will improve, but does that accurately represent the improvement of the segmentation? It seems to be the case, but it would be nice to maybe have a better assessment of the improvement of the quality.

    4. Reviewer #3 (Public review):

      Summary:

      A very thorough technical report of a new standalone, open-source software for microscopy image processing and analysis (MorphoNet 2.0), with a particular emphasis on automated segmentation and its curation to obtain accurate results even with very complex 3D stacks, including timelapse experiments.

      Strengths:

      The authors did a good job of explaining the advantages of MorphoNet 2.0, as compared to its previous web-based version and to other software with similar capabilities. What I particularly found more useful to actually envisage these claimed advantages is the five examples used to illustrate the power of the software (based on a combination of Python scripting and the 3D game engine Unity). These examples, from published research, are very varied in both types of information and image quality, and all have their complexities, making them inherently difficult to segment. I strongly recommend the readers to carefully watch the accompanying videos, which show (although not thoroughly) how the software is actually used in these examples.

      Weaknesses:

      Being a technical article, the only possible comments are on how methods are presented, which is generally adequate, as mentioned above. In this regard, and in spite of the presented examples (chosen by the authors, who clearly gave them a deep thought before showing them), the only way in which the presented software will prove valuable is through its use by as many researchers as possible. This is not a weakness per se, of course, but just what is usual in this sort of report. Hence, I encourage readers to download the software and give it time to test it on their own data (which I will also do myself).

      In conclusion, I believe that this report is fundamental because it will be the major way of initially promoting the use of MorphoNet 2.0 by the objective public. The software itself holds the promise of being very impactful for the microscopists' community.

    5. Author response:

      eLife Assessment

      This work presents an important technical advancement with the release of MorphoNet 2.0, a user-friendly, standalone platform for 3D+T segmentation and analysis in biological imaging. The authors provide convincing evidence of the tool's capabilities through illustrative use cases, though broader validation against current state-of-the-art tools would strengthen its position. The software's accessibility and versatility make it a resource that will be of value for the bioimaging community, particularly in specialized subfields.

      We would like to thank the editors and reviewers for their careful and constructive evaluation of our manuscript “MorphoNet 2.0: An innovative approach for qualitative assessment and segmentation curation of large-scale 3D time-lapse imaging datasets”. We are grateful for the positive assessment of MorphoNet 2.0 as a valuable and accessible tool for the bioimaging community, and for the recognition of its technical advancements, particularly in the context of complex 3D+t segmentation tasks.

      The reviewers have highlighted several important points that we will address in the revised manuscript. These include:

      - The need for a clearer demonstration that improvements in unsupervised quality metrics correspond to actual improvements in segmentation quality. In response, we will provide comparisons with gold standard annotations where available and clarify how to interpret metric distributions.<br /> - The potential risk of circular logic when using unsupervised metrics to guide model training. We now explicitly discuss this limitation and emphasize the importance of external validation and expert input.<br /> - The value of comparing MorphoNet 2.0 to other tools such as FIJI and napari. We will include a comparative table to help readers understand MorphoNet’s positioning and complementarity.<br /> - The importance of clearer documentation and terminology. We will overhaul the help pages, standardize plugin naming, and add a glossary-style table to the manuscript.<br /> - Suggestions for future developments, such as mesh export and interoperability with napari, which we will explore for the revision.

      We appreciate the detailed feedback on both scientific and editorial aspects, including corrections to figures and text, and we will integrate all suggested revisions to improve the manuscript’s clarity and impact. We are confident that these changes will strengthen the manuscript and enhance the utility of MorphoNet 2.0 for the community.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present a substantial improvement to their existing tool, MorphoNet, intended to facilitate assessment of 3D+t cell segmentation and tracking results, and curation of high-quality analysis for scientific discovery and data sharing. These tools are provided through a user-friendly GUI, making them accessible to biologists who are not experienced coders. Further, the authors have re-developed this tool to be a locally installed piece of software instead of a web interface, making the analysis and rendering of large 3D+t datasets more computationally efficient. The authors evidence the value of this tool with a series of use cases, in which they apply different features of the software to existing datasets and show the improvement to the segmentation and tracking achieved.

      While the computational tools packaged in this software are familiar to readers (e.g., cellpose), the novel contribution of this work is the focus on error correction. The MorphoNet 2.0 software helps users identify where their candidate segmentation and/or tracking may be incorrect. The authors then provide existing tools in a single user-friendly package, lowering the threshold of skill required for users to get maximal value from these existing tools. To help users apply these tools effectively, the authors introduce a number of unsupervised quality metrics that can be applied to a segmentation candidate to identify masks and regions where the segmentation results are noticeably different from the majority of the image.

      This work is valuable to researchers who are working with cell microscopy data that requires high-quality segmentation and tracking, particularly if their data are 3D time-lapse and thus challenging to segment and assess. The MorphoNet 2.0 tool that the authors present is intended to make the iterative process of segmentation, quality assessment, and re-processing easier and more streamlined, combining commonly used tools into a single user interface.

      We sincerely thank the reviewer for their thorough and encouraging evaluation of our work. We are grateful that they highlighted both the technical improvements of MorphoNet 2.0 and its potential impact for the broader community working with complex 3D+t microscopy datasets. We particularly appreciate the recognition of our efforts to make advanced segmentation and tracking tools accessible to non-expert users through a user-friendly and locally installable interface, and for pointing out the importance of error detection and correction in the iterative analysis workflow. The reviewer’s appreciation of the value of integrating unsupervised quality metrics to support this process is especially meaningful to us, as this was a central motivation behind the development of MorphoNet 2.0. We hope the tool will indeed facilitate more rigorous and reproducible analyses, and we are encouraged by the reviewer’s positive assessment of its utility for the community.

      One of the key contributions of the work is the unsupervised metrics that MorphoNet 2.0 offers for segmentation quality assessment. These metrics are used in the use cases to identify low-quality instances of segmentation in the provided datasets, so that they can be improved with plugins directly in MorphoNet 2.0. However, not enough consideration is given to demonstrating that optimizing these metrics leads to an improvement in segmentation quality. For example, in Use Case 1, the authors report their metrics of interest (Intensity offset, Intensity border variation, and Nuclei volume) for the uncurated silver truth, the partially curated and fully curated datasets, but this does not evidence an improvement in the results. Additional plotting of the distribution of these metrics on the Gold Truth data could help confirm that the distribution of these metrics now better matches the expected distribution.

      Similarly, in Use Case 2, visual inspection leads us to believe that the segmentation generated by the Cellpose + Deli pipeline (shown in Figure 4d) is an improvement, but a direct comparison of agreement between segmented masks and masks in the published data (where the segmentations overlap) would further evidence this.

      We agree that demonstrating the correlation between metric optimization and real segmentation improvement is essential. We will add new analysis comparing the distributions of the unsupervised metrics with the gold truth data before and after curation. Additionally, we will provide overlap scores where ground truth annotations are available, confirming the improvement. We will also explicitly discuss the limitation of relying solely on unsupervised metrics without complementary validation.

      We would appreciate the authors addressing the risk of decreasing the quality of the segmentations by applying circular logic with their tool; MorphoNet 2.0 uses unsupervised metrics to identify masks that do not fit the typical distribution. A model such as StarDist can be trained on the "good" masks to generate more masks that match the most common type. This leads to a more homogeneous segmentation quality, without consideration for whether these metrics actually optimize the segmentation

      We thank the reviewer for this important and insightful comment. It raises a crucial point regarding the risk of circular logic in our segmentation pipeline. Indeed, relying on unsupervised metrics to select “good” masks and using them to train a model like StarDist could lead to reinforcing a particular distribution of shapes or sizes, potentially filtering out biologically relevant variability. This homogenization may improve consistency with the chosen metrics, but not necessarily with the true underlying structures.

      We fully agree that this is a key limitation to be aware of. We will revise the manuscript to explicitly discuss this risk, emphasizing that while our approach may help improve segmentation quality according to specific criteria, it should be complemented with biological validation and, when possible, expert input to ensure that important but rare phenotypes are not excluded.

      In Use case 5, the authors include details that the errors were corrected by "264 MorphoNet plugin actions ... in 8 hours actions [sic]". The work would benefit from explaining whether this is 8 hours of human work, trying plugins and iteratively improving, or 8 hours of compute time to apply the selected plugins.

      We will clarify that the “8 hours” refer to human interaction time, including exploration, testing, and iterative correction using plugins.

      Reviewer #2 (Public review):

      Summary:

      This article presents Morphonet 2.0, a software designed to visualise and curate segmentations of 3D and 3D+t data. The authors demonstrate their capabilities on five published datasets, showcasing how even small segmentation errors can be automatically detected, easily assessed, and corrected by the user. This allows for more reliable ground truths, which will in turn be very much valuable for analysis and training deep learning models. Morphonet 2.0 offers intuitive 3D inspection and functionalities accessible to a non-coding audience, thereby broadening its impact.

      Strengths:

      The work proposed in this article is expected to be of great interest to the community by enabling easy visualisation and correction of complex 3D(+t) datasets. Moreover, the article is clear and well written, making MorphoNet more likely to be used. The goals are clearly defined, addressing an undeniable need in the bioimage analysis community. The authors use a diverse range of datasets, successfully demonstrating the versatility of the software.

      We would also like to highlight the great effort that was made to clearly explain which type of computer configurations are necessary to run the different datasets and how to find the appropriate documentation according to your needs. The authors clearly carefully thought about these two important problems and came up with very satisfactory solutions.

      We would like to sincerely thank the reviewer for their positive and thoughtful feedback. We are especially grateful that they acknowledged the clarity of the manuscript and the potential value of MorphoNet 2.0 for the community, particularly in facilitating the visualization and correction of complex 3D(+t) datasets. We also appreciate the reviewer’s recognition of our efforts to provide detailed guidance on hardware requirements and access to documentation—two aspects we consider crucial to ensuring the tool is both usable and widely adopted. Their comments are very encouraging and reinforce our commitment to making MorphoNet 2.0 as accessible and practical as possible for a broad range of users in the bioimage analysis community.

      Weaknesses:

      There is still one concern: the quantification of the improvement of the segmentations in the use cases and, therefore, the quantification of the potential impact of the software. While it appears hard to quantify the quality of the correction, the proposed work would be significantly improved if such metrics could be provided.

      The authors show some distributions of metrics before and after segmentations to highlight the changes. This is a great start, but there seem to be two shortcomings: first, the comparison and interpretation of the different distributions does not appear to be trivial. It is therefore difficult to judge the quality of the improvement from these. Maybe an explanation in the text of how to interpret the differences between the distributions could help. A second shortcoming is that the before/after metrics displayed are the metrics used to guide the correction, so, by design, the scores will improve, but does that accurately represent the improvement of the segmentation? It seems to be the case, but it would be nice to maybe have a better assessment of the improvement of the quality.

      We thank the reviewer for this constructive and important comment. We fully agree that assessing the true quality improvement of segmentation after correction is a central and challenging issue. While we initially focused on changes in the unsupervised quality metrics to illustrate the effect of the correction, we acknowledge that interpreting these distributions may not be straightforward, and that relying solely on the metrics used to guide the correction introduces an inherent bias in the evaluation.

      To address the first point, we will revise the manuscript to provide clearer guidance on how to interpret the changes in metric distributions before and after correction, with additional examples to make this interpretation more intuitive.

      Regarding the second point, we agree that using independent, external validation is necessary to confirm that the segmentation has genuinely improved. To this end, we will include additional assessments using complementary evaluation strategies on selected datasets where ground truth is accessible, to compare pre- and post-correction segmentations with an independent reference. These results reinforce the idea that the corrections guided by unsupervised metrics generally lead to more accurate segmentations, but we also emphasize their limitations and the need for biological validation in real-world cases.

      Reviewer #3 (Public review):

      Summary:

      A very thorough technical report of a new standalone, open-source software for microscopy image processing and analysis (MorphoNet 2.0), with a particular emphasis on automated segmentation and its curation to obtain accurate results even with very complex 3D stacks, including timelapse experiments.

      Strengths:

      The authors did a good job of explaining the advantages of MorphoNet 2.0, as compared to its previous web-based version and to other software with similar capabilities. What I particularly found more useful to actually envisage these claimed advantages is the five examples used to illustrate the power of the software (based on a combination of Python scripting and the 3D game engine Unity). These examples, from published research, are very varied in both types of information and image quality, and all have their complexities, making them inherently difficult to segment. I strongly recommend the readers to carefully watch the accompanying videos, which show (although not thoroughly) how the software is actually used in these examples.

      We sincerely thank the reviewer for their thoughtful and encouraging feedback. We are particularly pleased that the reviewer appreciated the comparative analysis of MorphoNet 2.0 with both its earlier version and existing tools, as well as the relevance of the five diverse and complex use cases we selected. Demonstrating the software’s versatility and robustness across a variety of challenging datasets was a key goal of this work, and we are glad that this aspect came through clearly. We also appreciate the reviewer’s recommendation to watch the accompanying videos, which we designed to provide a practical sense of how the tool is used in real-world scenarios. Their positive assessment is highly motivating and reinforces the value of combining scripting flexibility with an interactive 3D interface.

      Weaknesses:

      Being a technical article, the only possible comments are on how methods are presented, which is generally adequate, as mentioned above. In this regard, and in spite of the presented examples (chosen by the authors, who clearly gave them a deep thought before showing them), the only way in which the presented software will prove valuable is through its use by as many researchers as possible. This is not a weakness per se, of course, but just what is usual in this sort of report. Hence, I encourage readers to download the software and give it time to test it on their own data (which I will also do myself).

      We fully agree that the true value of MorphoNet 2.0 will be demonstrated through its practical use by a wide range of researchers working with complex 3D and 3D+t datasets. In this regard, we will improve the user documentation and provide a set of example datasets to help new users quickly familiarize themselves with the platform. We are also committed to maintaining and updating MorphoNet 2.0 based on user feedback to further support its usability and impact.

      In conclusion, I believe that this report is fundamental because it will be the major way of initially promoting the use of MorphoNet 2.0 by the objective public. The software itself holds the promise of being very impactful for the microscopists' community.

    1. eLife Assessment

      Thick multicellular plant samples provide unique challenges when it comes to cryo-preservation, which has resulted in limited successful examples for structural studies using in situ cryo-electron tomography. To address this deficiency, this important study describes procedures for high-pressure-freezing, focused ion-beam milling, and cryo-electron tomography imaging of certain plant types. The results described in the paper provide solid evidence for the usefulness of the methods described, although some reservations remain about the applicability of the methods to a wider range of plant cell types.

    2. Reviewer #1 (Public review):

      Summary:

      This in situ cryo-ET workflow of selected plant structures provides several detailed strategies using plunge-freezing and the HPF waffle method and lift-out for notoriously difficult samples (compared to cell culture, yeast, and algae, which are far more prevalent in the literature).

      Strengths:

      A very difficult challenge whereby the authors demonstrate successful vitrification of selected plants/structures using waffle and lift-out approaches for cryoET. Because there are relatively few examples of multi-cellular plant cryo-ET in the literature, it is important for the scientific community to be motivated and have demonstrated strategies that it is achievable. This manuscript has a number of very helpful graphics and videos to help guide researchers who would be interested in undertaking that would help shorten the learning curve of admittedly tedious and complex workflows. This is a slow and tedious process, but you have to start somewhere, and I applaud the authors for sharing their experiences with others, and I expect will help other early adopters to come up to speed sooner.

      Weaknesses:

      While important, the specific specimen and cell-types selected that were successful (perhaps other plant specimen and tissues tried were unsuccessful and thus not reported) in this approach did not demonstrate success to broadly applicable to other much more prevalent and interesting and intensive areas plant biology and plant structures (some mentioned in more detail below).

      This manuscript is essentially a protocol paper and in its paragraph form, and even with great graphics, will definitely be difficult to follow and reproduce for a non-expert. Also considering the use of 3 different FIB-SEM platforms and 2 different cryo-FLM platforms, I wonder if a master graphic of the full workflow(s) could be prepared as a supplementary document that walks through the major steps and points to the individual figures at the critical steps to make it more accessible to the broader readership.

      Multiple times in the manuscript, important workflow details seemed to point to and be dependent on two "unpublished" manuscripts:

      (1) Line 583, 755, 790, 847-848, (Poge et al., will soon be published as a protocol).

      (2) Lines 140, 695, 716 (Capitanio et al., will soon be described in a manuscript).

      It is not clear if/when these would be publicly available. It may be important to wait until these papers can be included in published form.

    3. Reviewer #2 (Public review):

      Summary:

      Poge et al. present a workflow for studying plant tissue by combining high-pressure freezing, cryo-fluorescence microscopy, FIB milling, and cryo-electron tomography (cryo-ET). They tested various plant tissues, including Physcomitrium patens, Arabidopsis thaliana, and Limonium bicolor. The authors successfully produce thin lamellae suitable for cryo-ET studies. Using sub-tomogram averaging, they determined the Rubisco structure at subnanometer resolution, demonstrating the potential of this workflow for plant tissue studies.

      Strengths:

      This manuscript is likely the first to systematically apply FIB milling and cryo-ET to plant tissue samples. It provides a detailed methodological description, which is not only valuable for plant tissue studies but also adaptable to a broader range of biological tissue samples. The study compares the plunge freezing method with a high-pressure freezing method, demonstrating that high-pressure freezing can vitrify thick tissues while preserving their native state. Additionally, the authors explore two methods for plant tissue sample preparation, the "waffle" method and in-carrier high-pressure freezing combined with the "lift-out" approach. The "waffle" method is suitable for samples less than 25um, while the in-carrier high-pressure freezing method can process samples up to 100um.

      Weaknesses:

      The described workflow is very complicated and requires special expertise. The success rate of this workflow is not very high, particularly for high-pressure freezing and life-out technology. Further improvements are needed for automation and increasing throughput.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to improve cryo-TEM workflows for plant cells. The authors present details on high-pressure-freezing protocols to vitrify, ion-mill, and image certain plant cell types.

      Strengths:

      Clear step-by-step outline on how to preserve and image cryo samples derived from plants.

      Weaknesses:

      A general current weakness of cryo-TEM is the problem of vitrifying cells that are embedded in tissues. The vast majority of cells in the plant body are currently not accessible to this technology. This is not a weakness of this specific manuscript but a general problem.

      The manuscript is well organized and well written, and the discussion covers practically all questions I had while reading the results section. I only have a few comments, all of which I consider minor.

    1. eLife Assessment

      This study provides a valuable extension of credibility-based learning research by showing how feedback reliability can distort reward-learning biases in a disinformation-like bandit task. Although the paradigm is well controlled and the computational modelling rigorous, the evidential support is incomplete: key claims about learning from 50 %-credible feedback and heightened positivity bias at low credibility hinge on a single dataset, specific parameter definitions, and modelling assumptions not fully validated across studies. Clearer reporting of the discovery-study null result, behavioural tests of positivity bias, and standard information-criterion model comparisons are needed to solidify the conclusions and enhance generalizability.

    2. Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example) but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.<br /> One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.

      Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

    3. Reviewer #2 (Public review):

      This valuable paper studies the problem of learning from feedback given by sources of varying credibility. The solid combination of experiment and computational modeling helps to pin down properties of learning, although some ambiguity remains in the interpretation of results.

      Summary:

      This paper studies the problem of learning from feedback given by sources of varying credibility. Two bandit-style experiments are conducted in which feedback is provided with uncertainty, but from known sources. Bayesian benchmarks are provided to assess normative facets of learning, and alternative credit assignment models are fit for comparison. Some aspects of normativity appear, in addition to deviations such as asymmetric updating from positive and negative outcomes.

      Strengths:

      The paper tackles an important topic, with a relatively clean cognitive perspective. The construction of the experiment enables the use of computational modeling. This helps to pinpoint quantitatively the properties of learning and formally evaluate their impact and importance. The analyses are generally sensible, and parameter recovery analyses help to provide some confidence in the model estimation and comparison.

      Weaknesses:

      (1) The approach in the paper overlaps somewhat with various papers, such as Diaconescu et al. (2014) and Schulz et al. (forthcoming), which also consider the Bayesian problem of learning and applying source credibility, in terms of theory and experiment. The authors should discuss how these papers are complementary, to better provide an integrative picture for readers.

      Diaconescu, A. O., Mathys, C., Weber, L. A., Daunizeau, J., Kasper, L., Lomakina, E. I., ... & Stephan, K. E. (2014). Inferring the intentions of others by hierarchical Bayesian learning. PLoS computational biology, 10(9), e1003810.<br /> Schulz, L., Schulz, E., Bhui, R., & Dayan, P. Mechanisms of Mistrust: A Bayesian Account of Misinformation Learning. https://doi.org/10.31234/osf.io/8egxh

      (2) It isn't completely clear what the "cross-fitting" procedure accomplishes. Can this be discussed further?

      (3) The Credibility-CA model seems to fit the same as the free-credibility Bayesian model in the first experiment and barely better in the second experiment. Why not use a more standard model comparison metric like the Bayesian Information Criterion (BIC)? Even if there are advantages to the bootstrap method (which should be described if so), the BIC would help for comparability between papers.

      (4) As suggested in the discussion, the updating based on random feedback could be due to the interleaving of trials. If one is used to learning from the source on most trials, the occasional random trial may be hard to resist updating from. The exact interleaving structure should also be clarified (I assume different sources were shown for each bandit pair). This would also relate to work on RL and working memory: Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024-1035.

      (5) Why does the choice-repetition regression include "only trials for which the last same-pair trial featured the 3-star agent and in which the context trial featured a different bandit pair"? This could be stated more plainly.

      (6) Why apply the "Truth-CA" model and not the Bayesian variant that it was motivated by?

      (7) "Overall, the results from this study support the exact same conclusions (See SI section 1.2) but with one difference. In the discovery study, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3)" - this seems like a very salient difference, when the paper reports the feedback effect as a primary finding of interest, though I understand there remains a valence-based difference.

      (8) "Participants were instructed that this feedback would be "a lie 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded." - I agree that this is a possible explanation for updating from the random source. It is a meaningful caveat.

      (9) "Future studies should investigate conditions that enhance an ability to discard disinformation, such as providing explicit instructions to ignore misleading feedback, manipulations that increase the time available for evaluating information, or interventions that strengthen source memory." - there is work on some of this in the misinformation literature that should be cited, such as the "continued influence effect". For example: Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of experimental psychology: Learning, memory, and cognition, 20(6), 1420.

      (10) Are the authors arguing that choice-confirmation bias may be at play? Work on choice-confirmation bias generally includes counterfactual feedback, which is not present here.

    4. Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the *right* way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but *not* the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the non-normalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But, assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693 and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to *disinformation*, rather than simply _less information._ I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore *not* Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

    1. eLife Assessment

      This important study of the inhibitory complex amacrines (CAM) in the vertical lobe of Octopus vulgaris delivers a solid standard for the structural characterization of an anatomical region likely to be key for memory processing in this unconventional but complex organism, as well as a helpful classification of CAM subtypes. This work will be of broad relevance to the fields of memory and evolutionary neuroscience.

    2. Reviewer #1 (Public review):

      The authors identified five complex amacrine cell (CAM) subtypes based on their morphology and synaptic connectivity. It's suggested that the differences in structure may be directly correlated with different functional roles. The authors also describe synaptic compartmentalization in the SFL tract relating to three types of CAM input regions, again implying a specialized role for these cells. The authors also identified neural progenitor cells, which suggests that the octopus's vertical lobe can undergo neurogenesis throughout its life.

      The work presented here is valuable and convincing. Below are some suggestions the authors may wish to incorporate:

      a) Quantitative measurements to define the CAM subtypes<br /> I think the categorization of the CAMs into five subtypes is convincing, however, I wonder how easily these categories could be identified by other researchers. Would it be possible for the authors to include additional quantitative measurements of these cell types to make their categorization less qualitative and more quantitative? For example, density, volume, and orientation of their dendritic fields?

      b) The definition of the neuritic backbone is included in the methods, but I found the term confusing when I first encountered it in the results, so I would suggest adding the definition to the results too.

      c) The authors wrote, 'Note that given the pronounced difference in diameters between the neuritic backbones (208.27 +/-87.95 nm) and axons (121.55 +/- 21.28 nm)'. What figure is this in?

      d) I am slightly confused about how the authors decided on the specific cubes to reflect the different synaptic compartments in the SFL tract. Is this organisation arranged/repeated vertically or horizontally throughout the SFL tract? The location of the cubes looks to me to be chosen at random, so more information here would be helpful.

      e) In Figure 2, could the authors plot the number of synapses per cube to make the result clearer, so that cube 1 has the lowest synaptic density and cube 2 has the highest?

      f) SAMs are ACh and excitatory<br /> The authors refer to SAMs as excitatory cholinergic. They should provide more detailed explanations/citations to back up this claim. Could SAMs be synthesizing any other neurotransmitters? Could there be a subpopulation of inhibitory SAMs?

      g) CAMs are GABA and inhibitory

      The 5 subtypes of CAMs described here have never been directly confirmed to be GABAergic. Could CAMs be synthesizing any other neurotransmitters? Could a subpopulation of CAMs be excitatory? I believe the authors should make this clearer to readers when referring to CAMs, perhaps by saying, 'hypothesized to be inhibitory neurons', or 'putative inhibitory neurons'.

      h) Fast neurotransmitters and neuromodulators<br /> The authors refer to neuromodulatory connections in their summary in Figure 4, however, cephalopod receptors have yet to be extensively functionally characterized, therefore, the role different molecules play as neurotransmitters or neuromodulators is not yet known. For example, many invertebrates are known to have functional diversity in their receptors: C. elegans has both excitatory and inhibitory receptors for a range of neurotransmitters, anionic ACh- and glutamate-gated channels, and cationic peptide-gated channels have also been identified in some molluscs. So, probably the authors should be cautious in speculating about how a particular transmitter/modulator acts in the octopus brain.

      i) In the methods, the authors refer to "an adult Octopus", what age and size was it? I also know this is Octopus vulgaris, but it would be good to specify it here.

      j) A general comment about all figures. All panels should have a letter associated with them to make it easier to refer to them in the text. For example, in Figure 4, please also add letters to the main schematic, the CAM subtypes, and the VL wiring diagram. In addition, D and E are missing boxes on the main schematic. It's also not immediately obvious that A-E are zooms of the larger schematic; perhaps this could be made clearer with colours or arrows. Please also add names to the CAM subtypes.

      a) Typo: 'Additionally, the unique characteristics of LTP in the octopus VL, such as its reliance on a NO-dependent mechanism, independent of de novo protein synthesis, persistent activation of (Turchetti-Maia et al., 2018).'

    3. Reviewer #2 (Public review):

      Summary:

      The paper examines the diversity of complex amacrine neurons in the ventral lobe of the adult octopus brain, a structure involved in learning and memory. The work builds on a recent paper by the authors that described the connectivity of the much larger population of simple amacrine (SAM) interneurons from the same pioneering EM volume.

      Strengths:

      While the EM volume only provides a snapshot of a tiny fraction of an adult octopus' brain, the authors can make specific conclusions and formulate precise hypotheses about neuron function, synaptic pathways, and developmental trajectories. One example is the reconstruction of a putative maturation sequence for the SAM neuronal lineage, based on the correlation of soma position and the number of synapses, uncovering a plausible developmental sequence of cell morphologies, with interesting parallels to vertebrate neurogenesis.

      Weaknesses:

      The weakness of the study is that it is examining a relatively small volume (260 × 390 × 27 µm), and several neurons are only incompletely reconstructed. It also remains unclear approximately how many neurons remain to be reconstructed from this volume.

      To improve the presentation, the authors should consider showing videos with the volumetric reconstructions of the different types with their partners/synapses and their relation to the SFL track and SAMs. Such videos would help the reader to appreciate the morphological differences between the cell types. The authors could also consider carrying out further morphological analyses to strengthen their cell-type classification, including Sholl value, radial density of input and output synapses, the number of branch nodes, and similar measures.

    4. Reviewer #3 (Public review):

      (1) The authors described "the excitatory glutamatergic SFL axons and cholinergic SAM inputs". However, the evidence of their transmitter specificity has not been provided. Compelling evidence was neither provided nor discussed in the context of the study.

      (2) Specific interference for inhibitory or excitatory synapses based on EM or other studies must be detailed and elaborated

      (3) Different local microcircuits (submodules) referred to in the text should be better described and more specifically defined.

      (4) I would recommend incorporating a more detailed description of synapses and, especially, synaptic vesicles, clarifying their diversity and similarity across cell subtypes. Are there any differences between cholinergic and glutamatergic synaptic vesicles, postsynaptic densities, or other features...? It would be good, if possible, to explicitly clarify: how many vesicles per different types of synapses? How many synapses per neuron of different types? How many inputs and outputs per a given neuron?

      (5) Authors discuss retrograde messengers like NO? Is there any identifiable morphological type of neuron(s) or synapses that might be nitrergic?

      (6) It would be good to provide separate illustrations showing the detailed organization of any glial cell or different types of glial cells they identified in this study. Authors mainly discuss glial processes but refer to "recognized glial types, such as radial glia and astrocyte-like glia" without specific illustrations, which can be deciphered from their EM data. What are vesicular organizations within different types of glial cells?

      (7) The authors also discuss "supervising inputs of inhibitory (pain) and neuromodulatory (supervising) signals", without any details. It would be important to provide these details in the discussion. Specifically, I suggest incorporating comments about differences/similarities of transmitters and morphology between pain and modulatory pathways/signaling/circuits.

    5. Reviewer #4 (Public review):

      Summary:

      The authors present a follow-up to their initial publication of a volume EM reconstruction of a part of the Octopus vulgaris vertical lobe (VL) (Bidel, Meirovitch et al., eLife 2023). In their previous study, they presented a swath of novel observations pertaining to the neuron types making up the VL and their synaptic connectivity. Here, the authors present an extension of those findings in which they (1) demonstrate that the Complex Amacrine cells (CAMs), which they identified previously, can be grouped into at least 5 distinct subclasses; (2) show that there appears to be distinct compartments in the SFL tract that contain specific synapse types; and (3) present morphological evidence that there may be a neurogenic niche in the VL. The findings are intriguing, advance our understanding of memory circuitry in octopus and across the phylogenetic tree, and open new avenues for deeper investigation.

      Strengths:

      A deeper dissection of the morphologies of CAMs and their distinct complements of synapse types is valuable. The identification of multiple categories of CAMs makes it clearer how the very simple SFL-to-SAM connectivity is likely enriched by a population of diverse interneurons.

      The observation that synapse types may be compartmentalized in the superior frontal lobe tract is an intriguing one, and invites more extensive segmentation and future anatomical studies to further characterize the precise architecture of these compartments.

      Finally, the evidence of the possibility of a neurogenic niche in the VL is exciting as it suggests that ongoing neurogenesis may be a common feature of memory circuitry, perhaps contributing to keeping the representation space of the circuit flexible and adequately sparse.

      Weaknesses:

      A key weakness is the reconstruction and grouping of the CAMs:

      (1) CAMs are relatively few in number compared to SAMs, and as such, only 53 are reconstructed in this study. Of those 53 cells, 18 were not classified into one of the 5 categories the authors designate, begging the question of how robust those categories are.

      (2) Related to (1), in Figure 1B, the proportions given in the bar graph are given cumulatively across the entire population of each category. The proportions should be presented as means within each category to adequately capture the variability of the small sample sizes.

      (3) While the xy dimensions of the serial section EM volume are adequate to capture relatively whole cells and neuronal arbors, the volume is only 27µm thick. Thus, many neurite branches are likely truncated in the z-dimension. This may have contributed to ~1/3 of CAMs eluding categorization. However, it is hard to estimate the effect this may have had without knowing the extent of the truncation. It may be worth the authors' time to count the proportion of CAM neurites that are cut off at the edges of the volume.

      (4) The authors state that CAMs appear to have axons and dendrites based on neurite widths. This is an interesting finding, given that amacrine cells are generally thought to possess only one type of neurite, which both send and receive synaptic potentials, and therefore deserves more attention. Is the distribution of neurite widths indeed bimodally distributed? Can the axons and dendrites be differentiated by examining the presence and absence of synaptic vesicle pools, respectively?

      In Figure 2, the compartmentalization of synapse types is intriguing; however, due to the 3D nature of the data, it is difficult to appreciate clearly from the panels presented. This is particularly true for the suggestion that glia may be forming a barrier around these compartments. This could be rectified by providing Neuroglancer links for these specific reconstructions (neurites, synapses, and glia).

      Lastly, although the identification of a putative neurogenic niche is tantalizing, morphological data alone is only an initial hint. Although the chances are slim, it would be more convincing if the authors could identify any actively dividing cells in the proposed niche. More likely, further work, for instance, immunofluorescence, which the lab has previously shown to be viable in octopus, will be needed to add weight to the claim.

    1. eLife Assessment

      This is an important study introducing a stimulus-computable model of multisensory perception that extends an existing framework to accept raw, stimulus-level inputs (i.e., image- and soundscape-computable). The author demonstrates how low-level correlation detection can drive both illusions and cue integration, and the model bridges diverse stimuli, behaviors, and species. The model and evidence provided are deemed generally convincing and of broad applicability, potentially impacting areas across neuroscience, psychology, and computational cognitive science. There are, however, certain aspects of the work considered incomplete, particularly as they relate to explaining details pertinent to model fitting.

    2. Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155<br /> Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y<br /> Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841<br /> Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006<br /> Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.<br /> Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

    3. Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

    1. eLife Assessment

      The study conducted by Hurtado et al. offers important insights and solid evidence regarding the prediction of drug combinations for cancer treatment. By leveraging disease-specific drug response profiles and single-cell transcriptional signatures, this research not only demonstrates a novel and effective approach to identifying potential drug synergies but it also enhances our understanding of the underlying mechanisms of drug response prediction.

    2. Reviewer #1 (Public review):

      Summary:

      Identifying drugs that target specific disease phenotypes remains a persistent challenge. Many current methods are only applicable to well-characterized small molecules, such as those with known structures. In contrast, methods based on transcriptional responses offer broader applicability because they do not require prior information about small molecules. Additionally, they can be rapidly applied to new small molecules. One of the most promising strategies involves the use of "drug response signatures"-specific sets of genes whose differential expression can serve as markers for the response to a small molecule. By comparing drug response signatures with expression profiles characteristic of a disease, it is possible to identify drugs that modulate the disease profile, indicating a potential therapeutic connection.

      This study aims to prioritize potential drug candidates and to forecast novel drug combinations that may be effective in treating triple-negative breast cancer (TNBC). Large consortia, such as the LINCS-L1000 project, offer transcriptional signatures across various time points after exposing numerous cell lines to hundreds of compounds at different concentrations. While this data is highly valuable, its direct applicability to pathophysiological contexts is constrained by the challenges in extracting consistent drug response profiles from these extensive datasets. The authors use their method to create drug response profiles for three different TNBC cell lines from LINCS.<br /> To create a more precise, cancer-specific disease profile, the authors highlight the use of single-cell RNA sequencing (scRNA-seq) data. They focus on TNBC epithelial cells collected from 26 diseased individuals compared to epithelial cells collected from 10 healthy volunteers. The authors are further leveraging drug response data to develop inhibitor combinations.

      Strengths:

      The authors of this study contribute to an ongoing effort to develop automated, robust approaches that leverage gene expression similarities across various cell lines and different treatment regimen, aiming to predict drug response signatures more accurately. There remains a gap in computational methods for inferring drug responses at the cell subpopulation level, which the authors are trying to address.

      Weaknesses:

      The major deficiencies in this revised manuscript are a lack of benchmarking against established methods, clarification of method limitations, and experimental validation.

      (1) The manuscript still lacks a direct comparison between the retriever tool and well-established methods. How does it perform compared to metaLINCS? Evaluating its performance relative to existing approaches is essential to demonstrate its added value and robustness.<br /> (2) The study remains limited by the absence of experimental validation. Are there supporting data from biological models or clinical trials? Figure 5F is important as this is the validation of the identified compounds in three cell lines. In the previous review, it was noted that the identified drugs had only a modest effect on cell viability. Furthermore, the efficacy of QL-XII-47 and GSK-690693 was found to be cell-line specific-showing activity against BT20 (the cell line used for LINCS transcriptional signature generation) but not against CAL120 and DU4475, which were not included in the signature derivation process. This raises concerns about the tool's ability to predict effective drugs. Additionally, the combination may have an effect because the drugs were tested at high concentrations. How does this effect compare in non-TNBC or normal immortalized breast cell lines? Finally, the DU4475 data were not reproducible, and the experiment must be repeated to ensure reliable comparisons.<br /> (3) A previous review requested a discussion on the limitations of the retriever tool, but the authors instead focused on the well-documented constraints of the LINCS dataset. Clearly defining limitations of the retriever will be critical for evaluating its potential applications and reliability.<br /> (4) Description of the database that the authors used should be corrected. Two examples are below:<br /> "The LINCS-L1000 project published transcriptional profiles of several cell lines." Exploring LINCS metadata will help to introduce the reader to this impressive catalog.<br /> "The portal then returns a ranked list of compounds that are likely to have an inverse effect on disease-associated gene expression levels". When selecting small molecules for use in LINCS-L1000 platform, no link was established between the compounds and disease-associated gene expression levels.<br /> (5) Fig. 3 presents data on differentially expressed genes. However, without indicating whether these genes are up- or downregulated, it is difficult to assess their relevance to TNBC phenotypes and cancer burden.<br /> Additionally, presenting the new Biological Process Gene Ontology analysis in a format similar to Fig. 3C would be beneficial. The statement that these processes are closely related to cancer deregulation is somewhat vague. Instead, the findings may be discussed in relation to each enriched pathway, specifically in the context of TNBC biology and available treatments.

    3. Reviewer #2 (Public review):

      Summary:

      In their study, Osorio and colleagues present 'retriever,' an innovative computational tool designed to extract disease-specific transcriptional drug response profiles from the LINCS-L1000 project. This tool has been effectively applied to TNBC, leveraging single-cell RNA sequencing data to predict drug combinations that may effectively target the disease. The public review highlights the significant integration of extensive pharmacological data with high-resolution transcriptomic information, which enhances the potential for personalized therapeutic applications.

      Strengths:

      A key finding of the study is the prediction and validation of the drug combination QL-XII-47 and GSK-690693 for the treatment of TNBC. The methodology employed is robust, with a clear pathway from data analysis to experimental confirmation.

      Comments on revisions:

      I commend the authors for their thorough and thoughtful revisions, which have significantly strengthened the manuscript. The expanded discussion on the limitations of the LINCS-L1000 dataset and the inherent challenges of imputation techniques provides critical context for interpreting the tool's predictive accuracy. The addition of clinical implications, including strategies for integrating retriever into clinical trial design and its broader applicability to other diseases, enhances the translational relevance of the work. Addressing drug resistance mechanisms in the context of combination therapy further underscores the biological rationale for the approach.

      The transparency regarding computational requirements and ethical considerations-particularly data privacy, bias mitigation, and model validation-demonstrates a responsible and forward-thinking approach to computational biology. These additions not only improve the manuscript's rigor but also set a precedent for ethical practices in personalized medicine research.

      With these revisions, the authors have effectively addressed prior concerns and elevated the impact of their work. The manuscript now presents a compelling case for the retriever as a valuable tool in precision oncology.

    1. eLife Assessment

      This manuscript presents a valuable minimal model of habituation which is quantified by information theoretic measures. The results here could be of use in interpreting habituation behavior in a range of biological systems. The evidence presented is solid, and uses simulations of the minimal model to recapitulate several hallmarks of habituation from a simple model.

    2. Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponding to an optimization strategy that maximizes the mutual information between signal and readout in the steady-state and minimizes dissipation in the system also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model serves as a good starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits most basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear:

      (1) How general is their finding that the optimal Pareto front coincides with the region of maximal information gain? For instance, what happens if the signal H_st (H_max) isn't very strong? Does it matter that in this case, H_st only has a minor influence on delta Q_R? In the binary switching case, what happens if H_max is rather different from H_st (and not just 20% off)? Or in a case where the adapted value corresponds to the average of H_max and H_min?

      (2) The comparison to experimental data isn't very convincing. For instance, is PCA performed simultaneously on both the experimental data set and on the model or separately? What are the units of the PCs in Fig. 6(b,c)? Given that the model parameters are chosen so that the activity decrease in the model is similar to the one in the data (i.e., that they show similar habituation in terms of the readout), isn't it expected that the dynamics in the PC1/2 space look very similar?

    3. Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be the favorable combination.

      Comments on the revision:

      The authors have adequately addressed the points I raised during the initial review. The text has been clarified at multiple instances, and the treatment of energy expenditure is now more rigorous. The manuscript is much improved both in terms of readability and scientific content.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary: 

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths: 

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work and for their comments, which we believe have been instrumental in significantly improving our work and its scope. Below, we address all their concerns.

      Weaknesses: 

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery. 

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model. 

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; 5) intensity sensitivity; 6) subliminal accumulation. Here, we are following the same terminology employed in Eckert et al., Current Biology 34, 5646–5658 (2024), the paper highlighted by the reviewer. We have dedicated a section of the revised version of the manuscript to these hallmarks, substantiating the validity of our framework as a minimal model to have habituation. We remark that these are the sole hallmarks that can be discussed by considering one single external stimulus and that can be identified without ambiguity in a biochemical context. This observation is again in line with Eckert et al., Current Biology 34, 5646–5658 (2024).

      In the revised version, we employ the same strategy of the aforementioned work to determine when the system can be considered “habituated”. Indeed, we introduce a response threshold that is now discussed in the manuscript. We also included a note in the discussions stating that, since any biochemical model will eventually reach a steady state, subliminal accumulation, for example, can only be seen with the use of a threshold. The introduction of different storage mechanisms, ideally more detailed at a molecular level, can shed light on this conceptual gap. This is an interesting direction of research.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed? 

      The reviewer is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes in the previous version. In the revised version, in the section discussing the hallmarks of habituation, we also show other parameter choices when the response decrement is more pronounced. Moreover, we remark that the contour plot of \Delta⟨U> clearly shows that the decrement can largely exceed the 20% threshold presented in the previous version.

      In the revised version, also in light of the works highlighted by the reviewer, we decided to move the focus of the manuscript to the information-theoretic advantage of habituation. As such, we modified several parts of the main text. Also, in the region of optimal information gain, habituation is at an intermediate level. For this reason, we decided to keep the same parameter choice as the previous version in Figure 2.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as done in Eckert et al., Current Biology 34, 5646–5658 (2024), we can state that the system is habituated after a few stimuli for each set of parameters. This aspect is highlighted in the revised version of the manuscript (see also the point above).

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above. 

      As for the response decrement of the readout, we can certainly choose a set of parameters for which the information gain is higher. In the revised version, we also report the information at the first stimulation and when the system is habituated to give a better idea of the range of these quantities. At any rate, as the referee correctly points out, it is difficult to give an intuitive interpretation of the information in our minimal model.

      It is also important to remark that, since the readout population and the receptor both undergo fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus. As such, the mutual information presents a discontinuous behavior that resembles the dynamics of the readout, thereby starting at a non-zero value already at the first stimulus.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. In the revised version, we highlighted that we discuss the information-theoretic aspects of habituation, while the aforementioned references focus on the dynamics of this phenomenon.

      Reviewer #1 (Recommendations for the authors):

      I would also like to note here the simplification of the proposed biological model - in particular, that the receptor can be in an active/passive state, as well as proposing the Nf-kB signaling module as a possible molecular realization. Generally, a large number of cell surface receptors including RTKs of GPCRs have much more complex dynamics including autocatalytic activation that generally leads to bistability, and the Nf-kB has been demonstrated to have oscillatory even chaotic dynamics (works of Savas Tsay, Mogens Jensen and others). Considering this, the authors should at least discuss under which conditions these TNF-Alpha signaling could potentially serve as a molecular realisation for habituation. 

      We thank the reviewer for bringing this to our attention. In the previous version, we reported the TNF signaling network only to show a similar coarse-grained modular structure. However, following a suggestion of reviewer #2, we decided to change Figure 1 to include a simplified molecular scheme of chemotaxis rather than TNF signaling, to avoid any source of confusion about this issue.

      Also, a minor point: Figures 2d-e are cited before 2a-c. 

      We apologize for the oversight. The structure of the Figures and their order is now significantly different, and they are now cited in the correct order. 

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation. 

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained: 

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is a delicate aspect to discuss and we thank the referee for the comment. In the revised version, we report information gain, initial and final information, highlighting that both gain and final information are higher in regions where habituation is present. They have qualitatively similar behavior and highlight a clear information-theoretic advantage of this dynamical phenomenon. An important point is that, to determine the optimal Pareto front, we consider a prolonged stimulus and its associated steady-state information. Therefore, from the optimization point of view, there is no notion of “information gain” or “final information”, which are intrinsically dynamical quantities. As a result, the fact that optimal curve lies in the region of optimal information gain is a-priori not expected and hints at the potential crucial role of this feature. In the revised version, we elucidate this aspect with several additional analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain (non-zero) mutual information, multiple observations of the same stimulus have to reflect into accumulated information that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid confusion between the usual definition of (perfect) adaptation and habituation. However, we now believe that this is not the case for the revised manuscript, and we now include chemotaxis as an example in Figure 1.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the reviewer for the suggestion. We agree that a priori, there is no reason to choose \delta Q_R or a function of the internal energy flux J_int (that, in the revised version, we are using in place of \dot\Sigma_int following the suggestion of reviewer #3). The rationale was to minimize \delta Q_R since this dissipation is unavoidable and stems from the presence of the storage inhibiting the receptor through the internal pathway. Indeed, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R.

      In the revised version, we now include in the optimization principle two energy contributions (see Eq. (14) of the revised manuscript): \delta Q_R and E_int, which is the energy consumption associated with the driven storage production per unit energy. All Figures have been updated accordingly. The results remain similar, as \delta Q_R still represents the main contribution, especially at high \beta.

      Furthermore, in the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the signal needs to be strong enough for the system to distinguish it from the intrinsic thermal noise (controlled by beta). We also show that if the system is able to tune the inhibition strength \kappa, the Pareto frontiers at different ⟨H⟩ collapse into a single curve. This shows that, although the values of, e.g., the mutual information, depend on ⟨H⟩, the qualitative behavior of the system in this regime is effectively independent of it. We also added more details about this in the Supplementary Information.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels? 

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, we believe that the fact that our minimal model is able to capture the features of a complex neural system just by looking at the PCs, without any explicit biological details, is non-trivial. We also stress that the 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. In the case of the data generated from the model, most of the variance of the activity comes from the switching signal, and similar considerations can be made for the looming stimulations in the data. We updated the manuscript to clarify this point.

      Reviewer #2 (Recommendations for the authors):

      (1) The abstract makes it sound like a new finding is that habituation is due to a slow, negative feedback mechanism. But, as mentioned in the introduction, this is a well-known fact. 

      We agree with the reviewer. We have revised the abstract.

      (2) Figure 2c Why does the range of Delta Delta I_f include negative values if the corresponding region is shaded (right-tilted stripes)? 

      The negative values in the range are those attained in the shaded region with right-tilted stripes. We decided to include them in the colorbar for clarity, since Delta Delta I_f is also plotted in the region where it attains negative values.

      (3) What does the Pareto front look like if the optimization is done for input statistics given by ⟨H⟩_min? 

      In the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the strength of the signal is crucial for the system to discriminate input and thermal noise (see also the answers above).

      In particular, in Figure 4 we explicitly compare the results of the Pareto optimization (which is done with a static input of a given statistics) with the dynamics of the model for different values of ⟨H⟩ in two scenarios, i.e., adaptive and non-adaptive inhibition strength (see answers above for details).

      We also remark that ⟨H⟩_min represents the background signal that the system is not trying to capture, which is why we never used it for optimization.

      (4) From the main text, it is rather difficult to understand how the comparison to the experimental data was performed. How was the PCA done exactly? What are the "features" of the evoked neural response? 

      The PCA on data is performed starting from the single-neuron calcium dynamics. To perform a far comparison, we reconstruct a similar but extremely simplified dynamics using our model as explained in Methods to perform the PCA on analogous simulated data. We added a comment on this in the revised version. While these components capture most of the variance in the data, their specific interpretation is usually out of reach and we believe that it lies beyond the scope of this theoretical work. We also remark that the model does not contain all these biological details - a strong aspect in our opinion - and, as such, it cannot capture specific biological features.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment. 

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination. 

      We thank the reviewer for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed: 

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the reviewer for raising this point. In the revised version, we have changed the abstract to reflect the reviewer’s points and the new structure and results of the manuscript.

      (2) Several clarifications are needed on the treatment of energy dissipation. 

      -   When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the reviewer for this typo. Indeed, \sigma sets the energy scale of feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., in Eq. (1) together with \kappa. This typo has been corrected in the revised manuscript, and all subsequent equations are consistent.

      -   I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on ⟨H⟩, however, is not fully clear. If the environment were static and the memory block was absent, the term with ⟨H⟩ would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence.

      By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript), since its presence is solely due to the existence of a storage population. Therefore, in this case, the receptor would be a 2-state, 1-pathway system and, as such, it would always satisfy an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript would not hold anymore and the receptor would not exhibit any dissipation. Thus, in a static environment and without a memory block, no receptor dissipation would be present. We would also like to stress that our choice to model two different pathways has been motivated by the observation that the negative feedback acts along a different pathway in several biochemical and biological examples. We made some changes to the model description in the revised version and we hope that this aspect has been clarified.

      -   Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate? 

      We agree with the referee that the reverse reaction we considered is not the microscopic reverse of the storage production. In the case of a fast readout population, we employed a coarse-grained view to compute this entropy production. To be more precise, we gladly welcomed the referee’s suggestion in the revised version and modified the manuscript accordingly. As suggested, we now employ the energy flux associated with the storage production to estimate the internal dissipation (see new Fig. 3). 

      In the revised version, we also use this quantity in the optimization procedure in combination with \deltaQ_R (see new Fig. 4) to have a complete characterization of the system’s energy consumption. The conclusions are qualitatively identical to before, but we believe that now they are more solid from a theoretical perspective. For this important advance in the robustness and quality of our work, we are profoundly grateful to the referee.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics? 

      The initial stimulus is indeed stochastic with an average constant in time and mimics the background (small) signal. We apply the (strong) stimulation when the system already reached a stationary state with respect to the background. As it can be appreciated in Fig. 2 of the revised version, the model response depends on the pre-stimulus level, since it sets the storage concentration before the stimulation arrives and, as such, the subsequent habituation dynamics. This dependence is important from a dynamical perspective. The information-theoretic picture has been developed, as said above, by letting the system relax before the first stimulus. This eliminates this arbitrary dependence and provides a clearer idea of the functional advantages of habituation. Moreover, the optimization procedure is performed in a completely different setting, with no pre-stimulus at all, since we only have one prolonged stimulation. We hope that the revised version is clearer on all these points.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity and we thank the reviewer for spotting this issue. In Figure 4 (now Figure 5 in the revised manuscript) Δ⟨S⟩ is not exactly zero, but equal to 0.15% at the final point. It appeared as 0% in the plot due to an unwanted rounding in the plotting function that we missed. This has been fixed in the revised version, thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 2 | "Figure 1b-e" should be "Figure 1b-d" since there is no panel (e) in Figure 1. 

      (2) Figure 1a | In the top schematic, the symbol "k" is used, while in the rest of the text, the proportionality constant is denoted by κ. 

      We thank the reviewer for pointing this out. Figure 1 has been revised and the panels are now consistent. The proportionality constant (the inhibition strength) has also been fixed.

      (3) Figure 1a | I find the upper part of the schematic for Storage hard to perceive. I understand the lower part stands for the degradation reaction for storage molecules. The upper part stands for the synthesis reaction catalyzed by the readout population. I think the bolded upper arrow would explain it sufficiently well; the left/right arrows, together with the crossed green circle make that part of the figure confusing. Consider simplifying. 

      We decided to remove the left/right arrows, as suggested by the reviewer, as we agree that they were unnecessarily complicating the schematic. We hope that the revised version will be easier to understand.

      (4)Page 3 | It would be helpful to tell what the temporal statistics of the input signal $p_H(h,t)$ is, i.e. <h(t) h(t')>. Looking at the example trajectory in Figure 1a, consecutive signal values do not seem correlated. 

      We agree with the reviewer that this is an important detail and worth mentioning. We now explicitly state that consecutive values are not correlated, for simplicity. 

      (5)Figure 2 | I believe the label "EXTERNAL INPUT" refers to the *average* external input, not one specific realization (similar to panels (d) and (e) that report on average metrics). I suggest you indicate this in the label, or, what may be even better, add one particular realization of the stochastic input to the same graph.

      We thank the reviewer for spotting this. We now write that what we show is the average external signal. We prefer this solution rather than showing a realization of the stochastic input, since it is more consistent with the rest of the plots, where we always show average quantities. We also note that Figure 2 is now Figure 3 in the revised manuscript.

      (6)Figure 2d | The expression of Δ⟨U⟩ is the negative of the definition in Eq. (5). It should be corrected. 

      In the revised version, both the definitions in Figure 2 (now Figure 3) and in the text (now Eq. (11)) are consistent.

      (7) Figure 3(d-e) caption | "where ⟨U⟩ starts to be significantly smaller than zero." There, it should be Δ⟨U⟩ instead of ⟨U⟩. 

      Thanks again, we corrected this typo.

    1. eLife Assessment

      This study presents an important set of new tools to facilitate Cre or Dre-mediated recombination in mice. The characterization of these new tools was done using solid and validated methodology. The work convincingly demonstrates the efficient gene knockout capability of these models and will progress the field.

    2. Reviewer #1 (Public review):

      This is a simple and potentially valuable approach to reduce Cre leak in amplified systems designed to improve CreER use across alleles. The revised work is improved with a direct comparison to the Benedito iSure-Cre line, providing some practical guidance for investigators. The authors do not address the issue of Cre toxicity or mosaic efficiency with low Tamoxifen use.

      The major improvement in my mind is the inclusion of Supp Fig 7 where the authors compare their loxCre to iSureCre. The discussion is somewhat improved, but still fails to discuss significant issues such as Cre toxicity in detail. As noted by most reviewers, without a biological question, the paper is entirely a technical description of a couple of new tools. Whether and to what extent journals such as eLife should publish every new technical innovation without rigorous functional comparison to prior tools is an important question raised by this study. There is already a plethora of available techniques, most of which look better on paper than they function in mice.

      However, I do feel that these tools will be of potential use to the field.

    3. Reviewer #2 (Public review):

      This work presents new genetic tools for enhanced Cre-mediated gene deletion and genetic lineage tracing. The authors optimise and generate mouse models that convert temporally controlled CreER or DreER activity to constitutive Cre expression, coupled with the expression of tdT reporter for the visualizing and tracing of gene-deleted cells. This was achieved by inserting a stop cassette into the coding region of Cre, splitting it into N- and C-terminal segments. Removal of the stop cassette by Cre-lox or Dre-rox recombination results in the generation of modified Cre that is shown to exhibit similar activity to native Cre. The authors further demonstrate efficient gene knockout in cells marked by the reporter using these tools, including intersectional genetic targeting of pericentral hepatocytes.

      The new models offer several important advantages. They enable tightly controlled and highly effective genetic deletion of even alleles that are difficult to recombine. By coupling Cre expression to reporter expression, these models reliably report Cre-expressing i.e. gene-targeted cells and circumvent false positives that can complicate analyses in genetic mutants relying on separate reporter alleles. Moreover, the combinatorial use of Dre/Cre permits intersectional genetic targeting, allowing for more precise fate mapping.

      The study and the new models have also limitations. The demonstration of efficient deletion of multiple floxed alleles in a mosaic fashion, a scenario where the lines would demonstrate their full potential compared to already existing models, has not been tested in the current study. Mosaic genetics is increasingly recognized as a key methodology for assessing cell-autonomous gene functions. The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. In addition, as discussed by the authors, a limitation of this line is the constitutive expression of Cre, which is associated with toxicity in some cases.

      Comments on revisions: I have no further comments.

    4. Reviewer #3 (Public review):

      Shi et al describe a new set of tools to facilitate Cre or Dre-recombinase-mediated recombination in mice. The strategies are not completely novel but have been pursued previously by the lab, which is world-leading in this field, and by others. The authors report a new version of the iSuRe-Cre approach, which was originally developed by Rui Benedito's group in Spain. Shi et al describe that their approach shows reduced leakiness compared to the iSuRe-Cre line. Furthermore, a new R26-roxCre-tdT mouse line was established after extensive testing, which enables efficient expression of the Cre recombinase after activation of the Dre recombinase. The authors carefully evaluated efficiency and leakiness of the new line and demonstrated the applicability by marking peri-central hepatocytes in an intersectional genetics approach. The paper represents the result of enormous, carefully executed efforts. Although I would have preferred to see a study which uses the wonderful new tools to address a major biological question, carefully conducted technical studies have an enormous value for the scientific community, clearly justifying publication.

      The new mouse lines generated in this study will enhance the precision of genetic manipulation in distinct cell types and greatly facilitate future work in numerous laboratories. The authors expertly eradicated weaknesses from initial submissions. Remaining open questions regarding potential toxicity of expressing multiple recombinases and fluorescence reports were convincingly answered.

    1. eLife Assessment

      This analysis of the formation of the oral-aboral body axis in cnidarians, the sister group of bilaterians, is a significant and fundamental contribution to the field of Wnt signalling and planar cell polarity, particularly in or understanding in gradient formation, non-canonical Wnt signalling and Wnt-Frizzled interactions in cnidarians. The evidence supporting the conclusions is compelling and has the potential to contribute to a deeper understanding of the origin and evolution of Wnt signalling in cnidarians and metazoans in general. These findings, which are presented in a thoughtful and scholarly manner, will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This noteworthy paper examines the role of planar cell polarity and Wnt signalling in body axis formation of the hydrozoan Clytia. In contrast to the freshwater polyp Hydra or the sea anemone Nematostella, Clytia represents a cnidarian model system with a complete life cycle (planula larva-polyp-medusa). In this species, classical experiments have demonstrated that a global polarity is established from the oral end of the embryos (Freeman, 1981). Prior research has demonstrated that Wnt3 plays a role in the formation of the oral organiser in Clytia and other cnidarians, acting in an autocatalytic feedback-loop with β-catenin. However, the question of whether and to what extent an oral-aboral gradient of Wnt activity is established remained unanswered. This gradient is thought to control both tissue differentiation and tissue polarity. The planar cell polarity (PCP) pathway has been linked to this polarity, although it is generally considered to be β-catenin independent.

      Comments on major strengths and weaknesses:

      Beautiful and solid experiments to clarify the role of canonical Wnt signalling and PCP core factors in coordinating planar cell polarity of Clytia. The authors have conducted a series of sophisticated experiments utilising morpholinos, mRNA microinjections and immunofluorescent visualisation of PCP. The objective of these experiments was to address the function of Wnt3, β-catenin and PCP core proteins in the coordination of the global polarity of Clytia embryos. The authors conclude that PCP plays a role in regulating polarity along the oral-aboral axis of embryos and larvae. This offers a conceivable explanation for how polarity information is established and distributed globally during Clytia embryogenesis, with implications for our understanding of axis formation in cnidarians and the evolution of Wnt signalling in general. - While the experiments are well-designed and executed, there are some criticisms, questions or suggestions that should be addressed.

      (i) Wnt3 cue and global PCP. PCP has been described in detail in a previous paper on Clytia (Momose et al, 2012): its orientation along the oral-aboral body axis (ciliary basal body positioning studies), and its function in directional polarity during gastrulation (Stbm-, Fz1-, and Dsh-MO experiments). I wonder if this part could be shortened. What is new, however, are the knockdown and Wnt3-mRNA rescue experiments, which provide a deeper insight into the link between Wnt3 function in the blastopore organiser as a source or cue for axis formation. These experiments demonstrate that the Wnt3 knockdown induces defects equivalent to PCP factor knockdown, but can be rescued by Wnt3-mRNA injection, even at a distance of 200 µm away from the Wnt-positive area. The experimental set-up of these new molecular experiments follows in important aspects those of Freeman's experiments of 1981 (who in turn was motivated to re-examine Teissier's work of 1931/1933 ...). Freeman did not use the term "global polarity" but the concept of an axis-inducing source and a long-range tissue polarity can be traced back to both researchers.

      (ii) PCP propagation and β-catenin. The central but unanswered question in this study focuses on the interaction between Wnt3 and PCP and the propagation of PCP. Wnt3 has been described in cnidarians but also in vertebrates and insects as a canonical Wnt interacting with β-catenin in an autocatalytic loop. The surprising result of this study is that the action of Wnt3 on PCP orientation is not inhibited in the presence of a dominant-negative form of CheTCF (dnTCF) ruling out a potential function of β-catenin in PCP. This was supported by studies with constitutively active β-catenin (CA-β-cat) mRNA which was unable to restore PCP coordination nor elongation of Wnt3-depleted embryos but did restore β-catenin-dependent gastrulation. Based on these data, the authors conclude that Wnt3 has two independent roles: Wnt/β-catenin activation and initial PCP orientation (two step model for PCP formation). However, the molecular basis for the interaction of Wnt3 with the PCP machinery and how the specificity of Wnt3 for both pathways is regulated at the level of Wnt-receiving cells (Fz-Dsh) remains unresolved. - Also, with respect to PCP propagation, there is no answer with respect to the underlying mechanisms. The authors found that PCP components are expressed in the mid-blastula stage, but without any further indication of how the signal might be propagated, e.g., by a wavefront of local cell alignment. Here, it is necessary to address the underlying possible cellular interactions more explicitly.

      (iii) The proposed two step model for PCP formation has important evolutionary implications in that it excludes the current alternate model according to which a long-range Wnt3-gradient orients PCP ("Wnt/β-catenin-first"). Nevertheless, the initial PCP orientation by Wnt3 - as proposed in the two-step-model - is not explained at all on the molecular level. Another possible, but less well discussed and studied option for linking Wnt3 with PCP action could be a role of other Wnt pathways. The authors present compelling evidence that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development. The authors convincingly show that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development (Fig. S1). However, Wnt7 is also more highly expressed, which makes it a candidate for signal transduction from canonical Wnts to PCP Wnts. An involvement of Wnt7 in PCP regulation has been described in vertebrates (http://dx.doi.org/10.1016/j.celrep.2013.12.026). This would challenge the entire discussion and speculation on the evolutionary implications according to which PCP Wnt signaling comes first (PCP-first scenario") and canonical Wnt signaling later in metazoan evolution.

      (iv) The discussion, including Figure 6, is strongly biased towards the traditional evolutionary scenario postulating a choanzoan-sponge ancestry of metazoans. Chromosome-linkage data of pre-metazoans and metazoans (Schulz et al., 2023; https://doi.org/10 (1038/s41586-023-05936-6) now indicate a radically different scenario according to which ctenophores represent the ancestral form and are sister to sponges, cnidarians and bilaterians (the Ctenophora-sister hypothesis). This also has implications for the evolution of Wnt signalling, as discussed in the recent Nature Genetics Review by Holzem et al. (2024) (https://doi.org/10.1038/s41576-024-00699-w). Furthermore, it calls into question the hypothesis of a filter-feeding multicellular gastrula-like ancestor as proposed by Haeckel (Maegele et al., 2023). These papers have not yet been referenced, but they would provide a more robust discussion.

      General appraisal:

      The authors have carefully addressed all important points raised in this review. Aims and results support their conclusions.

      Impact of the work, utility of methods and data:

      As stated above, there will be a major impact on our understanding of the role of Wnt signaling in gradient formation and particularly the role of non canonical wnt signaling. As mentioned above, this will have a major impact on our understanding of the role of Wnt signalling in gradient formation, particularly the role of non-canonical Wnt signalling. - It will also be important to better understand the role of Wnt-Frizzled interactions in these basal organisms, as cnidarians have a smaller repertoire of Frizzled receptors compared to the relatively complete repertoire of Wnt subfamilies. This may imply that Wnt 3 is active in both canonical and PCP.

      Additional context:

      With regard to the question of the evolution of the body plan and Wnt signalling, it would be helpful and important for readers unfamiliar with cnidarians to know that the Hydrozoa/Medusozoa, to which Clytia belongs, are an "evolutionary derived group" within the Cnidaria, as opposed to the Anthozoa (e.g. sea anemone Nematostella). Hydrozoans possess planula larvae that are devoid of a mouth and any form of feeding mechanism, relying instead on the yolk of a fertilised egg for sustenance. The substantial divergence between the Anthozoa and Medusozoa was accompanied by significant gene reductions within the Medusozoa, which likely exerts an influence on the evolution of Wnt signalling in this group as well. This should not detract from the value of the work, but may help to put it in perspective.

    3. Reviewer #2 (Public review):

      Summary:

      Canonical Wnt signaling has previously been shown to be responsible for correct patterning of the oral-aboral axis as well as germ layer formation in several cnidarians. The post-gastrula stage, the planula larvae is not only elongated, it has a specific swimming direction due to the decentralized cellular positioning and slanted anchoring of the cilia. This, in turn, is in most other animals the result of a Wnt-Planar-cell polarity pathway. This paper by Uveira et al investigates the role of Wnt3 signaling in serving as a local cue for the PCP pathway which then is responsible for the orientation of the cilia and elongation of the planula larva of the hydrozoan Clytia hemisphaerica. Wnt3 was shown before to activate the canonical pathway via ß-catenin and to act as an axial organizer. The authors provide compelling evidence for this somewhat unusual direct link between the pathways through the same signaling molecule, Wnt3. In conclusion, they propose a two-step model: 1) local orientation by Wnt3 secretion 2) global propagation by the PCP pathway over the whole embryo.

      Strengths:

      In a series of elegant and also seemingly sophisticated experiments, they show that Wnt3 activates the PCP pathway directly, as it happens in the absence of canonical Wnt signaling (e.g. through co-expression of dnTCF). Conversely, constitutive active ß-catenin was not able to rescue PCP coordination upon Wnt3 depletion, yet restored gastrulation. This uncouples the effect of Wnt3 on axis specification and morphogenetic movements from the elongation via PCP. Through transplantation of single blastomeres providing a local source of Wnt3, they also demonstrate the reorganization of cellular polarity immediately adjacent to the Wnt3 expressing cell patch. These transplantation experiments also uncover that mechanical cues can also trigger the polarization, suggesting a mechanotransduction or direct influence on subcellular structures, e.g. actin fiber orientation.

      This is a beautiful and elegant study addressing an important question. The results have significant implications also for our understanding of the evolutionary origin of axis formation and the link of these two ancient pathways, which in most animals are controlled by distinct Wnt ligands and Frizzled receptors. The quality of the data is stunning and the paper is written in a clear and succinct manner. This paper has the potential to become a widely cited milestone paper.

      Weaknesses:

      I can not detect any major weaknesses. The work only raises a few more follow-up questions, which the authors are invited to comment on.

      I acknowledge the revisions made by the authors. Some open questions remain that need to be addressed in future work, and I accept the limitations of this study, as argued by the authors. Besides the elegant and high-quality experiments, I also appreciate the thoughtful and inspiring discussion.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Wnt3 cue and global PCP. PCP has been described in detail in a previous paper on Clytia (Momose et al, 2012): its orientation along the oral-aboral body axis (ciliary basal body positioning studies), and its function in directional polarity during gastrulation (Stbm-, Fz1-, and Dsh-MO experiments). I wonder if this part could be shortened. What is new, however, are the knockdown and Wnt3-mRNA rescue experiments, which provide a deeper insight into the link between Wnt3 function in the blastopore organiser as a source or cue for axis formation. These experiments demonstrate that the Wnt3 knockdown induces defects equivalent to PCP factor knockdown, but can be rescued by Wnt3-mRNA injection, even at a distance of 200 µm away from the Wnt-positive area. The experimental set-up of these new molecular experiments follows in important aspects those of Freeman's experiments of 1981 (who in turn was motivated to re-examine Teissier's work of 1931/1933 ...). Freeman did not use the term "global polarity" but the concept of an axis-inducing source and a long-range tissue polarity can be traced back to both researchers.

      We appreciate the reviewer’s insightful comments for evolutionary biology and cnidarian developmental biology.

      Concerning the presentation of the basic PCP structure of Clytia embryo epidermal cells, we prefer to retain this section unless there is a strict limit on manuscript length. These experiments provide background information necessary to establish the biological system for the readers. The structures of cells, notably cell adhesion, cilia, and the cytoskeleton, are essential components of this system.

      We have restored sentences concerning the historical contributions of Freeman and Teissier from a previous version of the manuscript.

      Freeman’s work offered two key insights. The first is the concept that cell polarity spreads and self-organizes over the distances revealed by the tissue orientation of aggregate embryonic cells (Freeman, 1981 https://doi.org/10.1007/BF00867804), which was termed “global polarity” in a review by Primus and Freeman (2004 https://doi.org/10.1002/bies.20031). This concept closely resembles the modern understanding of PCP coordination mechanisms mediated by core PCP interactions. Remarkably, Freeman proposed this idea in the early 1980s, at the same time of the first characterization of PCP mutants in Drosophila (Gubb and Garcia-Bellido 1982). The second is the role of egg polarity in defining the axis. Freeman demonstrated that the position of the first cleavage furrow predicts the oral-aboral axis by a series of sophisticated experiments. This was a milestone for the studies of cnidarian body axis development.

      However, some of Freeman’s interpretations were misleading. In the 1981 paper, he stated:

      "Polarity

      Other work that I have done has established that the anterior-posterior axis of the planula is set up at the time of the first cleavage; the site where cleavage is initiated specifies the posterior pole of this axis (Freeman 1980). The experiment reported here in which embryos were cut into halves and each half regulated to form a normal planula with the same polarity properties as the embryo it is from provides evidence that these polarity properties are remarkably stable at all developmental stages tested ranging from 4 cell to postgastrula embryos. "

      Freeman hypothesised that cell polarity at the 2- or 4-cell stage, referred to as the “polarity of first cell cleavage,” is directly inherited as the global polarity observed in later developmental stages.

      In the review by Primus and Freeman (2004), two hypotheses were introduced: (1) maternally localised factors, such as mRNA, determine the axis, and (2) cell polarity of cleavage furrow formation, is inherited to later stages and determines the axis. Freeman described these two hypotheses as mutually exclusive. However, we now know that cell polarity at early cleavage stages does not directly contribute to global polarity/PCP. Instead, Wnt/β-catenin signaling is regionally activated by maternally localised mRNAs distributed along the egg polarity (Momose, 2007; Momose, 2008), which maintain Wnt3 localisation and direct morphological axis patterning. Our study shown in this article unified these hypotheses.

      On the second point, as the reviewer noted, Freeman indeed revisited the work of Georges Teissier (Teissier, 1931), who conducted similar experiments on Amphisbetia embryos. It was Teissier who first described how the egg polarity is preserved in later stages and defines the axis. Teissier, however, carefully avoided asserting continuity between egg and blastula polarities, allowing for the possibility of “rétablissement” (re-establishment). As Teissier stated:

      "…On constate, en second lieu, que la polarité de l’œuf se conserve dans chacun de se fragment et que le maintien ou le rétablissement de cette polarité sont indispensables à un développement normal. Un fragment d’œuf ou de morula n’a aucune partie ni aucun blastomère qui soit rigoureusement déterminé comme endoderme, mais possède, par contre, un pôle antérieur et un pôle postérieur bien définis.…

      Mais cette proposition, qui ne semble pourtant guère dépasser la simple constatation des faits, soulève de grave difficulté. Elle donne en effet à la polarité, propriété encore bien mystérieuse, un rôle morphogénétique de premier ordre et implique des conséquences trop importantes pour qu’on puisse l’accepter sans un très sérieux examen.

      Comme je ne pense pas que les questions relatives à la nature des localisation germinales, à l’existence et au fonctionnement des organisateurs de l’œuf des Cœlentérés, puissant, dans l’état actuel de nos connaissances, être discutées utilement, je ne veux voir dans la proposition précédente qu’une façons commode et tout provisoire de systématiser les faits."

      English translation:

      “We note also that the polarity of the egg is preserved in each fragment and that the maintenance or re-establishment of this polarity is essential for normal development. A fragment of egg or morula has no part or blastomere that is rigorously determined as endoderm, but has, on the other hand, a well-defined anterior and posterior pole....

      But this proposition, which hardly seems to go beyond the simple observation of facts, raises serious difficulties. It gives polarity, still a mysterious property, a morphogenetic role of the first order, and implies consequences too important to be accepted without very serious examination.

      As I do not believe that questions concerning the nature of germinal localisation, or the existence and functioning of the egg organisers in Coelenterates, can, in the present state of our knowledge, be usefully discussed, I prefer only to see in the foregoing proposition a convenient and very provisional way of systematising the facts.”

      Teissier, G. (1931). Étude Expérimentale du Développement de Quelques Hydraires. Ann. Sc. Nat. Zool 14, 5–59.

      Teissier's interpretation and caution were reasonable.

      Our work connects recent molecular research on axis specification mechanisms in cnidarians with the classic experimental studies of Freeman and Teissier. We believe it is essential to present and acknowledge their conceptual contributions.  We have updated the Discussion to include these points.

      (2) PCP propagation and β-catenin. The central but unanswered question in this study focuses on the interaction between Wnt3 and PCP and the propagation of PCP. Wnt3 has been described in cnidarians but also in vertebrates and insects as a canonical Wnt interacting with β-catenin in an autocatalytic loop. The surprising result of this study is that the action of Wnt3 on PCP orientation is not inhibited in the presence of a dominant-negative form of CheTCF (dnTCF) ruling out a potential function of β-catenin in PCP. This was supported by studies with constitutively active β-catenin (CA-β-cat) mRNA which was unable to restore PCP coordination nor elongation of Wnt3-depleted embryos but did restore β-catenin-dependent gastrulation. Based on these data, the authors conclude that Wnt3 has two independent roles: Wnt/β-catenin activation and initial PCP orientation (two-step model for PCP formation). However, the molecular basis for the interaction of Wnt3 with the PCP machinery and how the specificity of Wnt3 for both pathways is regulated at the level of Wnt-receiving cells (Fz-Dsh) remain unresolved. Also, with respect to PCP propagation, there is no answer with respect to the underlying mechanisms. The authors found that PCP components are expressed in the mid-blastula stage, but without any further indication of how the signal might be propagated, e.g., by a wavefront of local cell alignment. Here, it is necessary to address the underlying possible cellular interactions more explicitly.

      The question of how Wnt3 interacts with the core PCP complex remains open for future investigation. An obvious hypothesis is that one of the Frizzled receptors binds Wnt3 ligands. For additional details, please refer to the response to Reviewer 2’s comment. Regarding other non-classic Wnt receptors, studies in the developing mouse limb have demonstrated that a Wnt5a gradient controls PCP polarisation via ROR receptors and graded Strabismus phosphorylation (Gao et al., 2011, https://doi.org/10.1016/j.devcel.2011.01.001). However, in this context, the Wnt5a gradient influences the frequency of polarised cells rather than PCP orientation. In Clytia, we performed gene knockdown experiments targeting ROR and RYK receptors using Morpholinos but did not observe any effect on axial patterning, suggesting that these receptors are unlikely to be involved in Wnt3 interaction.

      Concerning PCP propagation mechanisms, these are well-characterized in both Drosophila and vertebrates and conserved across taxa. The localised Fz-Fmi complex at the apical cortex of a cell interacts with the oppositely localised Stbm-Fmi complex in neighbouring cells, enabling coordination of PCP between directly adjacent cells. This interaction provides a comprehensive explanation for PCP propagation mechanisms. In Drosophila, the “domineering non-autonomy” effect is a well-documented phenomenon where PCP orientation autonomously propagates from core PCP mutant mosaic patches. Overall, PCP propagation is a conserved and robust mechanism across metazoans.

      (3) The proposed two-step model for PCP formation has important evolutionary implications in that it excludes the current alternate model according to which a long-range Wnt3-gradient orients PCP ("Wnt/β-catenin-first"). Nevertheless, the initial PCP orientation by Wnt3 - as proposed in the two-step model - is not explained at all on the molecular level. Another possible, but less well-discussed and studied option for linking Wnt3 with PCP action could be the role of other Wnt pathways. The authors present compelling evidence that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development. The authors convincingly show that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development (Figure S1). However, Wnt7 is also more highly expressed, which makes it a candidate for signal transduction from canonical Wnts to PCP Wnts. An involvement of Wnt7 in PCP regulation has been described in vertebrates (http://dx.doi.org/10.1016/j.celrep.2013.12.026). This would challenge the entire discussion and speculation on the evolutionary implications according to which PCP Wnt signaling comes first (PCP-first scenario") and canonical Wnt signaling later in metazoan evolution.

      First of all, we apologise that the expression profile of Wnt7originally provided in Figure S1 was incorrect; Wnt7 is not expressed in the embryonic stage. The error came from the accession number XLOC_034538 assigned to two transcripts, Wnt7 and Ataxin10, in the published genome assembly. Once the expression profile is revised in this light, the data are consistent with the in situ hybridisation data published in Momose et al. (2012, https://doi.org/10.1242/dev.084251). Wnt3 is the only Wnt ligand detectable between egg and gastrula stages. We appreciate the reviewer highlighting this issue and have corrected Figure S1

      If we understand correctly, the reviewer raises the possibility that Wnt3's downstream canonical Wnt/β-catenin pathway activates the expression of other Wnt genes, which in turn orient the PCP. Indeed, we showed that the expression of Wnt1 (previously called WntX2), Wnt2 (WntX1A), Wnt5 and Wnt6 (Wnt9) all becomes undetectable at the planula stage following Wnt3-MO injection (Momose et al., 2012). So, it is a reasonable concern.

      This possibility can be excluded because the canonical pathway activation by CA-β-cat does not restore PCP in Wnt3-MO-injected embryos and Wnt3 can orient PCP without Wnt/β-catenin pathway activity in the presence of dominant negative TCF (dnTCF). Concerning Wnt1b and Wnt11b, these transcripts are maternally stored and even more abundant than Wnt3. However, we can conclude that these do not have any role in axis patterning based on the complete axis loss in Wnt3-MO morphants.

      Lastly, it should of course be remembered that the chronological order of characters appearing in a developmental process does not necessarily reflect their appearance in evolution from ancestral to modern.

      (4) The discussion, including Figure 6, is strongly biased towards the traditional evolutionary scenario postulating a choanzoan-sponge ancestry of metazoans. Chromosome-linkage data of pre-metazoans and metazoans (Schulz et al., 2023; https://doi.org/10(1038/s41586-023-05936-6) now indicate a radically different scenario according to which ctenophores represent the ancestral form and are sister to sponges, cnidarians and bilaterians (the Ctenophora-sister hypothesis). This has also implications for the evolution of Wnt signalling, as discussed in the recent Nature Genetics Review by Holzem et al. (2024) (https://doi.org/10.1038/s41576-024-00699-w). Furthermore, it calls into question the hypothesis of a filter-feeding multicellular gastrula-like ancestor as proposed by Haeckel (Maegele et al., 2023). These papers have not yet been referenced, but they would provide a more robust discussion.

      I overlooked the excellent work of Holzem and colleagues. I appreciate this suggestion. The work, unfortunately, focusses mainly on the Wnt/β-catenin pathway. The PCP pathway consists of not only core PCP (Fmi Stbm, Pk, Dgo, Fz and Dsh) but many other components, such as Rho GTPase, which are all dealt with as "PCP” in this review. While the full set of core PCP is present only in the phylum Cnidaria and bilaterians, Pk and Dgo are present in choanoflagellate and Rho GTPase or ROCK are present even in Fungi (Lapébie et al,  2011 DOI 10.1002/bies.201100023). Holzem et al., described PCP as absent in ctenophores, likely based on the lack of Fmi/Stbm, while claiming its presence in fungi based on Rho GTPase and ROCK. This led to their argument that the Wnt/β-catenin pathway is more ancestral, supported by the absence of PCP components in ctenophores alongside the ctenophore-sister hypothesis.

      This likely reflects the limited attention given to PCP in the metazoan evolutionary biology community. Our work sheds light on the importance of PCP regulation in metazoan evolution. In the revised Discussion, we emphasise this point together with the importance of cell biology studies in basal metazoans and compare them based on functional studies.

      The observation of Aiptasia’s predatory “gastrula-like” larvae is indeed fascinating. Understanding how early metazoan ancestors obtained nutrients is a key to uncovering the origins of metazoans. However, the relevance of this work to metazoan evolution remains unclear. Predatory nutrient uptake is common among cnidarians, and the findings of Maegele et al. could suggest that the predatory gastrula-like state is ancestral, with the symbiotic state being derived, within Cnidaria, but does not notably support it in metazoa. Also, it has to be clarified how predation is defined. Fundamentally, there is little distinction between filter-feeding and predatory feeding regarding heterotrophy; both feeding types require digestive machinery. If active feeding behaviour is the essence of predation, this would be better addressed as an evolutionary neurobiology or neuroscience question. Another mystery is what the metazoan ancestors took as food if they were predatory; there has to be a non-predatorial metazoan, as a food, already present before them.<br /> Overall, Maegele’s work seems premature to be incorporated into the metazoan evo-devo discussion. In either case, the standard approach would involve comparative studies across taxa. It will be interesting to see follow-up works on comparative and functional genomics of predatory/digestive machinery within phylum cnidaria and across metazoan, including sponge and ctenophores.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s expertise and recommendations regarding Wnt and PCP signalling. It would be our great pleasure if our work is seen and referenced by the cell biology community using model animals.

      (1) According to the 2-step model, one would expect that there is a temporal gradient in the spreading of the PCP from oral to aboral. Is there any indication for this?

      The best indication of a spatial and temporal gradient of PCP establishment observed so far is at the blastula stage (Fig.2B). PCP gradually becomes coordinated starting at 9 hpf, when PCP is slightly better organised close to the Wnt3-positive area (oral) compared to distal (aboral) areas. We did live imaging with tagged Poc1 to track the positions of centrioles in each cell (Fig. 2E), but this did not provide any further information about the spreading of the PCP. We hypothesise that there is a delay between PCP polarisation—established through the subcellular localisation of core PCP components—and its structural manifestation as ciliary positioning and orientation. This delay likely varies between cells, preventing the formation of a precise spatial PCP wave. We hope in the future to address this temporal aspect by live-imaging of core PCP proteins labelled with fluorescent proteins.

      (2) PCP is likely to be an all-or-nothing effect, while axial patterning is dose-dependent. is there a critical dose of Wnt3 level required to kick off the PCP pathway?

      We agree that the PCP phenotype is all-or-nothing.  Although we did not perform a quantitative test, we have not seen any intermediate phenotypes in Wnt3-rescue experiments. In our experimental condition (100 ng/µl mRNA), the Wnt3 mRNA injection into a blastomere consistently restores the body axis (via PCP) of Wnt3-MO injected embryos. No axis restoration was observed at 1 ng/µl. At 10 ng/µl, some embryos showed a restored elongated axis, while others showed no axis. The volume of injection is not precisely controllable and can easily vary two-fold, so we assume the limit is somewhere around 10 ng/µl. This contrasts with endoderm rescue via Wnt/β-catenin activation by GSK-β-inhibitors (alsterpaullone) or the constitutively active β-catenin (CA-β-cat), which occurs in a dose-dependent manner (ex. Supplementary Figure S2).

      (3) The key question left unaddressed is whether Wnt3 signals through one or two different Frizzled receptors? Which Frizzled receptors are candidates for this? Could they be knocked down to see which pathway (or both) is affected?

      How Wnt3 orientates the PCP system is an extremely interesting question that needs to be answered, and we plan to address this in the future. In Clytia, four Frizzled genes have been identified in the genome: CheFz1 (vertebrate counterpart of Fz1, 2, 3, 6 and 7), CheFz2 (Fz5 and 8), CheFz3 (Fz9/10) and CheFz4 (Fz4). Knockdown of CheFz1, hereby called Fz1, by Morpholino showed a PCP phenotype (Momose 2012, supplementary data). For a long time, we have suspected that the most likely candidate for PCP mediation is CheFz1. The Wnt3-rescue experiment in CheFz1-blocked background (similar experiment to Figure 3E, F) could potentially have answered this question. No PCP orientation would be expected even near the Wnt3-mRNA injected area if CheFz1 was the Wnt3 receptor for PCP orientation. Unfortunately, no reliable PCP phenotype was observed in this experiment, so this experiment was not included in the manuscript. We initially thought this was due to incomplete suppression of CheFz1 mRNA translation by the Morpholino when used at sub-toxic doses. But we now favour the alternative explanation that Fz1 does not mediate the Wnt3 signal responsible for initiating PCP orientation. We have previously shown that Fz1 is required for the Wnt/ β-catenin pathway (indicated by nuclear β-catenin localisation Momose 2007), which is then required to maintain Wnt3 expression. We cannot rule out that the PCP phenotype obtained previously following Fz1 knockdown (supplementary data in Momose 2012) is an indirect effect of Wnt3 downregulation.

      In future work, we plan to test the PCP involvement of the other Clytia Frizzleds, notably CheFz2 and CheFz4, which are not present as maternal mRNAs but are zygotically expressed in the early gastrula stage. CheFz3 is unlikely to be a candidate because it is aborally localised and acts as a negative receptor for the Wnt/β-catenin pathway (Momose 2007). Lastly, in unpublished experiments, no axial phenotype was obtained with ROR and RYK knockdown by Morpholino (T. Momose unpublished). 

      Based on these considerations, our current working hypothesis is that Wnt3 somehow stabilises or activates one of the Frizzled receptors acting as a core PCP protein in a polarised manner, likely at the oral side of each cell (Stbm is localised at the aboral side), which breaks the PCP symmetry and is propagated across the body axis.

      A few lines have been added to the discussion regarding this point.

      (4) Is there also PCP within the Wnt3 expressing domain? In other words, (and linked to question 2), does PCP require a certain concentration of Wnt3 or a gradient of Wnt3 in order to provide an orientation?

      In the context of a simple Wn3-MO rescue experiment, PCP is coordinated within the Wnt3-positive area. But this could be because PCP can propagate in both orientations, so it does not answer the question. In the Wnt3-rescue experiments in Fmi-MO and Stbm-MO embryos, PCP seemed better oriented close to the boundary between Wnt3-positive and -negative areas, in particular outside the Wnt3-positive area and rather uncoordinated deep in the middle of Wnt3-RNA positive area. 

      If Wnt3 expression is uniform across an embryo, as achieved by Wnt3-mRNA injection into the egg, the axis will be lost entirely (Momose 2008). We interpret these observations as indicating that Wnt3 expression "contrasts" (or steep gradients) act as the PCP orientation cue rather than a permissive manner.

      In normal development, mRNA expression detected by in situ hybridisation has a slight gradient, but we do not have any information about the endogenous protein distribution.

      We greatly appreciate the reviewer’s insightful comments. A few sentences addressing points (2) and (4) have been added. The graphical models in Figures 4 and 6A have been updated. While these are relatively minor changes to the manuscript, they significantly impact future perspectives.

      Minor comments:

      (1) Labeling in some of the figures is too small and not legible, e.g. Figures 4E-H. Please check and improve.

      Agreed. Some labelling was way too small (2.5 points). This has been corrected. The minimum font size is now 6-point for most labelling in the revised Figures. 

      (2) Page 13: ...and allow us to novel scenarios for PCP-driven axis symmetry breaking... seems to lack the verb "propose"

      Corrected.

    1. eLife Assessment

      This paper provides a compelling and rigorous quantitative analysis of the turnover and maintenance of CD4+ tissue-resident memory T cell clones, in the skin and the lamina propria. It provides a fundamental advance in our understanding of CD4 T cell regulation. Interestingly, in both tissues, maintenance involves an influx from progenitors on the time scale of months. The evidence that is based on fate mapping and mathematical inference is strong, although open questions on the interpretation of the Ki67-based fate mapping remain.

    2. Reviewer #1 (Public review):

      Summary:

      Compelling and clearly described work that combines two elegant cell fate reporter strains with mathematical modelling to describe the kinetics of CD4+ TRM in mice. The aim is to investigate the cell dynamics underlying maintenance of CD4+TRM.

      The main conclusions are that 1) CD4+ TRM are not intrinsically long-lived 2) even clonal half lives are short: 1 month for TRM in skin, even shorter (12 days) for TRM in lamina propria 3) TRM are maintained by self-renewal and circulating precursors.

      Strengths:

      (1) Very clearly and succinctly written. Though in some places too succinctly! See suggestions below for areas I think could benefit from more detail.

      (2) Powerful combination of mouse strains and modelling to address questions that are hard to answer with other approaches.

      (3) The modelling of different modes of recruitment (quiescent, neutral, division linked) is extremely interesting and often neglected (for simpler neutral recruitment).

      Comments on revised version: This reviewer is satisfied with the author responses and the changes made in the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript addresses a fundamental problem of immunology - the persistence mechanisms of tissue-resident memory T cells (TRMs). It introduces a novel quantitative methodology, combining the in vivo tracing of T cell cohorts with rigorous mathematical modeling and inference. Interestingly, the authors show that immigration plays a key role for maintaining CD4+ TRM populations in both skin and lamina propria (LP), with LP TRMs being more dependent on immigration than skin TRMs. This is an original and potentially impactful manuscript.

      Comments on revised version: This reviewer is satisfied with the author responses and the changes made in the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Compelling and clearly described work that combines two elegant cell fate reporter strains with mathematical modelling to describe the kinetics of CD4+ TRM in mice. The aim is to investigate the cell dynamics underlying the maintenance of CD4+TRM.

      The main conclusions are that:

      (1) CD4+ TRM are not intrinsically long-lived.

      (2) Even clonal half-lives are short: 1 month for TRM in skin, and even shorter (12 days) for TRM in lamina propria.

      (3) TRM are maintained by self-renewal and circulating precursors.

      Strengths:

      (1) Very clearly and succinctly written. Though in some places too succinctly! See suggestions below for areas I think could benefit from more detail.

      (2) Powerful combination of mouse strains and modelling to address questions that are hard to answer with other approaches.

      (3) The modelling of different modes of recruitment (quiescent, neutral, division linked) is extremely interesting and often neglected (for simpler neutral recruitment).

      Weaknesses/scope for improvement:

      (1) The authors use the same data set that they later fit for generating their priors. This double use of the same dataset always makes me a bit squeamish as I worry it could lead to an underestimate of errors on the parameters. Could the authors show plots of their priors and posteriors to check that the priors are not overly-influential? Also, how do differences in priors ultimately influence the degree of support a model gets (if at all)? Could differences in priors lead to one model gaining more support than another?

      We now show the priors and posteriors overlaid in Figure S2. The posteriors lie well within the priors, giving us confidence that the priors are not overly influential.

      (2) The authors state (line 81) that cells were "identified as tissue-localised by virtue of their protection from short-term in vivo labelling (Methods; Fig. S1B)". I would like to see more information on this. How short is short term? How long after labelling do cells need to remain unlabelled in order to be designated tissue-localised (presumably label will get to tissue pretty quickly -within hours?). Can the authors provide citations to defend the assumption that all label-negative cells are tissue-localised (no false negatives)?

      And conversely that no label-positive cells can be found in the tissue (no false positives)? I couldn't actually find the relevant section in the methods and Figure S1B didn't contain this information.

      We did describe the in vivo labeling in the first section of Methods (it was for 3 mins before sacrifice). The two aims of Fig S1B were to show the gating strategy (label-positive and negatives from tissue samples were clearly separated) and to address the false-positive issue. Less than 3% of cells in our tissue samples were positive; therefore, at most 3% of truly tissue-resident cells acquired the i.v. label, and likely less. Excluding those (as we did) therefore makes little difference to our analyses in terms of cell numbers. False negative rates are expected to be extremely low; labeling within circulating cells is typically >99% (see refs in Methods).

      (3) Are the target and precursor populations from the same mice? If so is there any way to reflect the between-individual variation in the precursor population (not captured by the simple empirical fit)? I am thinking particularly of the skin and LP CD4+CD69- populations where the fraction of cells that are mTOM+ (and to a lesser extent YFP+) spans virtually the whole range. Would it be nice to capture this information in downstream predictions if possible?

      This is a great point. We do indeed isolate all populations from each mouse. We are very aware of the advantages of using this grouping of information to reduce within-mouse uncertainty – we employ this as often as we can. The issue here was that the label content within the tissue (target) at any time depends on the entire trajectory of the label frequency in the precursor, in that mouse, up to that point. We can’t identify this curve for each animal individually – so we are obliged to use a population average.

      To mitigate this lack of pairing we do take a very conservative approach and fit this empirical function describing the trajectories of YFP and mTom in precursors at the same time as the label kinetics in the target; that is, we account for uncertainty in label influx in our fits and parameter estimates.

      Another issue is that to be sure that we are performing model selection appropriately, we only use the distribution of the likelihood on the target observations when comparing support for different precursors with LOO-IC. If we had been able to pair the precursor and target data in some way, the two would then be entangled and model comparison across precursors would not be possible.

      We’ve added some of this to the discussion.

      (4) In Figure 3, estimates of kinetics for cells in LP appear to be more dependent on the input model (quiescent/neutral/division-linked) than the same parameters in the skin. Can the authors explain intuitively why this is the case?

      This is a nice observation and it has a fairly straightforward explanation. As we pointed out in the paper, estimated rates of self renewal become more sensitive to the mode of recruitment the greater the rate of influx. If immigrants are quiescent, all Ki67 in the tissue has to be explained by self renewal. If all new immigrants are Ki67 high, the estimate of the rate of self renewal within the tissue will be lower. Across the board, the estimated rates of influx into gut were consistently higher than those in skin, and so the sensitivity of parameters to the mode of recruitment was much more obvious at that site.

      The importance of this trade-off for the division linked model can also be seen when you look at the neutral and quiescent models; they give similar parameter estimates because the Ki67 levels within all precursor populations were all less than 25% and so those two modes of recruitment are difficult to distinguish.

      (5) Can the authors include plots of the model fits to data associated with the different strengths of support shown in Figure 4? That is, I would like to know what a difference in the strength of say 0.43 compared with 0.3 looks like in "real terms". I feel strongly that this is important. Are all the fits fantastic, and some marginally better than others? Are they all dreadful and some are just less dreadful? Or are there meaningful differences?

      This is another good point (and from the author recommendations list, is your most important concern).

      We find that a fairly common issue is that models that are clearly distinguished by information criteria or LRTs can often give visually quite similar fits. Our experience is that this is partly due to the fact that models are usually fit on transformed scales (e.g. log for cell counts, logit for fractions) to normalise residuals, and this uncertainty is compressed when one looks at fits on the observed scale (e.g. linear). Another issue in our case is that for each model (precursor, target, and mode of recruitment) we fit 6 time courses simultaneously. Visual comparisons of fits of different models can then be a little difficult or misleading; apparently small differences in each fitted timecourse can add up to quite significant changes in the combined likelihood. We added this to the Discussion.

      The number of models is combinatorial (Fig. 4) so showing them all seems a bit cumbersome. But now in the supporting information (Fig. S3), for each target we show the best, second best, and the worst model fits overlaid, to give a sense of the dynamic range of the models we considered. As you will now see, visual differences among the most strongly supported models were not huge (but refer to our point just above). Measures of out-of-sample prediction error (LOO-IC) discriminated between these models reasonably well, though (weights shown in Fig. 4).

      It’s also worth mentioning here that we have substantially greater confidence in the identity of the precursors than in the precise modes of recruitment - you can see this clearly in the groupings of weights in Figure 4A. We did comment on this in the text but now emphasise it more.

      (6) Figure 4 left me unclear about exactly which combinations of precursors and targets were considered. Figure 3 implies there are 5 precursors but in Figure 4A at most 4 are considered. Also, Figure 4B suggests skin CD69- were considered a target. This doesn't seem to be specified anywhere.

      Thanks for pointing this out. When we were considering CD4+ EM in bulk as target, this population includes CD69- cells; in those fits, therefore, we couldn't use CD69- as a precursor. We now clarify this in the caption. Thanks also for the observation about Figure 4B; we didn’t consider CD69- cells as a target, so we’ve also made that clearer.

      Reviewer #2 (Public review):

      This manuscript addresses a fundamental problem of immunology - the persistence mechanisms of tissue-resident memory T cells (TRMs). It introduces a novel quantitative methodology, combining the in vivo tracing of T-cell cohorts with rigorous mathematical modeling and inference. Interestingly, the authors show that immigration plays a key role in maintaining CD4+ TRM populations in both skin and lamina propria (LP), with LP TRMs being more dependent on immigration than skin TRMs. This is an original and potentially impactful manuscript. However, several aspects were not clear and would benefit from being explained better or worked out in more detail.

      (1) The key observations are as follows:

      a) When heritably labeling cells due to CD4 expression, CD4+ TRM labeling frequency declines with time. This implies that CD4+ TRMs are ultimately replenished from a source not labeled, hence not expressing CD4. Most likely, this would be DN thymocytes.

      That’s correct.

      b) After labeling by Ki67 expression, labeled CD4+ TRMs also decline - This is what Figure 1B suggests. Hence they would be replaced by a source that was not in the cell cycle at the time of labeling. However, is this really borne out by the experimental data (Figure 2C, middle row)? Please clarify.

      (2) For potential source populations (Figure 2D): Please discuss these data critically. For example, CD4+ CD69- cells in skin and LP start with a much lower initial labeling frequency than the respective TRM populations. Could the former then be precursors of the latter?

      A similar question applies to LN YFP+ cells. Moreover, is the increase in YFP labeling in naïve T cells a result of their production from proliferative thymocytes? How well does the quantitative interpretation of YFP labeling kinetics in a target population work when populations upstream show opposite trends (e.g., naïve T cells increasing in YFP+ frequency but memory cells in effect decreasing, as, at the time of labeling, non-activated = non-proliferative T cells (and hence YFP-) might later become activated and contribute to memory)?

      These are good (and related) points. We've added some text to the discussion, paragraphs 2 and 3; we reproduce it here, slightly expanded.

      Fig 1B was a schematic but did faithfully reflect the impact of any waning of YFP in precursor on its kinetic in the targets. However, in our experiments, as you noted, the kinetics of YFP in most of the precursor populations were quite flat. This was due in part to memory subsets being sustained by the increasing levels of YFP within naïve cells from the cohort of thymocytes labeled during treatment. There is also likely some residual permanent labeling of lymphocyte progenitor populations. We discussed this in Lukas Front Imm 2023. (The latter is not a problem; all that matters for our analysis is that we generate a reasonable empirical description of the label kinetics in naive cells, however it arises). YFP is therefore not cleanly washed out in the periphery; and so for models with circulating memory as the tissue precursor, the flatness of their YFP curves leads to rather flat curves in the tissues.

      The mTom labelling was more informative as it was clearly diluted out of all peripheral populations by mTom-negative descendants of thymically-derived cells, as you point out in (a).

      Regarding (2), re: interpreting the initial levels of labels in precursors and targets. The important point here is that YFP and mTom were induced quickly in all populations we studied; therefore our inferences regarding precursors and targets aren’t informed by the initial levels of levels in each. (Imagine a slow precursor feeding a rapidly dividing target; YFP levels in the former would start lower than those in the latter). The causal issue that we think you’re referring to would matter if one expects the targets to begin with no label at all; for instance, in our busulfan chimeric mouse model (e.g. Hogan PNAS 2015) new, thymically derived ‘labelled’ (donor) cells progressively infiltrate replete ‘unlabelled’ (host) populations. In that case, one can immediately reject certain differentiation pathways by looking the sequence of accrual of donor cells in different subsets.

      The trends in YFP and mTom frequencies after treatment do matter for pathway inference, though, because precursor kinetics must leave an imprint on the target. For the case you mentioned, with opposite trends in label kinetics, such models would unlikely to be supported strongly; indeed, we never saw strong support for naïve cells (strongly increasing YFP) as a direct precursor of TRM (fairly flat).

      We’ve added a condensed version of this to the Discussion.

      (3) Please add a measure of variation (e.g., suitable credible intervals) to the "best fits" (solid lines in Figure 2).

      Added.

      (4) Could the authors better explain the motivation for basing their model comparisons on the Leave-OneOut (LOO) cross-validation method? Why not use Bayesian evidence instead?

      Bayes factors are very sensitive to priors and are either computationally unstable if calculated with importance sampling methods, or very expensive to calculate, if ones uses the more stable bridge sampling method. (We also note that fitting just a single model here takes a substantial amount of time). Further, using BF can be unreliable unless one of the models is close to the 'true' data generating model; though they seem to work well, we can be sure that none of our models are! For us, a more tractable and real-world selection criterion is based on the usefulness of a model, for which predictive performance is a reasonable proxy. In this case the mean out-of-sample prediction error (which LOO-IC reflects) is a wellestablished and valid means of ascribing support to different models.

    1. eLife Assessment

      This important study addresses an essential morphogenetic process-epithelial fusion-by identifying the transcription factor Hamlet as a potential master regulator. Using a combination of genetic, cell biological, and omics approaches, including a comprehensive RNAi screen and high-quality imaging, the authors provide compelling evidence for Hamlet's role in coordinating cell fate and differentiation. The findings are robust and of broad interest to developmental biologists and geneticists.

    2. Reviewer #1 (Public review):

      Summary:

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc that is necessary for development of a continuous testes and seminal vesicle tissue essential for fertility. To do so, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility.

      Strengths:

      This is a rigorous and comprehensive study that identifies the Hamlet dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      Strengths:

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) Specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream-pathways they characterized - which are Wnt2 and Toll.

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well-understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level.

      Weaknesses:

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects epithelial fusion, as myotubes might create the first "bridge" bringing the two epithelia together. Nevertheless, this is outside the scope of this work and could be addressed in the future.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. eLife Assessment

      This important study reports on a basis for neurabin-mediated specification of substrate choice by protein phosphatase-1. The data from the comprehensive approach using structural, biochemical, and computational methods are compelling. This paper is broadly relevant to those investigating various cellular signaling cascades that entail phosphorylation as the main mechanism.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript the Treisman and colleagues address the question of how protein phosphatase 1 (PP1) regulatory subunits (or PP1-interacting protein (PIPs)) confer specificity on the PP1 catalytic subunit which by itself possesses little substrate specificity. In prior work the authors showed that the PIP Phactrs confers specificity by remodelling a hydrophobic groove immediately adjacent to the PP1 catalytic site through residues within the RVxF- ø ø -R-W string of Phactrs. Specifically, the residues proximal and including the 'W' of the RVxF- ø ø -R-W string remodel the hydrophobic groove. Other residues the of the RVxF- ø ø -R-W string (i.e. the RVxF- ø ø -R) are not involved in this remodelling.

      The authors suggest that the RVxF- ø ø -R-W string is a conserved feature of many PIPs including PNUTS, Neurabin/spinophilin and R15A. However from a sequence and structural perspective only the RVxF- ø ø -R- is conserved. The W is not conserved in most and in the R15A structure (PDB:7NZM) the Trp side chain points away from the hydrophobic channel - this could be a questionable interpretation due to model building into the low resolution cryo-EM map (4 A).

      In this paper the authors convincingly show that Neurabin confers substrate specificity through interactions of its PDZ domain with the PDZ domain-binding motif (PBM) of 4E-BP. They show the PBM motif is required for Neurabin to increase PP1 activity towards 4E-BP and a synthetic peptide modelled on 4E-BP and also a synthetic peptide based on IRSp53 with a PBM added. The PBM of 4E-BP1 confers high affinity binding to the Neurabin PDZ domain. A crystal structure of a PP1-4E-BP1 fusion with Neurabin shows that the PBM of 4E-BP interacts with the PDZ domain of Neurabin. No interactions of 4E-BP and the catalytic site of PP1 are observed. Cell biology work showed that Neurabin-PP1 regulates the TOR signalling pathway by dephosphorylating 4E-BPs.

      Strengths:

      This work demonstrates convincingly using a variety of cell biology, proteomics, biophysics and structural biology that the PP1 interacting protein Neurabin confers specificity on PP1 through an interaction of its PDZ domain with a PDZ-binding motif of 4E-BP1 proteins. Remodelling of the hydrophobic groove of the PP1 catalytic subunit is not involved in Neurabin-dependent substrate specificity, in contrast to how Phactrs confers specificity on PP1. The active site of the Neurabin/PP1 complex does not recognise residues in the vicinity of the phospho-residue, thus allowing for multiple phospho-sites on 4E-BP to be dephosphorylated by Neurabin/PP1. This contrasts with substrate specificity conferred by the Phactrs PIP that confers specificity of Phactrs/PP1 towards its substrates in a sequence-specific context by remodelling the hydrophobic groove immediately adjacent to the catalytic. The structural and biochemical insights are used to explore the role of Neurabin/PP1 in dephosphorylation 4E-BPs in vivo, showing that Neurabin/PP1 regulates the TOR signalling pathway, specifically mTORC1-dependent translational control.

      Weaknesses:

      The only weakness is the suggestion that a conserved RVxF- ø ø -R-W string exists in PIPs. The 'W' is not conserved in sequence and 3-dimensions in most of the PIPs discussed in this manuscript. The lack of conservation of the W would be consistent with the finding based on multiple PP1-PIP structures that apart from Phactrs, no other PIP appears to remodel the PP1 hydrophobic channel.

      Comments on revisions:

      The authors have addressed my comments.

      One aspect of the manuscript and response to reviewers is misleading regarding the statement: 'Like many PIPs, they interact with PP1 using the previously defined "RVxF", "ΦΦ", and "R" motifs (Choy et al, 2014).' This statement, and similar in the authors' response, implies that Choy et al discovered the "RVxF" and "ΦΦ" motifs. The Choy et al, 2014 paper reports the discovery of the "R" motif. The "RVxF" and "ΦΦ" motifs were discovered and reported in earlier papers not cited in the authors' manuscript. Perhaps the authors can correct this.

    3. Reviewer #2 (Public review):

      This manuscript explores the molecular mechanisms that are involved in substrate recognition by the PP1 phosphatase. The authors previously showed that the PP1 interacting protein (PPI), PhactrI, conferred substrate specificity by remodelling the PP1 hydrophobic substrate groove. In this work, the authors aimed to understand the key determinant of how other PIPs, Neurabin and Spinophilin, mediate substrate recognition.

      The authors generated a few PP1-PIP fusion constructs, undertook TMT phosphoproteomics and validated their method using PP1-Phactr1/2/3/4 fusion constructs. Using this method, the authors identified phsophorylation sites controlled by PP1-Neurabin and focussed their work on 4E-BP1, thereby linking PP1-Neurabin to mTORC1 signalling. Upon validating that PP1-Neurabin dephosphorylates 4E-BP1, they determined that 4E-BP1 PBM binds to the PDZ domain of Neurabin with an affinity that was greater than 30 fold as compared to other substrates. PP1-Neurabin dephosphorylated 4E-BP1WT and IRSp53WT with a catalytic efficiency much greater than PP1 alone. However, PP1-Neurabin bound to 4E-BP1 and IRSp53 mutants lacking the Neurabin PDZ domain with a catalytic efficiency lesser than that observed with 4E-BP1WT. These results indicate the involvement of the PDZ domain in facilitating substrate recruitment by PP1-Neurabin. Interestingly, PP1-Phactr1 dephosphorylation of 4E-BP1 phenocopies PP1 alone, while PP1-Phactr1 dephosphorylates IRSp53 to a much higher extent than PP1 alone. These results highlights the importance of the PDZ domain and also shed light on how different PP1-PIP holoenzymes mediate substrate recognition using distinct mechanisms. The authors also show that the remodelling of the hydrophobic PP1 substrate groove which is essential for substrate recognition by PP1-Phactr1, was not required by PP1-Neurabin. Additionally, the authors also resolved the structure of a PP1-4E-BP1 fusion with the PDZ-containing C-terminal of Neurabin and observed that the Neurabin/PP1-4E-BP1 complex structure was oriented at 21{degree sign} to that in the unliganded Spinophilin/PP1 complex (resolved by Ragusa et al., 2010) owing to a slight bend in the C-terminal section that connects it to the RVxF-ΦΦ-R-W string. Since, no interaction was observed with the remodelled PP1-Neurabin hydrophobic groove, the authors utilised AlphaFold3 to further answer this. They observed a high confidence of interaction between the groove and phosphorylated substrate and a low confidence of interaction between the groove and unphosphorylated substrate, thereby suggesting that the hydrophobic groove remodelling is not involved in PP1-Neurabin recognition and dephosphorylation of 4E-BP1.

      In this work, the authors provide novel insights into how Neurabin depends on the interaction between its PDZ domain and PBM domains of potential substrates to mediate its recruitment by PP1. Additionally, they uncover a novel PP1-Neurabin substrate, 4E-BP1. They systematically employ phosphoproteomics, biochemical and structural methods to investigate substrate specifity in a robust fashion. Furthermore, the authors also compares the interactions between PP1-Neurabin to 4E-BP1 and IRSp53 (PP1-Phactr1 substrate) with PP1-Phactr1, to showcase the specificity of the mode of action employed by these complexes in mediating substrate specificity. The authors do employ an innovative PP1-PIP fusion strategy previously explored by Oberoi et al., 2016 and the authors themselves in Fedoryshchak et al., 2020. This method, allows for a more controlled investigation of the interactions between PP1-PIPs and its substrates. Furthermore, the authors have substantially characterised the importance of the PDZ domain using their fusion constructs, however, I believe that a further exploration into either structural or AlphaFold3 modelling of PBM domain substrate mutants, or a Neurabin PDZ-domain mutant might further strengthen this claim. Overall, the paper makes a substantial contribution to understanding substrate recognition and specificity in PP1-PIP complexes. The study's innovative methods, biological relevance, and mechanistic insights are strengths, but whether this mechanism occurs in a physiological context is unclear.

    4. Reviewer #3 (Public review):

      Protein Phosphatase 1 (PP1), a vital member of the PPP superfamily, drives most cellular serine/threonine dephosphorylation. Despite PP1's low intrinsic sequence preference, its substrate specificity is finely tuned by over 200 PP1-interacting proteins (PIPs), which employ short linear motifs (SLIMs) to bind specific PP1 surface regions. By targeting PP1 to cellular sites, modifying substrate grooves, or altering surface electrostatics, PIPs influence substrate specificity. Although many PIP-PP1-substrate interactions remain uncharacterized, the Phactr family of PIPs uniquely imposes sequence specificity at dephosphorylation sites through a conserved "RVxF-ΦΦ-R-W" motif. In Phactr1-PP1, this motif forms a hydrophobic pocket that favors substrates with hydrophobic residues at +4/+5 in acidic contexts (the "LLD motif"), a specificity that endures even in PP1-Phactr1 fusions. Neurabin/Spinophilin remodel PP1's hydrophobic groove in distinct ways, creating unique holoenzyme surfaces, though the impact on substrate specificity remains underexplored. This study investigates Neurabin/Spinophilin specificity via PDZ domain-driven interactions, showing that Neurabin/PP1 specificity is governed more by PDZ domain interactions than by substrate sequence, unlike Phactr1/PP1.

      A significant strength of this work is the use of PP1-PIP fusion proteins to effectively model intact PP1•PIP holoenzymes by replicating the interactions that remodel the PP1 interface and confer site-specific substrate specificity. When combined with proteomic analyses to assess phospho-site depletion in mammalian cells, these fusions offer critical insights into holoenzyme specificity, revealing new candidate substrates for Neurabin and Spinophilin. The studies present compelling evidence that the PDZ domain of PP1-Neurabin directs its specificity, with the remodeled PP1 hydrophobic groove interactions having minimal impact. This mechanism is supported by structural analysis of the PP1-4E-BP1 substrate fusion bound to a Neurabin construct, highlighting the 4E-BP1/PDZ interaction. This work delivers crucial insights into PP1-PIP holoenzyme function, combining biochemical, proteomic, and structural approaches. It validates the PP1-PIP fusion protein model as a powerful tool, suggesting it may extend to studying additional holoenzymes. While an extremely useful model, it must be considered unlikely the PP1-PIP fusions fully recapitulate the specificity and regulation of the holoenzyme.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to the public reviews:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the “RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments and suggestions for revisions

      (1) The authors do not provide strong evidence that the interactions of the 'W' of the RVxF- øø -R-W string with the hydrophobic groove of PP1 is conserved in PIPs. Whereas the RVxF motif is well conserved and validated since its discovery in 1997, as are the øø - (an extension of the RVxF motif), and the 'R', the conservation of the Trp residue in the RVxF-øø-R-W string is not conserved.

      We did not mean to imply that the W motif is conserved amongst all PIPs.

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs). Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through a conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed that the PPP1R15A/B, Neurabin/Spinophilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs share a trajectory across the PP1 surface that encompasses not only the RVxF-ΦΦ-R SLIMs, but also additional sequences C-terminal to the R SLIM (Chen et al, eLife, 2015). This trajectory is also shared by the Phactr1-PP1 complex (Fedoryshchak et al, eLife, 2020). Based on this structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134 (See Fedoryshchak et al, 2020, Figure 1 figure supplement 2).

      Introduction, paragraph 2 is rewritten to make this clearer.

      The sequence and positions of W differ in amino acid type and position relative to the RVxF-øø-R string.

      The motif ‘W’ does not mandate tryptophan, it is our name for a common structurally aligned motif: although the Phactrs and PPP1R15A/B indeed have W at this position, Neurabin and spinophilin contain VDP, which nevertheless makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In the Discussion the authors state that the hydrophobic groove of PP1 is remodelled by Neurabin. However, details of this are not described or shown in the manuscript.

      The shared trajectory determined by the RVxF-øø-R-W string brings the sequences C-terminal to the W SLIM into the vicinity of the PP1 hydrophobic groove. In the Phactr1/PP1 holoenzyme this generates a novel pocket required for substrate recognition (Fedoryshchak et al, 2020). These observations raised the possibility that sequences C-terminal to the “W” motif in the other RVxF-øø-R-W PIPs also play a role in substrate recognition.

      Introduction paragraph 3 now cites a new Figure 1-S2, which shows how the hydrophobic groove is remodelled in the various different PIP/PP1 complexes. A revised Figure 1A now indicates the hydrophobic residues defining the hydrophobic groove by grey shading.

      (2) To add to the confidence of the structure, the authors should include a 2Fo-Fc simulated annealing omit map, perhaps showing the R and W interactions of the RVxF-øø-R-W string.

      This is now included as new Figure 6 Figure supplement 1. Note that in Neurabin, the W motif is VDP, where the valine and proline sidechains interact similarly to the tryptophan (see also new Figure 1-S2G,H).

      We also add a new supplementary Figure 6-S1 comparing our PBM-liganded Neurabin PDZ domain with the previously published unliganded structure (Ragusa et al 2010).

      (3) Page 16. The authors state that spinophilin remodels the PP1 hydrophobic groove differently from Phactrs. Arguably spinophilin does not remodel the PP1 hydrophobic groove at all. There are no contacts between spinophilin and the PP1 hydrophobic groove in the spinophilin-PP1 structure, correlating with the absence of 'W" in the RVxF-øø-R-W string in spinophilin.

      The VDP sequence corresponding to the W motif in spinophilin and neurabin makes analogous contacts to those made by the W in Phactr1 (see Fedoryshchak et al 2020).

      Remodelling is meant in the sense of altering the structure of the major groove by bringing new sequences into its vicinity rather than necessarily directly interacting with it. The spinophilin/PP1 and Phactr/PP1 hydrophobic grooves are compared in new Figure 1-S2 (see also Fedoryshchak et al 2020, Figure 2 figure supplement 1)

      (4) Page 8. For the cell-based/proteomics-dephosphorylation assay in Figure 2, it isn't clear why there were no dephosphorylation sites detected for the PPP1R15A/B-PP1 fusion (except PPP6R1 S531 for PPP1R15B). One might have expected a correlation with PP1 alone. Does this imply that PPP1R15A/B are inhibiting PP1 catalytic activity? Was the activity tested in vitro?

      The R15A/B data are compared to average abundance of all the phosphosites in the dataset, including those of PP1.

      We have not tested for a general inhibitory effect of R15A/B on PP1 activity. Many PIPs including R15A/B do occlude one or more of the PP1 substrate groove and therefore generally act as inhibitors of PP1 activity against some potential substrates, while enhancing activities against others.

      Other points 

      (4) Figure S1: Colour sequence similarities/identities.

      Done

      (6) Figures: Structure figures lacked labels:

      Figure 1A, label PP1, Phactrs etc.

      Done

      Figure 6, label PP1, Neurabin, previous Neurabin structure (Fig. 6C), hydrophobic groove, PDZ domain, etc.

      Done

      (7) Statistical analysis. p values should be shown for data in:

      Figure 5.

      To avoid cluttering the Figure, a new sheet, “statistical significance” has been added to Supplementary Table 3, summarizing the analysis.

      Figure 1.

      Figure amended (now figure 1-S1).

      (8) Some inconsistency with labels, eg '34-WT' used in Fig. 5C, whereas '34A-WT' (better) in Methods.

      Now changed to 34A etc where used.

      (9) Page 6. PPP1R9A/B is not shown in Figure 1A and Figure S1A.

      PPP1R9A/B are Neurabin and spinophilin - now clarified in Introduction paragraph 2, Results paragraph 1, Discussion paragraph 1.

      (10) Page 7: lines 4, 'site' not 'side'.

      Done

      (11) Page 9: DTL and CAMSAP3 were found to be dephosphorylated in the PP1-Neurabin/spinophilin screen. Are these PDZ-binding proteins?

      Neither DTL nor CAMSAP3 contain C-terminal hydrophobic residues characteristic of classical PBMs. Sentence added in Discussion, paragraph 5

      (12) Page 12 and Figure 5 and S5: The synthetic p4E-BP1 and IRSp53WT peptides with PBM should be given more specific names to indicate the presence of the PBM.

      We have renamed 4E-BP1<sup>WT</sup> and IRSp53<sup>WT</sup> to 4E-BP1<sup>PBM</sup> and  IRSp53<sup>PBM</sup> respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides.

      Text, Figure 5, and Figure S5 all revised accordingly.

      (13) Give PDB code for spinophilin-PP1 complex coordinates shown in Figure 6C.

      PDB codes for the various PIP/PP1 complexes now given in new Figure 1-S2 and revised Figure 6C.

      Reviewer #2 (Recommendations for the authors):

      The work undertaken by the authors is extensive and robust, however, I believe that some improvement in the writing and some detailed explanation of certain results sections would help with the presentation of the work and clarity for the readers.

      (1) The introduction should contain more information about the interaction between PP1 and Neurabin, given that this is the focus of the paper. This would give the reader the necessary background required to follow the paper.

      Introduction paragraph 2 revised to describe the different SLIMs in more detail. New Figure 1-S2 shows detail of the different remodelled hydrophobic grooves in the various PIP/PP1 complexes.

      (2) More information on PP1-IRSp53L460A has to be added before discussing results in S1B.

      Sentence explaining that IRSp53 L460 docks with the remodelled PP1 hydrophobic groove in the Phactr1/PP1 holoenzyme added in Results paragraph 2.

      (3) Page 6: "as expected, the +5 residue L460A mutation, which impairs dephosphorylation by the intact Phactr1/PP1 holoenzyme, impaired sensitivity to all the fusions, indicating that they recognise phosphorylated IRSp53 in a similar way (Figure S1B)". Statistics between IRSp53 and IRSp53L460A across PP1-PIPs need to be conducted before concluding the above. From the graph and the images, the impairment to dephosphorylation is not convincing.

      For each of the four PP1-Phactr fusions, the IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide (p<0.05 for each fusion).

      Since the proteomics studes in Figure 2 show that the substrate specificity of the four PP1-Phactr1 fusions is virtually identical, we combined the data for the four different fusions. The IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide in this analysis (p< 0.0001). This result shown in revised Figure S1B and legend.

      (4) mCherry-4E-BP1(118+A), in which an additional C-terminal alanine should still allow TOSmediated phosphorylation, but prevent PDZ interaction. Does 4EBP1 (118+A) actually prevent interaction between PP1-Neurabin? This interaction needs to be validated, especially since spinophilin was shown to bind to multiple regions of PP1.

      It is not clear what the referee is asking for here. The biochemical analysis in Figure 4C shows that the C-terminus of 4E-BP1 constitutes a classical PBM. The X-ray crystallography in Figure 6 confirms this, demonstrating H-bond interactions between the 4E-BP1 C-terminal carboxylate and main chain amides of L514, G515 and I516.

      We consider the possibility that the 4E-BP1(118+A) mutant inhibits the activity of PP1-neurabin via a mechanism other than direct blocking 4E-BP1 / PDZ interaction to be unlikely for the following reasons:

      (1) Addition of a C-terminal alanine will disrupt the PBM interaction because the extra residue sterically blocks access to the PBM-binding groove. This is the most parsimonious explanation, and is based on our solid structural and biochemical evidence that the 4E-BP1 C-terminus is a classical PBM.

      (2) Alphafold3 modelling predicts Neurabin PDZ / 4E-BP1 PBM interaction with high confidence (shown in Figure 6-S2E), but it does not predict any PDZ interaction with 4E-BP1(118+A). Note added in Figure 6-S2 legend.

      (3) Recognition of the 4E-BP1(118+A) mutation without loss of binding affinity would require that the mutant becapable of binding formally equivalent to recognition of an “internal” PDZ-binding peptide. Recognition of such “internal peptides” is dependent on their adopting a specifically constrained conformation, which typically requires reorganisation of the PDZ carboxylate-binding GLGF loop. Such “internal site” recognition typically involves more than one residue C-terminal to the conventional PDZ “0” position (see Penkert et al NSMB 2004, doi:10.1038/nsmb839; Gee et al JBC 1998, DOI: 10.1074/jbc.273.34.21980; Hillier et al 1999, Science PMID: 10221915).

      (5) It is nice to see that the various PP1-Phactr fusions have around 60% substrate overlap between them. Would it be possible to compare these results with previously published mass spec data of Phactr1XXX from the group? There is mention of some substrates being picked up, but a comparison much like in Figure 2E would be more informative about the extent to which the described method captures relevant information.

      This is difficult to do directly as the PP1-Phactr fusion data are from human cells while that in Fedoryshchak et al 2020 is from mouse.

      However, manual curation shows that of the 28 top hits seen in our previous analysis of Phactr1XXX in NIH3T3 cells, 18 were also detectable in the HEK293 system; of these, 13 were also detected as as PP1-Phactr fusion hits. Data summarised in new Figure 2-S1C. Text amended in Results, “Proteomic analysis...”, paragraph 2.

      (6) Figure 3D Why are the levels of pT70, pT37/46 and total protein in vector controls much lower as compared to 0nM Tet in PP1-Neurabin conditions? It is also weird that given total protein is so low, why are the pS65/101 levels high compared to the rest?

      We think it likely these phenomena reflect a low level expression of PP1-Neurabin expression in uninduced cells. Now noted in Figure 3D legend, basal PP1-Neurabin expression shown in new Figure 3-S1C. This alters the relative levels of the different species detected by the total 4E-BP1 antibody in favour of the faster migrating forms, which are less phosphorylated than the slower ones, and the total amount increases about 2-fold (Figure 3D, compare 0nM Tet lanes).

      The altered p65/101-pT70 ratio is also likely to reflect the leaky PP1-Neurabin expression, since the relative intensities of the various phosphorylated species are dependent on both the relative rates of phosphorylation and dephosphorylation. Expression of a phosphatase would therefore be expected to differentially affect the phosphorlyation levels of different sites according to their reactivity.

      (7) Figure 3E: Does inhibiting mTORC further reduce translation when PP1-Neurabin is expressed? If this is the case, this might suggest that they might not necessarily be mTORC inhibitors?

      We have not done this experiment. Since Rapamycin cannot be guaranteed to completely block 4E-BP1 phosphorylation, and PP1-Neurabin cannot be guaranteed to completely dephosphorylate 4E-BP1, any further reduction upon their combination would be hard to interpret.

      (8) Substrate interactions with the remodelled PP1 hydrophobic groove do not affect PP1-Neurabin specificity. Is there evidence that PP1-Neurabin remodels the hydrophobic groove? Is it not possible that Neurabin does not remodel the PP1 groove to begin with and hence there is no effect observed with the various mutants? If this is not the case, it should be explained in a bit more detail.

      Comparison of the Neurabin/PP1 and Phactr1/PP1 structures shows that the hydrophobic groove is remodelled differently in the two complexes. Now shown in new Figure 1-S2B,C,G.

      (9) Figure 5B has a lot of interesting information, which I believe has not been discussed at all in the results section.

      To help interpretation of the enzymology in Figure 5 we have renamed 4E-BP1WT and IRSp53WT to 4E-BP1PBM and IRSp53PBM respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides. Text in Results, “PDZ domain interaction…”, paragraph 1, and Figures 5 and S5 revised accordingly.

      Why does the 4E-BP1Mut affect catalytic efficiency of PP1 alone when compared with WT, while no difference is observed with IRSp53WT and mutant?

      We do not understand the basis for the differential reactivity of 4E-BP1PBM and 4E-BP1MUT with PP1 alone; we suspect that it reflects the hydrophobicity change resulting from the MDI -> SGS substitution. However this is unlikely to be biologically significant as PP1 is sequestered in PIP-PP1 complexes.

      Importantly, the two PP1 fusion proteins behave consistently in this assay – the presence of the intact PBM increases reactivity with PP1-Neurabin, but has no effect on dephosphorylation by PP1-Phactr1.

      Why does PP1 alone not have a difference between IRSp53WT and mutant, while PP1-Neurabin does have a difference?

      This is due to the presence of the PBM in IRSp53WT (now renamed IRSp53PBM), which affects increases affinity for PP1 Neurabin, but not PP1 alone. Likewise, PP1-Phactr1, which does not possess a PDZ domain, is also unaffected by the integrity of the PBM.

      (7) “Strikingly, alanine substitutions at +1 and +2 in 4E-BP1WT increased catalytic efficiency by both fusions, perhaps reflecting changes at the catalytic site itself (Figure 5E, Figure S5E)”. This could be expanded upon, because this suggests a mechanism that makes the substrate refractory to PDZ/hydrophobic groove remodelling?

      We favour the idea that this reflects a requirement to balance dephosphorylation rates between the multiple 4E-BP1 phosphorylation sites, especially if multiple rounds of dephosphorylation occur for each PBM—PDZ interaction. Additional sentences added in Discussion paragraph 7.

      (8) Typographical errors and minor comments:

      a) PIPs can target PP1 to specific subcellular locations, and control substrate specificity through autonomous substrate-binding domains, occupation or extension of the substrate grooves, or modification of PP1 surface electrostatics.

      b) Phosphophorylation side site abundances within triplicate samples from the same cell line were comparable between replicates (Figure 2B).

      c) While the alanine substitutions had little effect, conversion of +4 to +6 to the IRSp534E-BP1 sequence LLD increased catalytic efficiency some 20-fold (Figure 5C, Figure S5C). 

      d) Figure 3E labels are not clear. The graph can be widened to make the labels of the conditions clearer.

      All corrected

      Reviewer #3 (Recommendations for the authors):

      This was a very well-written manuscript.

      However, I was looking for a summary mechanistic figure or cartoon to help me navigate the results.

      I noted a few typos in the text.

      New summary Figure 5-S2 added, cited in results, and discussed in Discussion paragraph 6,7.

    1. eLife assessment

      This study offers a useful discussion of the well-accepted abundance-occupancy relationship in macroecology. While using the ebird large dataset to revisit the theme is interesting, multiple unresolved confounding factors exist, leaving the results inadequate to overturn the repeatedly confirmed abundancy-occupancy relationship.

    2. Reviewer #1 (Public review):

      Summary:

      This article presents an analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      Weaknesses:

      The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      In the revision, the authors have refined their findings to birds and provided additional clarifications and discussion. However, the primary concerns raised by reviewers remain inadequately addressed. My main concern continues to be whether testing AOR at a global scale is meaningful given the numerous confounding factors involved. With the current data and analytical approach, these confounders appear inseparable. The study would be significantly strengthened if the authors identified specific conditions under which AORs are valid.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article presents a meta-analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      We thank the Reviewer for their positive comments.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      We thank the Reviewer for their positive comments.

      Weaknesses:

      While the dataset employed in this research holds promise, a rigorous justification of the core assumptions underpinning the analytical framework is inadequate. The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      We thank the Reviewer for these comments. We agree that more justification and transparency is needed of the core assumptions that form the foundation of our methods. In our revised version, we have taken the following steps to achieve this:

      - Altered the title to be more explicit about the core assumptions, which now reads: “Local-scale relative abundance is decoupled from global range size”

      - We have added more details on why and how we treat global range size as a measure of ‘occupancy.’

      - We have added a section that discusses the limitations of using eBird relative abundance

      Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      We thank the Reviewer for these comments.

      Strengths:

      The study raises a dormant question, with a large dataset.

      We thank the Reviewer for these comments. We intended to take a longstanding question and attempt to apply novel datasets that were not available mere decades ago. While we do not imply that we have ‘solved’ the question, we hope this work highlights the potential for further interrogation using these large datasets.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      We agree that there is a heterogeneous collection of data across many habitats, taxa, and observations. However, rather than as a weakness, we see this as a significant strength. Our work assumes we are averaging over this variability to assess for a large-scale pattern in the relationship - something that was potentially a limitation of previous work, as these large datasets were often focused on particular contexts (e.g., much work focused solely on the UK), which we believe could limit some of the generalizability of the previous work. However, the reviewer makes a fair point in regard to the heterogeneity of data collection. We have now added some text in the discussion which is explicit about this - see the new section named “Potential limitations of current work and future work –-although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, synthesizing observations of potentially heterogeneous locations, context and quality”.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      See comment below.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      See comment below.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      See comment below.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      See comment below. We also added a sentence in the methods that highlighted we did not remove alien ranges and provided reasons why. Still, we do acknowledge the dramatic changes in populations and environments over the past 50 years (see the new section  “Potential limitations of current work and futur work”)

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      We agree that all of the points above are interesting data explorations. As said above, our main purpose was to highlight the potential for further interrogation using these large datasets. However, we have added some additional text in the discussion that explicitly mentions/encourages these additional data explorations. We hope people will pick up on the potential for these data and explore them further.

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.

      We agree that if we were interested in the absolute abundance of a given species, the local number on an eBird checklist would be a poor representation. However, our study aims not to estimate absolute abundance but to examine relative abundance among species on each checklist. By focusing on relative abundance, we leverage eBird data's strengths in detecting the presence and frequency of species across diverse locations and times, thereby capturing community composition trends that can provide meaningful insights despite individual checklist biases. This approach allows us to assess the comparative prominence of species in the community as reported by the observer, providing a consistent metric of relative abundance. Despite detectability biases, the structure of eBird checklists reflects the observer’s encounter rates with each species under similar conditions, offering a valuable snapshot of relative species composition across sites and times. The key to our assumption is that these biases discussed are not directional and, therefore, random throughout the sampling process, which would translate to no ‘real’ bias in our effect size of interest.

      Range biases are also present. Notably, tropical mountain-occupying species have range sizes overestimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al., Nature Communications). These species are often quite rare, too.

      We thanks the reviewer for pointing to this issue and reference. We included a discussion on these biases in our limitations section and reference Ocampo-Peñuela et al. to emphasize the need for improved spatial resolution in range data for more accurate AOR assessments.”More precise range-size estimates would also improve the accuracy of AOR assessments, since species range data are often overestimated due to the failure to capture gaps in actual distributions ”

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      We agree random errors can affect estimates, but as we wrote above, random errors, regardless of magnitudes, would not bias estimates. After accounting for sampling error (a part of random errors), little variance is left to be explained as we have shown in the MS. This suggests that many of the random errors were part of the sampling errors. And this is where meta-analysis really shines.

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Extremely high correlations would not make any biological sense if these observations were based on large sample sizes. However, as shown in Figure 2, all extreme correlations come from small sample sizes (i.e., low precision), as sampling theory expects (actually our Fig 2 a text-book example of the funnel shape). Therefore, we do not need to invoke any biological explanations here.

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      We agree. Now, we extended an explanation of why confirmation bias could result in positive AOR. Yet, we point out confirmation bias is a very common phenomena which we cite relevant citations in the original MS. The only way to avoid confirmation bias is to conduct a study blind but this is not often possible in ecological work.

      “Meta-research on behavioural ecology identified 79 studies on nestmate recognition, 23 of which were conducted blind. Non-blind studies confirmed a hypothesis of no aggression towards nestmates nearly three times more often. It is possible that confirmation bias was at play in earlier AOR studies.”

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

      We agree. And yes, we pointed out these in our introduction.

      Reviewer #3 (Public Review):

      Summary:

      This paper claims to overturn the longstanding abundance occupancy relationship.

      Strengths:

      (1) The above would be important if true.

      (2) The dataset is large.

      We have clarified this point by changing the title to emphasize that we do not suggest overturning AORs entirely but instead provide a refined view of the relationship at a global scale. Our results suggest a weaker and more context-dependent AOR than previously documented. We hope our revised title and additional clarifications in the text convey our intent to contribute to a more nuanced understanding rather than a whole overturning of the AOR framework.

      Weaknesses:

      (1) The authors are not really measuring the abundance-occupancy relationship (AOR). They are measuring abundance-range size. The AOR typically measures patches in a metapopulation, i.e. at a local scale. Range size is not an interchangeable notion with local occupancy.

      We have refined this in our revision to be more explicitly focused on global range size. However, we note that the classic paper by Bock and Richlefs (1983, Am Nat) also refers to global (species entire) range size in the context of the AOR. Importantly, Bock and Richlefs pointed out the importance of using species’ entire ranges; without such uses, there will be sampling artifacts creating positive AORs when using arbitrary geographical ranges, which were used in some studies of AORs. So we highlight that our work is well in line with the previous work, allowing us to question the longstanding macroecological work. One of the issues of AOR has been how to define occupancy and global range size, which provides a relatively ambiguous measure, which is why we used this measure.

      (2) Ebird is a poor dataset for this. The sampling unit is non-standard. So abundance can at best be estimated by controlling for sampling effort. Comparisons across space are also likely to be highly heterogenous. They also threw out checklists in which abundances were too high to be estimated (reported as "X"). As evidence of the biases in using eBird for this pattern, the North American Breeding Bird Survey, a very similar taxonomic and geographic scope but with a consistent sampling protocol across space does show clear support for the AOR.

      Yes, we agree the sampling unit is non-standard. However, this is a significant strength in that it samples across much heterogeneity (as discussed in response to Reviewer 2, above). We were interested in relative abundance and not direct absolute abundance per se, which is accurate, especially since we did control for sampling effort.

      We appreciate the reviewer’s attention to our data selection criteria. We excluded checklists containing ‘X’ entries to minimize biases in our abundance estimates. The 'X' notation is often used for the most common species, reflecting the observer's identification of presence without specifying a count. This approach was chosen to avoid disproportionately inflating presence data for these abundant species, which could distort the relative abundance calculations in our analysis. By excluding such checklists, we aimed to retain consistency and ensure that local abundance estimates were representative across all species on each checklist. We have revised our manuscript to clarify this methodological choice and hope this explanation addresses the reviewer’s concern. We modified our text in the methods to make the entries ‘X’ clearer (see the Method section).

      (3) In general, I wonder if a pattern demonstrated in thousands of data sets can be overturned by findings in one data set. It may be a big dataset but any biases in the dataset are repeated across all of those observations.

      Overturning a major conclusion requires careful work. This paper did not rise to this level.

      We appreciate the reviewer’s caution regarding broad conclusions based on a single dataset, even one as large as eBird. Our intention was not to definitively overturn the abundance-occupancy relationship (AOR) but to re-evaluate it with the most extensive and globally representative dataset currently available. We recognise that potential biases in citizen science data, such as observer variation, may influence our findings, and we have taken steps to address these in our methodology and limitations sections. We see this work as a contribution to an ongoing discourse, suggesting that AOR may be less universally consistent than previously believed, mainly when tested with large-scale citizen science data. We hope this study will encourage additional research that tests AORs using other expansive datasets and approaches, further refining our understanding of this classic macroecological relationship. However, we have left our broad message about instigating credible revolution and also re-examining ecological laws.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The investigation focuses solely on interspecific relationships among birds; thus, the extrapolation of these conclusions to broader ecological contexts requires further validation.

      We have now added this point to our new section: “Although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, so we hope this work serves as a foundation for further investigations that utilize such comprehensive datasets.”

      (2) The rationale for combining data from eBird - a platform predominantly representing individual observations from urban North America - with the more globally comprehensive BirdLife International database needs to be substantiated. The potential underrepresentation of global abundance in the eBird checklist data could introduce a sampling bias, undermining the foundational premises of AORs.

      We agree with the limitation of ebird sampling coverage, but it should not bias our results. In statistical definitions, bias is directional, and if not directional, it will become statistical noise, making it difficult to detect the signal. In fact, our meta-analyses adjust what statisticians call sampling bias and it is the strength of meta-analysis.

      (3) In the full mixed-effect model, checklist duration and sampling variance (inversely proportional to sample size N) are treated as fixed effects. However, these variables are likely to be negatively correlated, which could introduce multicollinearity, inflating standard errors and diminishing the statistical significance of other factors, such as the intercept. This calls into question the interpretation of insignificance in the results.

      Multicollinearity is an issue with sample sizes. For example, with small datasets, correlations of 0.5 could be an issue, and such an issue would usually show up as a large SE. We do not have such an issue with ~ 17 million data points. Please refer to this paper.

      Freckleton, Robert P. "Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error." Behavioral Ecology and Sociobiology 65 (2011): 91-101.

      (4) The observed low heterogeneity may stem from discrepancies in sampling for abundance versus occupancy, compounded by uncertainties in reporting behavior.

      If we assume everybody underreports common species or overreports rare species, this could happen. However, such an assumption is unlikely. If some people report accurately (but not others), we should see high heterogeneity, which we do not observe).  We have touched upon this point in our original MS.

      (5) The contribution and implementation of phylogenetic comparative analysis remain ambiguous and were not sufficiently clarified within the study.

      We need to add more explanation for the global abundance analysis

      “To statistically test whether there was an effect of abundance and occupancy at the macro-scale, we used phylogenetic comparative analysis.  This analysis also addresses the issue of positive interspecific AORs potentially arising from not accounting for phylogenetic relatedness among species examined ”

      (6) The use of large N checklists could skew the perceived rarity or commonality of species, potentially diminishing the positive correlation observed in AORs. A consistent observer effect could lead to a near-zero effect with high precision.

      Regardless of the number of N species in checklists (seen in Fig 2), correlations are distributed around zero. This means there is nothing special about large N checklists. 

      (7) The study should acknowledge and discuss any discrepancies or deviations from previous literature or expected outcomes.

      We felt we had already done this as we discussed the previous meta-analysis and what we expected from this meta-analysis.  Nevertheless, we have added some relevant sentences in the new version of MS.

      In addition to these major points, there are several minor concerns:

      (1) Figure 2B lacks discussion, and the metric for the number of observations is not clarified. Furthermore, the labeling of the y-axis appears to be incorrect.

      Thank you very much for pointing out this shortcoming. Now, the y-axis label has been fixed and we mention 2B in the main text.

      (2) The study should provide a clear, mathematical expression of the multilevel random effect models for greater transparency.

      Many thanks for this point, and now we have added relevant mathematical expressions in Table S6.

      (3) On Line 260, the term "number of species" should be refined to "number of species in a checklist," ideally represented by a formula for precision.

      This ambiguity has been mended as suggested.

      Please provide the data and R code linked to the outputs.

      The referee must have missed the link (https://github.com/itchyshin/AORs) in our original MS. In addition to our GitHub repository link, we now have added a link to our Zenodo repository (https://doi.org/10.5281/zenodo.14019900).

      Reviewer #3 (Recommendations For The Authors):

      The authors cite Rabinowitz's 7 forms of rarity paper as a suggestion that previous findings also break the AOR. In fact empirical studies of the 7 forms of rarity typically find that all three forms of rareness vs commonness are heavily correlated (e.g. Yu & Dobson 2000).

      We thank the reviewer for drawing attention to Yu & Dobson (2000) and similar studies that find positive correlations among the axes of rarity. Ref 3 is correct in that Rabinowitz’s (1981) framework does not require that local abundance and geographic range size be uncorrelated for every species; instead, it highlights conceptual scenarios where a species may be common locally yet have a restricted distribution (or vice versa).

      Empirical analyses such as Yu & Dobson (2000) show that, on average, these axes can be correlated, which may align with conventional AOR findings in some taxonomic groups. However, Rabinowitz’s key insight was that exceptions do occur, so these exceptions demonstrate that strong positive AORs may not be universally applicable. Our results do not claim that Rabinowitz’s framework “breaks” the AOR outright; instead, we use it to underscore that local abundance can, in principle, be “decoupled” from global occupancy.  Whether the correlation found by Yu & Dobson (2000) implies a positive AOR, requires a detailed simulation study, which is an interesting avenue for future research. 

      Thus, citing Rabinowitz serves to highlight the potential heterogeneity and complexity of abundance–occupancy relationships rather than to refute every positive correlation reported in the literature. Our findings suggest that when examined at large spatiotemporal scales (with unbiased sampling), the overall AOR signal may be less robust than traditionally believed. This is consistent with Rabinowitz’s view that local abundance and global range can vary along independent axes. Now we added

      “Although studies using her framework found positive correlations between species range and local abundance.”

    1. eLife Assessment

      This important theoretical study shows that active hexatic topological defects in epithelia play a crucial role in enabling collective cell flows. While the use of coarse-grained hydrodynamic models to describe cell-scale behavior has limitations, the study provides solid evidence supporting its claims. These findings will interest both biophysicists studying collective cell behaviors and biologists investigating epithelial flows during development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical mechanisms underlying cell intercalation, which then enables collective cell flows in confluent epithelia. The authors show that T1 transitions (the topological transitions responsible for cell intercalation) correspond to the unbinding of groups of hexatic topological defects. Defect unbinding, and hence cell intercalation and collective cell flows, are possible when active stresses in the tissue are extensile. This result helps to rationalize the observation that many epithelial cell layers have been found to exhibit extensile active nematic behavior.

      Strengths:

      The authors obtain their results based on a combination of active hexanematic hydrodynamics and a multiphase field (MPF) model for epithelial layers, whose connection is a strength of the paper. With the hydrodynamic approach, the authors find the active flow fields produced around hexatic topological defects, which can drive defect unbinding. Using the MPF simulations, the authors show that T1 transitions tend to localize close to hexatic topological defects.

      Weaknesses:

      Citations are sometimes not comprehensive. Cases of contractile behavior found in collective cell flows, which would seemingly contradict some of the authors' conclusions, are not discussed.

      I encourage the authors to address the comments and questions below.

      (1) In Equation 1, what do the authors mean by the cluster's size \ell? How is this quantity defined? The calculations in the Methods suggest that \ell indicates the distance between the p-atic defects and the center of the T1 cell cluster, but this is not clearly defined.

      (2) The multiphase field model was developed and reviewed already, before the Loewe et al. 2020 paper that the authors cite. Earlier papers include Camley et al. PNAS 2014, Palmieri et al. Sci. Rep. 2015, Mueller et al. PRL 2019, and Peyret et al. Biophys. J. 2019, as reviewed in Alert and Trepat. Annu. Rev. Condens. Matter Phys. 2020.

      (3) At what time lag is the mean-squared displacement in Figure 3f calculated? How does the choice of a lag time affect these data and the resulting conclusions?

      (4) The authors argue that their results provide an explanation for the extensile behavior of cell layers. However, there are also examples of contractile behavior, such as in Duclos et al., Nat. Phys., 2017 and in Pérez-González et al., Nat. Phys., 2019. In both cases, collective cell flows were observed, which in principle require cell intercalations. How would these observations be rationalized with the theory proposed in this paper? Can these experiments and the theory be reconciled?