10,000 Matching Annotations
  1. Oct 2025
    1. Author Response

      Response to Reviewer 1:

      Summary of what the author was trying to achieve: In this study, the author aimed to develop a method for estimating neuronal-type connectivity from transcriptomic gene expression data, specifically from mouse retinal neurons. They sought to develop an interpretable model that could be used to characterize the underlying genetic mechanisms of circuit assembly and connectivity.

      Strengths: The proposed bilinear model draws inspiration from commonly implemented recommendation systems in the field of machine learning. The author presents the model clearly and addresses critical statistical limitations that may weaken the validity of the model such as multicollinearity and outliers. The author presents two formulations of the model for separate scenarios in which varying levels of data resolution are available. The author effectively references key work in the field when establishing assumptions that affect the underlying model and subsequent results. For example, correspondence between gene expression cell types and connectivity cell types from different references are clearly outlined in Tables 1-3. The model training and validation are sufficient and yield a relatively high correlation with the ground truth connectivity matrix. Seemingly valid biological assumptions are made throughout, however, some assumptions may reduce resolution (such as averaging over cell types), thus missing potentially important single-cell gene expression interactions.

      Thank you for acknowledging the strengths of this work. The assumption to average gene expression data across individual cells within a given cell type was made in response to the inherent limitations of, for example, the mouse retina dataset, where individual cell-level connectivity and gene expression data are not profiled jointly (the second scenario in our paper). This approach was a necessary compromise to facilitate the analysis at the cell type level. However, in datasets where individual cell-level connectivity and gene expression data are matched, such as the C.elegans dataset referenced below, our model can be applied to achieve single-cell resolution (the first scenario in our paper), offering a more detailed understanding of genetic underpinnings in neuronal connectivity.

      Weaknesses: The main results of the study could benefit from replication in another dataset beyond mouse retinal neurons, to validate the proposed method. Dimensionality reduction significantly reduces the resolution of the model and the PCA methodology employed is largely non-deterministic. This may reduce the resolution and reproducibility of the model. It may be worth exploring how the PCA methodology of the model may affect results when replicating. Figure 5, ’Gene signatures associated with the two latent dimensions’, lacks some readability and related results could be outlined more clearly in the results section. There should be more discussion on weaknesses of the results e.g. quantification of what connectivity motifs were not captured and what gene signatures might have been missed.

      I value the suggestion of validating the propose method in another dataset. In response, I found the C.elegans dataset in the references the reviewer suggested below a good candidate for this purpose, and I plan to explore this dataset and incorporate findings in the revised manuscript. I understand the concerns regarding the PCA methodology and its potential impact on the model’s resolution and reproducibility. In response, alternative methods, such as regularization techniques, will be explored to address these issues. Additionally, I agree that enhancing the clarity and readability of Figure 5, as well as including a more comprehensive discussion of the model’s limitations, would significantly strengthen the manuscript.

      The main weakness is the lack of comparison against other similar methods, e.g. methods presented in Barabási, Dániel L., and Albert-László Barabási. "A genetic model of the connectome." Neuron 105.3 (2020): 435-445. Kovács, István A., Dániel L. Barabási, and Albert-László Barabási. "Uncovering the genetic blueprint of the C. elegans nervous system." Proceedings of the National Academy of Sciences 117.52 (2020): 33570-33577. Taylor, Seth R., et al. "Molecular topography of an entire nervous system." Cell 184.16 (2021): 4329-4347.

      Thank you for highlighting the importance of comparing our model with others, particularly those mentioned in your comments. After reviewing these papers, I find that our bilinear model aligns closely with the methods described, especially in [1, 2]. To see this, let’s start with Equation 1 in Kovács et al. [2]:

      In this equation, B represents the connectivity matrix, while X denotes the gene expression patterns of individual neurons in C.elegans. The operator O is the genetic rule operator governing synapse formation, linking connectivity with individual neuronal expression patterns. It’s noteworthy that the work of Barabási and Barabási [1] explores a specific application of this framework, focusing on O for B that represents biclique motifs in the C.elegans neural network.

      To identify the the operator O, the authors sought to minimize the squared residual error:

      with regularization on O.

      Adopting the notation from our bilinear model paper and using Z to represent the connectivity matrix, the above becomes

      Coming back to the bilinear model formulation, the optimization problem, as formulated for the C.elegans dataset where individual neuron connectivity and gene expression are accessible, takes the form:

      where we consider each neuron as a distinct neuronal type. In addition, we extend the dimensions of X and Y to encompass the entire set of neurons in C.elegans, with X = Y ∈ Rn×p, where n signifies the total number of neurons and p the number of genes. Accordingly, our optimization challenge evolves into:

      Upon comparison with the earlier stated equation, it becomes clear that our approach aligns consistently with the notion of O = ABT. This effectively results in a decomposition of the genetic rule operator O. This decomposition extends beyond mere mathematical convenience, offering several substantial benefits reminiscent of those seen in the collaborative filtering of recommendation systems:

      • Computational Efficiency: The primary advantage of this approach is its improvement in computational efficiency. For instance, solving for O ∈ Rp×p necessitates determining p2 entries. In contrast, solving for A ∈ Rp×d and B ∈ Rp×d involves determining only 2pd entries, where p is the number of genes, and d is the number of latent dimensions. Assuming the existence of a lower-dimensional latent space (d << p) that captures the essential variability in connectivity, resolving A and B becomes markedly more efficient than resolving O. Additionally, from a computational system design perspective, inferring the connectivity of a neuron allows for caching the latent embeddings of presynaptic neurons XA or postsynaptic neurons XB with a space complexity of O(nd). This is significantly more space-efficient than caching XO or OXT, which has a space complexity of O(np). This difference is particularly notable when dealing with large numbers of neurons, such as those in the entire mouse brain. The bilinear modeling approach thus enables effective handling of large datasets, simplifying the optimization problem and reducing computational load, thereby making the model more scalable and faster to execute.

      • Interpretability: The separation into A for presynaptic features and B for postsynaptic features provides a clearer understanding of the distinct roles of pre- and post- synaptic neurons in forming the connection. By projecting the pre- and post- synaptic neurons into a shared latent space through XA and YB, one can identify meaningful representations within each axis, as exemplified in different motifs from the mouse retina dataset. The linear characteristics of A and B facilitate direct evaluation of each gene’s contribution to a latent dimension. This interpretability, offering insights into the genetic factors influencing synaptic connections, is beyond what O could provide itself.

      • Flexibility and Adaptability: The bilinear model’s adaptability is another strength. Much like collaborative filtering, which can manage very different user and item features, our bilinear model can be tailored to synaptic partners with genetic data from varied sources. A potential application of this model is in deciphering the genetic correlates of long-range projectomic rules, where pre- and post-synaptic neurons are processed and sequenced separately, or even involving post-synaptic targets being brain regions with genetic information acquired through bulk sequencing. This level of flexibility also allows for model adjustments or extensions to incorporate other biological factors, such as proteomics, thereby broadening its utility across various research inquiries into the determinants of neuronal connectivity.

      In the study by Taylor et al. [3], the authors introduced a generalization of differential gene expressions (DGE) analysis called network DGE (nDGE) to identify genetic determinants of synaptic connections. It focuses on genes co-expressed across pairs of neurons connected, compared with pairs without connection.

      As the authors acknowledged in the method part of the paper, nDGE can only examine single genes co-expressed at synaptic terminals: "While the nDGE technique introduced here is a generalization of standard DGE, interrogating the contribution of pairs of genes in the formation and maintenance of synapses between pairs of neurons, nDGE can only account for a single co-expressed gene in either of the two synaptic terminals (pre/post)."

      In contrast, the bilinear model offers a more comprehensive analysis by seeking a linear combination of gene expressions in both pre- and post-synaptic neurons. This model goes beyond the scope of examining individual co-expressed genes, as it incorporates different weights for the gene expressions of pre- and post-synaptic neurons. This feature of the bilinear model enables it to capture not only homogeneous but also complex and heterogeneous genetic interactions that are pivotal in synaptic connectivity. This highlights the bilinear model’s capability to delve into the intricate interactions of synaptic gene expression.

      Appraisal of whether the author achieved their aims, and whether results support their conclusions: The author achieved their aims by recapitulating key connectivity motifs from single-cell gene expression data in the mouse retina. Furthermore, the model setup allowed for insight into gene signatures and interactions, however could have benefited from a deeper evaluation of the accuracy of these signatures. The author claims the method sets a new benchmark for single-cell transcriptomic analysis of synaptic connections. This should be more rigorously proven. (I’m not sure I can speak on the novelty of the method)

      I value your appraisal. In response, additional validation of the bilinear model on a second dataset will be undertaken.

      Discussion of the likely impact of the work on the field, and the utility of methods and data to the community : This study provides an understandable bilinear model for decoding the genetic programming of neuronal type connectivity. The proposed model leaves the door open for further testing and comparison with alternative linear and/or non-linear models, such as neural networkbased models. In addition to more complex models, this model can be built on to include higher resolution data such as more gene expression dimensions, different types of connectivity measures, and additional omics data.

      Thank you for your positive assessment of the potential impact of the study.

      Response to Reviewer 2:

      Summary: In this study, Mu Qiao employs a bilinear modeling approach, commonly utilized in recommendation systems, to explore the intricate neural connections between different pre- and post-synaptic neuronal types. This approach involves projecting single-cell transcriptomic datasets of pre- and post-synaptic neuronal types into a latent space through transformation matrices. Subsequently, the cross-correlation between these projected latent spaces is employed to estimate neuronal connectivity. To facilitate the model training, connectomic data is used to estimate the ground-truth connectivity map. This work introduces a promising model for the exploration of neuronal connectivity and its associated molecular determinants. However, it is important to note that the current model has only been tested with Bipolar Cell and Retinal Ganglion Cell data, and its applicability in more general neuronal connectivity scenarios remains to be demonstrated.

      Strengths: This study introduces a succinct yet promising computational model for investigating connections between neuronal types. The model, while straightforward, effectively integrates singlecell transcriptomic and connectomic data to produce a reasonably accurate connectivity map, particularly within the context of retinal connectivity. Furthermore, it successfully recapitulates connectivity patterns and helps uncover the genetic factors that underlie these connections.

      Thank you for your positive assessment of the paper.

      Weaknesses:

      1. The study lacks experimental validation of the model’s prediction results.

      Thank you for pointing out the importance of experimental validation. I acknowledge that the current version of the study is focused on the development and validation of the computational model, using the datasets presently available to us. Moving forward, I plan to collaborate with experimental neurobiologists. These collaborations are aimed at validating our model’s predictions, including the delta-protocadherins mentioned in the paper. However, considering the extensive time and resources required for conducting and interpreting experimental results, I believe it is more pragmatic to present a comprehensive experimental study, including the design and execution of experiments informed by the model’s predictions, in a separate follow-up paper. I intend to include a paragraph in the discussion of this paper outlining the future direction for experimental validation.

      1. The model’s applicability in other neuronal connectivity settings has not been thoroughly explored.

      I recognize the importance of assessing the model across different neuronal systems. In response to similar feedback from Reviewer 1, I am keen to extend the study to include the C.elegans dataset mentioned earlier. The results from applying our bilinear model to the second dataset will be incorporated into the revised manuscript.

      1. The proposed method relies on the availability of neuronal connectomic data for model training, which may be limited or absent in certain brain connectivity settings.

      The concern regarding the dependency of our model on the availability of connectomic data is valid. While complete connectomes are available for organisms like C.elegans and Drosophila, and efforts are underway to map the connectome of the entire mouse brain, such data may not always be accessible for all research contexts. Recognizing this limitation, part of the ongoing research is to explore ways to adapt our model to the available data, such as projectomic data. Furthermore, our bilinear model is compatible with trans-synaptic virus-based sequencing techniques [4, 5], allowing us to leverage data from these experimental approaches to uncover the genetic underpinnings of neuronal connectivity. These initiatives are crucial steps towards broadening the applicability of our model, ensuring its relevance and usefulness in diverse brain connectivity studies where detailed connectomic data may not be readily available.

      References

      [1] Dániel L. Barabási and Albert-László Barabási. A genetic model of the connectome. Neuron, 105(3):435–445, 2020.

      [2] István A. Kovács, Dániel L. Barabási, and Albert-László Barabási. Uncovering the genetic blueprint of the c. elegans nervous system. Proceedings of the National Academy of Sciences, 117(52):33570–33577, 2020.

      [3] Seth R. Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J. Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, Oliver Hobert, and David M. 3rd Miller. Molecular topography of an entire nervous system. Cell, 184(16):4329–4347, 2021.

      [4] Nicole Y. Tsai, Fei Wang, Kenichi Toma, Chen Yin, Jun Takatoh, Emily L. Pai, Kongyan Wu, Angela C. Matcham, Luping Yin, Eric J. Dang, Denise K. Marciano, John L. Rubenstein, Fan Wang, Erik M. Ullian, and Xin Duan. Trans-seq maps a selective mammalian retinotectal synapse instructed by nephronectin. Nat Neurosci, 25(5):659–674, May 2022.

      [5] Aixin Zhang, Lei Jin, Shenqin Yao, Makoto Matsuyama, Cindy van Velthoven, Heather Sullivan, Na Sun, Manolis Kellis, Bosiljka Tasic, Ian R. Wickersham, and Xiaoyin Chen. Rabies virusbased barcoded neuroanatomy resolved by single-cell rna and in situ sequencing. bioRxiv, 2023.

    1. Author response:

      Reviewer #1:

      The only minor weakness that I found is the assumption of independence of bacterial species, which is expressed as the well-stirred approximation. One could imagine that bacterial species might cooperate, leading to non-uniform distributions that are real. How to distinguish such situations? I believe that this method can be extended to determine if this is the case or not before the application. For example, if the bacteria species are independent of each other and one can use the binomial distributions, then the Fano factor would be proportional to the overall relative fraction of bacterial species. Maybe a simple test can be added to test it before the application of REPOP. However, I believe that this is a minor issue.

      This is an interesting point raised by the reviewer.

      First, we need to clarify an important point–we do not make a well-stirred assumption. Samples can be drawn and plated from any region of space however small and that region’s population can be quantified using our method. The stirring only occurs after we collect a sample in order to dilute the contents and pour the solution homogeneously over the plate.

      As such, learning multiple independent species is possible and not impacted by the dilution (“wellstirred” assumption). In the revised manuscript we will make it clear that this assumption concerns the dilution process. Any correlation between species arises in the initial sample and should be retained in the plating. Once given the sample, the dilution itself produces independent binomial draws from that point in space from which cultures were harvested. REPOP is designed to recover the true underlying heterogeneity in species abundance (even from limited data) by leveraging a Bayesian framework that remains valid regardless of whether species are independent or correlated.

      If one applies the method for multiple species as is, REPOP can recover the marginal distribution of each species in each plate if they are selectively cultured or many species at once if the colonies are sufficiently distinct. To demonstrate this, we will add a synthetic example with two species whose populations in a sample are correlated to the manuscript.

      However, in order to learn the joint distribution and capture correlations between species within samples, the method would need to be extended. At present, in Eq. 5 we sum the likelihood over all values of n, using a data-driven cutoff (twice the na¨ıvely estimated count times the dilution factor). Extending this to multiple species adding up to (n1,n2), while retain the generality of the method, would require quadratically scaling memory with this cutoff in the population number. For this reason while we will comment on this in the next version of the manuscript, it will not be implemented as part of REPOP.

      Reviewer #2:

      A more thorough discussion of when and by how much estimated microbial population abundance distributions differ from the ground truth would be helpful in determining the best practices for applying this method. Not only would this allow researchers to understand the sampling effort necessary to achieve the results presented here, but it would also contextualize the experimental results presented in the paper. Particularly, there is a disconnect between the discussion of the large sample sizes necessary to achieve accurate multimodal distribution estimates and the small sample sizes used in both experiments.

      That is a great suggestion from the reviewer. To address it, we will expand Appendix B, which currently presents the relative error between the means for the experimental results in Fig. 3, to also include a comparable evaluation for the synthetic data example in Fig. 2.

      Specifically, for each example, we will report (1) the relative error in the estimated means (as already done for Fig. 3), and (2) the Kullback-Leibler (KL) divergence between the reconstructed and ground truth distributions. These metrics will be shown as a function of the size of the dataset, enabling a direct assessment of how the sampling effort affects the precision of the inference.

      That said, we highlight that by explicitly modeling the dilution process within a Bayesian framework, REPOP extracts the mathematically optimal amount of information from each individual sample no matter the sample size. Our strategy therefore leads to better inference with fewer measurements, which is particularly important in applications such as plate counting, where data acquisition is laborintensive.

      Reviewer #3:

      While the study is promising, there are a few areas where the paper could be strengthened to increase its impact and usability. First, the extent to which dilution and plating introduce noise is not fully explored. Could this noise significantly affect experimental conclusions? And under what conditions does it matter most? Does it depend on experimental design or specific parameter values? Clarifying this would help readers appreciate when and why REPOP should be used.

      We agree with the reviewer that this is an important point, and we will expand Appendix B to include a quantitative analysis using simulated data (Fig. 2), reporting both relative error and KL divergence as a function of dataset size. This complements our response to Reviewer #2 clarifying when REPOP offers the greatest benefit.

      In addition, we will expand the discussion on how modeling dilution noise becomes essential when learning population dynamics. In particular, we will emphasize the role of Model 3, especially relevant when working with multiple plates and approaching the asymptotic regime—an aspect that was alluded to in Fig. 3 but not fully explored.

      Second, more practical details about the tool itself would be very helpful. Simply stating that it is available on GitHub may not be enough. Readers will want to know what programming language it uses, what the input data should look like, and ideally, see a step-by-step diagram of the workflow. Packaging the tool as an easy-to-use resource, perhaps even submitting it to CRAN or including example scripts, would go a long way, especially since microbiologists tend to favor user-friendly, recipe-like solutions.

      We will update the introduction to reinforce that REPOP is written in Python(PyTorch), installable via pip, and designed for ease of use. We are also expanding the tutorials to include clearer guidance on data formatting and common workflows. Author response image 1 will be added in the revised manuscript to better illustrate the full application process.

      Author response image 1.

      Third, it would be great to see the method tested on existing datasets, such as those from Nic Vega and Jeff Gore (2017), which explore how colonization frequency impacts abundance fluctuation distributions. Even if the general conclusions remain unchanged, showing that REPOP can better match observed patterns would strengthen the paper’s real-world relevance.

      That is a great suggestion from the reviewer. We will demonstrate the application of REPOP to datasets such as that of Vega and Gore (Ref. 27 in the manuscript), as well as other publicly available datasets, in the revised version.

      Lastly, it would be helpful for the authors to briefly discuss the limitations of their method, as no approach is without its constraints. Acknowledging these would provide a more balanced and transparent perspective.

      We agree with the reviewer on that. A new subsection will explicitly address the assumptions of our method, and therefore its limitations, including assumptions about species classification, computational cost of joint inference, and dependence on accurate dilution modeling. This discussion will synthesize points raised throughout our response to all reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. A previous study by Tschan et al(1) has shown that PU.1 attenuates the transcriptional activity of the p53 tumor suppressor family through direct binding to the DNA-binding and/or the oligomerization domains of p53/p73 proteins. We will discuss this point in the revised manuscript and cite this paper accordingly. Moreover, to further investigate the interaction between Pu.1 and Tp53 in zebrafish, we intend to perform a comprehensive analysis of the tp53 promoter region utilizing bioinformatic prediction tools. This approach aims to identify potential Pu.1 binding sites, thereby providing insights into the direct regulatory interactions between Pu.1 and the tp53 promoter in zebrafish. 

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      In the revised manuscript, we will elaborate on the methodological details of the RNA analysis. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We will explicitly acknowledge this technical constraint in the revised manuscript to ensure methodological transparency.

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Author response image 1). Microglial death occurs only when Pu.1 is disrupted in the spi-b mutant background, in both embryonic and adult brains. The blebbing morphology of some microglia after pu.1 conditional knock out in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic (Figure S4) and adult stages Author response image 2). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Figure 2) versus conditional pu.1 ablation. Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We plan to include this clarification in the revised manuscript.

      Author response image 1.

      Conditional depletion of Pu.1 in embryonic microglia had no effect for their short-term survival. (A) Schematics of 4-OHT treatment for pu.1<sup>KI/WT</sup> Tg(coro1a:CreER) and pu.1<sup>KI/Δ839</sup> Tg(coro1a:CreER) at embryonic stage. (B) Representative images of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 5 dpf. (C) Quantification of DsRed<sup>+</sup> microglia in pu.1<sup>KI/WT</sup> and pu.1<sup>KI/Δ839</sup> at 3 dpf and 5 dpf. Values represent means ± SD, n.s., P >0.05.

      Author response image 2. Simultaneous inactivation of Pu.1 and Spi-b lead to microglia death in adult zebrafish. (A) The experimental setup for pu.1 conditional knockout in adult spi-b<sup>Δ232/Δ232</sup> mutants (B) the representative images of the midbrain cross section of adult pu.1<sup>KI/+</sup>;spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) and pu.1<sup>KI/WT</sup>spi-b<sup>Δ232/Δ232</sup>;Tg(coro1a:CreER) fish at 2 dpi. The white arrow indicates microglia with blebbing morphology.

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of SPI-B expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineagespecific roles, becoming absent in microglia. We will expand on this evolutionary divergence and its implications for microglial regulation in the revised manuscript.

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We plan to represent our data as mean ± SD in the revised manuscript.

      Reference:

      (1) Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE. PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene. 2008 May 29;27(24):3489-93.

    1. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      We agree with the reviewer and will replace all the subsection titles as suggested.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

      We thank the reviewer for spotting this error. We will modify the schematic as suggested.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      We thank the reviewer for this comment. We would like to clarify that the images presented in Figure 2 and Figure 1 Supplement 1 are representative images based on at least 5 independent samples. We will clarify this in the figure caption and methods. The electron micrographs showing dense core vesicle (DCV) characteristics (Figure 1 Supplement E-G) are also representative images based on examination of multiple neurons. However, we agree with the reviewer that a rigorous quantification would be useful to showcase the differences between DCVs from NSC subtypes. Therefore, we have now performed a quantitative analysis of the DCVs in putative m-NSC<sup>DH44</sup> (n=6), putative m-NSC<sup>DMS</sup> (n=6) and descending neurons (n=4) known to express DMS. For consistency, we examined the cross section of each cell where the diameter of nuclei was the largest. We quantified the mean gray value of at least 50 DCV per cell. Our analysis shows that mean gray values of putative m-NSC<sup>DMS</sup> and DMS descending neurons are not significantly different, whereas the mean gray values of m-NSC<sup>DH44</sup> are significantly larger. This analysis is in agreement with our initial conclusion.

      Author response image 1.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

      We fully agree with the reviewer and will further elaborate on the limitations of our approach in the revised manuscript. However, there is a very high-likelihood that a given NSC subtype can signal to another NSC subtype using a neuropeptide if its receptor is expressed in the target NSC. This is due to the fact that all NSC axons are part of the same nerve bundle (nervi corpora cardiaca) which exits the brain. The axons of different NSCs form release sites that are extremely close to each other. Neuropeptides from these release sites can easily diffuse via the hemolymph to peripheral tissues that (e.g. fat body and ovaries) that are much further away from the release sites on neighboring NSCs. We believe that neuropeptide receptors are expressed in NSCs near these release sites where they can receive inputs not just from the adjacent NSCs but also from other sources such as the gut enteroendocrine cells. Hence, neuropeptide diffusion is not a limiting factor preventing paracrine signaling between NSCs and receptor expression is a good indicator for putative paracrine signaling.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis

      (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

      We agree with this reviewer that assessing the functional output from NSCs would improve the manuscript. Given that we currently lack genetic tools to measure hormone levels and that behaviors and physiology are modulated by NSCs on slow timescales, it is difficult to assess the immediate functional impact of the sensory inputs to NSC using approaches such as optogenetics. However, since l-NSC<sup>CRZ</sup> are the only known cell type that provide output to descending neurons, we will functionally test this output pathway using different behavioral assays recommended by this reviewer.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      Thanks for this nice summary of our paper.

      The following points could be addressed in a revision:

      (1) The authors conclude that much of the person-to-person and strain-to-strain variation seems idiosyncratic to individual sera rather than age groups. This point is not yet fully convincing. While the mean titer of an individual may be idiosyncratic to the individual sera, the strain-to-strain variation still reveals some patterns that are consistent across individuals (the authors note the effects of substitutions at sites 145 and 275/276). A more detailed analysis, removing the individual-specific mean titer, could still show shared patterns in groups of individuals that are not necessarily defined by the birth cohort.

      As the reviewer suggests, we normalized the titers for all sera to the geometric mean titer for each individual in the US-based pre-vaccination adults and children. This is only for the 2023-circulating viral strains. We then faceted these normalized titers by the same age groups we used in Figure 6, and the resulting plot is shown below. Although there are differences among virus strains (some are better neutralized than others), there are not obvious age group-specific patterns (eg, the trends in the two facets are similar). To us this suggests that at least for these relatively closely related recent H3N2 strains, the strain-to-strain variation does not obviously segregate by age group. Obviously, it is possible (we think likely) that there would be more obvious age-group specific trends if we looked at a larger swath of viral strains covering a longer time range (eg, over decades of influenza evolution). We plan to add the new plots shown below to a supplemental figure in the revised manuscript.

      Author response image 1.

      Author response image 2.

      (2) The authors show that the fraction of sera with a titer below 138 correlates strongly with the inferred growth rate using MLR. However, the authors also note that there exists a strong correlation between the MLR growth rate and the number of HA1 mutations. This analysis does not yet show that the titers provide substantially more information about the evolutionary success. The actual relation between the measured titers and fitness is certainly more subtle than suggested by the correlation plot in Figure 5. For example, the clades A/Massachusetts and A/Sydney both have a positive fitness at the beginning of 2023, but A/Massachusetts has substantially higher relative fitness than A/Sydney. The growth inference in Figure 5b does not appear to map that difference, and the antigenic data would give the opposite ranking. Similarly, the clades A/Massachusetts and A/Ontario have both positive relative fitness, as correctly identified by the antigenic ranking, but at quite different times (i.e., in different contexts of competing clades). Other clades, like A/St. Petersburg are assigned high growth and high escape but remain at low frequency throughout. Some mention of these effects not mapped by the analysis may be appropriate.

      Thanks for the nice summary of our findings in Figure 5. However, the reviewer is misreading the growth charts when they say that A/Massachusetts/18/2022 has a substantially higher fitness than A/Sydney/332/2023. Figure 5a shows the frequency trajectory of different variants over time. While A/Massachusetts/18/2022 reaches a higher frequency than A/Sydney/332/2023, the trajectory is similar and the reason that A/Massachusetts/18/2022 reached a higher max frequency is that it started at a higher frequency at the beginning of 2023. The MLR growth rate estimates differ from the maximum absolute frequency reached: instead, they reflect how rapidly each strain grows relative to others. In fact, A/Massachusetts/18/2022 and A/Sydney/332/2023 have similar growth rates, as shown in Supplementary Figure 6b. Similarly, A/Saint-Petersburg/RII-166/2023 starts at a low initial frequency but then grows even as A/Massachusetts/18/2022 and A/Sydney/332/2023 are declining, and so has a higher growth rate than both of those. In the revised manuscript, we will clarify how viral growth rates are estimated from frequency trajectories, and how growth rate differs from max frequency.

      (3) For the protection profile against the vaccine strains, the authors find for the adult cohort that the highest titer is always against the oldest vaccine strain tested, which is A/Texas/50/2012. However, the adult sera do not show an increase in titer towards older strains, but only a peak at A/Texas. Therefore, it could be that this is a virus-specific effect, rather than a property of the protection profile. Could the authors test with one older vaccine virus (A/Perth/16/2009?) whether this really can be a general property?

      We are interested in studying immune imprinting more thoroughly using sequencing-based neutralization assays, but we note that the adults in the cohorts we studied would have been imprinted with much older strains than included in this library. As this paper focuses on the relative fitness of contemporary strains with minor secondary points regarding imprinting, these experiments are beyond the scope of this study. We’re excited for future work (from our group or others) to explore these points by making a new virus library with strains from multiple decades of influenza evolution.

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, which will be relevant across pathogens (assuming the assay can be appropriately adapted). I only have a few comments, focused on maximising the information provided by the sera.

      Thanks very much!

      Firstly, one of the major findings is that there is wide heterogeneity in responses across individuals. However, we could expect that individuals' responses should be at least correlated across the viruses considered, especially when individuals are of a similar age. It would be interesting to quantify the correlation in responses as a function of the difference in ages between pairs of individuals. I am also left wondering what the potential drivers of the differences in responses are, with age being presumably key. It would be interesting to explore individual factors associated with responses to specific viruses (beyond simply comparing adults versus children).

      We’re excited by this idea! We plan to include these analyses in our revised pre-print.

      Relatedly, is the phylogenetic distance between pairs of viruses associated with similarity in responses?

      As above, we like this idea and our revised pre-print will include this analysis.

      Figure 5C is also a really interesting result. To be able to predict growth rates based on titers in the sera is fascinating. As touched upon in the discussion, I suspect it is really dependent on the representativeness of the sera of the population (so, e.g., if only elderly individuals provided sera, it would be a different result than if only children provided samples). It may be interesting to compare different hypotheses - so e.g., see if a population-weighted titer is even better correlated with fitness - so the contribution from each individual's titer is linked to a number of individuals of that age in the population. Alternatively, maybe only the titers in younger individuals are most relevant to fitness, etc.

      We’re very interested in these analyses, but suggest they may be better explored in subsequent works that could sample more children, teenagers and adults across age groups. Our sera set, as the reviewer suggests, may be under-powered to perform the proposed analysis on subsetted age groups of our larger age cohorts.

      In Figure 6, the authors lump together individuals within 10-year age categories - however, this is potentially throwing away the nuances of what is happening at individual ages, especially for the children, where the measured viruses cross different groups. I realise the numbers are small and the viruses only come from a small numbers of years, however, it may be preferable to order all the individuals by age (y-axis) and the viral responses in ascending order (x-axis) and plot the response as a heatmap. As currently plotted, it is difficult to compare across panels

      This is a good suggestion, and a revised pre-print will include heatmaps of the different cohorts, ordered by ages of individuals.

      Reviewer #3 (Public review):

      The authors use high-throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. However, there are some areas where I thought the work could be more strongly motivated and linked together. In particular, how the vaccine responses in US and Australia in Figures 6-7 relate to the earlier analysis around growth rates, and what we would expect the relationship between growth rate and population immunity to be based on epidemic theory.

      Thank you for this nice summary. This reviewer also notes that the text related to figures 6 and 7 are more secondary to the main story presented in figures 3-5. The main motivation for including figures 6 and 7 were to demonstrate the wide-ranging applications of sequencing-based neutralization data, and this can certainly be clarified in minor text revisions.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Author Response

      Reviewer 1 (Public Review):

      1. With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      Response: We acknowledge the plausibility of the hypothesis advanced by Reviewer #1. We would like to further clarify the rationale that led us to predict that the hypothesized long-term predictions should manifest at the onset of (and not within) a “phrasal chunk”. The hypothesis does not directly concern the probability of a short event to follow a long one (or the other way around), which to our knowledge has not been systematically quantified in previous cross-linguistic studies. Rather, it concerns how the auditory system forms higher-level auditory chunks based on the rhythmic properties of the native language, which is what the previous behavioral studies on perceptual grouping have addressed (e.g., Iversen 2008; Molnar et al. 2014; Molnar et al. 2016). When presented with sequences of two tones alternating in duration, Spanish speakers typically report perceiving the auditory stream as a repetition of short-long chunks separated by a pause, while speakers of Basque usually report the opposite long-short grouping bias. These results suggest that the auditory system performs a chunking operation by grouping pairs of tones into compressed, higher-level auditory units (often perceived as a single event). The way two constituent tones are combined depends on linguistic experience. Based on this background, we hypothesized the presence of (i) a short-term system that merely encodes a repetition of alternations rule and predicts transitions from one constituent tone to the other (a → b → a → b, etc.); (ii) a long-term system that encodes a repetition of concatenated alternations rule and predicts transitions from one high-level unit to the other (ab → ab, etc.). Under this view, we expect predictions based on the long-term system to be stronger at the onset of (rather than within) high-level units and therefore omissions of the first constituent tone to elicit larger responses than omissions of the second constituent tone.

      In other words, the omission of the onset tone would reflect the omission of the whole chunk. On the other hand, the omission of the internal tone would be better handled by the short-term system, involved in processing the low-level structure of our sequences.

      A similar concern was also raised by Reviewer #2. We will include the view proposed by Reviewer #1 and Reviewer #2 in the updated version of the manuscript.

      1. The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      Response: We appreciate that all three reviewers agreed on the robustness of the data analysis pipeline. The approach employed to identify the presence of an interaction effect was indeed conservative, using a non-parametric test on combined gradiometers data, no a priori assumptions regarding the location of the effect, and small cluster thresholds (cfg.clusteralpha = 0.05) to enhance the likelihood of detecting highly localized clusters with large effect sizes. This approach led to the identification of the cluster illustrated in Figure 2c, where the interaction effect is evident. The fact that this interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. The only partial overlap of the cluster with the activation peaks might simply reflect the fact that distinct sources contribute to the generation of the omission-MMN, which has been demonstrated in numerous prior studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      Response: The two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in Author response image 1 below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot.

      Author response image 1.

      The presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels.

      The code to generate Author response image 1 and the corresponding statistics have been added to the script “analysis_interaction_data.R” in the OSF folder (https://osf.io/6jep8/).

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      Response: Our interpretation of the results for the present study is mainly driven by the effect observed on sensor-level data, which is statistically robust. The source modeling analyses (in non-invasive electrophysiology) provide a possible model of the candidate brain sources driving the effect observed at the sensor level. The source showing the interactive effect in our study is the left auditory cortex. More details and statistics will be provided in the reviewed version of the manuscript.

      Reviewer #2 (Public Review):

      1. Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of long-term priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Response: We will consider the suggestion of reviewer #2 for the new version of the manuscript. Overall, we believe that the novelty of the current study lies in bridging together findings from two research fields - basic auditory neuroscience and cross-linguistic research - to provide evidence for a predictive coding model in the auditory that uses long-term priors to make perceptual inferences.

      1. Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      Response: A similar point was advanced by Reviewer #1. We tried to clarify our hypothesis (see above). We will consider including this interpretation in the updated version of the manuscript.

      1. In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      Response: We thank the reviewer for this comment, which is indeed very pertinent. Below are some comments highlighting our thoughts on this.

      1) We will explore in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed very interesting. However, we believe that even if a phase modulation by language experience is found, it would not negate the possibility that the group differences in the MMN are driven by different long-term predictions. Rather, since the hypothesized phase differences would be driven by long-term linguistic experience, phase entrainment may reflect a mechanism through which long-term predictions are carried. On this point, we agree with the Reviewer when says that “this view would not change the impact of the results but add depth to their interpretation”.

      2) Related to the point above: Despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      3) Despite the plausibility of the view proposed by reviewer #2, many studies in the auditory neuroscience literature putatively consider the MMN as an index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). There are good reasons to believe that also in our study the MMN reflects, at least in part, an error response.

      In the updated version of the manuscript, we will include a paragraph discussing the possibility that the reported group differences in the omission MMN might be partially accounted for by differences in neural entrainment to the rhythmic sound sequences.

      Reviewer #3 (Public Review):

      The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      Response: We appreciate the positive feedback from Reviewer #3. Concerning this weakness highlighted: we agree with Reviewer #3 that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Overall, we hope this work will lead to more studies using cross-linguistic/cultural comparisons to assess the effect of experience on neural processing. In the context of the present study, we believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it.”

      References

      1. Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      2. Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      3. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 2263-2271.

      4. Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      5. Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      6. Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      7. Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      8. Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      9. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      10. Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria, combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low-transmission settings. This paper focuses on Magude and Matutuine, two districts in southern Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again, strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex, and other factors. These data have practical implications for public health strategies aiming for malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study lies in the combination of different sources of data - epidemiological, travel, and genetic data - to estimate importation probabilities, and the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      Thank you for your review and consideration. As mentioned, we state in the manuscript the limitations related to sample sizes and travel reports. We aim to continue this study with new prospective data, aiming to address these limitations.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data. The work is well-written, very organized, and brings important contributions both to malaria control efforts in Mozambique and to the scientific community. Understanding the origin of cases is essential for designing more effective control measures and elimination strategies.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      The Bayesian rationale is sound and well justified; however, the formulation appears to present an inconsistency that is replicated in both the main text and the Supplementary Material.

      In fact, one of the questions that remains unanswered is the overall contribution of importation events to transmission in the areas. While the Bayesian classifier does not quantify this, our future analysis will focus on combining outbreak detection, genetic clustering and importation classification to quantify the contribution of imported cases to outbreak resurgence and to the overall transmission.

      Thank you for pointing out the inconsistency in the final formula. In fact, the final formula corresponds to P(I<sub>A</sub> | G), instead to i>P(I<sub>A</sub>), so:

      instead of

      We will correct this error in a new version of the manuscript.

      Reviewer #3 (Public review):

      The authors present an important approach to identify imported P. falciparum malaria cases, combining genetic and epidemiological/travel data. This tool has the potential to be expanded to other contexts. The data was analyzed using convincing methods, including a novel statistical model; although some recognized limitations can be improved. This study will be of interest to researchers in public health and infectious diseases.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to help efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for risk factor analysis, which can also be affected by travel recall bias. Additionally, the authors used a proxy for transmission intensity and assumed some conditions for the genetic variable when calculating the importation probability for specific scenarios. The weaknesses were assessed by the authors.

      We acknowledge the limitations commented by the reviewer. We have the following plans to address the limitations. We will repeat the study for our data collected in 2023, which this time contains a good representation of all the provinces of Mozambique, and completeness of the metadata collection was ensured by implementing a new protocol in January 2023. Regarding the proxy for transmission intensity, we will refine the model by integrating monthly estimates of malaria incidence (previously calibrated to address testing and reporting rates) from the DHIS2 data, taking also into account the date of the reported cases in the analysis.

    1. Author Response

      We are grateful to the editors for considering our manuscript and facilitating the peer review process. Importantly, we would like to express our gratitude to reviewers for their constructive comments. Given eLife’s publishing format, we provide an initial author response now, which will be followed by a revised manuscript in the near future. Please find our responses below.

      eLife Assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Reviewer 1

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      • Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      • Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      • Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      • Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their subjective feelings. It might have been better to query participants about perceived stimulus intensity levels. This per- spective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the rele- vance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.1- 2.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Reviewer 2

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential impli- cations for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Thank you very much for these positive comments.

      Reviewer 3

      We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally trans- formed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens. Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines sig- nificance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the x- axis and the recovered parameters on the y-axis would effectively convey this missing information. Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Thank for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regula- tion.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. Author response:

      We thank the reviewers for their thorough reading and thoughtful feedback. Below, we provisionally address each of the concerns raised in the public reviews, and outline our planned revision that aims to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile.

      Reviewer 1:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, in our revision of the manuscript, we will make sure to clarify the kinds of resources the experiment involved, and highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains.

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We thank the reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. We will attempt to implement this as an alternative model and compare it with the current model.  

      We also acknowledge the importance of avoiding a potential "jangle fallacy". We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is able and willing to invest (see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modelling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Ultimately, even in the model suggested by the Reviewer, there would need to be a dedicated variable representing elasticity, such as the probability of sloped controllability functions. A single-process account thus allows that different aspects of this process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, ‘elasticity of controllability bias’ and ‘maximum controllability bias’) is consistent with a common construct account. 

      That said, given the Reviewer’s comments, we believe that some of the terminology we used may have been misleading. In our planned revision, we will modify the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. 

      Reviewer 2:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the reviewer for highlighting the lack of clarity in our concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and a planned revision of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case environments can differ in the degree to which they are elastic. For further details on this formal definition, see our response to Reviewer 3 below. We will make these necessary clarifications in the revised manuscript. 

      Importantly, whether an environment is more or less elastic does not determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability<sup>1</sup>: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P(S' \= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability<sup>2</sup>: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      I(S'; A, C | S) = H(S'|S) - H(S'|S, A, C)

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state 𝑆, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’3,4. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] \= 1 bit

      For the elastic environment:

      P(goal) \= (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 \= .33, P(non-goal) \= .67  H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] \= .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C \= 0) \= -[.2 × log<sub>2</sub>(.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub>(.6) + .4 × log<sub>2</sub>(.4)] \= .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × <sub>2</sub>(.1) + .9 × <sub>2</sub>(.9)] \= .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × <sub>2</sub>(.2) + .8 × <sub>2</sub> .8)] \= .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 \= .52 bits

      Step 3: Calculating I(S' | A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and goal reaching is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We will amend the manuscript to clarify this distinction between controllability and its elasticity.

      Reviewer 3:

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We will clarify this in our revision of the manuscript.

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability(1) as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions(2,3) would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We will add this formal definition to the manuscript.  

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%]). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location, which was revealed together with the outcome (since depending on the starting location, the treasure location was automatically reached by walking). To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We will include this new analysis in the revised manuscript.

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity. Thus, in our revision of the manuscript, we will add a discussion concerning possible alternative models.  

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. Our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improve control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We will clarify the scope of the present conclusions in the revised manuscript.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance. This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations hold for the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we will now report in the supplementary materials along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Furthermore, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, bottom plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p\=.03 permutation test) to the observed canonical correlation. Participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of controlrelated psychopathology to which SOA contributed significantly.  

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort(7), whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression(5-6).

      We will revise the text to better clarify the advantageous and disadvantageous of our analytical approach, and the conclusions that can and cannot be drawn from it.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (λ) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 22). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝝐<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝝐<sub>elasticity</sub>→ λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝝐<sub>elasticity</sub>). This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (Author response image 1) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We will incorporate these clarifications and additional analysis in our revised manuscript.

      Author response image 1.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were merely meant to describe the relationship between model parameters and modelindependent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, instead of a higher controllability bias primarily associating with futile investment of resources in uncontrollable environments, it could have been primarily associated with more proper investment of resources in high-controllability environments. Additionally, we believe these analyses are of value to readers who seek to understand the role of different parameters in the model. In our planned revision, we will clarify that the relevant analyses are merely descriptive. 

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. 

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates

      (a<sub>elastic≥1</sub>, a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. 

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We will discuss the Reviewer’s suggestion for a potentially more accurate model in the revised manuscript. 

      References

      (1) Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2) Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3) Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4) Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5) Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6) Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7) Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response: 

      We thank the reviewers for their feedback on our paper. We have taken all their comments into account in revising the manuscript. We provide a point-by-point response to their comments, below.

      Reviewer #1:

      Major comments:

      The manuscript is clearly written with a level of detail that allows others to reproduce the imaging and cell-tracking pipeline. Of the 22 movies recorded one was used for cell tracking. One movie seems sufficient for the second part of the manuscript, as this manuscript presents a proof-of-principle pipeline for an imaging experiment followed by cell tracking and molecular characterisation of the cells by HCR. In addition, cell tracking in a 5-10 day time-lapse movie is an enormous time commitment.

      My only major comment is regarding "Suppl_data_5_spineless_tracking". The image file does not load.

      It looks like the wrong file is linked to the mastodon dataset. The "Current BDV dataset path" is set to "Beryl_data_files/BLB mosaic cut movie-02.xml", but this file does not exist in the folder. Please link it to the correct file.

      We have corrected the file path in the updated version of Suppl. Data 5.

      Minor comments:

      The authors state that their imaging settings aim to reduce photo damage. Do they see cell death in the regenerating legs? Is the cell death induced by the light exposure or can they tell if the same cells die between the movies? That is, do they observe cell death in the same phases of regeneration and/or in the same regions of the regenerating legs?

      Yes, we observe cell death during Parhyale leg regeneration. We have added the following sentence to explain this in the revised manuscript: "During the course of regeneration some cells undergo apoptosis (reported in Alwes et al., 2016). Using the H2B-mRFPruby marker, apoptotic cells appear as bright pyknotic nuclei that break up and become engulfed by circulating phagocytes (see bright specks in Figure 2F)."

      We now also document apoptosis in regenerated legs that have not been subjected to live imaging in a new supplementary figure (Suppl. Figure 3),  and we refer to these observations as follows: "While some cell death might be caused by photodamage, apoptosis can also be observed in similar numbers in regenerating legs that have not been subjected to live imaging (Suppl. Figure 3)."

      Based on 22 movies, the authors divide the regeneration process into three phases and they describe that the timing of leg regeneration varies between individuals. Are the phases proportionally the same length between regenerating legs or do the authors find differences between fast/slow regenerating legs? If there is a difference in the proportions, why might this be?

      Both early and late phases contribute to variation in the speed of regeneration, but there is no clear relationship between the relative duration of each phase and the speed of regeneration. We now present graphs supporting these points in a new supplementary figure (Suppl. Figure 2).  

      To clarify this point, we have added the following sentence in the manuscript: "We find that the overall speed of leg regeneration is determined largely by variation in the speed of the early (wound closure) phase of regeneration, and to a lesser extent by variation in later phases when leg morphogenesis takes place (Suppl. Figure 2 A,B). There is no clear relationship between the relative duration of each phase and the speed of regeneration (Suppl. Figure 2 A',B')."

      Based on their initial cell tracing experiment, could the authors elaborate more on what kind of biological information can be extracted from the cell lineages, apart from determining which is the progenitor of a cell? What does it tell us about the cell population in the tissue? Is there indication of multi- or pluripotent stem cells? What does it say about the type of regeneration that is taking place in terms of epimorphosis and morphallaxis, the old concepts of regeneration?

      In the first paragraph of Future Directions we describe briefly the kind of biological information that could be gained by applying our live imaging approach with appropriate cell-type markers (see below). We do not comment further, as we do not currently have this information at hand. Regarding the concepts of epimorphosis and morphallaxis, as we explain in Alwes et al. 2016, these terms describe two extreme conditions that do not capture what we observe during Parhyale leg regeneration. Our current work does not bring new insights on this topic.

      Page 5. The authors mention the possibility of identifying the cell ID based on transcriptomic profiling data. Can they suggest how many and which cell types they expect to find in the last stage based on their transcriptomic data?

      We have added this sentence: "Using single-nucleus transcriptional profiling, we have identified approximately 15 transcriptionally-distinct cell types in adult Parhyale legs (Almazán et al., 2022), including epidermis, muscle, neurons, hemocytes, and a number of still unidentified cell types."

      Page 6. Correction: "..molecular and other makers.." should be "..molecular and other markers.."

      Corrected

      Page 8. The HCR in situ protocol probably has another important advantage over the conventional in situ protocol, which is not mentioned in this study. The hybridisation step in HCR is performed at a lower temperature (37˚C) than in conventional in situ hybridisation (65˚C, Rehm et al., 2009). In other organisms, a high hybridisation temperature affects the overall tissue morphology and cell location (tissue shrinkage). A lower hybridisation temperature has less impact on the tissue and makes manual cell alignment between the live imaging movie and the fixed HCR in situ stained specimen easier and more reliable. If this is also the case in Parhyale, the authors must mention it.

      This may be correct, but all our specimens were treated at 37˚C, so we cannot assess whether hybridisation temperature affects morphological preservation in our specimens.

      Page 9. The authors should include more information on the spineless study. What been is spineless? What do the cell lineages tell about the spineless progenitors, apart from them being spread in the tissue at the time of amputation? Do spineless progenitors proliferate during regeneration? Do any spineless expressing cells share a common progenitor cell?

      We now point out that spineless encodes a transcription factor. We provide a summary of the lineages generating spineless-expressing cells in Suppl. Figure 6, and we explain that "These epidermal progenitors undergo 0, 1 or 2 cell divisions, and generate mostly spineless-expressing cells (Suppl. Figure 5)."

      Page 10. Regarding the imaging temperature, the Materials and Methods state "... a temperature control chamber set to 26 or 27˚C..."; however, in Suppl. Data 1, 26˚C and 29˚C are indicated as imaging temperatures. Which is correct?

      We corrected the Methods by adding "with the exception of dataset li51, imaged at 29°C"

      Page 10. Regarding the imaging step size, the Materials and Methods state "...step size of 1-2.46 µm..."; however, Suppl. Data 1 indicate a step size between 1.24 - 2.48 µm. Which is correct?

      We corrected the Methods.

      Page 11. Correct "...as the highest resolution data..." to "...at the highest resolution data..."

      The original text is correct ("standardised to the same dimensions as the highest resolution data").

      Page 11. Indicate which supplementary data set is referred to: "Using Mastodon, we generated ground truth annotations on the original image dataset, consisting of 278 cell tracks, including 13,888 spots and 13,610 links across 55 time points (see Supplementary Data)."

      Corrected

      p. 15. Indicate which supplementary data set is referred to: "In this study we used HCR probes for the Parhyale orthologues of futsch (MSTRG.441), nompA (MSTRG.6903) and spineless (MSTRG.197), ordered from Molecular Instruments (20 oligonucleotides per probe set). The transcript sequences targeted by each probe set are given in the Supplementary Data."

      Corrected

      Figure 3. Suggestion to the overview schematics: The authors might consider adding "molting" as the end point of the red bar (representing differentiation).

      The time of molting is not known in the majority of these datasets, because the specimens were fixed and stained prior to molting. We added the relevant information in the figure legend: "Datasets li-13 and li-16 were recorded until the molt; the other recordings were stopped before molting."

      Figure 4B': Please indicate that the nuclei signal is DAPI.

      Corrected

      Supplementary figure 1A. Word is missing in the figure legend: ...the image also shows weak…

      Corrected

      Supplementary Figure 2: Please indicate the autofluorescence in the granular cells. Does it correspond to the yellow cells?

      Corrected

      Video legend for video 1 and 2. Please correct "H2B-mREFruby" to "H2B-mRFPruby".

      Corrected

      Reviewer #2:

      Major comments:

      MC 1. Given that most of the technical advances necessary to achieve the work described in this manuscript have been published previously, it would be helpful for the authors to more clearly identify the primary novelty of this manuscript. The abstract and introduction to the manuscript focus heavily on the technical details of imaging and analysis optimization and some additional summary of the implications of these advances should be included here to aid the reader.

      This paper describes a technical advance. While previous work (Alwes et al. 2016) established some key elements of our live imaging approach, we were not at that time able to record the entire time course of leg regeneration (the longest recordings were 3.5 days long). Here we present a method for imaging the entire course of leg regeneration (up to 10 days of imaging), optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining in cuticularised adult legs (an important technical breakthrough in this experimental system), which we combine with live imaging to determine the fate of tracked cells. We have revised the abstract and introduction of the paper to point out these novelties, in relation to our previous publications.

      In the abstract we explain: "Building on previous work that allowed us to image different parts of the process of leg regeneration in the crustacean Parhyale hawaiensis, we present here a method for live imaging that captures the entire process of leg regeneration, spanning up to 10 days, at cellular resolution. Our method includes (1) mounting and long-term live imaging of regenerating legs under conditions that yield high spatial and temporal resolution but minimise photodamage, (2) fixing and in situ staining of the regenerated legs that were imaged, to identify cell fates, and (3) computer-assisted cell tracking to determine the cell lineages and progenitors of identified cells. The method is optimised to limit light exposure while maximising tracking efficiency."

      The introduction includes the following text: "Our first systematic study using this approach presented continuous live imaging over periods of 2-3 days, capturing key events of leg regeneration such as wound closure, cell proliferation and morphogenesis of regenerating legs with single-cell resolution (Alwes et al., 2016). Here, we extend this work by developing a method for imaging the entire course of leg regeneration, optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining of gene expression in cuticularised adult legs, which we combine with live imaging to determine the fate of tracked cells."

      MC 2. The description of the regeneration time course is nicely detailed but also very qualitative. A major advantage of continuous recording and automated cell tracking in the manner presented in this manuscript would be to enable deeper quantitative characterization of cellular and tissue dynamics during regeneration. Rather than providing movies and manually annotated timelines, some characterization of the dynamics of the regeneration process (the heterogeneity in this is very very interesting, but not analyzed at all) and correlating them against cellular behaviors would dramatically increase the impact of the work and leverage the advances presented here. For example, do migration rates differ between replicates? Division rates? Division synchrony? Migration orientation? This seems to be an incredibly rich dataset that would be fascinating to explore in greater detail, which seems to me to be the primary advance presented in this manuscript. I can appreciate that the authors may want to segregate some biological findings from the method, but I believe some nominal effort highlighting the quantitative nature of what this method enables would strengthen the impact of the paper and be useful for the reader. Selecting a small number of simple metrics (eg. Division frequency, average cell migration speed) and plotting them alongside the qualitative phases of the regeneration timeline that have already been generated would be a fairly modest investment of effort using tools that already exist in the Mastodon interface, I would roughly estimate on the order of an hour or two per dataset. I believe that this effort would be well worth it and better highlight a major strength of the approach.

      The primary goal of this work was to establish a robust method for continuous long-term live imaging of regeneration, but we do appreciate that a more quantitative analysis would add value to the data we are presenting. We tried to address this request in three steps:

      First, we examined whether clear temporal patterns in cell division, cell movements or other cellular features can be observed in an accurately tracked dataset (li13-t4, tracked in Sugawara et al. 2022). To test this we used the feature extraction functions now available on the Mastodon platform (see link). We could discern a meaningful temporal pattern for cell divisions (see below); the other features showed no interpretable pattern of variation.

      Second, we asked whether we could use automated cell tracking to analyse the patterns of cell division in all our datasets. Using an Elephant deep learning model trained on the tracks of the li13-t4 dataset, we performed automated cell tracking in the same dataset, and compared the pattern of cell divisions from the automated cell track predictions with those coming from manually validated cell tracks. We observed that the automated tracks gave very imprecise results, with a high background of false positives obscuring the real temporal pattern (see images below, with validated data on the left, automated tracking on the right). These results show that the automated cell tracking is not accurate enough to provide a meaningful picture on the pattern of cell divisions.

      Third, we tried to improve the accuracy of detection of dividing cells by additional training of Elephant models on each dataset (to lower the rate of false positives), followed by manual proofreading. Given how labour intensive this is, we could only apply this approach to 4 additional datasets. The results of this analysis are presented in Figure 4.

      Author response image 1.

      MC 3. The authors describe the challenges faced by their described approach:

      Using this mode of semi-automated and manual cell tracking, we find that most cells in the upper slices of our image stacks (top 30 microns) can be tracked with a high degree of confidence. A smaller proportion of cell lineages are trackable in the deeper layers.

      Given that the authors quantify this in Table 1, it would aid the reader to provide metrics in the manuscript text at this point. Furthermore, the metrics provided in Table 1 appear to be for overall performance, but the text describes that performance appears to be heavily depth dependent. Segregating the performance metrics further, for example providing DET, TRA, precision and recall for superficial layers only and for the overall dataset, would help support these arguments and better highlight performance a potential adopter of the method might expect.

      In the revised manuscript we have added data on the tracking performance of Elephant in relation to imaging depth in Suppl. Figure 3. These data confirm our original statement (which was based on manual tracking) that nuclei are more challenging to track in deeper layers.

      We point to these new results in two parts of the paper, as follows: "A smaller proportion of cells are trackable in the deeper layers (see Suppl. Figure 3)", and "Our results, summarised in Table 1A, show that the detection of nuclei can be enhanced by doubling the z resolution at the expense of xy resolution and image quality. This improvement is particularly evident in the deeper layers of the imaging stacks, which are usually the most challenging to track (Suppl. Figure 3)."

      MC 4. Performance characterization in Table 1 appears to derive from a single dataset that is then subsampled and processed in different ways to assess the impact of these changes on cell tracking and detection performance. While this is a suitable strategy for this type of optimization it leaves open the question of performance consistency across datasets. I fully recognize that this type of quantification can be onerous and time consuming, but some attempt to assess performance variability across datasets would be valuable. Manual curation over a short time window over a random sampling of the acquired data would be sufficient to assess this.

      We think that similar trade-offs will apply to all our datasets because tracking performance is constrained by the same features, which are intrinsic to our system; e.g. by the crowding of nuclei in relation to axial resolution, or the speed of mitosis in relation to the temporal resolution of imaging. We therefore do not see a clear rationale for repeating this analysis. On a practical level, our existing image datasets could not be subsampled to generate the various conditions tested in Table 1, so proving this point experimentally would require generating new recordings, and tracking these to generate ground truth data. This would require months of additional work.

      A second, related question is whether Elephant would perform equally well in detecting and tracking nuclei across different datasets. This point has been addressed in the Sugawara et al. 2022 paper, where the performance of Elephant was tested on diverse fluorescence datasets.

      Reviewer #3:

      Major comments:

      • The authors should clearly specify what are the key technical improvements compared to their previous studies (Alwes et al. 2016, Elife; Konstantinides & Averof 2014, Science). There, the approaches for mounting, imaging, and cell tracking are already introduced, and the imaging is reported to run for up to 7 days in some cases.

      In Konstantinides and Averof (2014) we did not present any live imaging at cellular resolution. In Alwes et al. (2016) we described key elements of our live imaging approach, but we were never able to record the entire time course of leg regeneration. The longest recordings in that work were 3.5 days long.

      We have revised the abstract and introduction to clarify the novelty of this work, in relation to our previous publications. Please see our response to comment MC1 of reviewer 2.

      • While the authors mention testing the effect of imaging parameters (such as scanning speed and line averaging) on the imaging/tracking outcome, very little or no information is provided on how this was done beyond the parameters that they finally arrived to.

      Scan speed and averaging parameters were determined by measuring contrast and signal-to-noise ratios in images captured over a range of settings. We have now added these data in Supplementary Figure 1.

      • The authors claim that, using the acquired live imaging data across entire regeneration time course, they are now able to confirm and extend their description of leg regeneration. However, many claims about the order and timing of various cellular events during regeneration are supported only by references to individual snapshots in figures or supplementary movies. Presenting a more quantitative description of cellular processes during regeneration from the acquired data would significantly enhance the manuscript and showcase the usefulness of the improved workflow.

      The events we describe can be easily observed in the maximum projections, available in Suppl. Data 2. Regarding the quantitative analysis, please see our response to comment MC2 of reviewer 2.  

      • Table 1 summarizes the performance of cell tracking using simulated datasets of different quality. However only averages and/or maxima are given for the different metrics, which makes it difficult to evaluate the associated conclusions. In some cases, only 1 or 2 test runs were performed.

      The metrics extracted from each of the three replicates, per dataset, are now included in Suppl. Data 4.

      We consistently used 3 replicates to measure tracking performance with each of the datasets. The "replicates" column label in Table 1 referred to the number of scans that were averaged to generate the image, not to the replicates used for estimating the tracking performance. To avoid confusion, we changed that label to "averaging".

      • OPTIONAL: An imaging approach that allows using the current mounting strategy but could help with some of the tradeoffs is using a spinning-disk confocal microscope instead of a laser scanning one. If the authors have such a system available, it could be interesting to compare it with their current scanning confocal setup.

      Preliminary experiments that we carried out several years ago on a spinning disk confocal (with a 20x objective and the CSU-W1 spinning disk) were not very encouraging, and we therefore did not pursue this approach further. The main problem was bad image quality in deeper tissue layers.

      Minor comments:

      • The presented imaging protocol was optimized for one laser wavelength only (561 nm) - this should be mentioned when discussing the technical limitations since animals tend to react differently to different wavelengths. Same settings might thus not be applicable for imaging a different fluorescent protein.

      In the second paragraph of the Results section, we explain that we perform the imaging at long wavelengths in order to minimise photodamage. It should be clear to the readers that changing the excitation wavelength will have an impact for long-term live imaging.

      • For transferability, it would be useful if the intensity of laser illumination was measured and given in the Methods, instead of just a relative intensity setting from the imaging software. Similarly,more details of the imaging system should be provided where appropriate (e.g., detector specifications).

      We have now measured the intensity of the laser illumination and added this information in the

      Methods: "Laser power was typically set to 0.3% to 0.8%, which yields 0.51 to 1.37 µW at 561 nm (measured with a ThorLabs Microscope Slide Power Sensor, #S170C)."

      Regarding the imaging system and the detector, we provide all the information that is available to us on the microscope's technical sheets.

      • The versions of analysis scripts associated with the manuscript should be uploaded to an online repository that permanently preserves the respective version.

      The scripts are now available on gitbub and online repositories. The relevant links are included in the revised manuscript.

    1. Reviewer #2 (Public Review):

      Summary:

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection when the mutation bias changes across species.

      Strengths:

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this).

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences.

      Weaknesses:

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to , CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?"

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of for some species

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by , such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon at selection-mutation-drift equilibrium in gene for an amino acid with synonymous codons is

      where is the mutation bias, is the strength of selection scaled by the strength of drift, and is the gene expression level of gene \(g\). In this case, \ and reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which . Assuming the selection-mutation-drift equilibrium model is generally adequate to model the true codon usage patterns in a genome (as I do and I think the authors do, too), the could be considered the expected observed frequency codon in gene .

      Let's re-write the in the form of Gilchrist et al., such that it is a function of mutation bias . For simplicity, we will consider just the two-codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term and can be written as

      where is the mutation rate from nucleotides to. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon at an amino acid becomes just a Bernoulli process.

      If we do this, then

      Recall that in the Gilchrist et al. framework, the reference codon has . Thus, we have recovered the Gilchrist et al. model from the formulation of under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for in equation (1).

      We can then calculate the expected RSCUS using equation (1) (using notation and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as . Assume in this case that NNG is the reference codon .

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection increases, which is desired. Note that in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If (i.e. selection does not favor either codon), then . Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay.

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.

      Strengths:

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      A recommendation should be added on when or under which conditions to use this pipeline.

      We thank the reviewer for this valuable feedback, which will be addressed in the revision. In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm.

      The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed).

      - Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered.

      - Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results.

      - Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium).

      - Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed.

      Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics.

      Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope.

      Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. If we identify suitable datasets to validate these modules, we will include them in the revised manuscript.

      To evaluate our 3D nuclei segmentation model, we plan to test it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method).

      Preliminary results are promising (see Author response image 1). We will provide quantitative comparisons of our model’s performance on these datasets, using annotations or reference predictions provided by the original authors where available.

      Author response image 1.

      Qualitative comparison of our custom Stardist3D segmentation strategy on diverse published 3D nuclei datasets. We show one slice from the XY plane for simplicity. (a) Gastruloid stained with the nuclear marker DRAQ5 imaged with an open-top dual-view and dual-illumination LSM [1]. (b) Breast cancer spheroid [2]. (c) Primary pancreatic ductal adenocarcinoma organoids imaged with confocal microscopy[2]. (d) Human colon organoid imaged with LSM laser scanning confocal microscope [2]. (e) Monolayer HCT116 cells imaged with LSM laser scanning confocal microscope [2]. (f) Fixed zebrafish embryo stained for nuclei and imaged with a Zeiss LSM 880 confocal microscopy [3].

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      - CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      - DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 2.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      - AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      - Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript. Author response image 3 displays the results qualitatively compared to our trained model Stardist-tapenade. For the revision of the paper, we will perform a comprehensive benchmark of these state-of-the-art routines, including quantitative assessment of the performance.

      Author response image 3.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately: https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      We agree with the reviewer that a morphometric analysis based on the axial views would be informative and plan to perform this analysis for the revision.

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H.T., Karatas, E., Poquillon, T., Grenci, G., Furlan, A., Dilasser, F., Mohamad Raffi, S.B., Blanc, D., Drimaracci, E., Mikec, D. and Galisot, G., 2025. Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology. Nature Methods, 22(6), pp.1343-1354.

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4: Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. Reviewer #3 (Public Review):

      The manuscript presents an intriguing explanation for why grid cell firing fields do {\em not} lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the journals audience.

      The presentation is quirky (but keep the quirkiness!).

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving.

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.<br /> We ask whether there are cyclic permutations of $[n]$ such that

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent.

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies.

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution.

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.

    1. Author Response

      Joint Public Review

      Strengths

      Overall, the idea that the PAG interacts with the BLA via the midline thalamus during a predator vs. foraging test is new and quite interesting. The authors have used appropriate tools to address their questions. The major impact in the field would be to add evidence to claims that the BLA can be downstream of the dPAG to evoke defensive behaviors. The study also adds to a body of evidence that the PAG mediates primal fear responses.

      Weaknesses

      (Anatomical concerns)

      1) The authors claim that the recordings were performed in the dorsal PAG (dPAG), but the histological images in Fig. 1B and Supplementary S2 for example show the tip of the electrode in a different subregion of PAG (ventral/lateral). They should perform a more careful histological analysis of the recording sites and explain the histological inclusion and exclusion criteria. Diagrams showing the sites of all PAG and BLA recordings, as well as all fiber optics, would be helpful.

      The PAG is composed of dorsomedial (dm), dorsolateral (dl), lateral (l), and ventrolateral (vl) columns that extend along the rostro-caudal axis of the aqueduct. The term “dorsal PAG” (dPAG) generally encompasses dmPAG, dlPAG, and lPAG, as substantiated by track-tracing, neurochemical, and immunohistochemical techniques (e.g., Bandler et al., 1991; Bandler & Keay, 1996; Carrive, 1993). As Bandler and Shipley (1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switchoff behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, all recordings were conducted within the dPAG. Also, Figures 1B and S2 in our manuscript correspond to the -6.04 mm template from Paxinos & Watson’s atlas (1998), which is shown in the left panel in Author response image 1 and is considerably anterior to the location where the vlPAG emerges, as shown in the right panel. In our revised manuscript, we will provide a detailed definition of the dPAG, inclusive of dmPAG, dlPAG, and lPAG, and support this with the referenced literature.

      Author response image 1.

      2) Prior studies investigating the role of BLA neurons during a foraging vs. robot test similar to the one used in this study should be also cited and discussed (e.g., Amir et al 2019; Amir et al 2015). These two studies demonstrated that most neurons in the basal portion of the BLA exhibit inhibitory activity during foraging behavior and only a small fraction of neurons (~4%) display excitatory activity in response to the robot (in contrast to the 25% reported in the present study). A very accurate histological analysis of BLA recording sites should be performed to clarify whether distinct subregions of the BLA encode foraging and predator-related information, as previously shown in the two described studies.

      In the revised manuscript, we will discuss papers by Amir et al. (2015) and Amir et al. (2019) that utilized a similar 'approach food-avoid predator' paradigm. These studies found a correlation between the neuronal activities in the basolateral amygdala (BL) and the velocity of animal movement during foraging, regardless of the presence or absence of predators. Specifically, the majority of BL neurons were inhibited in both conditions, with only 4.5% being responsive to predators. Consequently, Amir et al. posited that amygdala activity predominantly aligns with behavioral output such as foraging, rather than with responses to threats.

      In contrast, our body of work (Kim et al., 2018; Kong et al., 2021; the present study) reveals that the majority of neurons in the BA/BLA displayed distinct responses in pre-robot and robot sessions. Kong et al. (2021) discussed in depth several factors that may account for this discrepancy, given that both Amir et al. and our research used similar behavioral paradigms. Differences in apparatus features, experimental procedures, and data analysis methodologies (refer to Amir et al., 2019) could be contributing to the conflicting results and interpretations concerning the significance of amygdalar neuronal activities.

      Additionally, our studies uniquely monitored the same set of amygdalar neurons during pre-robot and robot sessions, affording us the opportunity for a direct comparison of neuronal activities under different threat conditions.

      Another salient difference lines in the foraging success rates, which were markedly higher in Amir et al (~80%) compared to our studies (<3-4%). We hypothesize that there may be an inverse relationship between the pellet procurement rate and the intensity of fear. The high foraging success rate in Amir et al., which correlates with subdued amygdalar activity, stands in contrast to our findings of heightened amygdalar activity associated with a lower foraging success rate. Supporting this notion, optogeneticallyinduced amygdalar activity led naïve rats to abandon foraging and escape to the nest (Kong et al., 2021, the present study).

      3) An important claim of this study that the PAG sends predator-related signals to BLA via the PVT (Fig. 4). The authors stated that PVT neurons labeled by intra-BLA injection of the retrograde tracer CTB were activated by the predator, but a proper immunohistochemical quantification with a control group was not provided to support this claim. To provide better support for their claim, the authors should quantify the doublelabeled PVT neurons (cFos plus CTB positive neurons) during the robot test.

      As recommended, we will include a revised Fig. 4 in the manuscript to present the quantification of neurons that are double-labeled with c-Fos and CTB in the PVT. This updated figure will provide a more rigorous analysis and visual representation of the data.

      4) The AVV anterograde tracer deposit spread to a large part of the PAG, including dorsolateral and lateral PAG, and supraoculomotor regions (Fig. 4B). Is the projection to the PVT from the dPAG or other regions of the PAG?

      As previously addressed in response to Comment #1, the dPAG comprises the dmPAG, dlPAG, and lPAG. In the revised manuscript, we will acknowledge the diffusion of the AAV to the adjacent deep gray layer of the superior colliculus. Additionally, we are considering conducting more restricted AAV injections into the dPAG to verify terminal expressions in the PVT.

      (Concerns about the strength of the evidence supporting a role for the PVT)

      5) The authors conclude in the discussion section that the dPAG-amygdala pathway is involved in generating antipredatory defensive behavior. However, the current results are entirely based on correlational analyses of neural firing rate and there is no direct demonstration that the PAG provides information about the robot to the BLA. Therefore, the authors should tone down their interpretation or provide more evidence to support it by performing experiments applying inhibitory tools in the dPAG > PVT > BLA pathway and examining the impact on behavior and downstream neural firing.

      As suggested, we will moderate the assertions about the functional implications of the PVT, based on the data from anterograde and retrograde tracers, to present a more measured interpretation in the manuscript.

      (Other concerns)

      6) One of the main findings of this study is the observation that BLA neurons that are responsive to PAG photostimulation are preferentially recruited during the foraging vs. robot test (Fig. 3). However, the experimental design used to address this question is problematic because the laser photostimulation of PAG neurons preceded the foraging vs. robot test. Prior photoactivation of PAG may have caused indirect shortterm synaptic plasticity in BLA cells, which would favor the response of these cells to the robot. Please see Oishi et al, 2019 PMID: 30621738, which demonstrated that 10 trains of 20Hz photoactivation (300 pulses each) was sufficient to induce LTP in brain slices.

      After approximately eight photostimulation trials of the dPAG, with 40 pulses each, the animals entered a post-photostimulation testing phase (referred to as "Post"; Fig. 3C), lasting 10-15 minutes over an average of eight trials before robot testing. Although the PAG does not directly project to the BLA, the remote possibility of trans-synaptic plasticity in the BLA cannot be completely excluded and will be acknowledged. Additionally, it is noteworthy that Oishi et al's (2019) study applied a total of 3,000 pulses (i.e., 10 15-s trains of 20-Hz pulses) and investigated CA3-CA3 synaptic plasticity, as opposed to a total of 320 pulses (i.e., 8 2-s trains of 20-Hz pulses) in our study.

      7) The authors should perform a longitudinal analysis of the behavioral responses of the rats across the trials to clarify whether the animals habituate to the robot or not. In Figure 1E, it appears that PAG neurons fire less across the trials, which could be associated with behavioral habituation to the predator robot. If that is the case, the activity of many other PAG and BLA neurons will also most likely vary according to the trial number, which would impact the current interpretation of the results.

      In Figure 1E, the y-axis represents the Z scores of individual dPAG neurons, instead of representing repeated tests of the same neuron across multiple trials. The raster plot in Figure 1F clearly depicts that the same dPAG neurons consistently display heightened neural activity in response to the approaching robot across successive trials.

      8) In Figure 1, it is unclear why the authors compared the activity of neurons that respond to the robot activation against the activity of the neurons during the retrieval of the food pellets in the pre-robot and postrobot sessions. The best comparison would be aligning the cells that were responsive to the activation of the robot with the moment in which the animals run back to the nest after consuming the pellets during the prerobot or post-robot sessions. This would enable the authors to demonstrate that the PAG responses are directly associated with the expression of escaping behavior in the presence of the robot rather than associated with the onset of goal-directed movement in direction to the next during the pre- and post-robot sessions. A graphic showing the correlation between PAG firing rate and escape response would be also informative.

      Figure 1E compares the dPAG neural activity when animals enter a designated pellet zone (time-stamped by camera tracking) during both pre-robot and post-robot trials to the dPAG neural activity when entering the robot trigger zone (time-stamped by robot activation). We wish to clarify that rats carry the large (0.5 g) pellet back to the nest for consumption rather than consume it in the open arena before returning to the nest.

      In our study, we aimed to investigate the direct response of dPAG neurons to the looming predator and explore the communication between dPAG and BLA in relation to antipredatory defensive responses. To build upon our previous research that suggests a potential role of dPAG in conveying such responses to the BLA (Kim et al., 2013) and the immediate firing of BLA neurons in response to predatory threats (Kim et al., 2018; Kong et al., 2021), we chose to narrow our testing window to a short latency period (< 500 ms) following robot activations. This specific time window allowed us to focus on the initial stages of the threat stimulus processing and minimize potential confounding factors such as the presence of residual firing activity triggered by the robot during the animals’ escape or any activity changes induced by the animals' behavior.

      Furthermore, Figure S1C clearly demonstrates that (i) increased activity of dPAG robot cells preceded the animals’ actual turning and fleeing behavior toward the nest, as indicated by the peak values of movement speed (dark yellow), and (ii) the presence of pellets did not affect activity changes of the robot cells during pre- and post-robot sessions. These observations suggest that the heightened activity of dPAG robot cells was not due to movement changes or pellet motivation.

      Lastly, as stated in the original manuscript, the vast majority of robot cells (90.9%) did not show significant correlations between movement speed and firing rates, lending further support to the interpretation that the dPAG activity observed was not merely a reflection of movement changes.

      References

      Bandler, R., Carrive, P., & Depaulis, A. (1991). Emerging principles of organization of the midbrain periaqueductal gray matter. The midbrain periaqueductal gray matter: functional, anatomical, and neurochemical organization, 1-8.

      Bandler, R. & Keay, K. A. (1996). Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression. Progress in brain research, 107, 285-300.

      Bandler, R. & Shipley, M. T. (1994) Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends in Neurosciences, 17(9), 379-89.

      Carrive, P. (1993). The periaqueductal gray and defensive behavior: functional representation and neuronal organization. Behavioural brain research, 58(1-2), 27-47.

      Oishi, N., Nomoto, M., Ohkawa, N., Saitoh, Y., Sano, Y., Tsujimura, S., ... & Inokuchi, K. (2019). Artificial association of memory events by optogenetic stimulation of hippocampal CA3 cell ensembles. Molecular brain, 12, 1-10.

      Paxinos, G. & Watson, C. (1998). The Rat Brain in Stereotaxic Coordinates. Academic Press, San Diego. Schenberg, L. C., Póvoa, R. M. F., Costa, A. L. P., Caldellas, A. V., Tufik, S., & Bittencourt, A. S. (2005). Functional specializations within the tectum defense systems of the rat. Neuroscience & Biobehavioral Reviews, 29(8), 1279-1298.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and have also been investigated with in vitro purified protein experiments.

      Strengths:

      The key finding is that cooperative effects between multiple myosin filaments can enhance both total force and the efficiency of force generation (force per myosin). These trends were possible to obtain only because the detailed structure of the motor filaments with multiple heads is represented in the model.

      We appreciate your comments about the strength of our study.

      Weaknesses:

      It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments or can be tested in future experiments.

      Thank you for the comment. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown below, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      Author response image 1.

      As the reviewer noticed, our results were briefly compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) where different myosin isoforms were used for in vitro actin bundles. We will add more quantitative comparisons between the in vitro study and our results.

      In addition, at the end of the conclusion section, we suggested future experiments that can be used for verifying our results. In particular, experiments with synthetic myosin filaments with tunable geometry seem to be suitable for verifying our computational predictions and observations.

      The model assumptions and scientific context need to be described better.

      We apologize for the insufficient descriptions about the model. We will revise those parts to better explain model assumptions and scientific context.

      The network contractility seems to be a mere appendix to the bundle contractility which is presented in much more detail.

      We included some cases run with the two-dimensional network in this study to prove the generality of our conclusions. We included minimal preliminary results in this study because we are currently working on a follow-up study with network structures. I hope that the reviewer would understand our intention and situation.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin cross-linking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Thank you for the comments about the strength of our study.

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast amount of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. The model seems to be quantitative, but the interpretation and connection to real experiments are rather qualitative in my point of view.

      As the reviewer mentioned, all agent-based computational models for simulating the actin cytoskeleton are inevitably involved with such a large number of parameters. Some of the parameter values are not known well, so we have tuned our parameter values carefully by comparing our results with experimental observations in our previous studies since 2009. 

      We were aware of the importance of rigorous representation of unbinding and walking rates of myosin motors, so we implemented the parallel cluster model, which can predict those rates with consideration of the mechanochemical rates of myosin II, into our model. Thus, we are convincing that our motors represent myosin II.

      In our manuscript, our results were compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) several times. In particular, larger force generation with more myosin heads per thick filament was consistent between the experiment and our simulations.

      Our study can make various predictions. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and non-muscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown in Author response image 1, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same. We will add discussion about these in the revised manuscript.

      We will add more discussion about these in the revised manuscript.

      It was often difficult for me to follow what parameters were changed and what parameters were set to what numerical values when inspecting the curve shown in the figures. The manuscript could be more specific by explicitly giving numbers. For example, in the caption for Figure 6, instead of saying "is varied by changing the number of motor arms, the bare zone length, the spacing between motor arms", the authors could be more specific and give the ranges: ""is varied by changing the number of motor arms form ... to .., the bare zone length from .. to..., and the spacing between motor arms from .. to ..".

      This unspecificity is also reflected in the text: "We ran simulations with a variation in either L<sub>sp</sub> or L<sub>bz</sub>" What is the range of this variation? "When L<sub>M</sub> was similar" similar to what? "despite different N<sub>M</sub>." What are the different values for N<sub>M</sub>? These are only a few examples that show that the text could be way more specific and quantitative instead of qualitative descriptions.

      We appreciate the comment. We will specify the range of the variation in each parameter in the revised manuscript.

      In the text, after equation (2) the authors discuss assumptions about the binding of the motor to the actin filament. I think these model-related assumptions and explanations should be discussed not in the results section but rather in the "model overview" section.

      Thank you for pointing this out. We will reorganize the text in the revised manuscript.

      The lines with different colors in Figure 2A are not explained. What systems and parameters do they represent?

      The different colors used in Fig. 2A were used for distinguishing 20 cases. We will add explanation about the colors in the figure caption in the revised manuscript.

    1. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. eLife Assessment

      In their study, Neiswender et al. provide important insights into how BicD2 variants linked to spinal muscular atrophy alter dynein activity and cargo specificity. The authors present convincing evidence that disease-associated mutations lead to interactome changes, supported by additional validation of the BicD2/HOPS complex and discussion of their functional implications. This well-executed study offers invaluable datasets and a strong foundation for future exploration of disease mechanisms.

    2. Reviewer #1 (Public review):

      In this work, Neiswender and colleagues test the hypothesis that mutations in BicD2 that are associated with SMALED alter BicD2-cargo interactions. To do this, they first establish the WT BicD2 cargo interactome (using a proximity-dependent biotin ligase screen with Turbo-ID on the BicD2 C-terminus). In addition to known cargo interactors, they also identified many proteins in the HOPs complex. Interestingly, they find that the HOPs complex may interact with BicD2 in a different manner than other known cargos. The authors also show that while BicD2 is required for the HOPs complex localization, on average, depletion of BicD2 from HeLa and Cos7 cells causes HOPs and Lysosome mislocalization that is consistent with Kinesin-1 trafficking defects, rather than dynein. The authors also use proximity biotin ligase approaches to define the cargo interactome of three BicD2 variants associated with SMALED. One variant (R747C) has the most altered cargo interactome. The authors highlight one protein, in particular, GRAMD1A, that is only found in the R747C dataset and mislocalizes specifically when R747C is expressed.

      The work in this manuscript is of a very high quality and contributes important findings to the field.

      Comments on revisions:

      The authors did a great job addressing the points I brought up!

    3. Reviewer #2 (Public review):

      Neiswender et al. investigated the interactomes between wild-type BICD2 and BICD2 mutants that are associated with Spinal Muscular Atrophy with Lower Extremity Predominance (SMALED2). Although BICD2 has previously been implicated in SMALED2, it is unclear how mutations in BICD2 may contribute to disease symptoms. In this study, the authors characterize the interactome of wild-type BICD2 and identify potential new cargos including the HOPS complex. The authors then chose three SMALED2-associated BICD2 mutants and compared each mutant interactome to that of wild-type BICD2. Each mutant had a change in the interactome, with the most drastic being BICD2_R747C, a mutation in the cargo binding domain of BICD2. This mutant displayed less interaction with a potential new BICD2 cargo, the HOPS complex. Additionally, it displayed more interaction with an ER protein, GRAMD1A.

      The data in the paper is generally strong but the major conclusions of this paper need more evidence to be better supported.

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7.

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms.

      Comments on revisions:

      The investigators did a good job in responding to our initial concerns (see below). We appreciate that they used siRNA to address our first comment because they do not have a BICD2 KO cell line. We appreciated that they added a new section in the Discussion to address the limitations of the study.

      In regards to our first comment about the BICD2 WT construct localization, since they use KD to validate the interaction between their BICD2 WT construct and VPS41, it would be nice to see localization of this construct under the KD condition. However, the binding they presented in Sup. Fig 1B does look convincing, so this may not be necessary.

      Overall, I believe this revision has satisfied our previous concerns.

    4. Reviewer #3 (Public review):

      Summary:

      BicD2 is a motor adapter protein that facilitates cellular transport pathways, which are impacted by human disease mutations of BicD2 causing spinal muscular atrophy with lower extremity dominance (SMALED2). The authors provide evidence that some of these mutations result in interactome changes, which may be the underlying cause of the disease. This is supported by proximity biotin ligation screens, immunoprecipitation and cell biology assays. The authors identify several novel BicD2 interactions such as the HOPS complex that participates in the fusion of late endosomes and autophagosomes with lysosomes, which could have important functions. Three BicD2 disease mutants studied had changes in the interactome, which could be an underlying cause for SMALED2. The study extends our understanding of the BicD2 interactome under physiological conditions, as well as of the changes of cellular transport pathways that result in SMALED2. It will be of great interest for the BicD2 and dynein fields.

      Strengths:

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain of function interaction with GRAMD1A.

      Weaknesses:

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant?

      Major points:

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript.<br /> (2) In the biotin proximity ligation assay, a large number of BicD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why particularly GRAMD1A was chosen as gain of function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant but GRAMD1A also interacts with WT BicD2.<br /> (3) Furthermore, functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyper activation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data.

      Comments on revisions:

      After a major revision, the manuscript is much improved. Additional evidence for the HOPS complex/BicD2 interaction was provided (the interaction was identified in multiple independent screens), and while the authors unfortunately were not able to confirm a direct interaction between BicD2 and the HOPS complex, additional caveats were added in the result section, which clearly state these limitations. The authors also included a very nice discussion of potential physiological roles of the GRAMD1A mislocalization in the disease mutant, which could potentially affect cholesterol transport and homostatis. Limitations of the presented approaches were clearly described as caveats.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I was surprised at the effect of BicD2 knockdown on LAMP (and VPS41) localization, which really suggests that in HeLa and Cos7 cells, BicD2 regulation of Kinesin-1 (rather than dynein) is the primary driver of lysosome localization. The KIF5B-knockout rescue of the BicD2overexpression phenotype was a very powerful result that supports this conclusion. Have the authors looked at other cargos, eg, Golgi or centrosomes in G2? Can the authors include more discussion about what this result means or how they imagine dynein and kinesin-1's interaction with BicD2 is regulated? 

      We have performed this experiment as requested by the reviewer. The BICD2 siRNA also resulted in Golgi fragmentation and localization defects of the centrosome in cells that are in G2 phase of the cell cycle (Supplemental Fig. 2E-H).

      We have also added additional discussion related to how BICD2 might couple cargos to opposite polarity motors (lines 440-447). Interestingly, the lysosome motility defect we observe upon BICD2 knock down has similarity to the RAB6A trafficking phenotype. In both cases, what one sees is a sharp reduction in the number of motile particles rather than a reversal in the direction of motility. This suggests that both motors are involved in the steady state distribution of these cargoes.

      (2) Have the authors examined if the SMALED mutants show diminished or increased binding to KIF5B? While the authors are correct that the mutations could hyperactivate dynein because they reduce BicD2 autoinhibition, it is possible that the SMALED mutants hyperactivate dynein because they no longer bind kinesin. This would be particularly interesting, given the complex relationship between BicD2 regulation of dynein and kinesin that the authors show in Figure 3. 

      Thank you for this suggestion. We had not considered this. We have added this experiment in the revised manuscript (Supplemental Fig. 3H, I). We find that the interaction between wild-type BICD2 and KIF5B is only slightly above the control. This is consistent with published findings that indicate that although the isolated CC2 domain of BICD2 is able to interact with KIF5B, the binding is lower for the full-length protein. This is most likely due to the intramolecular interaction between the N and C-termini of BICD2 partially blocking the binding site. Interestingly, however, all three mutants display a reduced interaction with KIF5B, with the reduction being most severe for the cargo domain binding mutants. Thus, as we discuss in the revised manuscript, dynein hyperactivity likely results from increased binding to dynein and a concurrent reduction in binding to KIF5B.

      (3) What is already known about the protein GRAMD1A? Did the authors choose to focus on GRAMD1A because it was the only novel interaction found in the SMALED mutant interactomes, or was this protein interesting for a different reason? Does the known function of GRAMD1A explain the potential dysfunction of cells expressing BICD2_R747C or patients who have this mutation? More discussion of this protein and why the authors focused on it would really strengthen the manuscript. 

      We chose to focus on GRAMD1A for a few reasons. The protein that displayed the highest gain of function interaction with BICD2_R747C in our proteomic analysis was Plastin. However, using at least one antibody against Plastin, we were not able to validate this result. In addition, we had previously performed a proteomic screen using a BICD2_R747A (arginine to alanine) mutation and had compared the interactome of this mutant to the wild-type protein. Plastin was not recovered in that screen but the top hit was GRAMD1A. Given that we isolated GRAMD1A in two separate screens as a gain of function interaction, we believed the result was worth focusing on for followup studies. 

      GRAMD1A (as well as its paralogs GRAMD1B and C) function in non-vesicle transport of accessible cholesterol from the plasma membrane to the ER. We have added additional discussion on GRAMD1A (lines 484-495). While we observe a relocalization of GRAMD1A in mutant expressing cells, we do not know whether this is sufficient to result in cholesterol transport defects. There are several routes for cholesterol uptake, with the GRAMD1A pathway representing just one these routes. 

      Reviewer #2 (Public review):

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7. 

      The localization of our exogenous BICD2-mNeon constructs is similar to what others have seen using GFP tagged versions of the protein (for example Peeters et al., 2013). In addition, in the experiment shown in the initial version of the paper, we were focusing on the centrosomal localization of BICD2. However, our BICD2-mNeon construct is also observed at the Golgi, in addition to its localization throughout the cell (Supplemental Fig. 3C). 

      We attempted to perform a co-immunoprecipitation experiment using endogenous proteins as suggested by the reviewer. Although a rabbit polyclonal antibody was able to coimmunoprecipitate RANBP2 with BICD2, the antibody complex of heavy and light chains comigrated with the VPS41 band and was abundantly detected by the secondary antibody used in the western blot. Thus, we were not able to make a conclusion regarding whether or not VPS41 was present in the co-immunoprecipitate. We attempted the experiment using a mouse monoclonal antibody against BICD2. However, this antibody failed in the immunoprecipitation experiment and we could not detect either RANBP2 (a validated cargo) or VPS41. Although the VPS41 antibody we used in the paper works for western blot, it does not recognize the native protein. Thus, despite our best efforts, we are not able to draw a valid conclusion from these coip experiments.

      It is beyond the scope of the revision to perform the entire experiment in a BICD2 KO cell line.  A BICD2 KO cell line does not exist and it would take several months to make such a knock out in the FLP IN HEK cells that were used in this manuscript. However, we have validated the interaction between BICD2 and VPS41 in cells that have been depleted of endogenous BICD2 (Supplemental Fig. 1B). The transgenic constructs contain silent mutations that make them refractory to bicD2 siRNA1. Thus, although endogenous BICD2 is depleted by the siRNA treatment, wild-type and mutant BICD2_TurboID is not. A similar approach was also used to demonstrate the gain of function interaction between BICD2_R747C and GRAMD1A in cells depleted of endogenous BICD2 (Supplemental Fig. 5A).

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments. 

      We have not been successful in purifying full length BICD2 in bacteria, perhaps due to solubility issues. However, we have added several experiments to further examine the nature of the BICD2-HOPS complex interaction.

      We have performed the experiment as requested. We find that BICD2_delCC1 is able to bind VPS41, but not as efficiently as the full length protein. However, unlike the CC3 cargo binding construct, the BICD2_delCC1 construct also displays reduced binding to RANBP2 (Supplemental Fig. 1D). We attribute this defect to either the intramolecular BICD2 interaction blocking cargo binding or potentially to a folding defect in the BICD2_delCC1 construct. Thus, although we performed this experiment as suggested by the reviewer, we are not able to make a solid conclusion.

      Based on the fact that VPS41 was the most abundantly detected HOPS component in the BICD2 interactome, we hypothesized that it was the point of direct contact between BICD2 and the HOPS complex. However, contrary to our hypothesis, depletion of VPS41 did not compromise the association between BICD2 and VPS16 and VPS18 (Supplemental Fig. 1E). Thus, we conclude that there are multiple points of contact between BICD2 and the HOPS complex, with BICD2 perhaps recognizing a common motif or domain present in these proteins.

      We next attempted to map the interaction site using Alphafold2 multimer. Although we were able to use this platform to predict a high confidence interaction between BICD2 and RAB6A (consistent with published results), this did not yield a high confidence prediction for the BICD2HOPS complex interaction.

      Ultimately although we added several new experiments, we were not able to determine the minimal region for binding, nor whether the interaction is direct or indirect. These caveats are clearly stated in the revised manuscript. Regardless of whether the interaction is direct or indirect however, it is noteworthy that the association between BICD2 and the HOPS complex is reduced by the R747C SMALED2 mutation.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms. 

      This point was addressed in the general discussion. Given the complexity of SMALED2 (autosomal dominant disorder; variable phenotypic severity; adult onset disorder in many instances, etc.) it is very hard to model in a cell line. One of the reasons we focused our studies on the HOPS complex and VPS41 in particular was because mutations in VPS41 are associated with spinocerebellar ataxia, a neurodevelopment disorder. However, we cannot conclude whether the reduction/loss of interaction of BICD2 with the HOPS complex is causative for disease symptoms. We also cannot conclude at present whether the mis-targeting of GRAMD1A is causative for disease symptoms. We have discussed these caveats in the revised manuscript and have included a section in the discussion that specifically lists the limitations of our study (lines 511-530).

      With that said, we can conclude that mutations in the cargo binding domain of BICD2 result in dynein hyperactivity, altered BICD2 localization in hippocampal neurons, and reduced neurite growth. Given that we observe interactome changes in HEK cells, it is plausible that interactome changes also exist in motor neurons. However, even in the absence of interactome changes, hyperactivation of dynein alone can result in cargo trafficking defects; the same cargos can be excessively localized in the soma vs the axon. As noted previously, however, a thorough examination of these points will require the use of genetically engineered motor neurons and is beyond the scope of the current study.

      Reviewer #3 (Public review):

      Strengths: 

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain-of-function interaction with GRAMD1A. 

      Weaknesses: 

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant? 

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript. 

      As discussed for reviewer 2 (point 2), we have added several experiments to better characterize the BICD2-HOPS complex interaction.

      We chose to focus on the HOPS complex for a few reasons. The list of interactions that displayed a >2 fold enrichment vs control was actually not that large (66 proteins). Within this list, we identified 4 out of 6 HOPS components and VPS41 was the 5th most enriched protein in the BICD2 interactome (RANBP2 by contrast was #16 on this list). Furthermore, the BICD2_R747C mutation resulted in greatly reduced interaction of BICD2 with the HOPS complex, whereas its interaction with dynein was increased. These results indicate that these proteins are not simply immunoprecipitating with the BICD2/dynein complex. Apart from the HOPS complex, lysosomal proteins were not present in the interactome, making it unlikely that they were identified due to non-specific interactions between BICD2 and co-precipitating lysosomes.

      (2) In the biotin proximity ligation assay, a large number of Bi cD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why, particularly GRAMD1A was chosen as a gain-of-function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant, but GRAMD1A also interacts with WT BicD2. 

      Please see the above discussion on GRAMD1A (reviewer 1, point 3). GRAMD1A comes down non-specifically with the binding control as well as BICD2_wt. We therefore conclude that wildtype BICD2 does not specifically interact with GRAMD1A above background levels (Fig. 7, compare the control lane vs BICD2-wt).

      (3) Furthermore, the functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors, a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyperactivation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data. 

      We have performed the experiment as requested by the reviewer. The re-localized GRAMD1A localizes adjacent to Pericentrin, a centrosomal marker (Supplemental Fig. 5B-F). GRAMD1A and BICD2 appear to co-localize in a ring around the Pericentrin marked centrosome.

      The re-localization of GRAMD1A to the centrosomal area by BICD2_R747C appears to be unique to this mutant, and not simply an issue of dynein hyperactivity. The other two mutants tested, BICD2_N188T and BICD2_R694C also hyperactivate dynein. However, they do not result in the same type of dramatic re-localization of GRAMD1A as we observe with the BICD2_R747C mutant. We conclude that this altered localization results from a gain of function interaction with BICD2_R747C as well as dynein hyperactivity.

      Reviewer #1 (Recommendations for the authors): 

      Please add a discussion about how the authors calculated the Cell Body enrichment shown in 5E. Is this a ratio of the BicD2 intensity in the cell body:axon? Did the authors normalize for potential differences in BicD2 variant expression? 

      Yes, it is a ratio of the intensity between the cell body and axon. This is described in the Methods section under quantification (lines 725-728). We attempted to image cells expressing similar amounts of protein.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The paper would benefit from an explanation of why the authors chose to follow up on the HOPS complex out of all proteins identified in the interactome experiment. 

      This discussion has been included in the revised manuscript.  

      (2) In panel B of Supplementary Figure 1, RFP mTurbo has a significant amount of non-specific binding to VPS18. The authors note that in the initial interactome experiment, there was a twofold enrichment of this protein in BICD2 pulldown versus control. Do the authors have a co-IP that has a similar enrichment?

      VPS18 occasionally comes down non-specifically with our RFP-TurboID control. However, the interaction is specific, because very little VPS18 comes down with the BICD2 construct lacking the cargo binding domain (Fig. 2B). An additional example of the VPS18 binding result is shown in Supplemental Fig. 1E.

      (3) In Figure 2B, there seems to be less Vps18 in the input for BICD2 delCC3-mTrbo. Do the authors have a blot where there is equal input across all conditions? This may increase the slight signal seen in the pulldown.

      The blot shown in Supplemental Fig. 1C has equivalent load for VPS18 across all lanes. Minimal binding of VPS18 is observed with the BICD2_delCC3 sample.

      (4) In Figure 3, can the authors show representative images of GFP-VPS-41 and LAMP1 localization that are at the same magnification? It currently looks as if the localization pattern differs between the two under control siRNA. Alternatively, the authors should show colocalization of the two, as the authors note both are localized to late endosomes/lysosomes. 

      We have provided additional images that are at the same magnification (Supplemental Fig. 2IK). Co-localization between GFP-VPS41 (rabbit polyclonal antibody against GFP) and LAMP1 (rabbit polyclonal antibody) is not possible. However, published studies have shown that a subset of V5 tagged VPS41 vesicles are positive for LAMP1. We have cited this study.

      (5) In Supplementary Figure 2, the authors should show the knockdown efficiency of both BICD2 siRNAs. The VPS41 staining in panel B looks like there is less perinuclear localization than with BICD2 siRNA 1. Is the because of knockdown efficiency? 

      We have included this data (Supplemental Fig. 2B). Both siRNAs are capable of depleting BICD2. However, we do see slightly more effective knock down with siRNA-1.

      (6) The data in Figure 4A would be more striking with quantification. 

      Quantifications have been provided (Supplemental Fig. 3A,B). Using a one-way Anova analysis, BICD2_R747C is the only mutant that shows significance. Variability in the binding experiment resulted in the other two mutants not showing a statistically significant change. However, the additional assays that are provided (centrosomal enrichment of BICD2 and peroxisome tethering) clearly demonstrate that the R694C mutant also results in dynein hyperactivation. It should be noted that the analysis done by Huynh et al., 2017 also showed a binding increase between BICD2 disease mutants and dynein. However, due to binding variability, their results were not not statistically significant.

      (7) Can the authors explain how centrosome enrichment is calculated in Figure 4F? The intensity of colocalization with the centrosome between mutant constructs visually does not look significantly different. Is this a ratio of centrosome localization to cell body localization? 

      We apologize for this omission. This has been added to the quantification section of the Methods (lines 721-723). Yes, it is a ratio of mean signal at the centrosome vs mean signal in the rest of the cell.

      (8) The current input blot in Supplementary Figure 4A shows increasing amounts of importin beta across the lanes. Do the authors have a blot of panel A in which the input level of importin beta is the same between constructs? Does this change the level of importin beta that is pulled down?

      Another replicate of this experiment has been shown. We have retained the original experiment as well (Supplemental Figs. 4A, B).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) In the .pdf version of the supplemental tables, the text is often cropped. It is recommended to delete the .pdf versions and just retain the Excel versions of the tables. 

      We are not sure why this occurred. Excel files were provided. In addition, the raw data from the mass spectrometry experiments will also be included with the final version of the manuscript.

      (2) Line 367: For transport of Rab6, kinesin-1 is the dominant motor, but dynein is still active and engaging in a tug of war (Serra Marquez et al 2022). 

      Thank you. We have revised our text to include this discussion. In this regard, LAMP1 vesicles are similar. Loss of BICD2 results in a greater number of stationary vesicles rather than vesicles that are excessively targeted towards the microtubules minus end.

      (3) Line 371: BicD2 is required for the transport of RanBP2 from annulate lamellae to nuclear pore complexes.

      Thank you. We have modified our text. 

      (4) Yi et al., 2023 have previously shown changed interactions of the BicD2/R747C mutant, such as decreased binding to Nup358 and increased binding to Nesprin-2, as well as functional implications for the associated brain developmental pathways, which should be acknowledged.

      We apologize for leaving this out. In the original version of the manuscript, we were attempting to keep the discussion more concise. We have added a discussion of these findings in the revised manuscript (lines 496-507).

    1. Author response:

      Reviewer #1 (Public review):

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNA-seq analyses in whole fat body tissues expressing MPC. These methods revealed hyperactivation of mTORC1 and Myc signaling in fat body cells expressing MPC in Drosophila, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses inappropriate reductions in cell size and activates signaling pathways to promote cell growth. The best characterized “sizer” pathway for mammalian cells is the CycD/CDK4 complex which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTOR pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those over-expressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc are epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. TAG abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We will carefully revise the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation.

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impacts nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author response image 1A). However, the reduction in nuclear size became significant only after 36 hours of MPC expression (Author response image 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author response image 1C and D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells. A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>²</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R<sup>²</sup> value.

      This image highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass.

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      We appreciate this valuable suggestion. In the revised manuscript, we will quantify trehalose abundance in circulation and within fat bodies. As described in the Methods section, following the approach outlined in Ugrankar-Banerjee et al., 2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. We will apply this methodology to include the trehalose measurements as part of our updated analysis.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we will quantify the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR and validate the corresponding data in Supplementary Figure 8f through western blot analysis.

      Additionally, we will assess the efficiency of pcb, pdha, dlat, pepck2, and Got2 manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this comment. We are currently working on identifying the substrates of IF172 to reveal the physiological relevant of its ubiquitination activity.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in a revised version of the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in the conclusion on pg. 11.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in the figure below, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      These results align with the reviewer's suggestion that ubiquitination occurs within the TPR-containing region. However, given the technical limitations of the MS analysis and the potential for E3-independent ubiquitination by UBCH5A, we have taken a conservative approach in interpreting these findings.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We do acknowledge in the manuscript is that the conjugation of ubiquitins to IFT172C is weak (Page 16). Future experiments of identification of potential substrates and its implications in ciliary regulation will provide further context to our in vitro ubiquitination experiments.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We agree that these mutations could provide insights into ubiquitin binding by IFT172. We are currently pursuing further mutagenesis studies on the IFT172-Ub interface based on the AF model. We however have evaluated the ubiquitin binding activity of the mutant F1715A using similar pulldowns, which showed no significant impact for the mutation on the ubiquitin binding activity of IFT172. We are yet to evaluate the impact of alternate amino acid substitutions at these positions. The I1688 mutants we cloned could not be expressed in soluble form, thus could not be used for testing in ubiquitination activity or ubiquitin binding assays.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      It is likely that IFT172 only binds ubiquitin with low affinity as indicated by our in vitro pulldowns and the AF interface. In our pull down experiment performed using the Chlamy flagella extracts, we have used extensive washes to remove non-specific interactors. This might have also excluded the identification of weak but bona fide interactors of IFT172. Additionally, we have not used any ubiquitination preserving reagents such as NEM in our pulldown buffers, exposing the cellular ubiquitinated proteins to DUB mediated proteolysis further preventing their identification in our pulldown/MS experiment.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot distinguish whether the TGFβ pathway defects arise from general protein instability or from specific loss of ubiquitin-related functions. Our experiments demonstrate that the U-box-like region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between these phenomena would require additional evidence. We will revise our discussion to more clearly acknowledge this limitation in our current understanding of the relationship between IFT172's U-box region and TGFβ pathway regulation.

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We will revise our text to more precisely state that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a potential direct IFT172 interactor" and will discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction. This more accurately reflects the current state of our understanding of this potential interaction.

      Reviewer #3:

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised manuscript, we will provide additional characterization of the cell lines, including detailed sequencing information and validation data for the IFT172 mutants. This will bring the documentation of our cell-based experiments up to the same standard as other aspects of our work.

    1. Author response:

      We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

      • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

      We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

      We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

      Author response table 1.

      Results of AMPredictor with different graph edge encodings

      Author response table 2.

      Results of AMPredictor with different ESM versions

      Author response table 3.

      Evaluation of generated sequences with different sampling numbers

      • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

      We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. Author Response

      Reviewer #1 (Public Review):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments shown were conducted independently by new researchers in the lab, using the original fly stocks. This will be more clearly stated in the revised supplement. The aim of repeating the experiments was to directly compare the consequences of impaired N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems.

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      C9C5 antibody reactivity and specificity is shown below, and this control will be added to the revised manuscript. We established the C9C5 immunoblotting protocol – and generated the blot shown in Author Response Image 1 - before any of the experiments in the manuscript were started. The immunoblot clearly shows Shh specificity similar to that of R&D AF464 anti-Shh antibodies that were previously used in the lab. The immunoblot also shows that both antibodies detect the same Shh signals in media, that C9C5 is more sensitive, and that AF464 and C9C5 detect 5E1-IP’d dual-lipidated and monolipidated soluble Shh equally well. Also note that, in our hands, C9C5 is highly specific: this antibody detects N-truncated C25S;Δ26-35Shh of increased electrophoretic mobility, but does not cause unspecific signals above or below, even if the blot is strongly overexposed (as shown here). Specific Shh detection by C9C5 is also discussed in our response to editor’s comments below.

      Cells were transfected with constructs encoding full-length C25SShh or truncated C25S;Δ26-35Shh, and proteins in serum-containing media were 5E1 immunoprecipitated or concentrated by heparin-sepharose pulldown. Dual-lipidated R&D 8908-SH was dissolved in the same medium and subjected to the same 5E1 immunoprecipitation or heparin pulldown. The blot was incubated with antibody AF464 and (after stripping) with antibody C9C5. Immunoblot analysis revealed high specificity of both antibodies and also revealed poor interactions of dual-lipidated 8908-SH with highly charged heparin.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We fully agree with this reviewer and therefore aimed to establish stable Hhat expressing cell lines several years ago. However, stable Hhat expression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat, preventing us to establish a stable line despite several attempts and tried strategies. For this reason, we established transient co-expression of Shh/Hhat from the same mRNA to at least eliminate variability between relative Shh/Hhat expression levels and to assure complete Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      This comment refers to data shown in Fig. 3 (here, no quantification of Scube2 function in Disp-/- cells had been conducted) and to qPCR data shown in Fig. 4 (here, Shh and C25AShh were compared only indirectly via dual-lipidated R&D 8908-SH, but not directly in a side-by-side experiment, and Shh variants with an N-terminal alanine or a serine were directly compared). We agree with the reviewer and therefore currently repeat qPCR assays and quantify blots to eliminate these technical shortcomings from the final manuscript.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      Our investigation identified unexpected links between Disp as a furin-activated Hh exporter, sheddase-mediated Shh release, Scube2-mediated Shh release and lipoprotein-mediated Hh transport – established modes indeed but with no previously established direct connections – that increase their relevance. We also identified a previously unknown N-processed Shh variant attached to lipoproteins and show that Disp/Scube2 function absolutely requires lipoproteins. Therefore, although we do agree that our findings are confirmatory for the above modes, they also provide new mechanistic insight and challenge the currently dominating model of Disp-mediated hand-over of dual-lipidated Hh to Scube2 chaperones (this model does not predict a role for lipoprotein particles but for both Shh lipids in signaling, for a recent discussion, see PMID 36932157). Our findings suggest an answer to the intensely debated question of whether Disp/Ptch extract cholesterol from the outer or inner plasma membrane leaflet, and suggest that N-palmitate is dispensable for signaling of lipoprotein-associated Shh to Ptch receptors. Finally, we note that previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions on their physiological relevance. Our in vivo analyses of Hh function in wing- and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      Reviewer #2 (Public Review):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      Regarding the confirmatory aspects of our work, please also refer to our response to reviewer 1. In addition, we would like to reply that our unbiased experimental approach was designed to challenge the model of Shh shedding by testing whether established Shh release regulators affect it (e.g. support it) or not. As described in our work, Disp, Scube2 and lipoproteins all contribute to increased shedding (which is new), that Disp function depends on lipoprotein presence (also new), and that lipoproteins modify the outcome of Shh shedding (dual Shh shedding versus N-shedding and lipoprotein association), which is also new.

      Regarding physiological relevance, we would like to reply that our finding that artificially generated monolipidated variants (C25SShh and ShhN) solubilize in uncontrolled manner from producing cells can explain previously observed, highly variable gain-of-function or loss-of-function phenotypes upon their overexpression in vivo 1, 2, 3, 4, 5. Our data is also supported by the observed presence of variably lipidated Shh/Hh variants in vivo 6, and the in vivo observation that complete removal of Scube activity in zebrafish embryos phenocopies a complete loss of Hh function that is bypassed by increased ligand expression - and even results in wild-type-like ectopic Shh target gene expression 7. The in vivo observations are compatible with our data but are incompatible with proposed alternative models of Scube-mediated dual-lipidated Shh extraction and continued Shh/Scube association to allow for morphogen transport.

      2) “Thus, it would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      Experimental data shown in Fig. S8B demonstrates that en-controlled expression of sheddase-resistant Hh variants blocks endogenous Hh function in the same wing disc compartment. To our knowledge, this assay is the most physiologically relevant test of the mechanism of Disp-mediated Hh release. Still, we have now started to analyze Hh from Drosophila disc tissue biochemically and hope that we can include our findings in the final manuscript.

      3) “The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter”.

      We agree with this reviewer’s assessment and are currently in the process to establish co-IP and density gradient conditions to test physical HDL/Shh interactions. The results will be included in the final version of record.

    1. Author Response

      eLife assessment

      This study presents potentially valuable results on glutamine-rich motifs in relation to protein expression and alternative genetic codes. The author's interpretation of the results is so far only supported by incomplete evidence, due to a lack of acknowledgment of alternative explanations, missing controls and statistical analysis and writing unclear to non experts in the field. These shortcomings could be at least partially overcome by additional experiments, thorough rewriting, or both.

      We thank both the Reviewing Editor and Senior Editor for handling this manuscript and will submit our revised manuscript after the reviewed preprint is published by eLife.  

      Reviewer #1 (Public Review):

      Summary

      This work contains 3 sections. The first section describes how protein domains with SQ motifs can increase the abundance of a lacZ reporter in yeast. The authors call this phenomenon autonomous protein expression-enhancing activity, and this finding is well supported. The authors show evidence that this increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance, and that this phenomenon is not affected by mutants in translational quality control. It was not completely clear whether the increased protein abundance is due to increased translation or to increased protein stability.

      In section 2, the authors performed mutagenesis of three N-terminal domains to study how protein sequence changes protein stability and enzymatic activity of the fusions. These data are very interesting, but this section needs more interpretation. It is not clear if the effect is due to the number of S/T/Q/N amino acids or due to the number of phosphorylation sites.

      In section 3, the authors undertake an extensive computational analysis of amino acid runs in 27 species. Many aspects of this section are fascinating to an expert reader. They identify regions with poly-X tracks. These data were not normalized correctly: I think that a null expectation for how often poly-X track occur should be built for each species based on the underlying prevalence of amino acids in that species. As a result, I believe that the claim is not well supported by the data.

      Strengths

      This work is about an interesting topic and contains stimulating bioinformatics analysis. The first two sections, where the authors investigate how S/T/Q/N abundance modulates protein expression level, is well supported by the data. The bioinformatics analysis of Q abundance in ciliate proteomes is fascinating. There are some ciliates that have repurposed stop codons to code for Q. The authors find that in these proteomes, Q-runs are greatly expanded. They offer interesting speculations on how this expansion might impact protein function.

      Weakness

      At this time, the manuscript is disorganized and difficult to read. An expert in the field, who will not be distracted by the disorganization, will find some very interesting results included. In particular, the order of the introduction does not match the rest of the paper.

      In the first and second sections, where the authors investigate how S/T/Q/N abundance modulates protein expression levels, it is unclear if the effect is due to the number of phosphorylation sites or the number of S/T/Q/N residues.

      There are three reasons why the number of phosphorylation sites in the Q-rich motifs is not relevant to their autonomous protein expression-enhancing (PEE) activities:

      First, we have reported previously that phosphorylation-defective Rad51-NTD (Rad51-3SA) and wild-type Rad51-NTD exhibit similar autonomous PEE activity. Mec1/Tel1-dependent phosphorylation of Rad51-NTD antagonizes the proteasomal degradation pathway, increasing the half-life of Rad51 from ∼30 min to ≥180 min (Ref 27; Woo, T. T. et al. 2020).

      1. T. T. Woo, C. N. Chuang, M. Higashide, A. Shinohara, T. F. Wang, Dual roles of yeast Rad51 N-terminal domain in repairing DNA double-strand breaks. Nucleic Acids Res 48, 8474-8489 (2020).

      Second, in our preprint manuscript, we have also shown that phosphorylation-defective Rad53-SCD1 (Rad51-SCD1-5STA) also exhibits autonomous PEE activity similar to that of wild-type Rad53-SCD (Figure 2D, Figure 4A and Figure 4C).

      Third, as revealed by the results of our preprint manuscript (Figure 4), it is the percentages, and not the numbers, of S/T/Q/N residues that are correlated with the PEE activities of Q-rich motifs.

      The authors also do not discuss if the N-end rule for protein stability applies to the lacZ reporter or the fusion proteins.

      The autonomous PEE function of S/T/Q-rich NTDs is unlikely to be relevant to the N-end rule. The N-end rule links the in vivo half-life of a protein to the identity of its N-terminal residues. In S. cerevisiae, the N-end rule operates as part of the ubiquitin system and comprises two pathways. First, the Arg/N-end rule pathway, involving a single N-terminal amidohydrolase Nta1, mediates deamidation of N-terminal asparagine (N) and glutamine (Q) into aspartate (D) and glutamate (E), which in turn are arginylated by a single Ate1 R-transferase, generating the Arg/N degron. N-terminal R and other primary degrons are recognized by a single N-recognin Ubr1 in concert with ubiquitin-conjugating Ubc2/Rad6. Ubr1 can also recognize several other N-terminal residues, including lysine (K), histidine (H), phenylalanine (F), tryptophan (W), leucine (L) and isoleucine (I) (Bachmair, A. et al. 1986; Tasaki, T. et al. 2012; Varshavshy, A. et al. 2019). Second, the Ac/N-end rule pathway targets proteins containing N-terminally acetylated (Ac) residues. Prior to acetylation, the first amino acid methionine (M) is catalytically removed by Met-aminopeptides, unless a residue at position 2 is non-permissive (too large) for MetAPs. If a retained N-terminal M or otherwise a valine (V), cysteine (C), alanine (A), serine (S) or threonine (T) residue is followed by residues that allow N-terminal acetylation, the proteins containing these AcN degrons are targeted for ubiquitylation and proteasome-mediated degradation by the Doa10 E3 ligase (Hwang, C. S., 2019).

      A. Bachmair, D. Finley, A. Varshavsky, In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186 (1986).

      T. Tasaki, S. M. Sriram, K. S. Park, Y. T. Kwon, The N-end rule pathway. Annu Rev Biochem 81, 261-289 (2012).

      A. Varshavsky, N-degron and C-degron pathways of protein degradation. Proc Natl Acad Sci 116, 358-366 (2019).

      C. S. Hwang, A. Shemorry, D. Auerbach, A. Varshavsky, The N-end rule pathway is mediated by a complex of the RING-type Ubr1 and HECT-type Ufd4 ubiquitin ligases. Nat Cell Biol 12, 1177-1185 (2010).

      The PEE activities of these S/T/Q-rich domains are unlikely to arise from counteracting the N-end rule for two reasons. First, the first two amino acid residues of Rad51-NTD, Hop1-SCD, Rad53-SCD1, Sup35-PND, Rad51-ΔN, and LacZ-NVH are MS, ME, ME, MS, ME, and MI, respectively, where M is methionine, S is serine, E is glutamic acid and I is isoleucine. Second, Sml1-NTD behaves similarly to these N-terminal fusion tags, despite its methionine and glutamine (MQ) amino acid signature at the N-terminus.

      The most interesting part of the paper is an exploration of S/T/Q/N-rich regions and other repetitive AA runs in 27 proteomes, particularly ciliates. However, this analysis is missing a critical control that makes it nearly impossible to evaluate the importance of the findings. The authors find the abundance of different amino acid runs in various proteomes. They also report the background abundance of each amino acid. They do not use this background abundance to normalize the runs of amino acids to create a null expectation from each proteome. For example, it has been clear for some time (Ruff, 2017; Ruff et al., 2016) that Drosophila contains a very high background of Q's in the proteome and it is necessary to control for this background abundance when finding runs of Q's.

      We apologize for not explaining sufficiently well the topic eliciting this reviewer’s concern in our preprint manuscript. In the second paragraph of page 14, we cite six references to highlight that SCDs are overrepresented in yeast and human proteins involved in several biological processes (32, 74), and that polyX prevalence differs among species (43, 75-77).

      1. Cheung HC, San Lucas FA, Hicks S, Chang K, Bertuch AA, Ribes-Zamora A. An S/T-Q cluster domain census unveils new putative targets under Tel1/Mec1 control. BMC Genomics. 2012;13:664.

      2. Mier P, Elena-Real C, Urbanek A, Bernado P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J. 2020;18:306-13.

      3. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      4. Kuspa A, Loomis WF. The genome of Dictyostelium discoideum. Methods Mol Biol. 2006;346:15-30.

      5. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev. 2017;41(6):923-40.

      6. Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709-19.

      We will cite the two references by Kiersten M. Ruff in our revised manuscript.

      K. M. Ruff and R. V. Pappu, (2015) Multiscale simulation provides mechanistic insights into the effects of sequence contexts of early-stage polyglutamine-mediated aggregation. Biophysical Journal 108, 495a.

      K. M. Ruff, J. B. Warner, A. Posey and P. S. Tan (2017) Polyglutamine length dependent structural properties and phase behavior of huntingtin exon1. Biophysical Journal 112, 511a.

      The authors could easily address this problem with the data and analysis they have already collected. However, at this time, without this normalization, I am hesitant to trust the lists of proteins with long runs of amino acid and the ensuing GO enrichment analysis.

      Ruff KM. 2017. Washington University in St.

      Ruff KM, Holehouse AS, Richardson MGO, Pappu RV. 2016. Proteomic and Biophysical Analysis of Polar Tracts. Biophys J 110:556a.

      We thank Reviewer #1 for this helpful suggestion and now address this issue by means of a different approach described below.

      Based on a previous study (43; Palo Mier et al. 2020), we applied seven different thresholds to seek both short and long, as well as pure and impure, polyX strings in 20 different representative near-complete proteomes, including 4X (4/4), 5X (4/5-5/5), 6X (4/6-6/6), 7X (4/7-7/7), 8-10X (≥50%X), 11-10X (≥50%X) and ≥21X (≥50%X).

      To normalize the runs of amino acids and create a null expectation from each proteome, we determined the ratios of the overall number of X residues for each of the seven polyX motifs relative to those in the entire proteome of each species, respectively. The results of four different polyX motifs are shown below, i.e., polyQ (Author response image 1), polyN (Author response image 2), polyS (Author response image 3) and polyT (Author response image 4).

      Author response image 1.

      Q contents in 7 different types of polyQ motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 2.

      N contents in 7 different types of polyN motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 3.

      S contents in 7 different types of polyS motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.  

      Author response image 4.

      T contents in 7 different types of polyT motifs in 20 near-complete proteomes. The five ciliates with reassigned stops codon (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      The results summarized in these four new figures support that polyX prevalence differs among species and that the overall X contents of polyX motifs often but not always correlate with the X usage frequency in entire proteomes (43; Palo Mier et al. 2020).

      Most importantly, our results reveal that, compared to Stentor coeruleus or several non-ciliate eukaryotic organisms (e.g., Plasmodium falciparum, Caenorhabditis elegans, Danio rerio, Mus musculus and Homo sapiens), the five ciliates with reassigned TAAQ and TAGQ codons not only have higher Q usage frequencies, but also more polyQ motifs in their proteomes (Figure 1). In contrast, polyQ motifs prevail in Candida albicans, Candida tropicalis, Dictyostelium discoideum, Chlamydomonas reinhardtii, Drosophila melanogaster and Aedes aegypti, though the Q usage frequencies in their entire proteomes are not significantly higher than those of other eukaryotes (Figure 1). Due to their higher N usage frequencies, Dictyostelium discoideum, Plasmodium falciparum and Pseudocohnilembus persalinus have more polyN motifs than the other 23 eukaryotes we examined here (Figure 2). Generally speaking, all 26 eukaryotes we assessed have similar S usage frequencies and percentages of S contents in polyS motifs (Figure 3). Among these 26 eukaryotes, Dictyostelium discoideum possesses many more polyT motifs, though its T usage frequency is similar to that of the other 25 eukaryotes (Figure 4).

      In conclusion, these new normalized results confirm that the reassignment of stop codons to Q indeed results in both higher Q usage frequencies and more polyQ motifs in ciliates.  

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to understand the connection between protein sequence and function in disordered regions enriched in polar amino acids (specifically Q, N, S and T). While the authors suggest that specific motifs facilitate protein-enhancing activities, their findings are correlative, and the evidence is incomplete. Similarly, the authors propose that the re-assignment of stop codons to glutamine-encoding codons underlies the greater user of glutamine in a subset of ciliates, but again, the conclusions here are, at best, correlative. The authors perform extensive bioinformatic analysis, with detailed (albeit somewhat ad hoc) discussion on a number of proteins. Overall, the results presented here are interesting, but are unable to exclude competing hypotheses.

      Strengths:

      Following up on previous work, the authors wish to uncover a mechanism associated with poly-Q and SCD motifs explaining proposed protein expression-enhancing activities. They note that these motifs often occur IDRs and hypothesize that structural plasticity could be capitalized upon as a mechanism of diversification in evolution. To investigate this further, they employ bioinformatics to investigate the sequence features of proteomes of 27 eukaryotes. They deepen their sequence space exploration uncovering sub-phylum-specific features associated with species in which a stop-codon substitution has occurred. The authors propose this stop-codon substitution underlies an expansion of ploy-Q repeats and increased glutamine distribution.

      Weaknesses:

      The preprint provides extensive, detailed, and entirely unnecessary background information throughout, hampering reading and making it difficult to understand the ideas being proposed. The introduction provides a large amount of detailed background that appears entirely irrelevant for the paper. Many places detailed discussions on specific proteins that are likely of interest to the authors occur, yet without context, this does not enhance the paper for the reader.

      The paper uses many unnecessary, new, or redefined acronyms which makes reading difficult. As examples:

      (1) Prion forming domains (PFDs). Do the authors mean prion-like domains (PLDs), an established term with an empirical definition from the PLAAC algorithm? If yes, they should say this. If not, they must define what a prion-forming domain is formally.

      The N-terminal domain (1-123 amino acids) of S. cerevisiae Sup35 was already referred to as a “prion forming domain (PFD)” in 2006 (Tuite, M. F. 2006). Since then, PFD has also been employed as an acronym in other yeast prion papers (Cox, B.S. et al. 2007; Toombs, T. et al. 2011).

      M. F., Tuite, Yeast prions and their prion forming domain. Cell 27, 397-407 (2005).

      B. S. Cox, L. Byrne, M. F., Tuite, Protein Stability. Prion 1, 170-178 (2007).

      J. A. Toombs, N. M. Liss, K. R. Cobble, Z. Ben-Musa, E. D. Ross, [PSI+] maintenance is dependent on the composition, not primary sequence, of the oligopeptide repeat domain. PLoS One 6, e21953 (2011).

      (2) SCD is already an acronym in the IDP field (meaning sequence charge decoration) - the authors should avoid this as their chosen acronym for Serine(S) / threonine (T)-glutamine (Q) cluster domains. Moreover, do we really need another acronym here (we do not).

      SCD was first used in 2005 as an acronym for the Serine (S)/threonine (T)-glutamine (Q) cluster domain in the DNA damage checkpoint field (Traven, A. and Heierhorst, J. 2005). Almost a decade later, SCD became an acronym for “sequence charge decoration” (Sawle, L. et al. 2015; Firman, T. et al. 2018).

      A. Traven and J, Heierhorst, SQ/TQ cluster domains: concentrated ATM/ATR kinase phosphorylation site regions in DNA-damage-response proteins. Bioessays. 27, 397-407 (2005).

      L. Sawle and K, Ghosh, A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem Phys. 143, 085101(2015).

      T. Firman and Ghosh, K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J. Chem Phys. 148, 123305 (2018).

      (3) Protein expression-enhancing (PEE) - just say expression-enhancing, there is no need for an acronym here.

      Thank you. Since we have shown that addition of Q-rich motifs to LacZ affects protein expression rather than transcription, we think it is better to use the “PEE” acronym.

      The results suggest autonomous protein expression-enhancing activities of regions of multiple proteins containing Q-rich and SCD motifs. Their definition of expression-enhancing activities is vague and the evidence they provide to support the claim is weak. While their previous work may support their claim with more evidence, it should be explained in more detail. The assay they choose is a fusion reporter measuring beta-galactosidase activity and tracking expression levels. Given the presented data they have shown that they can drive the expression of their reporters and that beta gal remains active, in addition to the increase in expression of fusion reporter during the stress response. They have not detailed what their control and mock treatment is, which makes complete understanding of their experimental approach difficult. Furthermore, their nuclear localization signal on the tag could be influencing the degradation kinetics or sequestering the reporter, leading to its accumulation and the appearance of enhanced expression. Their evidence refuting ubiquitin-mediated degradation does not have a convincing control.

      Based on the experimental results, the authors then go on to perform bioinformatic analysis of SCD proteins and polyX proteins. Unfortunately, there is no clear hypothesis for what is being tested; there is a vague sense of investigating polyX/SCD regions, but I did not find the connection between the first and section compelling (especially given polar-rich regions have been shown to engage in many different functions). As such, this bioinformatic analysis largely presents as many lists of percentages without any meaningful interpretation. The bioinformatics analysis lacks any kind of rigorous statistical tests, making it difficult to evaluate the conclusions drawn. The methods section is severely lacking. Specifically, many of the methods require the reader to read many other papers. While referencing prior work is of course, important, the authors should ensure the methods in this paper provide the details needed to allow a reader to evaluate the work being presented. As it stands, this is not the case.

      Thank you. As described in detail below, we have now performed rigorous statistical testing using the GofuncR package.

      Overall, my major concern with this work is that the authors make two central claims in this paper (as per the Discussion). The authors claim that Q-rich motifs enhance protein expression. The implication here is that Q-rich motif IDRs are special, but this is not tested. As such, they cannot exclude the competing hypothesis ("N-terminal disordered regions enhance expression").

      In fact, “N-terminal disordered regions enhance expression” exactly summarizes our hypothesis.

      On pages 12-13 and Figure 4 of our preprint manuscript, we explained our hypothesis in the paragraph entitled “The relationship between PEE function, amino acid contents, and structural flexibility”.

      The authors also do not explore the possibility that this effect is in part/entirely driven by mRNA-level effects (see Verma Na Comms 2019).

      As pointed out by the first reviewer, we show evidence that the increase in protein abundance and enzymatic activity is not due to changes in plasmid copy number or mRNA abundance (Figure 2), and that this phenomenon is not affected by translational quality control mutants (Figure 3).

      As such, while these observations are interesting, they feel preliminary and, in my opinion, cannot be used to draw hard conclusions on how N-terminal IDR sequence features influence protein expression. This does not mean the authors are necessarily wrong, but from the data presented here, I do not believe strong conclusions can be drawn. That re-assignment of stop codons to Q increases proteome-wide Q usage. I was unable to understand what result led the authors to this conclusion.

      My reading of the results is that a subset of ciliates has re-assigned UAA and UAG from the stop codon to Q. Those ciliates have more polyQ-containing proteins. However, they also have more polyN-containing proteins and proteins enriched in S/T-Q clusters. Surely if this were a stop-codon-dependent effect, we'd ONLY see an enhancement in Q-richness, not a corresponding enhancement in all polar-rich IDR frequencies? It seems the better working hypothesis is that free-floating climate proteomes are enriched in polar amino acids compared to sessile ciliates.

      Thank you. These comments are not supported by the results in Figure 1.

      Regardless, the absence of any kind of statistical analysis makes it hard to draw strong conclusions here.

      We apologize for not explaining more clearly the results of Tables 5-7 in our preprint manuscript.

      To address the concerns about our GO enrichment analysis by both reviewers, we have now performed rigorous statistical testing for SCD and polyQ protein overrepresentation using the GOfuncR package (https://bioconductor.org/packages/release/bioc/html/GOfuncR.html). GOfuncR is an R package program that conducts standard candidate vs. background enrichment analysis by means of the hypergeometric test. We then adjusted the raw p-values according to the Family-wise error rate (FWER). The same method had been applied to GO enrichment analysis of human genomes (Huttenhower, C., et al. 2009).

      Curtis Huttenhower, C., Haley, E. M., Hibbs, M., A., Dumeaux, V., Barrett, D. R., Hilary A. Coller, H. A., and Olga G. Troyanskaya, O., G. Exploring the human genome with functional maps, Genome Research 19, 1093-1106 (2009).

      The results presented in Author response image 5 and Author response image 6 support our hypothesis that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition, Candida albicans filamentous growth, peptidyl-glutamic acid modification in ciliates with reassigned stop codons (TAAQ and TAGQ), Tetrahymena thermophila xylan catabolism, Dictyostelium discoideum sexual reproduction, Plasmodium falciparum infection, as well as the nervous systems of Drosophila melanogaster, Mus musculus, and Homo sapiens (74). In contrast, peptidyl-glutamic acid modification and microtubule-based movement are not overrepresented with Q-rich proteins in Stentor coeruleus, a ciliate with standard stop codons.

      1. Cara L, Baitemirova M, Follis J, Larios-Sanz M, Ribes-Zamora A. The ATM- and ATR-related SCD domain is over-represented in proteins involved in nervous system development. Sci Rep. 2016;6:19050.

      Author response image 5.

      Selection of biological processes with overrepresented SCD-containing proteins in different eukaryotes. The percentages and number of SCD-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stop codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

      Author response image 6.

      Selection of biological processes with overrepresented polyQ-containing proteins in different eukaryotes. The percentages and numbers of polyQ-containing proteins in our search that belong to each indicated Gene Ontology (GO) group are shown. GOfuncR (Huttenhower, C., et al. 2009) was applied for GO enrichment and statistical analysis. The p values adjusted according to the Family-wise error rate (FWER) are shown. The five ciliates with reassigned stops codons (TAAQ and TAGQ) are indicated in red. Stentor coeruleus, a ciliate with standard stop codons, is indicated in green.

    1. Author Response

      Reviewer #1 (Public Review):

      Wang and all present an interesting body of work focused on the effects of high altitude and hypoxia on erythropoiesis, resulting in erythrocytosis. This work is specifically focused on the spleen, identifying splenic macrophages as central cells in this effect. This is logical since these cells are involved in erythrophagocytosis and iron recycling. The results suggest that hypoxia induces splenomegaly with decreased number of splenic macrophages. There is also evidence that ferroptosis is induced in these macrophages, leading to cell destruction. Finally, the data suggest that ferroptosis in splenic red pulp macrophages causes the decrease in RBC clearance, resulting in erythrocytosis aka lengthening the RBC lifespan. However, there are many issues with the presented results, with somewhat superficial data, meaning the conclusions are overstated and there is decreased confidence that the hypotheses and observed results are directly causally related to hypoxia.

      Major points:

      1) The spleen is a relatively poorly understood organ but what is known about its role in erythropoiesis especially in mice is that it functions both to clear as well as to generate RBCs. The later process is termed extramedullary hematopoiesis and can occur in other bones beyond the pelvis, liver, and spleen. In mice, the spleen is the main organ of extramedullary erythropoiesis. The finding of transiently decreased spleen size prior to splenomegaly under hypoxic conditions is interesting but not well developed in the manuscript. This is a shortcoming as this is an opportunity to evaluate the immediate effect of hypoxia separately from its more chronic effect. Based just on spleen size, no conclusions can be drawn about what happens in the spleen in response to hypoxia.

      Thank you for your insightful comments and questions. The spleen is instrumental in both immune response and the clearance of erythrocytes, as well as serving as a significant reservoir of blood in the body. This organ, characterized by its high perfusion rate and pliability, constricts under conditions of intense stress, such as during peak physical exertion, the diving reflex, or protracted periods of apnea. This contraction can trigger an immediate release of red blood cells (RBCs) into the bloodstream in instances of substantial blood loss or significant reduction of RBCs. Moreover, elevated oxygen consumption rates in certain animal species can be partially attributed to splenic contractions, which augment hematocrit levels and the overall volume of circulating blood, thereby enhancing venous return and oxygen delivery (Dane et al. J Appl Physiol, 2006, 101:289-97; Longhurst et al. Am J Physiol, 1986, 251: H502-9). In our investigation, we noted a significant contraction of the spleen following exposure to hypoxia for a period of one day. We hypothesized that the body, under such conditions, is incapable of generating sufficient RBCs promptly enough to facilitate enhanced oxygen delivery. Consequently, the spleen reacts by releasing its stored RBCs through splenic constriction, leading to a measurable reduction in spleen size.

      However, we agree with you that further investigation is required to fully understand the implications of these changes. Considering the comments, we propose to extend our research by incorporating more detailed examinations of spleen morphology and function during hypoxia, including the potential impact on extramedullary hematopoiesis. We anticipate that such an expanded analysis would not only help elucidate the initial response to hypoxia but also provide insights into the more chronic effects of this condition on spleen function and erythropoiesis.

      2) Monocyte repopulation of tissue resident macrophages is a minor component of the process being described and it is surprising that monocytes in the bone marrow and spleen are also decreased. Can the authors conjecture why this is happening? Typically, the expectation would be that a decrease in tissue resident macrophages would be accompanied by an increase in monocyte migration into the organ in a compensatory manner.

      We appreciate your insightful query regarding the observed decrease in monocytes in the bone marrow and spleen, particularly considering the typical compensatory increase in monocyte migration into organs following a decrease in tissue resident macrophages.

      The observed decrease in monocytes within the bone marrow is likely attributable to the fact that monocytes and precursor cells for red blood cells (RBCs) both originate from the same hematopoietic stem cells within the bone marrow. It is well established that exposure to hypobaric hypoxia (HH) induces erythroid differentiation specifically within the bone marrow, originating from these hematopoietic stem cells. As such, we postulate that the differentiation into monocytes is reduced under hypoxic conditions, which may subsequently cause a decrease in migration to the spleen.

      Furthermore, we hypothesize that an increased migration of monocytes to other tissues under HH exposure may also contribute to the decreased migration to the spleen. The liver, which partially contributes to the clearance of RBCs, may play a role in this process. Our investigations to date have indeed identified an increased monocyte migration to the liver. We were pleased to discover an elevation in CSF1 expression in the liver following HH exposure for both 7 and 14 days. This finding was corroborated through flow cytometry, which confirmed an increase in monocyte migration to the liver.

      Consequently, we propose that under HH conditions, the liver requires an increased influx of monocytes, which in turn leads to a decrease in monocyte migration to the spleen. However, it is important to note that these findings will be discussed more comprehensively in our forthcoming publication, and as such, the data pertaining to these results have not been included in the current manuscript.

      3) Figure 3 does not definitively provide evidence that cell death is specifically occurring in splenic macrophages and the fraction of Cd11b+ cells is not changed in NN vs HH. Furthermore, the IHC of F4/80 in Fig 3U is not definitive as cells can express F4/80 more or less brightly and no negative/positive controls are shown for this panel.

      We appreciate your insightful comments and critiques regarding Figure 3. We acknowledge that the figure, as presented, does not definitively demonstrate that cell death is specifically occurring in splenic macrophages. While it is challenging to definitively determine the occurrence of cell death in macrophages based solely on Figure 3D-F, our single-cell analysis provides strong evidence that such an event occurs. We initially observed cell death within the spleen under hypobaric hypoxia (HH) conditions, and to discern the precise cell type involved, we conducted single-cell analyses. Regrettably, we did not articulate this clearly in our preliminary manuscript. In the revised version, we have modified the sequence of Figure 3A-C and Figure 3D-F for better clarity. Besides, we observed a significant decrease in the fraction of F4/80hiCD11bhi macrophages under HH conditions compared to NN. To make the changes more evident in CD86 and CD206, we have transformed these scatter plots into histograms in our revised manuscript.

      Considering the limitations of F4/80 as a conclusive macrophage identifier, we have concurrently presented the immunohistochemical (IHC) analyses of heme oxygenase-1 (HO-1). Functioning as a macrophage marker, particularly in cells involved in iron metabolism, HO-1 offers additional diagnostic accuracy. Observations from both F4/80 and HO-1 staining suggested a primary localization of positively stained cells within the splenic red pulp. Following exposure to hypoxia-hyperoxia (HH) conditions, a decrease was noted in the expression of both F4/80 and HO-1. This decrease implies that HH conditions contribute to a reduction in macrophage population and impede the iron metabolism process. In the revised version of our manuscript, we have enhanced the clarity of Figure 3U to illustrate the presence of positive staining, with an emphasis on HO-1 staining, which is predominantly observed in the red pulp.

      4) The phagocytic function of splenic red pulp macrophages relative to infection cannot be used directly to understand erythrophagocytosis. The standard approach is to use opsonized RBCs in vitro. Furthermore, RBC survival is a standard method to assess erythrophagocytosis function. In this method, biotin is injected via tail vein directly and small blood samples are collected to measure the clearance of biotinilation by flow; kits are available to accomplish this. Because the method is standard, Fig 4D is not necessary and Fig 4E needs to be performed only in blood by sampling mice repeatedly and comparing the rate of biotin decline in HH with NN (not comparing 7 d with 14 d).

      We appreciate your insightful comments and suggestions. We concur that the phagocytic function of splenic red pulp macrophages in the context of infection may not be directly translatable to understanding erythrophagocytosis. Given our assessment that the use of cy5.5-labeled E.coli alone may not be sufficient to accurately evaluate the phagocytic function of macrophages, we extended our study to include the use of NHS-biotin-labeled RBCs to assess phagocytic capabilities. While the presence of biotin-labeled RBCs in the blood could provide an indication of RBC clearance, this measure does not exclusively reflect the spleen's role in the process, as it fails to account for the clearance activities of other organs.

      Consequently, we propose that the remaining biotin-labeled RBCs in the spleen may provide a more direct representation of the organ's function in RBC clearance and sequestration. Our observations of diminished erythrophagocytosis at both 7 and 14 days following exposure to HH guided our subsequent efforts to quantify biotin-labeled RBCs in both the circulatory system and spleen. These measurements were conducted during the 7 to 14-day span following the confirmation of impaired erythrophagocytosis. Comparative evaluation of RBC clearance rates under NN and HH conditions provided further evidence supporting our preliminary observations, with the data revealing a decrease in the RBC clearance rate in the context of HH conditions. In response to feedback from other reviewers, we have elected to exclude the phagocytic results and the diagram of the erythrocyte labeling assay. These amendments will be incorporated into the revised manuscript. The reviewers' constructive feedback has played a crucial role in refining the methodological precision and coherence of our investigation.

      5) It is unclear whether Tuftsin has a specific effect on phagocytosis of RBCs without other potential confounding effects. Furthermore, quantifying iron in red pulp splenic macrophages requires alternative readily available more quantitative methods (e.g. sorted red pulp macrophages non-heme iron concentration).

      We appreciate your comments and questions regarding the potential effect of Tuftsin on the phagocytosis of RBCs and the quantification of iron in red pulp splenic macrophages. Regarding the role of Tuftsin, we concur that the literature directly associating Tuftsin with erythrophagocytosis is scant. The work of Gino Roberto Corazza et al. does suggest a link between Tuftsin and general phagocytic capacity, but it does not specifically address erythrophagocytosis (Am J Gastroenterol, 1999;94:391-397). We agree that further investigations are required to elucidate the potential confounding effects and to ascertain whether Tuftsin has a specific impact on the phagocytosis of RBCs. Concerning the quantification of iron in red pulp splenic macrophages, we acknowledge your suggestion to employ readily available and more quantitative methods. We have incorporated additional Fe2+ staining in the spleen at two time points: 7 and 14 days subsequent to HH exposure (refer to the following Figure). The resultant data reveal an escalated deposition of Fe2+ within the red pulp, as evidenced in Figures 5 (panels L and M) and Figure 7 (panels L and M).

      6) In Fig 5, PBMCs are not thought to represent splenic macrophages and although of some interest, does not contribute significantly to the conclusions regarding splenic macrophages at the heart of the current work. The data is also in the wrong direction, namely providing evidence that PBMCs are relatively iron poor which is not consistent with ferroptosis which would increase cellular iron.

      We appreciate your insightful critique regarding Figure 5 and the interpretation of our data on peripheral blood mononuclear cells (PBMCs) in relation to splenic macrophages. We understand that PBMCs do not directly represent splenic macrophages, and we agree that any conclusions drawn from PBMCs must be considered with caution when discussing the behavior of splenic macrophages.

      The primary rationale for incorporating PBMCs into our study was to investigate the potential correspondence between their gene expression changes and those observed in the spleen after HH exposure. This was posited as a working hypothesis for further exploration rather than a conclusive statement. The gene expression in PBMCs was congruous with changes in the spleen's gene expression, demonstrating an iron deficiency phenotype, ostensibly due to the mobilization of intracellular iron for hemoglobin synthesis. Thus, it is plausible that NCOA4 may facilitate iron mobilization through the degradation of ferritin to store iron.

      It remains ambiguous whether ferroptosis was initiated in the PBMCs during our study. Ferroptosis primarily occurs as a response to an increase in Fe2+ rather than an overall increase in intracellular iron. Our preliminary proposition was that relative changes in gene expression in PBMCs could potentially mirror corresponding changes in protein expression in the spleen, thereby potentially indicating alterations in iron processing capacity post-HH exposure. However, we fully acknowledge that this is a conjecture requiring further empirical substantiation or clinical validation.

      7) Tfr1 increase is typically correlated with cellular iron deficiency while ferroptosis consistent with iron loading. The direction of the changes in multiple elements relevant to iron trafficking is somewhat confusing and without additional evidence, there is little confidence that the authors have reached the correct conclusion. Furthermore, the results here are analyses of total spleen samples rather than specific cells in the spleen.

      We appreciate your astute comments and agree that the observed increase in transferrin receptor (TfR) expression, typically associated with cellular iron deficiency, appears contradictory to the expected iron-loading state associated with ferroptosis. We understand that this apparent contradiction might engender some uncertainty about our conclusions.

      In our investigation, we evaluated total spleen samples as opposed to distinct cell types within the spleen, a factor that could have contributed to the seemingly discordant findings. An integral element to bear in mind is the existence of immature RBCs in the spleen, particularly within the hematopoietic island where these immature RBCs cluster around nurse macrophages. These immature RBCs contain abundant TfR which was needed for iron uptake and hemoglobin synthesis. These cells, which prove challenging to eliminate via perfusion, might have played a role in the observed upregulation in TfR expression, especially in the aftermath of HH exposure. Our further research revealed that the expression of TfR in macrophages diminished following hypoxic conditions, thereby suggesting that the elevated TfR expression in tissue samples may predominantly originate from other cell types, especially immature RBCs (refer to subsequent Figure).

      Reviewer #2 (Public Review):

      The authors aimed at elucidating the development of high altitude polycythemia which affects mice and men staying in the hypoxic atmosphere at high altitude (hypobaric hypoxia; HH). HH causes increased erythropoietin production which stimulates the production of red blood cells. The authors hypothesize that increased production is only partially responsible for exaggerated red blood cell production, i.e. polycythemia, but that decreased erythrophagocytosis in the spleen contributes to high red blood cells counts.

      The main strength of the study is the use of a mouse model exposed to HH in a hypobaric chamber. However, not all of the reported results are convincing due to some smaller effects which one may doubt to result in the overall increase in red blood cells as claimed by the authors. Moreover, direct proof for reduced erythrophagocytosis is compromised due to a strong spontaneous loss of labelled red blood cells, although effects of labelled E. coli phagocytosis are shown. Their discussion addresses some of the unexpected results, such as the reduced expression of HO-1 under hypoxia but due to the above-mentioned limitations much of the discussion remains hypothetical.

      Thank you for your valuable feedback and insight. We appreciate the recognition of the strength of our study model, the exposure of mice to hypobaric hypoxia (HH) in a hypobaric animal chamber. We also understand your concerns about the smaller effects and their potential impact on the overall increase in red blood cells (RBCs), as well as the apparent reduced erythrophagocytosis due to the loss of labelled RBCs.

      Erythropoiesis has been predominantly attributed to the amplified production of RBCs under conditions of HH. The focus of our research was to underscore the potential acceleration of hypoxia-associated polycythemia (HAPC) as a result of compromised erythrophagocytosis. Considering the spontaneous loss of labelled RBCs in vivo, we assessed the clearance rate of RBCs at the stages of 7 and 14 days within the HH environment, and subsequently compared this rate within the period from 7 to 14 days following the clear manifestation of erythrophagocytosis impairment at the two aforementioned points identified in our study. This approach was designed to negate the effects of spontaneous loss of labelled RBCs in both NN and HH conditions. Correspondingly, the results derived from blood and spleen analyses corroborated a decline in the RBC clearance rate under HH when juxtaposed with NN conditions.

      Apart from the E. coli phagocytosis and the labeled RBCs experiment (this part of the results was removed in the revision), the injection of Tuftsin further substantiated the impairment of erythrophagocytosis in the HH spleen, as evidenced by the observed decrease in iron within the red pulp of the spleen post-perfusion. Furthermore, to validate our findings, we incorporated RBCs staining in splenic cells at 7 and 14 days of HH exposure, which provided concrete confirmation of impaired erythrophagocytosis (new Figure 4E).

      As for the reduced expression of heme oxygenase-1 (HO-1) under hypoxia, we agree that this was an unexpected result, and we are in the process of further exploring the underlying mechanisms. It is possible that there are other regulatory pathways at play that are yet to be identified. However, we believe that by offering possible interpretations of our data and potential directions for future research, we contribute to the ongoing scientific discourse in this area.

      Reviewer #3 (Public Review):

      The manuscript by Yang et al. investigated in mice how hypobaric hypoxia can modify the RBC clearance function of the spleen, a concept that is of interest. Via interpretation of their data, the authors proposed a model that hypoxia causes an increase in cellular iron levels, possibly in RPMs, leading to ferroptosis, and downregulates their erythrophagocytic capacity. However, most of the data is generated on total splenocytes/total spleen, and the conclusions are not always supported by the presented data. The model of the authors could be questioned by the paper by Youssef et al. (which the authors cite, but in an unclear context) that the ferroptosis in RPMs could be mediated by augmented erythrophagocytosis. As such, the loss of RPMs in vivo which is indeed clear in the histological section shown (and is a strong and interesting finding) can be not directly caused by hypoxia, but by enhanced RBC clearance. Such a possibility should be taken into account.

      Thank you for your insightful comments and constructive feedback. In their research, Youssef et al. (2018) discerned that elevated erythrophagocytosis of stressed red blood cells (RBCs) instigates ferroptosis in red pulp macrophages (RPMs) within the spleen, as evidenced in a mouse model of transfusion. This augmentation of erythrophagocytosis was conspicuous five hours post-injection of RBCs. Conversely, our study elucidated the decrease in erythrophagocytosis in the spleen after both 7 and 14 days.

      Typically, macrophages exhibit an enhanced phagocytic capacity in the immediate aftermath of stress or stimulation. Nonetheless, the temporal points of observation in our study were considerably extended (seven and fourteen days). It remains uncertain whether phagocytic capability was amplified during the acute phase of HH exposure—particularly within the first day, considering that splenoconstriction under HH for one day results in the release of stored RBCs into the bloodstream—and whether this initial response could precipitate ferroptosis and subsequently diminished erythrophagocytosis at the 7 or 14 day marks under continued HH conditions.

      Major points:

      1) The authors present data from total splenocytes and then relate the obtained data to RPMs, which are quantitatively a minor population in the spleen. Eg, labile iron is increased in the splenocytes upon HH, but the manuscript does not show that this occurs in the red pulp or RPMs. They also measure gene/protein expression changes in the total spleen and connect them to changes in macrophages, as indicated in the model Figure (Fig. 7). HO-1 and levels of Ferritin (L and H) can be attributed to the drop in RPMs in the spleen. Are any of these changes preserved cell-intrinsically in cultured macrophages? This should be shown to support the model (relates also to lines 487-88, where the authors again speculate that hypoxia decreases HO-1 which was not demonstrated). In the current stage, for example, we do not know if the labile iron increase in cultured cells and in the spleen in vivo upon hypoxia is the same phenomenon, and why labile iron is increased. To improve the manuscript, the authors should study specifically RPMs.

      We express our gratitude for your perceptive remarks. In our initial manuscript, we did not evaluate labile iron within the red pulp and red pulp macrophages (RPMs). To address this oversight, we utilized the Lillie staining method, in accordance with the protocol outlined by Liu et al., (Chemosphere, 2021, 264(Pt 1):128413), to discern Fe2+ presence within these regions. The outcomes were consistent with our antecedent Western blot and flow cytometry findings in the spleen, corroborating an increment in labile iron specifically within the red pulp of the spleen.

      However, we acknowledge the necessity for other supplementary experimental efforts to further validate these findings. Additionally, we scrutinized the expression of heme oxygenase-1 (HO-1) and iron-related proteins, including transferrin receptor (TfR), ferroportin (Fpn), ferritin (Ft), and nuclear receptor coactivator 4 (NCOA4) in primary macrophages subjected to 1% hypoxic conditions, both with and without hemoglobin treatment. Our results indicated that the expression of ferroptosis-related proteins was consistent with in vivo studies, however the expression of iron related proteins was not similar in vitro and in vivo. It suggesting that the increase in labile iron in cultured cells and the spleen in vivo upon hypoxia are not identical phenomena. However, the precise mechanism remains elusive.

      In our study, we observed a decrease in HO-1 protein expression following 7 and 14 days of HH exposure, as shown in Figure 3U, 5A, and S1A. This finding contradicts previous research that identified HO-1 as a hypoxia-inducible factor (HIF) target under hypoxic conditions (P J Lee et al., 1997). Our discussion, therefore, addressed the potential discrepancy in HO-1 expression under HH. According to our findings, HO-1 regulation under HH appears to be predominantly influenced by macrophage numbers and the RBCs to be processed in the spleen or macrophages, rather than by hypoxia alone.

      It is challenging to discern whether the increased labile iron observed in vitro accurately reflects the in vivo phenomenon, as replicating the iron requirements for RBCs production induced by HH in vitro is inherently difficult. However, by integrating our in vivo and in vitro studies, we determined that the elevated Fe2+ levels were not dependent on HO-1 protein expression, as HO-1 levels was increased in vitro while decreasing in vivo under hypoxic/HH exposure.

      2) The paper uses flow cytometry, but how this method was applied is suboptimal: there are no gating strategies, no indication if single events were determined, and how cell viability was assessed, which are the parent populations when % of cells is shown on the graphs. How RBCs in the spleen could be analyzed without dedicated cell surface markers? A drop in splenic RPMs is presented as the key finding of the manuscript but Fig. 3M shows gating (suboptimal) for monocytes, not RPMs. RPMs are typically F4/80-high, CD11-low (again no gating strategy is shown for RPMs). Also, the authors used single-cell RNAseq to detect a drop in splenic macrophages upon HH, but they do not indicate in Fig. A-C which cluster of cells relates to macrophages. Cell clusters are not identified in these panels, hence the data is not interpretable).

      Thank you for your comments and constructive critique regarding our flow cytometry methodology and presentation. We understand the need for greater transparency and detailed explanation of our procedures, and we acknowledge that the lack of gating strategies and other pertinent information in our initial manuscript may have affected the clarity of our findings.

      In our initial report, we provided an overview of the decline in migrated macrophages (F4/80hiCD11bhi), including both M1 and M2 expression in migrated macrophages, as illustrated in Figure 3, but did not specifically address the changes in red pulp macrophages (RPMs). Based on previous results, it is difficult to identify CD11b- and CD11blo cells. We will repeat the results and attempt to identify F4/80hiCD11blo cells in the revised manuscript. The results of the reanalysis are now included (Figure 3M). However, single-cell in vivo analysis studies may more accurately identify specific cell types that decrease after exposure to HH.

      Furthermore, we substantiated the reduction in red pulp, as evidenced by Figure 4J, given that iron processing primarily occurs within the red pulp. In Figure 3, our initial objective was merely to illustrate the reduction in total macrophages in the spleen following HH exposure.

      To further clarify the characterization of various cell types, we conducted a single-cell analysis. Our findings indicated that clusters 0,1,3,4,14,18, and 29 represented B cells, clusters 2, 10, 12, and 28 represented T cells, clusters 15 and 22 corresponded to NK cells, clusters 5, 11, 13, and 19 represented NKT cells, clusters 6, 9, and 24 represented cell cycle cells, clusters 26 and 17 represented plasma cells, clusters 21 and 23 represented neutrophils, cluster 30 represented erythrocytes, and clusters 7, 8, 16, 20, 24, and 27 represented dendritic cells (DCs) and macrophages, as depicted in Figure 3E.

      3) The authors draw conclusions that are not supported by the data, some examples: a) they cannot exclude eg the compensatory involvement of the liver in the RBCs clearance (the differences between HH sham and HH splenectomy is mild in Fig. 2 E, F and G).

      Thank you for your insightful comments and for pointing out the potential involvement of other organs, such as the liver, in the RBC clearance under HH conditions. We concur with your observation that the differences between the HH sham and HH splenectomy conditions in Fig. 2 E, F, and G are modest. This could indeed suggest a compensatory role of other organs in RBC clearance when splenectomy is performed. Our intent, however, was to underscore the primary role of the spleen in this process under HH exposure.

      In fact, after our initial investigations, we conducted a more extensive study examining the role of the liver in RBC clearance under HH conditions. Our findings, as illustrated in the figures submitted with this response, indeed support a compensatory role for the liver. Specifically, we observed an increase in macrophage numbers and phagocytic activity in the liver under HH conditions. Although the differences in RBC count between the HH sham and HH splenectomy conditions may seem minor, it is essential to consider the unit of this measurement, which is value*1012/ml. Even a small numerical difference can represent a significant biological variation at this scale.

      b) splenomegaly is typically caused by increased extramedullary erythropoiesis, not RBC retention. Why do the authors support the second possibility? Related to this, why do the authors conclude that data in Fig. 4 G,H support the model of RBC retention? A significant drop in splenic RBCs (poorly gated) was observed at 7 days, between NN and HH groups, which could actually indicate increased RBC clearance capacity = less retention.

      Prior investigations have predominantly suggested that spleen enlargement under hypoxic conditions stems from the spleen's extramedullary hematopoiesis. Nevertheless, an intriguing study conducted in 1994 by the General Hospital of Xizang Military Region reported substantial exaggeration and congestion of splenic sinuses in high altitude polycythemia (HAPC) patients. This finding was based on the dissection of spleens from 12 patients with HAPC (Zou Xunda, et al., Southwest Defense Medicine, 1994;5:294-296). Moreover, a recent study indicated that extramedullary erythropoiesis reaches its zenith between 3 to 7 days (Wang H et al., 2021).

      Considering these findings, the present study postulates that hypoxia-induced inhibition of erythrophagocytosis may lead to RBC retention. However, we acknowledge that the manuscript in its current preprint form does not offer conclusive evidence to substantiate this hypothesis. To bridge this gap, we further conducted experiments where the spleen was perfused, and total cells were collected post HH exposure. These cells were then smeared onto slides and subjected to Wright staining. Our results unequivocally demonstrate an evident increase in deformation and retention of RBCs in the spleen following 7 and 14 days of HH exposure. This finding strengthens our initial hypothesis and contributes a novel perspective to the understanding of splenic responses under hypoxic conditions.

      c) lines 452-54: there is no data for decreased phagocytosis in vivo, especially in the context of erythrophagocytosis. This should be done with stressed RBCs transfusion assays, very good examples, like from Youssef et al. or Threul et al. are available in the literature.

      Thanks. In their seminal work, Youssef and colleagues demonstrated that the transfusion of stressed RBCs triggers erythrophagocytosis and subsequently incites ferroptosis in red pulp macrophages (RPMs) within a span of five hours. Given these observations, the applicability of this model to evaluate macrophage phagocytosis in the spleen or RPMs under HH conditions may be limited, as HH has already induced erythropoiesis in vivo. In addition, it was unclear whether the membrane characteristics of stress induced RBCs were similar to those of HH induced RBCs, as this is an important signal for in vivo phagocytosis. The ambiguity arises from the fact that we currently lack sufficient knowledge to discern whether the changes in phagocytosis are instigated by the presence of stressed RBCs or by changes of macrophages induced by HH in vivo. Nonetheless, we appreciate the potential value of this approach and intend to explore its utility in our future investigations. The prospect of distinguishing the effects of stressed RBCs from those of HH on macrophage phagocytosis is an intriguing line of inquiry that could yield significant insights into the mechanisms governing these physiological processes. We will investigate this issue in our further study.

      d) Line 475 - ferritinophagy was not shown in response to hypoxia by the manuscript, especially that NCOA4 is decreased, at least in the total spleen.

      Drawing on the research published in eLife in 2015, it was unequivocally established that ferritinophagy, facilitated by Nuclear Receptor Coactivator 4 (NCOA4), is indispensable for erythropoiesis. This process is modulated by iron-dependent HECT and RLD domain containing E3 ubiquitin protein ligase 2 (HERC2)-mediated proteolysis (Joseph D Mancias et al., eLife. 2015; 4: e10308). As is widely recognized, NCOA4 plays a critical role in directing ferritin (Ft) to the lysosome, where both NCOA4 and Ft undergo coordinated degradation.

      In our study, we provide evidence that exposure to HH stimulates erythropoiesis (Figure 1). We propose that this, in turn, could promote ferritinophagy via NCOA4, resulting in a decrease in NCOA4 protein levels post-HH exposure. We will further increase experiments to verify this concern. This finding not only aligns with the established understanding of ferritinophagy and erythropoiesis but also adds a novel dimension to the understanding of cellular responses to hypoxic conditions.

      4) In a few cases, the authors show only representative dot plots or histograms, without quantification for n>1. In Fig. 4B the authors write about a significant decrease (although with n=1 no statistics could be applied here; of note, it is not clear what kind of samples were analyzed here). Another example is Fig. 6I. In this case, it is even more important as the data are conflicting the cited article and the new one: PMCID: PMC9908853 which shows that hypoxia stimulates efferocytosis. Sometimes the manuscript claim that some changes are observed, although they are not visible in representative figures (eg for M1 and M2 macrophages in Fig. 3M)

      We recognize that our initial portrayal of Figure 4B was lacking in precision, given that it did not include the corresponding statistical graph. While our results demonstrated a significant reduction in the ability to phagocytose E. coli, in line with the recommendations of other reviewers, we have opted to remove the results pertaining to E. coli phagocytosis in this revision, as they primarily reflected immune function. In relation to PMC9908853, which reported metabolic adaptation facilitating enhanced macrophage efferocytosis in limited-oxygen environments, it is worth noting that the macrophages investigated in this study were derived from ER-Hoxb8 macrophage progenitors following the removal of β-estradiol. Consequently, questions arise regarding the comparability between these cultured macrophages and primary macrophages obtained fresh from the spleen post HH exposure. The characteristics and functions of these two different macrophage sources may not align precisely, and this distinction necessitates further investigation.

      5) There are several unclear issues in methodology:

      • what is the purity of primary RPMs in the culture? RPMs are quantitatively poorly represented in splenocyte single-cell suspensions. This reviewer is quite skeptical that the processing of splenocytes from approx 1 mm3 of tissue was sufficient to establish primary RPM cultures. The authors should prove that the cultured cells were indeed RPMs, not monocyte-derived macrophages or other splenic macrophage subtypes.

      Thank you for your thoughtful comments and inquiries. Firstly, I apologize if we did not make it clear in the original manuscript. The purity of the primary RPMs in our culture was found to be approximately 40%, as identified by F4/80hiCD11blo markers using flow cytometry. We recognize that RPMs are typically underrepresented in splenocyte single-cell suspensions, and the concern you raise about the potential for contamination by other cell types is valid.

      We apologize for any ambiguities in the methodological description that may have led to misunderstandings during the review. Indeed, the entirety of the spleen is typically employed for splenic macrophage culture. The size of the spleen can vary dependent on the species and age of the animal, but in mice, it is commonly approximately 1 cm in length. The spleen is then dissected into minuscule fragments, each approximately 1 mm3 in volume, to aid in enzymatic digestion. This procedure does not merely utilize a single 1 mm3 tissue fragment for RPMs cultures. Although the isolation and culture of spleen macrophages can present considerable challenges, our method has been optimized to enhance the yield of this specific cell population.

      • (around line 183) In the description of flow cytometry, there are several missing issues. In 1) it is unclear which type of samples were analyzed. In 2) it is not clear how splenocyte cell suspension was prepared.

      1) Whole blood was extracted from the mice and collected into an anticoagulant tube, which was then set aside for subsequent thiazole orange (TO) staining. 2) Splenic tissue was procured from the mice and subsequently processed into a single-cell suspension using a 40 μm filter. The erythrocytes within the entire sample were subsequently lysed and eliminated, and the remaining cell suspension was resuspended in phosphate-buffered saline (PBS) in preparation for ensuing analyses.

      We have meticulously revised these methodological details in the corresponding section of the manuscript to ensure clarity and precision.

      • In line 192: what does it mean: 'This step can be omitted from cell samples'?

      The methodology employed for the quantification of intracellular divalent iron content and lipid peroxidation level was executed as follows: Splenic tissue was first processed into a single cell suspension, subsequently followed by the lysis of RBCs. It should be noted that this particular stage is superfluous when dealing with isolated cell samples. Subsequently, a total of 1 × 106 cells were incubated with 100 μL of BioTracker Far-red Labile Fe2+ Dye (1 mM, Sigma, SCT037, USA) for a duration of 1 hour, or alternatively, C11-Bodipy 581/591 (10 μM, Thermo Fisher, D3861, USA) for a span of 30 minutes. Post incubation, cells were thoroughly washed twice with PBS. Flow cytometric analysis was subsequently performed, utilizing the FL6 (638 nm/660 nm) channel for the determination of intracellular divalent iron content, and the FL1 (488 nm/525 nm) channel for the quantification of the lipid peroxidation level.

      • 'TO method' is not commonly used anymore and hence it was unclear to this Reviewer. Reticulocytes should be analyzed with proper gating, using cell surface markers.

      We are appreciative of your astute observation pertaining to the methodology we employed to analyze reticulocytes in our study. We value your recommendation to utilize cell surface markers for effective gating, which indeed represents a more modern and accurate approach. However, as reticulocyte identification is not the central focus of our investigation, we opted for the TO staining method—due to its simplicity and credibility of results. In our initial exploration, we adopted the TO staining method in accordance with the protocol outlined (Sci Rep, 2018, 8(1):12793), primarily owing to its established use and demonstrated efficacy in reticulocyte identification.

      • The description of 'phagocytosis of E. coli and RBCs' in the Methods section is unclear and incomplete. The Results section suggests that for the biotinylated RBCs, phagocytosis? or retention? Of RBCs was quantified in vivo, upon transfusion. However, the Methods section suggests either in vitro/ex vivo approach. It is vague what was indeed performed and how in detail. If RBC transfusion was done, this should be properly described. Of note, biotinylation of RBCs is typically done in vivo only, being a first step in RBC lifespan assay. The such assay is missing in the manuscript. Also, it is not clear if the detection of biotinylated RBCs was performed in permeablized cells (this would be required).

      Thanks for the comments. In our initial methodology, we employed Cy5.5-labeled Escherichia coli to probe phagocytic function, albeit with the understanding that this may not constitute the most ideal model for phagocytosis detection within this context (in light of recommendations from other reviewers, we have removed the E. coli phagocytosis results from this revision, as they predominantly mirror immune function). Our fundamental aim was to ascertain whether HH compromises the erythrophagocytic potential of splenic macrophages. In pursuit of this, we subsequently analyzed the clearance of biotinylated RBCs in both the bloodstream and spleen to assess phagocytic functionality in vivo.

      In the present study, instead of transfusing biotinylated RBCs into mice, we opted to inject N-Hydroxysuccinimide (NHS)-biotin into the bloodstream. NHS-biotin is capable of binding with cell membranes in vivo and can be recognized by streptavidin-fluorescein isothiocyanate (FITC) after cells are extracted from the blood or spleen in vitro. Consequently, biotin-labeled RBCs were detectable in both the blood and spleen following NHS-biotin injection for a duration of 21 days.

      Ultimately, we employed flow cytometry to analyze the NHS-biotin labeled RBCs in the blood or spleen. This method facilitates the detection of live cells and is not applicable to permeabilized cells. We believe this approach better aligns with our investigative goals and offers a more robust evaluation of erythrophagocytic function under hypoxic conditions.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample.

      Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C).  For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability.  Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32,  that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq  (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4 C,D.  The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4. 

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human  and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and  Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one to one orthologues as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress  that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3.  We will add a better description in the revised version.

      References

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

    1. Author Response

      We would like to thank the senior editor, reviewing editor and all the reviewers for taking out precious time to review our manuscript and appreciating our study. We are excited that all of you have found strength in our work and have provided comments to strengthen it further. We sincerely appreciate the valuable comments and suggestions, which we believe will help us to further improve the quality of our work.

      Reviewer 1

      The manuscript by Dubey et al. examines the function of the acetyltransferase Tip60. The authors show that (auto)acetylation of a lysine residue in Tip60 is important for its nuclear localization and liquid-liquid-phase-separation (LLPS). The main observations are: (i) Tip60 is localized to the nucleus, where it typically forms punctate foci. (ii) An intrinsically disordered region (IDR) within Tip60 is critical for the normal distribution of Tip60. (iii) Within the IDR the authors show that a lysine residue (K187), that is auto-acetylated, is critical. Mutation of that lysine residue to a non-acetylable arginine abolishes the behavior. (iv) biochemical experiments show that the formation of the punctate foci may be consistent with LLPS.

      On balance, this is an interesting study that describes the role of acetylation of Tip60 in controlling its biochemical behavior as well as its localization and function in cells. The authors mention in their Discussion section other examples showing that acetylation can change the behavior of proteins with respect to LLPS; depending on the specific context, acetylation can promote (as here for Tip60) or impair LLPS.

      Strengths:

      The experiments are largely convincing and appear to be well executed.

      Weaknesses:

      The main concern I have is that all in vivo (i.e. in cells) experiments are done with overexpression in Cos-1 cells, in the presence of the endogenous protein. No attempt is made to use e.g. cells that would be KO for Tip60 in order to have a cleaner system or to look at the endogenous protein. It would be reassuring to know that what the authors observe with highly overexpressed proteins also takes place with endogenous proteins.

      Response: The main reason to perform these experiments with overexpression system was to generate different point mutants and deletion mutants of TIP60 and analyse their effect on its properties and functions. To validate our observations with overexpression system, we also examined localization pattern of endogenous TIP60 by IFA and results depict similar kind of foci pattern within the nucleus as observed with overexpressed TIP60 protein (Figure 4A). However, we understand the reviewers concern and agree to repeat some of the overexpression experiments under endogenous TIP60 knockdown conditions using siRNA or shRNA against 3’ UTR region.

      Also, it is not clear how often the experiments have been repeated and additional quantifications (e.g. of western blots) would be useful.

      Response: The experiments were performed as independent biological replicates (n=3) and this is mentioned in the figure legends. Regarding the suggestion for quantifying Western blots, we want to bring into the notice that where ever required (for blots such as Figure 2F, 6H) that require quantitative estimation, graph representing quantitated value with p-value had already been added. However as suggested, in addition, quantitation for Figure 6D will be performed and added in the revised version.

      In addition, regarding the LLPS description (Figure 1), it would be important to show the wetting behaviour and the temperature-dependent reversibility of the droplet formation.

      Response: We appreciate the suggestion, and we will perform these assays and include the results in the revised version.

      In Fig 3C the mutant (K187R) Tip60 is cytoplasmic, but still appears to form foci. Is this still reflecting phase separation, or some form of aggregation?

      Response: TIP60 (K187R) mutant remains cytosolic with homogenous distribution as shown in Figure 2E. Also with TIP60 partners like PXR or p53, this mutant protein remains homogenously distributed in the cytosol. However, when co-expressed with TIP60 (Wild-type) protein, this mutant protein although still remain cytosolic some foci-like pattern is also observed at the nuclear periphery which we believe could be accumulated aggregates.

      Reviewer 2

      The manuscript "Autoacetylation-mediated phase separation of TIP60 is critical for its functions" by Dubey S. et al reported that the acetyltransferase TIP60 undergoes phase separation in vitro and cell nuclei. The intrinsically disordered region (IDR) of TIP60, particularly K187 within the IDR, is critical for phase separation and nuclear import. The authors showed that K187 is autoacetylated, which is important for TIP60 nuclear localization and activity on histone H4. The authors did several experiments to examine the function of K187R mutants including chromatin binding, oligomerization, phase separation, and nuclear foci formation. However, the physiological relevance of these experiments is not clear since TIP60 K187R mutants do not get into nuclei. The authors also functionally tested the cancer-derived R188P mutant, which mimics K187R in nuclear localization, disruption of wound healing, and DNA damage repair. However, similar to K187R, the R188P mutant is also deficient in nuclear import, and therefore, its defects cannot be directly attributed to the disruption of the phase separation property of TIP60. The main deficiency of the manuscript is the lack of support for the conclusion that "autoacetylation-mediated phase separation of TIP60 is critical for its functions".

      This study offers some intriguing observations. However, the evidence supporting the primary conclusion, specifically regarding the necessity of the intrinsically disordered region (IDR) and K187ac of TIP60 for its phase separation and function in cells, lacks sufficient support and warrants more scrutiny. Additionally, certain aspects of the experimental design are perplexing and lack controls to exclude alternative interpretations. The manuscript can benefit from additional editing and proofreading to improve clarity.

      Response: We understand the point raised by the reviewer, however we would like to draw his attention to the data where we clearly demonstrated that acetylation of lysine 187 within the IDR of TIP60 is required for its phase separation (Figure 2J). We would like to draw reviewer’s attention to other TIP60 mutants within IDR (R177H, R188H, K189R) which all enters the nucleus and make phase separated foci. Cancer-associated mutation at R188 behaves similarly because it also hampers TIP60 acetylation at the adjacent K187 residue. Our in vitro and in cellulo results clearly demonstrate that autoacetylation of TIP60 at K187 within its IDR is critical for multiple functions including its translocation inside the nucleus, its protein-protein interaction and oligomerization which are prerequisite for phase separation of TIP60.

      There are two putative NLS sequences (NLS #1 from aa145; NLS #2 from aa184) in TIP60, both of which are within the IDR. Deletion of the whole IDR is therefore expected to abolish the nuclear localization of TIP60. Since K187 is within NLS #2, the cytoplasmic localization of the IDR and K187R mutants may not be related to the ability of TIP60 to phase separation.

      Response: We are not disputing the presence of putative NLS within IDR region of TIP60, however our results through different mutations within IDR region (K76, K80, K148, K150, R177, R178, R188, K189) clearly demonstrate that only K187 residue acetylation is critical to shuttle TIP60 inside the nucleus while all other lysine mutants located within these putative NLS region exhibited no impact on TIP60’s nuclear shuttling. We have mentioned this in our discussion, that autoacetylation of TIP60’s K187 may induce local structural modifications in its IDR which is critical for translocating TIP60 inside the nucleus where it undergoes phase separation critical for its functions. A previous example of similar kind shows, acetylation of lysine within the NLS region of TyrRS by PCAF promote its nuclear localization (Cao X et al 2017, PNAS). IDR region (which also contains K187 site) is important for phase separation once the protein enters inside the nucleus. This could be the cell’s mechanism to prevent unwarranted action of TIP60 until it enters the nucleus and phase separate on chromatin at appropriate locations.

      The chromatin-binding activity of TIP60 depends on HAT activity, but not phase-separation (Fig 1I), (Fig 2B). How do the authors reconcile the fact that the K187R mutant is able to bind to chromatin with lower activity than the HAT mutant (Fig 2F, 2I)?

      Response: K187 acetylation is required for TIP60’s nuclear translocation but not critical for chromatin binding. When soluble fraction is prepared in fractionation experiment, nuclear membrane is disrupted and TIP60 (K187R) mutant has no longer hindrance in accessing the chromatin and thus can load on the chromatin (although not as efficient as Wild-type protein). For efficient chromatin binding auto-acetylation of other lysine residues in TIP60 is required which might be hampered due to reduced catalytic activity or not sufficient enough to maintain equilibrium with HDAC’s activity inside the nucleus. In case of K187R, the reduced auto-acetylation is captured when protein is the cytosol. During fractionation, once this mutant has access to chromatin, it might auto-acetylate other lysine residues critical for chromatin loading (remember catalytic domain is intact in this mutant). This is evident due to hyper auto-acetylation of Wild-type protein compared to K187R or HAT mutant proteins. We want to bring into notice that phase-separation occurs only after efficient chromatin loading of TIP60 that is the reason that under in-cellulo conditions, both K187R (which cannot enter the nucleus) and HAT mutant (which enters the nucleus but fails to efficiently binds onto the chromatin) fails to form phase separated nuclear punctate foci.

      The DIC images of phase separation in Fig 2I need to be improved. The image for K187R showed the irregular shape of the condensates, which suggests particles in solution or on the slide. The authors may need to use fluorescent-tagged TIP60 in the in vitro LLPS experiments.

      Response: We believe this comment is for figure 2J. The irregularly shaped condensates observed for TIP60 K187R are unique to the mutant protein and are not caused by particles on the slide. We would like to draw reviewer’s attention to supplementary figure S2A, where DIC images for TIP60 (Wild-type) protein tested under different protein and PEG8000 conditions are completely clear where protein did not made phase separated droplets ruling out the probability of particles in solution or slides.

      The authors mentioned that the HAT mutant of TIP60 does not phase separate, which needs to be included.

      Response: We have already added the image of RFP-TIP60 (HAT mutant) in supplementary Fig S4A (panel 2) in the manuscript.

      Related to Point 3, the HAT mutant that doesn't form punctate foci by itself, can incorporate into WT TIP60 (Fig 5A). In vitro LLPS assay for WT, HAT, and K187R mutants with or without acetylation should be included. WT and mutant TIP can be labelled with GFP and RFP, respectively.

      Response: We would like to draw reviewer’s attention towards our co-expression experiments performed in Figure 5 where Wild-type protein (both tagged and untagged condition) is able to phase separate and make punctate foci with co-expressed HAT mutant protein (with depleted autoacetylation capacity). We believe these in cellulo experiments are already able to answer the queries what reviewer is suggesting to acheive by in vitro experiments.

      Fig 3A and 3B showed that neither K187 mutant nor HAT mutant could oligomerize. If both experiments were conducted in the absence of in vitro acetylation, how do the authors reconcile these results?

      Response: We thank the reviewer for highlighting our oversight in omitting the mention of acetyl coenzyme A here. To induce acetylation under in vitro conditions, we have added 10 µM acetyl CoA into the reactions depicted in Figure 3A and 3B. The information for acetyl CoA for Figure 3B was already included in the GST-pull down assay (material and methods section). We will add the same in the oligomerization assay of material and methods in the revised manuscript.

      In Fig 4, the colocalization images showed little overlap between TIP60 and nuclear speckle (NS) marker SC35, indicating that the majority of TIP60 localized in the nuclear structure other than NS. Have the authors tried to perturbate the NS by depleting the NS scaffold protein and examining TIP60 foci formation? Do PXR and TP53 localize to NS?

      Response: Under normal conditions majority of TIP60 is not localized in nuclear speckles (NS) so we believe that perturbing NS will not have significant effect on TIP60 foci formation. Interestingly, recently a study by Shelly Burger group (Alexander KA et al Mol Cell. 2021 15;81(8):1666-1681) had shown that p53 localizes to NS to regulate subset of its targeted genes. We have mentioned about it in our discussion section. No information is available about localization of PXR in NS.

      Were TIP60 substrates, H4 (or NCP), PXR, TP53, present inTIP60 condensates in vitro? It's interesting to see both PXR and TP53 had homogenous nuclear signals when expressed together with K187R, R188P (Fig 6E, 6G), or HAT (Suppl Fig S4A) mutants. Are PXR or TP53 nuclear foci dependent on their acetylation by TIP60? This can and should be tested.

      Response: Both p53 and PXR are known to be acetylated by TIP60. In case of PXR, TIP60 acetylate PXR at lysine 170 and this TIP60-mediated acetylation of PXR at K170 is important for TIP60-PXR foci which now we know are formed by phase separation (Bakshi K et al Sci Rep. 2017 Jun 16;7(1):3635).

      Since R188P mutant, like K187R, does not get into the nuclei, it is not suitable to use this mutant to examine the functional relevance of phase separation for TIP60. The authors need to find another mutant in IDR that retains nuclear localization and overall HAT activity but specifically disrupts phase separation. Otherwise, the conclusion needs to be restated. All cancer-derived mutants need to be tested for LLPS in vitro.

      Response: We appreciate the reviewer’s point here, but it is important to note that the objective of these experiments is to understand the impact of K187R (critical in multiple aspects of TIP60 including phase separation) and R188P (a naturally occurring cancer-associated mutation and behaving similarly to K187R) on TIP60’s activities to determine their functional relevance. As suggested by the reviewer to test and find IDR mutant that fails to phase separate however retains nuclear localization and catalytic activity can be examined in future studies.

      For all cellular experiments, it is not mentioned whether endogenous TIP60 was removed and absent in the cell lines used in this study. It's important to clarify this point because the localization and function of mutant TIP60 are affected by WT TIP60 (Fig 5).

      Response: Endogenous TIP60 was present in in cellulo experiments, however as suggested by reviewer 1 we will perform some of the in cellulo experiments under endogenous TIP60 knockdown condition to validate our findings.

      It is troubling that H4 peptide is used for in vitro HAT assay since TIP60 has much higher activity on nucleosomes and its preferred substrates include H2A.

      Response: The purpose of using H4 peptide in the HAT assay is to determine the impact of mutations of TIP60’s catalytic activity. As H4 is one of the major histone substrate for TIP60, we believe it satisfy the objective of experiments.

      Reviewer 3

      This study presents results arguing that the mammalian acetyltransferase Tip60/KAT5 auto-acetylates itself on one specific lysine residue before the MYST domain, which in turn favors not only nuclear localization but also condensate formation on chromatin through LLPS. The authors further argue that this modification is responsible for the bulk of Tip60 autoacetylation and acetyltransferase activity towards histone H4. Finally, they suggest that it is required for association with txn factors and in vivo function in gene regulation and DNA damage response.

      These are very wide and important claims and, while some results are interesting and intriguing, there is not really close to enough work performed/data presented to support them. In addition, some results are redundant between them, lack consistency in the mutants analyzed, and show contradiction between them. The most important shortcoming of the study is the fact that every single experiment in cells was done in over-expressed conditions, from transiently transfected cells. It is well known that these conditions can lead to non-specific mass effects, cellular localization not reflecting native conditions, and disruption of native interactome. On that topic, it is quite striking that the authors completely ignore the fact that Tip60 is exclusively found as part of a stable large multi-subunit complex in vivo, with more than 15 different proteins. Thus, arguing for a single residue acetylation regulating condensate formation and most Tip60 functions while ignoring native conditions (and the fact that Tip60 cannot function outside its native complex) does not allow me to support this study.

      Response: We appreciate the reviewer’s point here, but it is important to note that the main purpose to use overexpression system in the study is to analyse the effect of different generated point/deletion mutations on TIP60. We have overexpressed proteins with different tags (GFP or RFP) or without tags (Figure 3C, Figure 5) to confirm the behaviour of protein which remains unperturbed due to presence of tags. To validate we have also examined localization of endogenous TIP60 protein which also depict similar localization behaviour as overexpressed protein. We would like to draw attention that there are several reports in literature where similar kind of overexpression system are used to determine functions of TIP60 and its mutants. Also nuclear foci pattern observed for TIP60 in our studies is also reported by several other groups.

      Sun, Y., et. al. (2005) A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci U S A, 102(37):13182-7.

      Kim, C.-H. et al. (2015) ‘The chromodomain-containing histone acetyltransferase TIP60 acts as a code reader, recognizing the epigenetic codes for initiating transcription’, Bioscience, Biotechnology, and Biochemistry, 79(4), pp. 532–538.

      Wee, C. L. et al. (2014) ‘Nuclear Arc Interacts with the Histone Acetyltransferase Tip60 to Modify H4K12 Acetylation(1,2,3).’, eNeuro, 1(1). doi: 10.1523/ENEURO.0019-14.2014.

      However, as a caution and suggested by other reviewers also we will perform some of these overexpression experiments in absence of endogenous TIP60 by using 3’ UTR specific siRNA/shRNA.

      We thank the reviewer for his comment on muti-subunit complex proteins and we would like to expand our study by determining the interaction of some of the complex subunits with TIP60 ((Wild-type) that forms nuclear condensates), TIP60 ((HAT mutant) that enters the nucleus but do not form condensates) and TIP60 ((K187R) that do not enter the nucleus and do not form condensates). We will include the result of these experiments in the revised manuscript.

      • It is known that over-expression after transient transfection can lead to non-specific acetylation of lysines on the proteins, likely in part to protect from proteasome-mediated degradation. It is not clear whether the Kac sites targeted in the experiments are based on published/public data. In that sense, it is surprising that the K327R mutant does not behave like a HAT-dead mutant (which is what exactly?) or the K187R mutant as this site needs to be auto-acetylated to free the catalytic pocket, so essential for acetyltransferase activity like in all MYST-family HATs. In addition, the effect of K187R on the total acetyl-lysine signal of Tip60 is very surprising as this site does not seem to be a dominant one in public databases.

      Response: We have chosen autoacetylation sites based on previously published studies where LC-MS/MS and in vitro acetylation assays were used to identified autoacetylation sites in TIP60 which includes K187. We have already mentioned about it in the manuscript and have quoted the references (1. Yang, C., et al (2012). Function of the active site lysine autoacetylation in Tip60 catalysis. PloS one 7, e32886. 10.1371/journal.pone.0032886. 2. Yi, J., et al (2014). Regulation of histone acetyltransferase TIP60 function by histone deacetylase 3. The Journal of biological chemistry 289, 33878–33886. 10.1074/jbc.M114.575266.). We would like to emphasize that both these studies have identified K187 as autoacetylation site in TIP60. Since TIP60 HAT mutant (with significantly reduced catalytic activity) can also enter nucleus, it is not surprising that K327 could also enter the nucleus.

      • As the physiological relevance of the results is not clear, the mutants need to be analyzed at the native level of expression to study real functional effects on transcription and localization (ChIP/IF). It is not clear the claim that Tip60 forms nuclear foci/punctate signals at physiological levels is based on what. This is certainly debated because in part of the poor choice of antibodies available for IF analysis. In that sense, it is not clear which Ab is used in the Westerns. Endogenous Tip60 is known to be expressed in multiple isoforms from splice variants, the most dominant one being isoform 2 (PLIP) which lacks a big part (aa96-147) of the so-called IDR domain presented in the study. Does this major isoform behave the same?

      Response: TIP60 antibody used in the study is from Santa Cruz (Cat. No.- sc-166323). This antibody is widely used for TIP60 detection by several methods and has been cited in numerous publications. Cat. No. will be mentioned in the manuscript. Regarding isoforms, three isoforms are known for TIP60 among which isoform 2 is majorly expressed and used in our study. Isoform and 1 and 2 have same length of IDR (150 amino acids) while isoform 3 has IDR of 97 amino acids. Interestingly, the K187 is present in all the isoforms (already mentioned in the manuscript) and missing region (96-147 amino acid) in isoform 3 has less propensity for disordered region (marked in blue circle). This clearly shows that all the isoforms of TIP60 has the tendency to phase separate.

      Author response image 1.

      • It is extremely strange to show that the K187R mutant fails to get in the nuclei by cell imaging but remains chromatin-bound by fractionation... If K187 is auto-acetylated and required to enter the nucleus, why would a HAT-dead mutant not behave the same?

      Response: We would like to draw attention that both HAT mutant and K187R mutant are not completely catalytically dead. As our data shows both these mutants have catalytic activity although at significantly decreased levels. We believe that K187 acetylation is critical for TIP60 to enter the nucleus and once TIP60 shuttles inside the nucleus autoacetylation of other sites is required for efficient chromatin binding of TIP60. In fractionation assay, nuclear membrane is dissolved while preparing the soluble fraction so there is no hindrance for K187R mutant in accessing the chromatin. While in the case of HAT mutant, it can acetylate the K187 site and thus is able to enter the nucleus however this residual catalytic activity is either not able to autoacetylate other residues required for its efficient chromatin binding or to counter activities of HDAC’s deacetylating the TIP60.

      • If K187 acetylation is key to Tip60 function, it would be most logical (and classical) to test a K187Q acetyl-mimic substitution. In that sense, what happens with the R188Q mutant? That all goes back to the fact that this cluster of basic residues looks quite like an NLS.

      Response: As suggested we will generate acetylation mimicking mutant for K187 site and examine it. Result will be added in the revised manuscript.

      • The effect of the mutant on the TIP60 complex itself needs to be analyzed, e.g. for associated subunits like p400, ING3, TRRAP, Brd8...

      Response: As suggested we will examine the effect of mutations on TIP60 complex

    1. Author Response:

      Reviewer #1:

      Summary:

      This research study utilizes a realistic motoneuron model to explore the potential to trace back the appropriate levels of excitation, inhibition, and neuromodulation in the firing patterns of motoneurons observed in in-vitro and in-vivo experiments in mammals. The research employs high-performance computing power to achieve its objectives. The work introduces a new framework that enhances understanding of the neural inputs to motoneuron pools, thereby opening up new avenues for hypothesis testing research.

      Strengths: The significance of the study holds relevance for all neuroscientists. Motoneurons are a unique class of neurons with known distribution of outputs for a wide range of voluntary and involuntary motor commands, and their physiological function is precisely understood. More importantly, they can be recorded in-vivo using minimally invasive methods, and they are directly impacted by many neurodegenerative diseases at the spinal cord level. The computational framework developed in this research offers the potential to reverse engineer the synaptic input distribution when assessing motor unit activity in humans, which holds particular importance. Overall, the strength of the findings focuses on providing a novel framework for studying and understanding the inputs that govern motoneuron behavior, with broad applications in neuroscience and potential implications for understanding neurodegenerative diseases. It highlights the significance of the study for various research domains, making it valuable to the scientific community.

      Weaknesses: The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.

      We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      Reviewer #2:

      The study presents an extensive computational approach to identify the motor neuron input from the characteristics of single motor neuron discharge patterns during a ramp up/down contraction. This reverse engineering approach is relevant due to limitations in our ability to estimate this input experimentally. Using well-established models of single motor neurons, a (very) large number of simulations were performed that allowed identification of this relation. In this way, the results enable researchers to measure motor neuron behavior and from those results determine the underlying neural input scheme. Overall, the results are very convincing and represent an important step forward in understanding the neural strategies for controlling movement.

      Nevertheless, I would suggest that the authors consider the following recommendations to strengthen the message further. First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.

      We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1: Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Ratio. The summary plots are for the models showing highest 𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Ratio).

      Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (push-pull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree left unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?

      We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task. We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed. 

      See the results below. These results will be added and discussed in the revised manuscript.

      Author response image 1.

      (2) Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (3) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (4) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 2.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract – 

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction – 

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion – 

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72. 

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A.

      2007;104(36):14330–5. 

      (3) Dekker PJT, Keil P, Rassow J, Maarse AC, Pfanner N, Meijer M. Identification of MIM23, a putative component of the protein import machinery of the mitochondrial inner membrane. FEBS Lett. 1993;330(1):66–70. 

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from:

      https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 61-64):

      While collective movement has been extensively studied in various species, including insect swarming, fish schooling, and bird murmuration (Pitcher, Partridge and Wardle, 1976; Partridge, 1982; Strandburg-Peshkin et al., 2013; Pearce et al., 2014; Rosenthal, Twomey, Hartnett, Wu, Couzin, et al., 2015; Bastien and Romanczuk, 2020; Davidson et al., 2021; Aidan, Bleichman and Ayali, 2024), as well as in swarm robotics agents performing tasks such as coordinated navigation and maze-solving (Faria Dias et al., 2021; Youssefi and Rouhani, 2021; Cheraghi, Shahzad and Graffi, 2022), most studies have focused on movement algorithms , often assuming full detection of neighbors (Parrish and Edelstein-Keshet, 1999; Couzin et al., 2002, 2005; Sumpter et al., 2008; Nagy et al., 2010; Bialek et al., 2012; Gautrais et al., 2012; Attanasi et al., 2014). Some models have incorporated limited interaction rules where individuals respond to one or a few neighbors due to sensory constraints (Bode, Franks and Wood, 2011; Jhawar et al., 2020). However, fewer studies explicitly examine how sensory interference, occlusion, and noise shape decision-making in collective systems (Rosenthal et al., 2015).

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      · Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      · Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      · Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 87-94 and 329-330). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      · Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      · Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion.

      We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me.

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight.

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats (Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 460-465.

      If so, what is the difference between phi_target and phi_tx in the model equations?

      represents the angle between the bat and the reflected object (target).

      the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 467-468). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      Author response image 1.

      What is a bat's response to colliding with a conspecific (rather than a wall)?

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldstein et al., 2024).Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both?

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 274-275):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials.

      We clarified in the revised text (Lines 534-535 in Statistical Analysis)

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the answers below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation?

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on well-documented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 430-447).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect?

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma M.), we also have empirical recordings of individuals flying under similar conditions (Goldstein et al., 2024). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities.

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (sell Lines 447-449 in Methods).

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filter bank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003). 

      We have now explicitly highlighted this in the revised version (see Lines 468-470).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation.

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming.

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.

      The reviewer is correct. Indeed, integration over multiple calls improves signal-to-noise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem?

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 518-523 in the revied version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach.

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      · Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m, as observed in Myotis grisescens and Tadarida brasiliensis (Fujioka et al., 2021; Sabol and Hudson, 1995; Betke et al., 2008; Gillam et al, 2010)

      · Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable: (see Methods lines 407-412)

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem.

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler and Bioscience, 2001‏; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 4: The impact of confusion on performance, and lines 345-355 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines XX in the manuscript for further discussion.

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to ensure coherent flight trajectories while maintaining a reasonable collision rate. These distances provide a balance between maneuverability and stability, preventing erratic flight patterns while still enabling effective obstacle avoidance. In the revised paper, we have added supplementary figures illustrating the effect of model parameters on performance, specifically focusing on the avoidance distance.

      The 15-second exit limit was determined as described in the text (Lines 403-404): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer—measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, Such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions? Does it include masking, no masking, or which species?

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss et al., 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking.

      We have revised the text to clarify these details see, lines 466.

      References:

      Aidan, Y., Bleichman, I. and Ayali, A. (2024) ‘Pausing to swarm: locust intermittent motion is instrumental for swarming-related visual processing’, Biology letters, 20(2), p. 20230468. Available at: https://doi.org/10.1098/rsbl.2023.0468.

      Attanasi, A. et al. (2014) ‘Collective Behaviour without Collective Order in Wild Swarms of Midges’. Edited by T. Vicsek, 10(7). Available at: https://doi.org/10.1371/journal.pcbi.1003697.

      Bastien, R. and Romanczuk, P. (2020) ‘A model of collective behavior based purely on vision’, Science Advances, 6(6). Available at: https://doi.org/10.1126/sciadv.aay0792.

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Bialek, W. et al. (2012) ‘Statistical mechanics for natural flocks of birds’, Proceedings of the National Academy of Sciences, 109(13), pp. 4786–4791. Available at: https://doi.org/10.1073/PNAS.1118633109.

      Bode, N.W.F., Franks, D.W. and Wood, A.J. (2011) ‘Limited interactions in flocks: Relating model simulations to empirical data’, Journal of the Royal Society Interface, 8(55), pp. 301–304. Available at: https://doi.org/10.1098/RSIF.2010.0397.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Cheraghi, A.R., Shahzad, S. and Graffi, K. (2022) ‘Past, Present, and Future of Swarm Robotics’, in Lecture Notes in Networks and Systems. Available at: https://doi.org/10.1007/978-3-030-82199-9_13.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Couzin, I.D. et al. (2002) ‘Collective Memory and Spatial Sorting in Animal Groups’, Journal of Theoretical Biology, 218(1), pp. 1–11. Available at: https://doi.org/10.1006/jtbi.2002.3065.

      Couzin, I.D. et al. (2005) ‘Effective leadership and decision-making in animal groups on the move’, Nature, 433(7025), pp. 513–516. Available at: https://doi.org/10.1038/nature03236.

      Davidson, J.D. et al. (2021) ‘Collective detection based on visual information in animal groups’, Journal of the Royal Society, 18(180), p. 2021.02.18.431380. Available at: https://doi.org/10.1098/rsif.2021.0142.

      Faria Dias, P.G. et al. (2021) ‘Swarm robotics: A perspective on the latest reviewed concepts and applications’, Sensors. Available at: https://doi.org/10.3390/s21062062.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gautrais, J. et al. (2012) ‘Deciphering Interactions in Moving Animal Groups’, PLOS Computational Biology, 8(9), p. e1002678. Available at: https://doi.org/10.1371/JOURNAL.PCBI.1002678.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldstein, A. et al. (2024) ‘Collective Sensing – On-Board Recordings Reveal How Bats Maneuver Under Severe 4 Acoustic Interference’, Under Review, pp. 1–25.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042.

      Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at: https://doi.org/10.1073/pnas.1006630107.

      Jhawar, J. et al. (2020) ‘Noise-induced schooling of fish’, Nature Physics 2020 16:4, 16(4), pp. 488–493. Available at: https://doi.org/10.1038/s41567-020-0787-y.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/1545-1542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469–478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Nagy, M. et al. (2010) ‘Hierarchical group dynamics in pigeon flocks’, Nature 2010 464:7290, 464(7290), pp. 890–893. Available at: https://doi.org/10.1038/nature08891.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Parrish, J.K. and Edelstein-Keshet, L. (1999) ‘Complexity, Pattern, and Evolutionary Trade-Offs in Animal Aggregation’, Science, 284(5411), pp. 99–101. Available at: https://doi.org/10.1126/SCIENCE.284.5411.99.

      Partridge, B.L. (1982) ‘The Structure and Function of Fish Schools’, 246(6), pp. 114–123. Available at: https://doi.org/10.2307/24966618.

      Pearce, D.J.G. et al. (2014) ‘Role of projection in the control of bird flocks’, Proceedings of the National Academy of Sciences of the United States of America, 111(29), pp. 10422–10426. Available at: https://doi.org/10.1073/pnas.1402202111.

      Pitcher, T.J., Partridge, B.L. and Wardle, C.S. (1976) ‘A blind fish can school’, Science, 194(4268), pp. 963–965. Available at: https://doi.org/10.1126/science.982056.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D., et al. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/pnas.1420068112.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S. and Couzin, I.D. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/PNAS.1420068112/-/DCSUPPLEMENTAL/PNAS.1420068112.SAPP.PDF.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648–1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001‏, undefined (no date) ‘Echolocation by insect-eating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ‏’, academic.oup.com‏HU Schnitzler, EKV Kalko‏Bioscience, 2001‏•academic.oup.com‏ [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-642-69271-0_20.

      Strandburg-Peshkin, A. et al. (2013) ‘Visual sensory networks and effective information transfer in animal groups’, Current Biology. Cell Press. Available at: https://doi.org/10.1016/j.cub.2013.07.059.

      Sumpter, D.J.T. et al. (2008) ‘Consensus Decision Making by Fish’, Current Biology, 18(22), pp. 1773–1777. Available at: https://doi.org/10.1016/J.CUB.2008.09.064.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight‏’, cs-web.bu.edu‏ [Preprint]. Available at: https://cs-web.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491–8498. Available at: https://doi.org/10.1073/pnas.0703550105.

      Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Youssefi, K.A.R. and Rouhani, M. (2021) ‘Swarm intelligence based robotic search in unknown maze-like environments’, Expert Systems with Applications, 178. Available at: https://doi.org/10.1016/j.eswa.2021.114907.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Author response:

      We thank the reviewers for their thorough evaluation and constructive feedback on our manuscript.

      We think that their valuable suggestions will strengthen the manuscript and help us clarify several important points.

      All reviewers acknowledged the importance of our theoretical results and network classification in making pattern formation analysis a more tractable problem. At the same time, they have also raised a number of important concerns that we shall carefully consider.

      A. A major clarification that the reviewers found important concerns the definition of non-trivial pattern transformations and its generalization to higher dimensions. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on non-trivial pattern transformations):

      (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail. Reviewer #2 (on non-trivial pattern transformations):

      (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slope-based measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for nontrivial pattern transformation.

      We agree with Reviewer #2 that including a more detailed definition of non-trivial pattern transformation in the main text would enhance the clarity of the paper. The one-dimensional (1D) definition currently provided in the Supplementary Information was chosen because all computations presented therein involve exclusively one-dimensional patterns. However, we acknowledge that this definition, as it was, did not have a totally unambiguous generalization  to higher dimensions. Therefore, in a revised version of the manuscript, we will incorporate an expanded definition applicable to higher-dimensional cases.

      This general definition of a non-trivial pattern transformation should make no reference to the sign of spatial derivatives of either the initial or resulting patterns. Specifically, a pattern transformation is considered non-trivial if it satisfies the following criteria:

      - It is heterogeneous: The resulting pattern is heterogeneous in space.

      - It is rearranging: The arrangement of critical points (i.e. peaks, valleys and saddle points in a gene product concentration) along the domain in the resulting pattern of a gene product is different to the arrangement of critical points in its initial pattern. This includes the emergence of new critical points, the disappearance of existing ones, or the spatial displacement of critical points from one location to another.

      - It is non-replicating: The spatial arrangement of critical points in the pattern of one gene product must differ from that of any other upstream gene product.

      Nonetheless, our two initial patterns are spatially discontinuous functions: in homogeneous initial patterns, the white noise is discontinuous by definition; and for the spike and spike+homogeneous initial patterns, we use sharp spikes defined by the rectangular function, which is discontinuous at the spike boundaries. Therefore, the aforementioned definition should be supplemented with the following two ad hoc assumptions:

      - Homogeneous initial patterns do not comprise any critical point. White noise in this type of initial patterns represents small thermodynamic fluctuations around the steady state and, for the purpose of pattern transformation, this is equivalent to a constant concentration along the domain.

      - Spike and spike+homogeneous initial patterns each contain a single critical point located at the center of the spike. The sharp spikes, modeled using the rectangular function, serve as a theoretical idealization to facilitate mathematical analysis. Once diffusion begins to act, these sharp boundaries are smoothed into differentiable gradients, maintaining a unique critical point at the center of the initial spike, which is the most relevant information for pattern transformation.

      Finally, it is worth recalling that our gene network classification is fundamentally based on an analysis of the dispersion relation associated with the gene network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain (i.e. it does not require assuming any specific number of dimensions). The fact that the description of this dispersion relation was in the SI may have been non-ideal for the understandability of the article and will, consequently, be moved to the main text in an upcoming version of the article. Thus, the gene networks that can lead to pattern transformation are the same in 1D, 2D or 3D. As for the resulting patterns, the broad description we provide also applies to any number of dimensions; these would be periodic, non periodic as in the amplified noise patterns or non periodic as in the hierarchic networks. For the latter notice that, except for boundary effects that we later discuss, the spike initial condition is radially symmetric and thus, the patterns resulting from it will also be radially symmetric. We will make this point more explicit in a revised version of the article, especially since, as suggested, this important portion of the Supplementary Information will be incorporated into the main text.

      Reviewer 2 suggests that with our definition of non-trivial pattern transformation, the simple widening of a concentration peak would constitute a non-trivial pattern transformation. This is not the case, as already shown in the figures as a example, since in a widening there is no change in the position of the critical point. A different situation applies if a wide and completely flat concentration peak (i.e. a plateau) forms. As we will explain in the coming version this is not possible because of requirement R5.

      We think that this clarification of the definition of non-trivial pattern transformation will also help clarify the next point (B below) since it would make it clearer that this article does not intend to explain which specific resulting pattern would arise from any given gene network.

      B. The main concern among these relates to the validity of our linearization of the model equations and the extension of the results obtained for the linear system to the fully nonlinear system. In this regard, the reviewers’ comments are:

      Reviewer #1:

      (on linearization):

      (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.

      Reviewer #2:

      (on linearization):

      (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.

      In this article, we address two main questions: first, which gene network topologies can give rise to non-trivial pattern transformations; and second, which broad types of resulting patterns can these gene network topologies give rise to resulting pattern. Thus, we are not intending to explain which exact resulting patterns would arise from any given gene network (i.e. a gene network topology with specific functions and interaction strengths or weights), a question for which non-linearities do indeed matter.

      For most known gene regulatory networks, available empirical information is typically limited to the nature of gene product regulations -indicating whether they act as activators or inhibitors- while details about the specific functional form of these regulations are rare. For instance, given two gene products, i and j, the network may indicate that i acts as an activator of j, implying that the concentration of j increases with that of i. However, this increase could follow a variety of functional forms: it may be quadratic (e.g., ), cubic (e.g., ), or any other function f j(gi). As we explain in the description of our model, we restrict our study to functions with a monotonicity constraint: higher concentrations of i lead to increased production of j (i.e., ).  In other words, a given gene interaction is always inhibitory or activatory, it does not change of sign. This monotonicity constraint corresponds to requirement (R5) in our main text. This requirement it is based on the biologically plausible idea that the complexity of gene regulation in development stems more from the topology of gene networks than from the complexity of the regulation by which a gene product may regulate another (i.e. we use simple monotonic functions).

      Question 1: A critical part to understand question 1 is in the dispersion relation that was explained in SI. From the reviewers’ comments it is clear that having this crucial part in the main text of an upcoming version of the article would improve understandability, specially for question 1.

      In brief, any pattern transformation requires the initial pattern to change. The trigger of such change is a change in the concentration of some gene product, either conceptualized as a noise fluctuation (in the homogeneous initial pattern) or a regulated change in a specific point (in the spike initial pattern). Mathematically, both can be conceptualized as perturbations and, for pattern transformation to be possible, such perturbation should grow so that the initial pattern becomes unstable and can change to another resulting pattern.

      If the perturbation is small, one can use the standard linear perturbation analysis in S6.2 of our Supplementary Information. In other words, the linear analysis is enough to ascertain if a small perturbation would grow or not. A gene network in which this will not happen would be unable to lead to pattern transformation, whichever the nonlinear part of f(g). In that sense, the linear approximation provides a necessary condition that any gene network needs to fulfill to lead to pattern transformation.

      However, the linear analysis would not ascertain whether a specific gene network will actually lead to pattern transformation (i.e., the condition is not sufficient). This, as well as the shape of the specific resulting pattern, may actually depend on the non-linear parts too. As we discuss, based on the dispersion relation, and other complementing arguments along the article, we can also get some insights on the possible patterns from the linear approximation alone (question 2). This arguments hold thanks to the imposition of requirements (R1-R5) on function f(g), which prevent strange behaviors stemming from the nonlinear part of the equation.

      The amplitude bound of perturbations mentioned by Reviewer #1 is addressed by requirements (R2) and (R4). Although the solution to the linear system predicts unbounded growth of unstable eigenmodes, the assume functions f(g) on which the nonlinear terms  eventually halt this growth, thereby ensuring the boundedness of solutions as imposed by (R4). This assumption on the nonlinear part is literally requirement R2 on f(g) in the main text.

      The transitive chains and branchings in section S3 of the Supplementary Information mentioned by the Reviewer #2 are topological properties of gene networks and therefore they influence only the linear part of the reaction-diffusion equations. This is why the proofs in that section are based on the linearized equations. We agree that clarifying this point in the text, as suggested by the reviewer, would improve the reader’s understanding of the section.

      Regarding Reviewer #2’s concerns about large perturbations, we acknowledge that the phrasing using “arbitrary height” may be confusing. For the homogeneous initial conditions these perturbations are assumed to be small because they are actually molecular noise (otherwise the initial condition could not be considered homogenous in the classical sense of developmental biology models). In the spike initial conditions in hierarchic networks the perturbation is not necessarily small. For the analysis provided in the SI we indeed assume that the perturbations are small enough for the linear approximation to be possible. Notice, however, that since these networks require an intracellular self-activating loop upstream of the first extracellular signal, the effective perturbation would rapidly grow to a value determined by such loop.

      In general the height of the initial spike does not affect the fact that hierarchic networks can lead to non-trivial pattern transformation. By definition these networks require the secretion of an extracellular signal from the cells in the spike (otherwise no change in gene product concentrations can occur over space). By definition this signal is not produced by any other cells and, thus, its concentration is governed by diffusion from the spike and its production in the cells in the spike. Thus, whichever the initial height of the spike and whichever the non-linearities in f(g), the signal’s concentration would decrease with the distance from the spike. As explained in the main text, this would lead to non-trivial pattern transformations if other general conditions are met. In general, the height of the initial perturbation can affect which specific pattern transformation would arise from a specific gene network but not which gene network topologies can lead to pattern transformation. This will be more clearly stated in an upcoming version of the article. C. In the following, we respond to the remaining concerns raised by the reviewers:

      Reviewer #1:

      (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.

      We acknowledge that the current version of the main text may not be as clear as we intended. Initially, we believed that placing the more technical mathematical passages in the Supplementary Information would make the main text more accessible to readers. However, we agree with the reviewer that including some of these computations in the main text could improve clarity. We also believe that adding a summary table outlining all the model’s requirements would further contribute to that goal.

      Reviewer #2:

      (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations specifically, which terms are retained or neglected and why- and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident -or, where equations are provided, it is not correct- that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:

      We believe there has been a misunderstanding regarding the presentation of the model equations (S8.2) used throughout our simulations. Accordingly, we agree that this relevant section of the Supplementary Information should be rewritten in a revised version of the manuscript to clarify this issue. Below, we address all the concerns raised by the reviewer.

      Equation (S8.2) represents the full nonlinear system described in Equation (1). While we recognize that the model may oversimplify real biological processes, its purpose is to illustrate our general statements about pattern formation rather than to capture any specific or detailed mechanism. In this context, model (S8.2) offers three key advantages for our goals: it allows rapid manipulation of gene network topology simply by modifying the matrix J, making it ideal for illustrating pattern formation across different network classes; it accommodates gene networks of arbitrary size -unlike other models, such as the classical Gierer-Meinhardt model, which are limited to two-element Turing or noise-amplifying networks-; and, due to the simplicity of its nonlinear terms, this model involves relatively few free parameters, facilitating the fine-tuning needed to identify parameter regions where non-trivial pattern transformations occur.

      Indeed, we find that the ability of model (S8.2) to illustrate our results despite having such simple nonlinear terms -bearing in mind that at least some nonlinearity is always necessary for selforganization- strongly supports the claim that the capacity of a gene network to produce pattern transformations is fully determined by the linear part of Equation (1). In this sense, nonlinear terms primarily influence the precise parameter values at which these transformations occur and contribute to shaping specific features of the resulting patterns.

      Model (S8.2) has been successfully employed in pattern formation studies elsewhere in the literature; accordingly, we provide relevant bibliographic references to support its widespread use.

      We believe the misunderstanding arises from our explanation of the biological interpretation of the model. As noted in the accompanying bibliography, the model is based on a general reactiondiffusion mechanism assuming the existence of a steady state. However, this conceptual reactiondiffusion framework is not the same as our Equation (1); rather, it was introduced by the original proponents of the model in the seminal paper cited in our text. In this context, Equation (S8.2) describes small concentration perturbations around that steady state, where the variables represent deviations in concentration relative to the general steady state.

      The aforementioned general steady state corresponds to the trivial equilibrium point g≡0 in equations (S8.2). Consequently, all our simulations based on model (S8.2) start from this steady state, to which we add white noise to generate homogeneous initial patterns or a sharp spike for the two types of spike initial patterns.

      It is also worth noting that Equations (S8.2) represent a non-dimensional model.

      It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition -for example, f5 as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.

      From the explanations above, it is important to distinguish two scales in the process: the scale of small perturbations, where equations (S8.2) apply; and the global scale, where the conceptual general reaction-diffusion system operates. Since the specific form of this general system does not affect equations (S8.2), we assume that it follows any of the models cited in the text, which yield a non-zero steady state at .

      In this sense, Equation (S8.2) represent a small concentration deviation of such global system and g(t ,x) is a relative concentration where g≡0 represents the steady-state at are concentrations above , and g<0 are concentrations below .

      As previously mentioned, simulations are performed using Equations (S8.2) on the basis of the equilibrium point g≡0. The result of these simulations is then superimposed on the non-zero steady state and presented in the figures along the article.

      Using the full model instead of the simplified Equations (S8.2) may result in slightly different resulting patterns, but it does not affect the gene network’s ability to produce pattern transformations, nor does it alter the main structural properties of the patterns—for example, the periodic nature of patterns generated by Turing networks.

      Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.

      The Jacobian of Equation (S8.2) is independent of g because g represents a small perturbation around a steady state of a general reaction-diffusion system. Consequently, the matrix J corresponds to the Jacobian of the general system evaluated at that steady state. Evaluating the Jacobian of equations (S8.2) at the equilibrium point g≡0 -which represents the general steady state- recovers the matrix J.

      This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.

      "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0 ) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i )." (SI chapter S8).

      This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.

      For homogeneous and spike+homogeneous initial conditions, we interpret equations (S8.2) as small perturbations around a non-zero steady state of a general reaction-diffusion system. For spike-only initial conditions, that steady state is zero. As we mention before, g≡0 will then represent such steady-state of zero concentration, g>0 are positive concentrations of the general system, and g<0 would represent unfeasible negative concentrations of the general system. Therefore, the use of a cutoff function to handle such initial conditions is justified. Moreover, this cutoff function is the same as the one employed in the reference general system cited in our paper.

      We acknowledge that the cutoff influences the simulations and accounts for the differences observed between spike and spike+homogeneous initial conditions. However, this distinction reflects what occurs in real biological systems, which is precisely why we differentiate these two types of initial states. For instance, the emergence of a periodic pattern in a noise-amplifying network depends critically on the formation of regions with concentrations below the steady state near the initial spike. Such regions can form in spike-plus-homogeneous initial patterns but not in spike-only initial patterns, where concentrations below the steady state would correspond to biologically unfeasible negative values.

      Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.

      It is explicitly stated in the relevant subsections of Section S7 in the Supplementary Information that, for the simulations involving RH2a and RH2b, the function f(g) in equation (S8.2) is modified by adding an ad hoc quadratic term to enable the assessment of these criteria.

      (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:

      "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).

      We believe that the confusion in this statement arises from the ambiguous use of the phrase “our results”. We will revise the text to provide a more precise description. Specifically, by “our results,” we refer to the conclusion that it is possible to determine whether a gene network leads to nontrivial pattern transformations based solely on its topology. This conclusion is independent of the dimensionality of space, as none of our arguments rely on assumptions specific to spatial dimensions. While one-dimensional examples are used for clarity and illustration, the underlying reasoning applies generally. In an improved version of the article, we will clarify this point explicitly and move relevant arguments from the Supplementary Information into the main text.

      Critically, our classification of gene networks is ultimately based on an argument concerning the dispersion relation associated with the network, and the construction of this dispersion relation is independent of the spatial dimensionality of the domain. In this sense, the networks identified in the text as capable of producing pattern transformations will be able to generate non-trivial pattern transformations in any spatial domain and in any number of dimensions. While the specific parameter values that permit such transformations may vary depending on the geometry and dimensionality of the domain, the existence of at least one such parameter set remains unaffected.

      The geometry of the domain can influence the specific form of the resulting patterns, but it does not alter the broader class of patterns (e.g., periodic patterns, peaks emerging around a spike, etc.) that a given gene network topology can produce. One such geometric influence, commonly observed in simulations, involves boundary effects. For example, structures such as peaks or rings forming near the boundaries may appear higher, broader, or spatially shifted compared to those arising in the central regions of the domain. However, we think a pattern consisting of a periodic train of peaks where only those near the boundary are slightly different can still be classified as a periodic pattern.

      "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).

      A justification for this statement is provided shortly after the claim, although we acknowledge that the current explanation is somewhat cumbersome and would benefit from a clearer presentation in a revised version of the main text.

      A more detailed justification is provided in the Supplementary Information, based on three key ideas. First, any pattern (provided it is symmetric with respect to the initial spike) can be described as an arrangement of peaks with varying heights and spatial positions along a one-dimensional domain. Second, there exists a simple gene network—the diamond network—that, through parameter tuning, can produce two peaks of arbitrary height and symmetric position relative to the initial spike. Third, by placing multiple diamond networks positively upstream of a common gene product, that gene product can express peaks at each location where the upstream diamond networks induce them. Under mild additional conditions, this mechanism allows the formation of essentially any symmetric pattern. These mild conditions, along with a detailed analysis of the diamond network’s ability to generate peaks with controllable height and position, are discussed in the Supplementary Information.

      "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).

      This result pertains specifically to pattern transformations arising from spike initial patterns. As defined in the text, spike initial patterns are radially symmetric. Since diffusion preserves radial symmetry, pattern transformations from spike initial patterns in two or three dimensions reduce to effectively one-dimensional transformations along each radial direction. In this framework, each pair of concentration peaks symmetric with respect to the spike in one dimension corresponds to a ring surrounding the spike in two dimensions, and each ring in two dimensions becomes a hollow spherical shell around the spike in three dimensions.

      We agree that including a brief section in the Supplementary Information to clarify these subtleties would be helpful for readers to better understand the generalization of certain patterns to higher dimensions.

      (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.

      We acknowledge that our explanation regarding the combination of sub-networks was relatively brief, and we intend to address this in a revised version. Our argument that combining sub-networks does not produce qualitatively new types of pattern transformations -beyond those already described- is based on the dispersion relation. Although this relation was only detailed in the Supplementary Information, it is central to our argument and will therefore be moved to the main text. Below, we provide an outline of this argument:

      Our study identifies two distinct behaviors of the principal branch of the dispersion relation at large wavenumbers. Based on this, gene networks capable of pattern formation can be classified into two categories: networks of the first kind, where the real part of the principal branch diverges to infinity as the wavenumber increases; and networks of the second kind, where the real part of the principal branch converges to a positive finite value for large wavenumbers. Naturally this argument applies to any gene network irrespectively of which, or how many, sub-networks are used to built it.

      Any gene regulatory network capable of pattern formation falls into one of these two categories. We identified that networks of the first kind contain at least one Turing sub-network, whereas networks of the second kind include either an H sub-network or a noise-amplifying sub-network. In this way, the primary objective of our study -namely, achieving a topological classification of gene regulatory networks capable of pattern formation- is fulfilled. It is important to note that while the dispersion relation provides broad information about the possible resulting patterns a gene network topology can produce (e.g., periodic versus noisy), it does not specify the exact patterns that emerge for each particular set of parameter values.

      Finally, regarding the shape of the resulting patterns, Figure S10 in the Supplementary Information exemplifies the notion that the behavior of combined networks can be understood as a combination of the individual behaviors of each constituent sub-network (note that the contribution of each type of sub-network in the resulting pattern is readily distinguishable). Consequently, we focus our detailed analysis on the patterning properties of the fundamental classes.

      (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells -such as their size or boundaries- can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.

      The size of cells is ignored in our model because we assume that they are small enough with respect to the total size of the domain that the space continuous reaction-diffusion equation (equation (1) in the main text) holds. Conceptually, one could understand cells in our model each of the pieces in an even partition of the domain into small subdomains surrounding each position x. This is anyway the standard procedure in most models of pattern formation by reaction-diffusion in embryonic development.

      For extracellular signals, we assume that g(t ,x) corresponds to the concentration of the signal in the extracellular space surrounding the cell located at position x. The extracellular space is any fluid medium for which Fick Laws apply and, therfore, the Fickian diffusion term in equation (1) is valid.

      For intracellular gene products, we assume that g(t ,x) corresponds to the concentration of such gene product within the cell at position x (if the gene product in hand is a transcription factor, for example), or on its surface (if it is a membrane-bound receptor). When collapsed in the continuous equations there is not such difference between being strictly within the cell or on its boundary. The only important fact is that these gene products cannot diffuse.

      Regarding cell boundaries, let us consider an extracellular signal s that regulates a transcriptor factor i within cells (in our model, i is an intracellular gene product). Such regulation shall be mediated by a membrane-bound receptor, which corresponds to intracellular gene product j. In terms of the gene regulatory network this is sji. Cell boundary effects mentioned by the reviewer should be encapsulated in the specific functional form of the regulation function f(g), but they have no effect in the actual topology of the network. Consequently, they are out of the scope of this study: as we mentioned before, considering different non-linear terms for f(g) will affect the parameter range for which a gene network is capable of producing non-trivial pattern transformations, but not their overall ability to produce non-trivial pattern transformations (i.e., the existence of at least one choice of model parameters for which such transformations take place).

      Finally, we would like to once again express our sincere gratitude to all reviewers for their insightful and constructive feedback. We are confident that the thorough peer review process will significantly enhance both the clarity and depth of our work. We greatly value the detailed comments provided and will carefully incorporate them in the preparation of a revised manuscript, which we intend to submit in the coming months.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author response:

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We appreciate the Editorial assessment on our paper’s strengths and novelty.  We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning.  Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      We thank the Reviewers for their comments and suggestions, prompting new analyses and additions that strengthened our report.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      We have previously showed that neural replay of MEG activity representing the practiced skill correlated with micro-offline gains during rest intervals of early learning, 1 consistent with the recent report that hippocampal ripples during these offline periods predict human motor sequence learning2.  However, decoding accuracy in our earlier work1 needed improvement.  Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses: 

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions. 

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while head position was not monitored online for this study, the head was restrained using an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. Head movement did not exceed 5mm between the beginning and end of each scan for all participants included in the study. Fourth, we agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. However, in order for any such correlations to meaningfully impact decoding performance, such head movements would need to: (A) be consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) systematically vary between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is extremely unlikely.

      Given the task design, a much more likely confound in our estimation would be the contribution of eye movement artefacts to the decoder performance (an issue appropriately raised by Reviewer #3 in the comments below). Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may move their eyes in a way that is systematically related to the task.  Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (or keyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (Overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts).

      In fact, inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. A similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued.  The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals. 1,3-5  Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known.  Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported6-11, and appears to be even more prominent during early fine motor skill learning in the non-dominant hand12,13.  The frontal regions identified in these studies are known to play crucial roles in executive control14, motor planning15, and working memory6,8,16-18 processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations6,8,16-18, in addition to working memory19. Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task.  We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We strongly disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular. To clarify, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications. One could also view this hybrid-space decoding approach as a spatial analogue to common time-frequency based analyses such as theta-gamma phase amplitude coupling (PAC), which combine information from two or more narrow-band spectral features derived from the same time-series data.

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (HybridAlt) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (HybridOrig). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± SD 7.03% for HybridOrig vs. 75.49% ± SD 7.17% for HybridAlt; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04) (Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. HybridAlt: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. HybridOrig:  Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that HybridOrig (the approach used in our manuscript) significantly outperforms the HybridAlt approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns.

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen. 

      We definitely agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated. This has been well documented in the MEG literature20,21 and is a particularly important confound to address in functional or effective connectivity analyses (not performed in the present study). In the present analysis, any correlation between adjacent voxels presents a multi-collinearity problem, which effectively reduces the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. - the effective dimensionality is still greater than 1), the intra-parcel spatial patterns could still meaningfully contribute to the decoder performance. Two specific results support this assertion.

      First, we obtained higher decoding accuracy with voxel-space features [74.51% (± SD 7.34%)] compared to parcel space features [68.77% (± SD 7.6%)] (Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel-space features.  Second, Individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding supports the Reviewer’s assertion that neighboring voxels express similar information, but also shows that the correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside in.

      Author response image 3.

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding.

       

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment. 

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics22,23 muscle activation patterns24 and temporal sequencing25 during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).  

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions". 

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these substantial shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans performing a similar sequence learning task showed that flexibility in brain network composition (i.e. – changes in brain region members displaying coordinated activity) is up-regulated in novel learning environments and explains differences in learning rates across individuals26.  This work supports our interpretation of the present study data, that brain networks engaged in sequential motor skills rapidly reconfigure during early learning.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning27,28. For example, reactivation events in the posterior parietal29 and medial prefrontal30,31 cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains32, including motor sequence learning1,33,34.  Further, synchronized interactions between MPFC and hippocampus are more prominent during early learning as opposed to later stages27,35,36, perhaps reflecting “redistribution of hippocampal memories to MPFC” 27.  MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning37. Consistently, coupling between hippocampus and MPFC has been shown during, and importantly immediately following (rest) initial memory encoding38,39.  Importantly, MPFC activity during initial memory encoding predicts subsequent recall40. Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” 28, also engaged in the development of an abstract representation of the sequence41.  In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” 42-44 required during early learning42-44. The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice45, all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding46,47.  Thus, several prefrontal and frontoparietal regions contributing to long term learning 48 are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning.  We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here. 

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power and neural replay density during inter-practice rest periods) to observed micro-offline gains49.

      Reviewer #2 (Public review): 

      Summary 

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond. <br /> Strengths 

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea. 

      Weaknesses 

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation. The issue can essentially be framed as a mixing problem. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Moreover, if the representation distance is largely driven by this mixing effect, it’s also possible that the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      We also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Overall, we do strongly agree with the Reviewer that the naturalistic, self-paced, generative task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the keyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study. 

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide some insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans.  This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider these specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study.  We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself. 

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the keyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses.  We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the keyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder.  Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the keyDown event (t0 = 0 ms).  We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window.  Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study.  Ongoing work in our lab, as pointed out above, is investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well. 

      The Reviewer suggests that the current data is not convincing enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last IndexOP5 and first IndexOP1 from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Author response image 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest period.

      Author response image 4.

      Distribution of individual subject correlation coefficients between contextualization changes occurring during practice or rest with  micro-online and micro-offline performance gains. Note that, the correlation distributions were significantly higher for the relationship between contextualization changes during rest and micro-offline gains than for contextualization changes during practice and either micro-online or offline gain.

      With respect to the second concern highlighted above, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the reviewed manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out.   When quantifying online changes in contextualization from the first IndexOP1 the last IndexOP5 keypress in the same trial we observed no learning-related trend (Author response image 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Author response image 6).

      Author response image 5.

      Trial by trial trend of offline (left panel) and online (middle and right panels) changes in contextualization. Offline changes in contextualization were assessed by calculating the distance between neural representations for the last IndexOP5 keypress in the previous trial and the first IndexOP1 keypress in the present trial. Two different approaches were used to characterize online contextualization changes. The analysis included in the reviewed manuscript (middle panel) calculated the distance between IndexOP1 and IndexOP5 for each correct sequence, which was then averaged across the trial. This approach is limited by the lack of control for the passage of time when making online versus offline comparisons. Thus, the second approach controlled for the passage of time by calculating distance between the representations associated with the first IndexOP1 keypress and the last IndexOP5 keypress within the same trial. Note that while the first approach showed an increase online contextualization trend with practice, the second approach did not.

      Author response image 6.

      Relationship between online contextualization and online learning is shown for both within-sequence (left; note that this is the online contextualization measure used in the reviewd manuscript) and across-sequence (right) distance calculation. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals. 

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. <br /> Strengths: 

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter). 

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?). 

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.  

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.  

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. –  3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space.  We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption. 

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions50. In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above and agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above replay to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would not address our experimental question: “do neural representations of the same action performed at different locations within a skill sequence contextually differentiate or remain stable as learning evolves”.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023). 

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial (which is pre-planned offline) is performed in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes.  The Reviewer is particularly concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. However, in contrast to the Reviewers stated argument above, findings from Korneysheva et. al (2019) showed that neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence.  Thus, mixing effects are likely still present for the first keypress in a trial. Also note that we now present new control analyses in multiple responses above confirming that hypothetical mixing effects between adjacent keypresses do not explain our reported contextualization finding. A statement addressing these possibilities raised by the Reviewer has been added to the Discussion in the revised manuscript.

      In relation to pre-planning, ongoing MEG work in our lab is investigating contextualization within different time windows tailored specifically for assessing how sequence skill action planning evolves with learning.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice).  It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable. 

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualization effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts in general on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement-related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. Notably, the minimal participant engagement with the visual task display observed in this study highlights an important difference between behavior observed during explicit sequence learning motor tasks (which is highly generative in nature) with reactive responses to stimulus cues in a serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when comparing findings across studies. All elements pertaining to this new control analysis are now included in the revised manuscript.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"? 

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differention” vs micro-online gains, (2) “online differention” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Author response images 4, 5 and 6 above). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      This statement is incorrect. The original Bonstrup et al (2019) 49 paper clearly states that micro-offline gains must be carefully interpreted based upon the behavioral context within which they are observed, and lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning.  In fact, the excellent meta-analysis of Pan & Rickard (2015) 51, which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study49, as well as in all our subsequent work. Pan & Rickard stated:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943). It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks52,53. Rickard, Cai, Rieth, Jones, and Ard (2008) and Brawn, Fenn, Nusbaum, and Margoliash (2010) 52,53 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008) massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard51 made several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They stated:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead 51. One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead 51. That design appears sufficient to eliminate at least the majority of the reactive inhibition effect 52,53.”

      We mindfully incorporated recommendations from Pan and Rickard51  into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects. 

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.”  The initial Bönstrup et al. (2019) 49 report was followed up by a large online crowd-sourcing study (Bönstrup et al., 2020) 54. This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 7 below for further details on these conditions).

      Author response image 7.

      Micro-offline gains observed in learning and non-learning contexts are attributed to different underlying causes. (A) Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from Bönstrup et al. (2019) 49. During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also 54). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature 55-57, argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning.  The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds.

      Evidence documented in that paper54 showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118);  3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) 54.  Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve Pan and Rickard51 refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects1. Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study1) linked to micro-offline gains during early skill learning. 33 These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice58. Third, even more recently, Chen et al. (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple events (which are known markers for neural replay59) in the hippocampus (80-120 Hz in humans) with micro-offline gains during early skill learning. The authors report that the strong increase in ripple rates tracked learning behavior, both across blocks and across participants. The authors conclude that hippocampal ripples during resting offline periods contribute to motor sequence learning. 2

      Thus, there is actually now substantial evidence in the literature directly supporting the assertion “that micro-offline gains really result from offline learning”.  On the contrary, according to Gupta & Rickard (2024) “…the mechanism underlying RI [reactive inhibition] is not well established” after over 80 years of investigation60, possibly due to the fact that “reactive inhibition” is a categorical description of behavioral effects that likely result from several heterogenous processes with very different underlying mechanisms.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). 

      It is important to point out that the recent work of Gupta & Rickard (2022,2024) 55 does not present any data that directly opposes our finding that early skill learning49 is expressed as micro-offline gains during rest breaks. These studies are essentially an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.  To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning. Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods. Again, we reported the same finding for trials following the early learning period in our original Bönstrup et al. (2019) paper49 (Author response image 7). Also, please note that we reported in this paper that cumulative micro-offline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later49 (see the Results section and further elaboration in the Discussion). Thus, while the composition of our data is supportive of a short-term memory consolidation process operating over several seconds during early learning, it likely differs from those involved over longer training times and offline periods, as assessed by Gupta & Rickard (2022).

      In the recent preprint from Das et al (2024) 61,  the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data.   The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”.  The study utilizes a spaced vs. massed practice group between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis. Crucially, the design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning1,33,49,54,57,58,62.  A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 8):

      Author response image 8.

      (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original Bönstrup et al. (2019) 49 paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report 49  (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) 49 is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range.

      First, participants in the original Bönstrup et al. study 49 experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 8).  Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.  

      Second, and perhaps most importantly, the actual intervention (i.e. – the difference in practice schedule between the Spaced and Massed groups) employed by Das et al. covers a very small fraction of the overall training session. Identical practice schedule segments for both the Spaced & Massed groups are indicated by the red shaded area in Author response image 8. Please note that these identical segments cover 94.84% of the Massed group training schedule and 88.01% of the Spaced group training schedule (since it has 60 seconds of additional rest). This means that the actual interventions cover less than 5% (for Massed) and 12% (for Spaced) of the total training session, which minimizes any chance of observing a difference between groups.

      Also note that the very beginning of the practice schedule (during which Figure R9 shows substantial learning is known to occur) is labeled in the Das et al. study as Test 1.  Test 1 encompasses the first 20 seconds of practice (alternatively viewed as the first two 10-second-long practice trials with no inter-practice rest). This is immediately followed by the Training 1 intervention, which is composed of only three 10-second-long practice trials (with 10-second inter-practice rest for the Spaced group and no inter-practice rest for the Massed group). Author response image 8 also shows that since there is no inter-practice rest after the third Training practice trial for the Spaced group, this third trial (for both Training 1 and 2) is actually a part of an identical practice schedule segment shared by both groups (Massed and Spaced), reducing the magnitude of the intervention even further.

      Moreover, we know from the original Bönstrup et al. (2019) paper49 that 46.57% of all overall group-level performance gains occurred between trials 2 and 5 for that study. Thus, Das et al. are limiting their designed intervention to a period covering less than half of the early learning range discussed in the literature, which again, minimizes any chance of observing an effect.

      This issue is amplified even further at Training 2 since skill learning prior to the long 5-minute break is retained, further constraining the performance range over these three trials. A related issue pertains to the trials labeled as Test 1 (trials 1-2) and Test 2 (trials 6-7) by Das et al. Again, we know from the original Bönstrup et al. paper 49 that 18.06% and 14.43% (32.49% total) of all overall group-level performance gains occurred during trials corresponding to Das et al Test 1 and Test 2, respectively. In other words, Das et al averaged skill performance over 20 seconds of practice at two time-points where dramatic skill improvements occur. Pan & Rickard (1995) previously showed that such averaging is known to inject artefacts into analyses of performance gains.

      Furthermore, the structure of the Test in Das et. al study appears to have an interference effect on the Spaced group performance after the training intervention.  This makes sense if you consider that the Spaced group is required to now perform the task in a Massed practice environment (i.e., two 10-second-long practice trials merged into one long trial), further blurring the true intervention effects. This effect is observable in Figure 1C,E of their pre-print. Specifically, while the Massed group continues to show an increase in performance during test relative to the last 10 seconds of practice during training, the Spaced group displays a marked decrease. This decrease is in stark contrast to the monotonic increases observed for both groups at all other time-points.

      Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (as opposed to after it has been removed) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized49. Extrapolation of this current framework to post-plateau performance periods, longer timespans, or non-learning situations (e.g. – the Non-repeating groups from Experiments 3 & 4 in Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      References

      (1) Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M. & Cohen, L. G. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 35, 109193 (2021). https://doi.org:10.1016/j.celrep.2021.109193

      (2) Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H. & Staresina, B. P. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680 (2024). https://doi.org:10.1101/2024.10.06.614680

      (3) Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol 79, 1117-1123 (1998).

      (4) Karni, A. et al. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature 377, 155-158 (1995). https://doi.org:10.1038/377155a0

      (5) Kleim, J. A., Barbay, S. & Nudo, R. J. Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol 80, 3321-3325 (1998).

      (6) Shadmehr, R. & Holcomb, H. H. Neural correlates of motor memory consolidation. Science 277, 821-824 (1997).

      (7) Doyon, J. et al. Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A 99, 1017-1022 (2002).

      (8) Toni, I., Ramnani, N., Josephs, O., Ashburner, J. & Passingham, R. E. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage 14, 1048-1057 (2001).

      (9) Grafton, S. T. et al. Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci 12, 2542-2548 (1992).

      (10) Kennerley, S. W., Sakai, K. & Rushworth, M. F. Organization of action sequences and the role of the pre-SMA. J Neurophysiol 91, 978-993 (2004). https://doi.org:10.1152/jn.00651.2003 00651.2003 [pii]

      (11) Hardwick, R. M., Rottschy, C., Miall, R. C. & Eickhoff, S. B. A quantitative meta-analysis and review of motor learning in the human brain. Neuroimage 67, 283-297 (2013). https://doi.org:10.1016/j.neuroimage.2012.11.020

      (12) Sawamura, D. et al. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep 9, 20397 (2019). https://doi.org:10.1038/s41598-019-56956-0

      (13) Lee, S. H., Jin, S. H. & An, J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep 9, 14066 (2019). https://doi.org:10.1038/s41598-019-50644-9

      (14) Battaglia-Mayer, A. & Caminiti, R. Corticocortical Systems Underlying High-Order Motor Control. J Neurosci 39, 4404-4421 (2019). https://doi.org:10.1523/JNEUROSCI.2094-18.2019

      (15) Toni, I., Thoenissen, D. & Zilles, K. Movement preparation and motor intention. Neuroimage 14, S110-117 (2001). https://doi.org:10.1006/nimg.2001.0841

      (16) Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci 1, 529-533 (1998). https://doi.org:10.1038/2245

      (17) Andersen, R. A. & Buneo, C. A. Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25, 189-220 (2002). https://doi.org:10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      (18) Buneo, C. A. & Andersen, R. A. The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia 44, 2594-2606 (2006). https://doi.org:S0028-3932(05)00333-7 [pii] 10.1016/j.neuropsychologia.2005.10.011

      (19) Grover, S., Wen, W., Viswanathan, V., Gill, C. T. & Reinhart, R. M. G. Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci 25, 1237-1246 (2022). https://doi.org:10.1038/s41593-022-01132-3

      (20) Colclough, G. L. et al. How reliable are MEG resting-state connectivity metrics? Neuroimage 138, 284-293 (2016). https://doi.org:10.1016/j.neuroimage.2016.05.070

      (21) Colclough, G. L., Brookes, M. J., Smith, S. M. & Woolrich, M. W. A symmetric multivariate leakage correction for MEG connectomes. NeuroImage 117, 439-448 (2015). https://doi.org:10.1016/j.neuroimage.2015.03.071

      (22) Mollazadeh, M. et al. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci 31, 15531-15543 (2011). https://doi.org:10.1523/JNEUROSCI.2999-11.2011

      (23) Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W. & Donoghue, J. P. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol 105, 1603-1619 (2011). https://doi.org:10.1152/jn.00532.2010

      (24) Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J Neurophysiol 108, 18-24 (2012). https://doi.org:10.1152/jn.00832.2011

      (25) Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51-56 (2012). https://doi.org:10.1038/nature11129

      (26) Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A 108, 7641-7646 (2011). https://doi.org:10.1073/pnas.1018985108

      (27) Albouy, G., King, B. R., Maquet, P. & Doyon, J. Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus 23, 985-1004 (2013). https://doi.org:10.1002/hipo.22183

      (28) Albouy, G. et al. Neural correlates of performance variability during motor sequence acquisition. Neuroimage 60, 324-331 (2012). https://doi.org:10.1016/j.neuroimage.2011.12.049

      (29) Qin, Y. L., McNaughton, B. L., Skaggs, W. E. & Barnes, C. A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci 352, 1525-1533 (1997). https://doi.org:10.1098/rstb.1997.0139

      (30) Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150 (2007). https://doi.org:10.1126/science.1148979

      (31) Molle, M. & Born, J. Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron 61, 496-498 (2009). https://doi.org:S0896-6273(09)00122-6 [pii] 10.1016/j.neuron.2009.02.002

      (32) Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat Rev Neurosci 6, 119-130 (2005). https://doi.org:10.1038/nrn1607

      (33) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A 117, 23898-23903 (2020). https://doi.org:10.1073/pnas.2009576117

      (34) Albouy, G. et al. Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage 108, 423-434 (2015). https://doi.org:10.1016/j.neuroimage.2014.12.049

      (35) Gais, S. et al. Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A 104, 18778-18783 (2007). https://doi.org:0705454104 [pii] 10.1073/pnas.0705454104

      (36) Sterpenich, V. et al. Sleep promotes the neural reorganization of remote emotional memory. J Neurosci 29, 5143-5152 (2009). https://doi.org:10.1523/JNEUROSCI.0561-09.2009

      (37) Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057-1070 (2012). https://doi.org:10.1016/j.neuron.2012.12.002

      (38) van Kesteren, M. T., Fernandez, G., Norris, D. G. & Hermans, E. J. Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A 107, 7550-7555 (2010). https://doi.org:10.1073/pnas.0914892107

      (39) van Kesteren, M. T., Ruiter, D. J., Fernandez, G. & Henson, R. N. How schema and novelty augment memory formation. Trends Neurosci 35, 211-219 (2012). https://doi.org:10.1016/j.tins.2012.02.001

      (40) Wagner, A. D. et al. Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science (New York, N.Y.) 281, 1188-1191 (1998).

      (41) Ashe, J., Lungu, O. V., Basford, A. T. & Lu, X. Cortical control of motor sequences. Curr Opin Neurobiol 16, 213-221 (2006).

      (42) Hikosaka, O., Nakamura, K., Sakai, K. & Nakahara, H. Central mechanisms of motor skill learning. Curr Opin Neurobiol 12, 217-222 (2002).

      (43) Penhune, V. B. & Steele, C. J. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res. 226, 579-591 (2012). https://doi.org:10.1016/j.bbr.2011.09.044

      (44) Doyon, J. et al. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural brain research 199, 61-75 (2009). https://doi.org:10.1016/j.bbr.2008.11.012

      (45) Schendan, H. E., Searl, M. M., Melrose, R. J. & Stern, C. E. An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37, 1013-1025 (2003). https://doi.org:10.1016/s0896-6273(03)00123-5

      (46) Morris, R. G. M. Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23, 2829-2846 (2006). https://doi.org:10.1111/j.1460-9568.2006.04888.x

      (47) Tse, D. et al. Schemas and memory consolidation. Science 316, 76-82 (2007). https://doi.org:10.1126/science.1135935

      (48) Berlot, E., Popp, N. J. & Diedrichsen, J. A critical re-evaluation of fMRI signatures of motor sequence learning. Elife 9 (2020). https://doi.org:10.7554/eLife.55241

      (49) Bonstrup, M. et al. A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol 29, 1346-1351 e1344 (2019). https://doi.org:10.1016/j.cub.2019.02.049

      (50) Kornysheva, K. et al. Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron 101, 1166-1180 e1163 (2019). https://doi.org:10.1016/j.neuron.2019.01.018

      (51) Pan, S. C. & Rickard, T. C. Sleep and motor learning: Is there room for consolidation? Psychol Bull 141, 812-834 (2015). https://doi.org:10.1037/bul0000009

      (52) Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J. & Ard, M. C. Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn 34, 834-842 (2008). https://doi.org:10.1037/0278-7393.34.4.834

      53) Brawn, T. P., Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci 30, 13977-13982 (2010). https://doi.org:10.1523/JNEUROSCI.3295-10.2010

      (54) Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N. & Cohen, L. G. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn 5, 7 (2020). https://doi.org:10.1038/s41539-020-0066-9

      (55) Gupta, M. W. & Rickard, T. C. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn 7, 25 (2022). https://doi.org:10.1038/s41539-022-00140-z

      (56) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proceedings of the National Academy of Sciences 117, 23898-23903 (2020).

      (57) Brooks, E., Wallis, S., Hendrikse, J. & Coxon, J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn 9, 23 (2024). https://doi.org:10.1038/s41539-024-00238-6

      (58) Deleglise, A. et al. Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex 33, 6120-6131 (2023). https://doi.org:10.1093/cercor/bhac489

      (59) Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073-1188 (2015). https://doi.org:10.1002/hipo.22488

      (60) Gupta, M. W. & Rickard, T. C. Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep 14, 4661 (2024). https://doi.org:10.1038/s41598-024-52726-9

      (61) Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P. & Azanon, E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795 (2024). https://doi.org:10.1101/2024.07.11.602795

      (62) Mylonas, D. et al. Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci 44 (2024). https://doi.org:10.1523/JNEUROSCI.1839-23.2024

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others.

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction to proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/) showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically, but showed that ancient states often show more favourable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine. There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in PostLUCA than in LUCA, vs. Ancient (Table II) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Table I). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      Author response table 1.

      Amino acid fractional difference of all coenzymes at residue level

      Author response table 2.

      Amino acid fractional difference of all coenzymes

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleftalpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript.<br /> Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Table III and IV (below) show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      Author response table 3.

      Amino acid fractional difference of non-phosphate containing coenzymes

      Author response table 4.

      Amino acid fractional difference of non-phosphate containing coenzymes at residue level

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study. We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone. Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzymepeptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphatecontaining coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

    1. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

    1. Author response:

      Reviewer #1 (Public review):

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, in cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Author response image 1 below). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      Author response image 1.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 2 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 2.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (7) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution. We have updated in the cryoEM table

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see figure; response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study reveals that TRPV1 signaling plays a key role in tympanic membrane (TM) healing by promoting macrophage recruitment and angiogenesis. Using a mouse TM perforation model, researchers found that blood-derived macrophages accumulated near the wound, driving angiogenesis and repair. TRPV1-expressing nerve fibers triggered neuroinflammatory responses, facilitating macrophage recruitment. Genetic Trpv1 mutation reduced macrophage infiltration, angiogenesis, and delayed healing. These findings suggest that targeting TRPV1 or stimulating sensory nerve fibers could enhance TM repair, improve blood flow, and prevent infections. This offers new therapeutic strategies for TM perforations and otitis media in clinical settings. This is an excellent and high-quality study that provides valuable insights into the mechanisms underlying TM wound healing.

      Strengths:

      The work is particularly important for elucidating the cellular and molecular processes involved in TM repair. However, there are several concerns about the current version.

      We sincerely thank Reviewer #1 for their time and effort in evaluating and improving our study. Below, we are pleased to address the Reviewer's concerns point by point.

      Weaknesses:

      Major concerns

      (1) The method of administration will be a critical factor when considering potential therapeutic strategies to promote TM healing. It would be beneficial if the authors could discuss possible delivery methods, such as topical application, transtympanic injection, or systemic administration, and their respective advantages and limitations for targeting TRPV1 signaling. For example, Dr. Kanemaru and his colleagues have proposed the use of Trafermin and Spongel to regenerate the eardrum.

      We are grateful to the reviewer for raising this important point. While the present study primarily focuses on the mechanistic role of TRPV1 in TM repair, we agree that the mode of therapeutic delivery will be pivotal in translating these findings into clinical practice. In response, we will expand the discussion to explore possible delivery methods—such as topical application, transtympanic injection, and systemic routes—along with their respective benefits and challenges. We will also cite the work by Dr. Kanemaru and colleagues as an example of how local delivery systems may facilitate TM regeneration.

      (2) The authors appear to have used surface imaging techniques to observe the TM. However, the TM consists of three distinct layers: the epithelial layer, the fibrous middle layer, and the inner mucosal layer. The authors should clarify whether the proposed mechanism involving TRPV1-mediated macrophage recruitment and angiogenesis is limited to the epithelial layer or if it extends to the deeper layers of the TM.

      We apologize for any confusion caused by our previous description. In our study, we utilized Z-stack confocal imaging to capture the full thickness of the TM, as illustrated in Author response image 1 (reconstructed from the acquired Z-sections). This imaging technique allowed us to encompass all three layers of the TM entirely. Each sample was imaged using a 10X objective on an Olympus fluorescence microscope. Given the conical shape and size of the TM, we imaged it in four quadrants, acquiring approximately 30 optical sections (with a 3 µm step) per region. Each acquired images were projected and exported using FV10ASW 4.2 Viewer, then stitched together using Photoshop. The resulting Z-stack projections enabled us to visualize the distribution of macrophages, angiogenesis, and the localization of nerve fibers throughout the TM. We will include this detailed methodology in our revision to clarify any potential confusion.

      Author response image 1.

      Representative confocal images showing one quadrant of the TM collected from collected from CSR1F<sup>EGFP</sup> bone marrow transplanted mouse at day 7 post-perforation. (A-B) 3D-rendered views from different angles reveal the close spatial relationship between CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) within the TM. (C) Cross-sectional view highlights the depth-wise distribution of CSF1R<sup>EGFP</sup> cells (green) and blood vessels (red) across the layered TM architecture. All images were processed using Imaris Viewer x64 (version 10.2.0).

      Minor concerns

      In Figure 8, the schematic illustration presents a coronal section of the TM. However, based on the data provided in the manuscript, it is unclear whether the authors directly obtained coronal images in their study. To enhance the clarity and impact of the schematic, it would be helpful to include representative images of coronal sections showing macrophage infiltration, angiogenesis, and nerve fiber distribution in the TM.

      As noted above, we utilized Z-stack confocal imaging to capture the full thickness of the TM, enabling us to visualize structures across all three layers. This approach ensured that all layers were included in our analysis. Due to the thin and curved nature of the TM, traditional cross-sectional imaging often struggles to clearly depict the spatial relationships between macrophages, blood vessels, and nerve fibers, especially at low magnification as shown in Author response image 2. In response to the reviewer's suggestion, we will include representative coronal images in the revised manuscript to better illustrate the distribution of these structures at higher magnification.

      Author response image 2.

      Confocal images of eardrum cross-sections collected at day 1 (A), 3 (B), and 7 (C) post perforation to demonstrate the wound healing processes.

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of TRPV1 signaling in the recruitment of monocyte-derived macrophages and the promotion of angiogenesis during tympanic membrane (TM) wound healing. The authors use a combination of genetic mouse models, macrophage depletion, and transcriptomic approaches to suggest that neuronal TRPV1 activity contributes to macrophage-driven vascular responses necessary for tissue repair.

      Strengths:

      (1) The topic of neuroimmune interactions in tissue regeneration is of interest and underexplored in the context of the TM, which presents a unique model due to its anatomical features.

      (2) The use of reporter mice and bone marrow chimeras allows for some dissection of immune cell origin.

      (3) The authors incorporate transcriptomic data to contextualize inflammatory and angiogenic processes during wound healing.

      We sincerely thank Reviewer #2 for their time and effort in improving our study and recognizing its strengths. Below, we are pleased to address the reviewer's concerns point by point.

      Weaknesses:

      (1) The primary claims of the manuscript are not convincingly supported by the evidence presented. Most of the data are correlative in nature, and no direct mechanistic experiments are included to establish causality between TRPV1 signaling and macrophage recruitment or function.

      We appreciate Reviewer #2's perspective on the lack of molecular mechanisms linking TRPV1 signaling and macrophages. However, our data demonstrates that TRPV1 mutations significantly affect macrophage recruitment and angiogenesis. This initial study primarily focuses on the intriguing phenomenon of how sensory nerve fibers are involved in eardrum immunity and wound healing, an area that has not been clearly reported in the literature before. We believe that further research is necessary to explore this topic in greater depth.

      (2) Functional validation of key molecular players (such as Tac1 or Spp1) is lacking, and their roles are inferred primarily from gene expression data rather than experimentally tested.

      Although we have identified the TAC1 and SPP1 signals as potentially important for TM wound healing for the first time, we agree with the Reviewer's view regarding the lack of molecular mechanisms explored in this study. We have not yet tested the downstream signaling pathways, but we plan to investigate them in a series of future studies. As this is an early report, we will continue to explore these signals and their potential clinical applications based on our initial findings moving forward.

      (3) The reuse of publicly available scRNA-seq data is not sufficiently integrated or extended to yield new biological insights, and it remains largely descriptive.

      We appreciate Reviewer #2 for highlighting this point. Leveraging publicly available scRNA-seq databases and established analysis pipelines not only saves time and resources—my lab recently collected macrophages from the eardrums of postnatal P15 mice, with each trial requiring 20 eardrums from 10 animals to obtain a sufficient number of cells—but also allows researchers to build on previous work and focus on new biological questions without the need to repeat experiments. A prior study conducted by Dr. Tward and his team utilized scRNA-seq data to make initial discoveries related to eardrum wound healing, primarily focusing on epithelial cells rather than macrophages. We are building on their raw data to uncover new biological insights regarding macrophages, even though we have not yet tested the unidentified signals, which we believe will be valuable to our peers.

      (4) The macrophage depletion model (CX3CR1CreER; iDTR) lacks specificity, and possible off-target or systemic effects are not addressed.

      We agree with reviewer #2, although macrophage depletion model used in our study is a standard and well-used animal model (Shi, Hua et al. 2018), which has been used by many other laboratories, it is important to note that any macrophage depletion model may have potential issues. We will discuss this in our revision.

      (5) Several interpretations of the data appear overstated, particularly regarding the necessity of TRPV1 for monocyte recruitment and wound healing.

      We thank the reviewer for pointing this out. We will revise our manuscript where it is overstated accordingly.

      (6) Overall, the study appears to apply known concepts - namely, TRPV1-mediated neurogenic inflammation and macrophage-driven angiogenesis - to a new anatomical site without providing new mechanistic insight or advancing the field substantially.

      Although our study may not seem highly innovative at first glance, it reveals a previously unknown role of the TRPV1 pain signaling pathway in promoting eardrum healing for the first time. This healing process includes the recruitment of monocyte-derived macrophages and the formation of new blood vessels (angiogenesis). While this process has been documented in other organs, most research on macrophage-driven angiogenesis has been conducted using in vitro models, with very few studies demonstrating this process in vivo. Our findings could lead to new translational opportunities, especially considering that tympanic membrane perforation, along with damage-induced otitis media and conductive hearing loss, are common clinical issues affecting millions of people worldwide. Targeting TRPV1 signaling could enhance tympanic membrane immunity, improve blood circulation, promote the repair of damaged tympanic membranes, and ultimately prevent middle ear infections—an idea that has not been previously proposed.

      Overall:

      While the study addresses an interesting topic, the current version does not provide sufficiently strong or novel evidence to support its major conclusions. Additional mechanistic experiments and more rigorous validation would be necessary to substantiate the proposed model and clarify the relevance of the findings beyond this specific tissue context.

      We greatly thank the two reviewers for their helpful critiques to improve our study. We especially thank the Section Editors for their insightful and constructive comments on this initial study.

      References:

      Shi, J., L. Hua, D. Harmer, P. Li and G. Ren (2018). "Cre Driver Mice Targeting Macrophages." Methods Mol Biol 1784: 263-275.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      We test whether changing the ratio between the state and control weight matrices can generate the observed effect. As shown in Author response image 1 and Author response image 2, the cost function change cannot produce a reduced peak velocity/acceleration and their timing advance simultaneously, but a mass estimation change can. In other words, using mass underestimation alone can explain the two key findings, amplitude reduction and timing advance. Yes, we cannot exclude the possibility of a change in cost function on top of the mass underestimation, but the principle of Occam’s Razor would support to adhering to a simple explanation, i.e., using body mass underestimation to explain the key findings. We will include our exploration on possible changes in cost function in the revision (in the Supplemental Materials).

      Author response image 1.

      Simulation using an altered cost function with α = 3.0. Panels A, B, and E show simulated position, velocity, and acceleration profiles, respectively, for the three movement directions. Solid lines correspond to pre- and post-exposure conditions, while dashed lines represent the in-flight condition. Panels C and D display the peak velocity and its timing across the three phases (Pre, In, Post), and Panels F and G show the corresponding peak acceleration and its timing. Note, varying the cost function, while leading to reduced peak velocity/acceleration, leads to an erroneous prediction of delayed timing of peak velocity/acceleration.

      Author response image 2.

      Simulation results using a cost function with α = 0.3. The format is the same as in Author response image 1. Note, this ten-fold decrease in α, while finally getting the timing of peak velocity/acceleration right (advanced or reduced), leads to an erroneous prediction of increased peak velocity/acceleration.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Our paper does not aim to quantitatively reproduce human reaching movements in microgravity. We will make this more clearly in the revision.

      (1) The model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques, while the actual situation is that people move their finger across a touch screen. The two-link arm model assumes planar movements, but our participants move their hand on a table top without vertical support to constrain their movement in 2D.

      (2) Our study merely uses well-established (though simplified) models to qualitatively predict the overall behavioral patterns if mass underestimation is at play. For this purpose, the results are well in line with models’ qualitative predictions: we indeed confirm that key kinematic features (peak velocity and acceleration) follow the same ranking order of movement direction conditions as predicted.

      (3) Using model simulation to qualitatively predict human behavioral patterns is a common practice in motor control studies, prominent examples including the papers on optimal feedback control (Todorov, 2004 and 2005) and movement vigor (Shadmehr et al., 2016). In fact, our model was inspired by the model in the latter paper.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      We are happy to include exemplary speed and acceleration trajectories. One example subject’s detailed trajectories are shown below and will be included in the revision. The reduced and advanced velocity/acceleration peaks are visible in typical trials.

      Author response image 3.

      Hand speed profiles (upper panels), hand acceleration profiles (middle panels) and speed profiles of the primary submovements (lower panels) towards different directions from an example participant.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response: In brief, our simulations show that Coriolis and centripetal forces, despite having some directional anisotropy, only have small effects on predicted kinematics (see our responses to Reviewer 2). We will move descriptions of the model into the main text with more justifications for using a simple model.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response: Indeed, the percentage of submovements only increases slightly, but the more important change is that the IPI (the inter-peak interval between submovements) also increases at the same time. Moreover, it is the effect of IPI that significantly predicts the duration increase in our linear mixed model. We will highlight this fact in our revision to avoid confusion.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      We agree that the effect of mass underestimation is less in the 45° direction than the other two directions, possibly related to its reliance on single-joint (elbow) as opposed to two-joints (elbow and shoulder) movements. Plus, movement correction using one joint is probably easier (as also suggested by another reviewer), this possibility will be further discussed in the revision. However, we find that our model simplification (excluding Coriolis and centripetal torques) does not affect our main conclusions at all. First, we performed a simple simulation and found that, under the current optimal hand trajectory, incorporating Coriolis and centripetal torques has only a limited impact on the resulting joint torques (see simulations in Author response image 4). One reason is that we used smaller movements than Hallerbach & Flash did. In addition, we applied an optimal feedback control model to a more realistic 2-joint arm configuration. Despite its simplicity, this model produced a speed profile consistent with our current predictions and made similar predictions regarding the effects of mass underestimation (Author response image 5). We will provide a more realistic 2-joint arm model muscle dynamics in the revision to improve the simulation further, but the message will be same: including or excluding Coriolis and centripetal torques will not affect the theoretical predictions about mass underestimation. Second, as the reviewer correctly pointed out, the mass (and its underestimation) also affects these two torque terms, thus its effect on kinematic measures is not affected much even with the full model.

      Author response image 4.

      Joint angles and joint torque of shoulder and elbow with simulated trajectories towards different directions. A. Shoulder (green) and elbow (blue) angles change with time for the 45° movement direction. B. Components of joint interaction torques at the shoulder. Solid line: net torque at the shoulder; dotted line: shoulder inertia torque; dashed line: shoulder Coriolis and centripetal torque. C. Same plot as B for the elbow joint. D–F. Coriolis and centripetal components in the full 360° workspace, beyond three movement directions (45°, 90°, and 135°). D. Net torque. E. Inertial torque. F. Combined Coriolis and centripetal torque. Note the polar plots of Coriolis/centripetal torques (F) have a scale that is two magnitudes smaller than that of inertial torque in our simulation. All torques were simulated with the optimal movement duration. Torques were squared and integrated over each trajectory.

      Author response image 5.

      Comparison between simulation results from the full model with the addition of Coriolis/centripetal torques (left) and the simplified model (right). The position profiles (top) and the corresponding speed profiles low) are shown. Solid lines are for normal mass estimation and dashed lines for mass underestimation in microgravity. The three colors represent three movement directions (dark red: 45°, red: 90°, yellow: 135°). The full model used a 2-link arm model without realistic muscle dynamics yet (will include in the formal revision) thus the speed profile is not smooth. Importantly, the full model also predict the same effect of mass underestimation, i.e., reduced peak velocity/acceleration and their timing advance.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response: Neuromuscular deconditioning is indeed a space or microgravity effect; thanks for bringing this up as we omitted the discussion of its possible contribution in the initial submission. However, muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). The handgrip strength decreases 5% to 15% after several months (Moosavi et al., 2021); shoulder and elbow muscles atrophy, though not directly measured, was estimated to be minimal (Shen et al., 2017). The muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Coriolis/centripetal torques does not affect the putative mass effect (as shown above simulations). The reviewer suggests that their poor coordination in microgravity might contribute to slowing down + more submovements. Poor coordination is an umbrella term for any motor control problems, and it can explain any microgravity effect. The feedforward control changes caused by mass underestimation can also be viewed as poor coordination. If we limit it as the coordination of the two joints or coordinating Coriolis/centripetal torques, we should expect to see some trajectory curvature changes in microgravity. However, we further analyzed our reaching trajectories and found no sign of curvature increase in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here. We will include descriptive statistics of these new analyses in our revision.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Moosavi, D., Wolovsky, D., Depompeis, A., Uher, D., Lennington, D., Bodden, R., & Garber, C. E. (2021). The effects of spaceflight microgravity on the musculoskeletal system of humans and animals, with an emphasis on exercise as a countermeasure: A systematic scoping review. Physiological Research, 70(2), 119.

      Shen, H., Lim, C., Schwartz, A. G., Andreev-Andrievskiy, A., Deymier, A. C., & Thomopoulos, S. (2017). Effects of spaceflight on the muscles of the murine shoulder. The FASEB Journal, 31(12), 5466.

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      We appreciate these very helpful suggestions about our model presentation. Indeed, our initial submission did not give detailed model descriptions in the main text, due to text limits for early submissions. We actually used a finite-horizon framework throughout, with a pre-specified duration derived from the utility model. In the revision, we will make that point clear, and we will also revise the Methods section to explicitly distinguish feedforward vs. feedback components, clarify the use of mass underestimation in both utility and control models, and update the equations accordingly.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the Control Center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. In fact, all the pre-, in-, and post-flight test sessions were administered by the same group of experimenters with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. We will include these experimental details and the rationales for emphasizing fast movements in the revision.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      We believe the differences between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. One of the most salient differences is that their adaptation is about incorporating microgravity in control for minimizing effort, while our adaptation is about rightfully perceiving body mass. We will elaborate on possible reasons for the lack of learning in the light of this previous study.

      We can elaborate on “sensory bias” and “fundamental constraint of the sensorimotor system”. If an inertial change is perceived (like an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching in tens of trials. In this case, sensory cues are veridical as they correctly inform about the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from correctly adapt. Our statement was too brief in the initial submission; we will expand it in the revision.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting and parameter estimation; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic properties (Figure 2 and Figure 3 as questioned) follow the same ranking order of directions as predicted.

      We thank the reviewer for pointing out the apparent discrepancy between model simulation and observed data. We will elaborate on the reasons behind the discrepancy in the revision.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1 and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10-15N). Thus, friction anisotropy is unlikely to explain our data.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We would argue shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study from Zhu and colleagues, a clear role for MED26 in mouse and human erythropoiesis is demonstrated that is also mapped to amino acids 88-480 of the human protein. The authors also show the unique expression of MED26 in later-stage erythropoiesis and propose transcriptional pausing and condensate formation mechanisms for MED26's role in promoting erythropoiesis. Despite the author's introductory claim that many questions regarding Pol II pausing in mammalian development remain unanswered, the importance of transcriptional pausing in erythropoiesis has actually already been demonstrated (Martell-Smart, et al. 2023, PMID: 37586368, which the authors notably did not cite in this manuscript). Here, the novelty and strength of this study is MED26 and its unique expression kinetics during erythroid development.

      Strengths:

      The widespread characterization of kinetics of mediator complex component expression throughout the erythropoietic timeline is excellent and shows the interesting divergence of MED26 expression pattern from many other mediator complex components. The genetic evidence in conditional knockout mice for erythropoiesis requiring MED26 is outstanding. These are completely new models from the investigators and are an impressive amount of work to have both EpoR-driven deletion and inducible deletion. The effect on red cell number is strong in both. The genetic over-expression experiments are also quite impressive, especially the investigators' structure-function mapping in primary cells. Overall the data is quite convincing regarding the genetic requirement for MED26. The authors should be commended for demonstrating this in multiple rigorous ways.

      Thank you for your positive feedback.

      Weaknesses:

      (1) The authors state that MED26 was nominated for study based on RNA-seq analysis of a prior published dataset. They do not however display any of that RNA-seq analysis with regards to Mediator complex subunits. While they do a good job showing protein-level analysis during erythropoiesis for several subunits, the RNA-seq analysis would allow them to show the developmental expression dynamics of all subunit members.

      Thank you for this helpful suggestion. While we did not originally nominate MED26 based on RNA-seq analysis, we have analyzed the transcript levels of Mediator complex subunits in our RNA-seq data across different stages of erythroid differentiation (Author response image 1). The results indicate that most Mediator subunits, including MED26, display decreased RNA expression over the course of differentiation, with the exception of MED25, as reported previously (Pope et al., Mol Cell Biol 2013. PMID: 23459945).

      Notably, our study is based on initial observations at the protein level, where we found that, unlike most other Mediator subunits that are downregulated during erythropoiesis, MED26 remains relatively abundant. Protein expression levels more directly reflect the combined influences of transcription, translation and degradation processes within cells, and are likely more closely related to biological functions in this context. It is possible that post-transcriptional regulation (such as m6A-mediated improvement of translational efficiency) or post-translational modifications (like escape from ubiquitination) could contribute to the sustained levels of MED26 protein, and this will be an interesting direction for future investigation.

      Author response image 1.

      Relative RNA expression of Mediator complex subunits during erythropoiesis in human CD34+ erythroid cultures. Different differentiation stages from HSPCs to late erythroblasts were identified using CD71 and CD235a markers, progressing sequentially as CD71-CD235a-, CD71+CD235a-, CD71+CD235a+, and CD71-CD235a+. Expression levels were presented as TPM (transcripts per million).

      (2) The authors use an EpoR Cre for red cell-specific MED26 deletion. However, other studies have now shown that the EpoR Cre can also lead to recombination in the macrophage lineage, which clouds some of the in vivo conclusions for erythroid specificity. That being said, the in vitro erythropoiesis experiments here are convincing that there is a major erythroid-intrinsic effect.

      Thank you for this insightful comment. We recognize that EpoR-Cre can drive recombination in both erythroid and macrophage lineages (Zhang et al., Blood 2021, PMID: 34098576). However, EpoR-Cre remains the most widely used Cre for studying erythroid lineage effects in the hematopoietic community. Numerous studies have employed EpoR-Cre for erythroid-specific gene knockout models (Pang et al, Mol Cell Biol 2021, PMID: 22566683; Santana-Codina et al., Haematologica 2019, PMID: 30630985; Xu et al., Science 2013, PMID: 21998251.).

      While a GYPA (CD235a)-Cre model with erythroid specificity has recently been developed (https://www.sciencedirect.com/science/article/pii/S0006497121029074), it has not yet been officially published. We look forward to utilizing the GYPA-Cre model for future studies. As you noted, our in vivo mouse model and primary human CD34+ erythroid differentiation system both demonstrate that MED26 is essential for erythropoiesis, suggesting that the regulatory effects of MED26 in our study are predominantly erythroid-intrinsic.

      (3) Te donor chimerism assessment of mice transplanted with MED26 knockout cells is a bit troubling. First, there are no staining controls shown and the full gating strategy is not shown. Furthermore, the authors use the CD45.1/CD45.2 system to differentiate between donor and recipient cells in erythroblasts. However, CD45 is not expressed from the CD235a+ stage of erythropoiesis onwards, so it is unclear how the authors are detecting essentially zero CD45-negative cells in the erythroblast compartment. This is quite odd and raises questions about the results. That being said, the red cell indices in the mice are the much more convincing data.

      Thank you for your careful and thorough feedback. We have now included negative staining controls (Author response image 2A, top). We agree that CD45 is typically not expressed in erythroid precursors in normal development. Prior studies have characterized BFU-E and CFU-E stages as c-Kit+CD45+Ter119−CD71low and c-Kit+CD45−Ter119−CD71high cells in fetal liver (Katiyar et al, Cells 2023, PMID: 37174702).

      However, our observations indicate that erythroid surface markers differ during hematopoiesis reconstitution following bone marrow transplantation.  We found that nearly all nucleated erythroid progenitors/precursors (Ter119+Hoechst+) express CD45 after hematopoiesis reconstitution (Author response image 2A, bottom).

      To validate our assay, we performed next-generation sequencing by first mixing mouse CD45.1 and CD45.2 total bone marrow cells at a 1:2 ratio. We then isolated nucleated erythroid progenitors/precursors (Ter119+Hoechst+) by FACS and sequenced the CD45 gene locus by targeted sequencing. The resulting CD45 allele distribution matched our initial mixing ratio, confirming the accuracy of our approach (Author response image 2B).

      Moreover, a recent study supports that reconstituted erythroid progenitors can indeed be distinguished by CD45 expression following bone marrow transplantation (He et al., Nature Aging 2024, PMID: 38632351. Extended Data Fig. 8). 

      In conclusion, our data indicate that newly formed erythroid progenitors/precursors post-transplant express CD45, enabling us to identify nucleated erythroid progenitors/precursors by Ter119+Hoechst+ and determine their origin using CD45.1 and CD45.2 markers.

      Author response image 2.

      Representative flow cytometry gating strategy of erythroid chimerism following mouse bone marrow transplantation. A. Gating strategy used in the erythroid chimerism assay. B. Targeted sequencing result of Ter119+Hoechst+ cells isolated by FACS. The cell sample was pre-mixed with 1/3 CD45.2 and 2/3 CD45.1 bone marrow cells. Ptprc is the gene locus for CD45.

      (4) The authors make heavy use of defining "erythroid gene" sets and "non-erythroid gene" sets, but it is unclear what those lists of genes actually are. This makes it hard to assess any claims made about erythroid and non-erythroid genes.

      Thank you for this helpful suggestion. We defined "erythroid genes" and "non-erythroid genes" based on RNA-seq data from Ludwig et al. (Cell Reports 2019. PMID: 31189107. Figure 2 and Table S1). Genes downregulated from stages k1 to k5 are classified as “non-erythroid genes,” while genes upregulated from stages k6 to k7 are classified as “erythroid genes.” We will add this description in the revised manuscript.

      (5) Overall the data regarding condensate formation is difficult to interpret and is the weakest part of this paper. It is also unclear how studies of in vitro condensate formation or studies in 293T or K562 cells can truly relate to highly specialized erythroid biology. This does not detract from the major findings regarding genetic requirements of MED26 in erythropoiesis.

      Thank you for the rigorous feedback. Assessing the condensate properties of MED26 protein in primary CD34+ erythroid cells or mouse models is indeed challenging. As is common in many condensate studies, we used in vitro assays and cellular assays in HEK293T and K562 cells to examine the biophysical properties (Figure S7), condensation formation capacity (Figure 5C and Figure S7C), key phase-separation regions of MED26 protein (Figure S6), and recruitment of pausing factors (Figure 6A-B) in live cells. We then conducted functional assays to demonstrate that the phase-separation region of MED26 can promote erythroid differentiation similarly to the full-length protein in the CD34+ system and K562 cells (Figure 5A). Specifically, overexpressing the MED26 phase-separation domain accelerates erythropoiesis in primary human erythroid culture, while deleting the Intrinsically Disordered Region (IDR) impairs MED26’s ability to form condensates and recruit PAF1 in K562 cells.

      In summary, we used HEK293T cells to study the biochemical and biophysical properties of MED26, and the primary CD34+ differentiation system to examine its developmental roles. Our findings support the conclusion that MED26-associated condensate formation promotes erythropoiesis.

      (6) For many figures, there are some panels where conclusions are drawn, but no statistical quantification of whether a difference is significant or not.

      Thank you for your thorough feedback. We have checked all figures for statistical quantification and added the relevant statistical analysis methods to the corresponding figure legends (Figure 2L and Figure S4C) to clarify the significance of the observed differences. The updated information will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al describes a novel role for MED26, a subunit of the Mediator complex, in erythroid development. The authors have discovered that MED26 promotes transcriptional pausing of RNA Pol II, by recruiting pausing-related factors.

      Strengths:

      This is a well-executed study. The authors have employed a range of cutting-edge and appropriate techniques to generate their data, including: CUT&Tag to profile chromatin changes and mediator complex distribution; nuclear run-on sequencing (PRO-seq) to study Pol II dynamics; knockout mice to determine the phenotype of MED26 perturbation in vivo; an ex vivo erythroid differentiation system to perform additional, important, biochemical and perturbation experiments; immunoprecipitation mass spectrometry (IP-MS); and the "optoDroplet" assay to study phase-separation and molecular condensates.

      This is a real highlight of the study. The authors have managed to generate a comprehensive picture by employing these multiple techniques. In doing so, they have also managed to provide greater molecular insight into the workings of the MEDIATOR complex, an important multi-protein complex that plays an important role in a range of biological contexts. The insights the authors have uncovered for different subunits in erythropoiesis will very likely have ramifications in many other settings, in both healthy biology and disease contexts.

      Thank you for your thoughtful summary and encouraging feedback.

      Weaknesses:

      There are almost no discernible weaknesses in the techniques used, nor the interpretation of the data. The IP-MS data was generated in HEK293 cells when it could have been performed in the human CD34+ HSPC system that they employed to generate a number of the other data. This would have been a more natural setting and would have enabled a more like-for-like comparison with the other data.

      Thank you for your positive feedback and insightful suggestions. We will perform validation of the immunoprecipitation results in CD34+ derived erythroid cells to further confirm our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to explore whether other subunits besides MED1 exert specific functions during the process of terminal erythropoiesis with global gene repression, and finally they demonstrated that MED26-enriched condensates drive erythropoiesis through modulating transcription pausing.

      Strengths:

      Through both in vitro and in vivo models, the authors showed that while MED1 and MED26 co-occupy a plethora of genes important for cell survival and proliferation at the HSPC stage, MED26 preferentially marks erythroid genes and recruits pausing-related factors for cell fate specification. Gradually, MED26 becomes the dominant factor in shaping the composition of transcription condensates and transforms the chromatin towards a repressive yet permissive state, achieving global transcription repression in erythropoiesis.

      Thank you for your positive summary and feedback.

      Weaknesses:

      In the in vitro model, the author only used CD34+ cell-derived erythropoiesis as the validation, which is relatively simple, and more in vitro erythropoiesis models need to be used to strengthen the conclusion.

      Thank you for your thoughtful suggestions. We have shown that MED26 promotes erythropoiesis using the primary human CD34+ differentiation system (Figure 2 K-M and Figure S4) and have demonstrated its essential role in erythropoiesis through multiple mouse models (Figure 2A-G and Figure S1-3). Together, these in vitro and in vivo results support our conclusion that MED26 regulates erythropoiesis. However, we are open to further validating our findings with additional in vitro erythropoiesis models, such as iPSC or HUDEP erythroid differentiation systems.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Author response image 1.

      Population genetic structure of the Eurasian Tree Sparrow (Passer montanus). The genetic structure generated using FRAPPE. The colors in each column represent the contribution from each subcluster (Qu et al. 2020). Yellow, highland population; blue, lowland population.

      Q4. Impact of the work There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598. Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      Author response image 2.

      The median expression levels of the conserved genes (i.e., coefficient of variance ≤ 0.3 and average TPM ≥ 1 for each sample) did not differ among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05).

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reinforcing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      Author response image 3.

      Figure 2a (left) and Figure 2b (right). Frequencies of genes with reinforcement and reversion plasticity (>50%) and their subsets that acquire strong support in the parametric bootstrap analyses (≥ 950/1000).

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      Author response image 4.

      (A) and Figure S4 (B). Frequencies of genes with reinforcement and reversion plasticity in the flight and cardiac muscle. (A) For genes identified by WGCNA, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for both the flight and cardiac msucles. (B) For genes that associated with muscle phentoypes, all comparisons show that there are more genes showing reversion plasticity than those showing reinforcement plasticity for the flight muscle, while more than 50% of comparisons support an excess of genes with reversion plasticity for the cardiac muscle. Two-tailed binomial test, NS, non-significant; , P < 0.05; , P < 0.01; **, P < 0.001.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

    1. eLife Assessment

      This study presents a fundamental discovery of how cerebellar climbing fibers modulate plastic changes in the somatosensory cortex by identifying both the responsible cortical circuit and the anatomical pathways. The evidence supporting the conclusions is convincing and well supported by modern neuroscience methodologies. Overall, this work represents a significant contribution that will be of broad interest to neuroscientists, especially those studying the long-distance cerebellar influence on non-motor brain functions.

    2. Reviewer #1 (Public review):

      Summary:

      Silbaugh, Koster, and Hansel investigated how the cerebellar climbing fiber (CF) signals influence neuronal activity and plasticity in mouse primary somatosensory (S1) cortex. They found that optogenetic activation of CFs in the cerebellum modulates responses of cortical neurons to whisker stimulation in a cell-type-specific manner and suppresses potentiation of layer 2/3 pyramidal neurons induced by repeated whisker stimulation. This suppression of plasticity by CF activation is mediated through modulation of VIP- and SST-positive interneurons. Using transsynaptic tracing and chemogenetic approaches, the authors identified a pathway from the cerebellum through the zona incerta and the thalamic posterior medial (POm) nucleus to the S1 cortex, which underlies this functional modulation.

      Strengths:

      This study employed a combination of modern neuroscientific techniques, including two-photon imaging, opto- and chemo-genetic approaches, and transsynaptic tracing. The experiments were thoroughly conducted, and the results were clearly and systematically described. The interplay between the cerebellum and other brain regions - and its functional implications - is one of the major topics in this field. This study provides solid evidence for an instructive role of the cerebellum in experience-dependent plasticity in the S1 cortex.

      Weaknesses:

      There may be some methodological limitations, and the physiological relevance of the CF-induced plasticity modulation in the S1 cortex remains unclear. In particular, it has not been elucidated how CF activity influences the firing patterns of downstream neurons along the pathway to the S1 cortex during stimulation.

      (1) Optogenetic stimulation may have activated a large population of CFs synchronously, potentially leading to strong suppression followed by massive activation in numerous cerebellar nuclear (CN) neurons. Given that there is no quantitative estimation of the stimulated area or number of activated CFs, observed effects are difficult to interpret directly. The authors should at least provide the basic stimulation parameters (coordinates of stim location, power density, spot size, estimated number of Purkinje cells included, etc.).

      (2) There are CF collaterals directly innervating CN (PMID:10982464). Therefore, antidromic spikes induced by optogenetic stimulation may directly activate CN neurons. On the other hand, a previous study reported that CN neurons exhibit only weak responses to CF collateral inputs (PMID: 27047344). The authors should discuss these possibilities and the potential influence of CF collaterals on the interpretation of the results.

      (3) The rationale behind the plasticity induction protocol for RWS+CF (50 ms light pulses at 1 Hz during 5 min of RWS, with a 45 ms delay relative to the onset of whisker stimulation) is unclear.

      a) The authors state that 1 Hz was chosen to match the spontaneous CF firing rate (line 107); however, they also introduced a delay to mimic the CF response to whisker stimulation (line 108). This is confusing, and requires further clarification, specifically, whether the protocol was designed to reproduce spontaneous or sensory-evoked CF activity.

      b) Was the timing of delivering light pulses constant or random? Given the stochastic nature of CF firing, randomly timed light pulses with an average rate of 1Hz would be more physiologically relevant. At the very least, the authors should provide a clear explanation of how the stimulation timing was implemented.

      (4) CF activation modulates inhibitory interneurons in the S1 cortex (Figure 2): responses of interneurons in S1 to whisker stimulation were enhanced upon CF coactivation (Figure 2C), and these neurons were predominantly SST- and PV-positive interneurons (Figure 2H, I). In contrast, VIP-positive neurons were suppressed only in the late time window of 650-850 ms (Figure 2G). If the authors' hypothesis-that the activity of VIP neurons regulates SST- and PV-neuron activity during RWS+CF-is correct, then the activity of SST- and PV-neurons should also be increased during this late time window. The authors should clarify whether such temporal dynamics were observed or could be inferred from their data.

      (5) Transsynaptic tracing from CN nicely identified zona incerta (ZI) neurons and their axon terminals in both POm and S1 (Figure 6 and Figure S7).

      a) Which part of the CN (medial, interposed, or lateral) is involved in this pathway is unclear.

      b) Were the electrophysiological properties of these ZI neurons consistent with those of PV neurons?

      c) There appears to be a considerable number of axons of these ZI neurons projecting to the S1 cortex (Figure S7C). Would it be possible to estimate the relative density of axons projecting to the POm versus those projecting to S1? In addition, the authors should discuss the potential functional role of this direct pathway from the ZI to the S1 cortex.

    3. Reviewer #2 (Public review):

      Summary:

      The authors examined long-distance influence of climbing fiber (CF) signaling in the somatosensory cortex by manipulating whiskers through stimulation. Also, they examined CF signaling using two-photon imaging and mapped projections from the cerebellum to the somatosensory cortex using transsynaptic tracing. As a final manipulation, they used chemogenetics to perturb parvalbumin-positive neurons in the zona incerta and recorded from climbing fibers.

      Strengths:

      There are several strengths to this paper. The recordings were carefully performed, and AAVs used were selective and specific for the cell types and pathways being analyzed. In addition, the authors used multiple approaches that support climbing fiber pathways to distal regions of the brain. This work will impact the field and describes nice methods to target difficult-to-reach brain regions, such as the inferior olive.

      Weaknesses:

      There are some details in the methods that could be explained further. The discussion was very short and could connect the findings in a broader way.

    4. Reviewer #3 (Public review):

      Summary:

      The authors developed an interesting novel paradigm to probe the effects of cerebellar climbing fiber activation on short-term adaptation of somatosensory neocortical activity during repetitive whisker stimulation. Normally, RWS potentiated whisker responses in pyramidal cells and weakly suppressed them in interneurons, lasting for at least 1h. Crusii Optogenetic climbing fiber activation during RWS reduced or inverted these adaptive changes. This effect was generally mimicked or blocked with chemogenetic SST or VIP activation/suppression as predicted based on their "sign" in the circuit.

      Strengths:

      The central finding about CF modulation of S1 response adaptation is interesting, important, and convincing, and provides a jumping-off point for the field to start to think carefully about cerebellar modulation of neocortical plasticity.

      Weaknesses:

      The SST and VIP results appeared slightly weaker statistically, but I do not personally think this detracts from the importance of the initial finding (if there are multiple underlying mechanisms, modulating one may reproduce only a fraction of the effect size). I found the suggestion that zona incerta may be responsible for the cerebellar effects on S1 to be a more speculative result (it is not so easy with existing technology to effectively modulate this type of polysynaptic pathway), but this may be an interesting topic for the authors to follow up on in more detail in the future.

    1. eLife Assessment

      This valuable manuscript presents findings supported by solid data to identify a surprising glia-exclusive function for betapix in vascular integrity and angiogenesis. The manuscript also describes the optimisation of a modified CRISPR-based Zwitch approach to generate conditional knockouts in zebrafish

    2. Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural or endothelial). Using RNA-in situ hybridisation and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.

      However, the study has the following major weaknesses:

      (1) The lack of glia-specific rescue of betapix in the global KOs/mutants prevents the study from making a compelling case for the unexpected glial-specific function in vascular development and stability.

      (2) Given the known splice-isoform specific function of betapix in haemorrhaging (Liu et al, 2007), at least an expression profile of the isoforms in glia at the relevant timepoints would have further underscored betapix function.

      (3) Direct evidence of the status of endothelial cell proliferation/survival deficits, if any, in the glial betapix KOs would have provided a key mechanistic handle. It becomes all the more relevant as Liu et al, 2012 have demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural, or endothelial). Using RNA-in situ hybridization and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.

      While the text is well written, it often elides details of experiments and relies on implicit understanding on the part of the reader. Similarly, the figure legends are laconic and often fail to provide all the relevant details.

      Thanks for this reviewer on his/her overall supports on our manuscript. We have now revised the manuscript text and figure legends making them to have all relevant details as much as we can. 

      Specific comments:

      (1) While the evidence from cKO's implicating glial betapix in vascular stability/angiogenesis is exciting, glia-specific rescue of betapix in the global KOs/mutants (like those performed for stathmin) would be necessary to make a water-tight case for glial betapix.

      We fully agree with the reviewer that it would be ideal to examine glia-specific rescue of betaPix in its global KOs. At the same time, it is difficult to achieve optimal transient expression of betaPix by injecting plasmid clone of gfap:betaPix while it takes long time to establish stable transgenic line gfap:betaPix for rescuing mutant phenotypes. We would like to pursue this line of researches in the future.

      (2) Splice variants of betapix have been shown to have differential roles in haemorrhaging (Liu, 2007). What are the major glial isoforms, and are there specific splice variants in the glial that contribute to the phenotypes described?

      We agree that it would be important to address whether any specific splice variants in glia contribute to betaPix mutant phenotypes. Previous studies have shown that the isoform a of betaPix is ubiquitously expressed across various tissues, while isoforms b, c, and d are predominantly expressed in the nervous system. In mice, the expression level of isoform betaPix-d is essential for the neurite outgrowth and migration. In the nervous system, we have not assessed glial specific betaPix isoforms directly. Our current data cannot rule out whether specific isoform is involved in its function in glial responses. The Zwitch cassette of betaPix resides on intron 5, thus disrupting all transcripts when Cre is activated. However, we are fully aware of the potential of identifying glial betaPix isoform with direct downstream targets. Further studies to dissect their roles in cerebral vascular development and diseases are part of our future plans.

      (3) Liu et al, 2012 demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis. Are there proliferation/survival defects in endothelial cells in the glial KOs?

      We thank the reviewer for highlighting endothelial cell phenotypes in betaPix mutants. We are aware of endothelial cells might directly link to the mutant defects in angiogenesis. We assessed and quantified endothelial migration by measuring the length of developing central arteries, but we did not examine endothelial cell proliferation/survival defects in glial KOs. In our scRNA-seq analysis, the proportion of endothelial cells reduced among betaPix deficiency, indicating that endothelial cell proliferation/survival might decrease in mutants. In this endothelial cell cluster, we found disrupted transcriptional landscape in a set of angiogenic associated genes (Figure 6M). While these analysis highlights altered angiogenic transcriptome profile in endothelial cells of betaPix knockouts, we acknowledge that our study does not directly address proliferation/survival phenotypes in endothelial cells, which warrants future investigations on the role of betaPix in regulating glia-endothelial cell interaction.  

      Reviewer #2 (Public review):

      Summary:

      Using a genetic model of beta-pix conditional trap, the authors are able to regulate the spatio-temporal depletion of beta-pix, a gene with an established role in maintaining vascular integrity (shown elsewhere). This study provides strong in vivo evidence that glial beta-pix is essential to the development of the blood-brain barrier and maintaining vascular integrity. Using genetic and biochemical approaches, the authors show that PAK1 and Stathmins are in the same signaling axis as beta-pix, and act downstream to it, potentially regulating cytoskeletal remodeling and controlling glial migration. How exactly the glial-specific (beta-pix driven-) signaling influences angiogenesis or vascular integrity is not clear.

      Strengths:

      (1) Developing a conditional gene-trap genetic model which allows for tracking knockin reporter driven by endogenous promoter, plus allowing for knocking down genes. This genetic model enabled the authors to address the relevant scientific questions they were interested in, i.e., a) track expression of beta-pix gene, b) deletion of beta-pix gene in a cell-specific manner.

      (2) The study reveals the glial-specific role of beta-pix, which was unknown earlier. This opens up avenues for further research. (For instance, how do such (multiple) cell-specific signaling converge onto endothelial cells which build the central artery and maintain the blood-brain barriers?)

      We thank this reviewer for his/her overall supports on our work.

      Weaknesses:

      Major:

      (1) The study clearly establishes a role of beta-pix in glial cells, which regulates the length of the central artery and keeps the hemorrhages under control. Nevertheless, it is not clear how this is accomplished.

      (a) Is this phenotype (hemorrhage) a result of the direct interaction of glial cells and the adjacent endothelial cells? If direct, is the communication established through junctions or through secreted molecules?

      Thanks for this critical question. We attempted to address this issue by performing live imaging using light-sheet confocal microscopy, but failed to achieve sub-cellular resolution. We don’t have data to address this critical issue that warrants future investigations. 

      (b) The authors do not exclude the possibility that the effects observed on endothelial cells (quantified as length of central artery) could be secondary to the phenotype observed with deletion of glial beta-pix. For instance, can glial beta-pix regulate angiogenic factors secreted by peri-vascular cells, which consequently regulate the length of the central artery or vascular integrity?

      Thank the reviewer for this critical point. While we found the major defects of endothelial cell migration quantified by the central artery length, could not rule out the participation of signals from other peri-vascular cells. We fully agree that it will be important to address the cell-type specific relationship by angiogenic factors. Of note, degradation of extracellular matrix and focal adhesion is critical for the hemorrhagic phenotypes of bbh mutants. In a previous published study in our group, we found that suppressing the globally induced MEK/ERK/MMP9 signaling in bbh mutants significantly decreases hemorrhages. Accordingly, we edited a paragraph in the Discussion section on pages 24-25. We plan to continue investigating whether the complex interactions in the perivascular space contribute to vascular integrity disruption, as well as the cross-talks among different cell types during vascular development in these mutants. We believe that our model of glial specific betaPix function will guide us to further study cellular interactions in the follow-up studies.

      (c) The pictorial summary of the findings (Figure 7) does not include Zfhx or Vegfa. The data do not provide clarity on how these molecules contribute (directly or indirectly) to endothelial cell integrity. Vegfaa is expressed in the central artery, but the expression of the receptor in these endothelial cells is not shown. Similarly, all other experimental analyses for Zfhx and Vegfa expression were performed in glial cells. More experimental evidence is necessary to show the regulation of angiogenesis (of endothelial cells) by glial beta-pix. Is the Vegfaa receptor present on central arteries, and how does glial depletion of beta-pix affect its expression or response of central artery endothelial cells (both pertaining to angiogenesis and vascular integrity).

      Thank this reviewer for pointing out this critical issue. We have now revised the pictorial summary including Zfhx or Vegfa information in Figure 7. The key receptors of VEGF-A ligand are VEGFR-1 and VEGFR-2. In zebrafish, expression of Vegfr-2, as known as kdrl, is well-documented at endothelial cells including the hindbrain central arteries. We fully agree that it would indeed be of great value to assess changes of kdrl expression pattern after betaPix deficiency in vivo. It warrants future investigations to address how the VEGFA-VEGFR2 signaling in endothelial cells is altered in betaPix mutants.

      (2) Microtubule stabilization via glial beta-pix, claimed in Figure 5M, is unclear. Magnified images for h-betapix OE and h-stmn-1 glial cells are absent. Is this migration regulated by beta-pix through its GEF activity for Cdc42/Rac?

      We have now revised Figure 5M to include magnified images for h-betaPIX and h-STMN1 overexpression groups. It has been shown that there is a positive feedback loop of microtubule regulation consisting of Rac1-Pak1-Stathmin at the cell edge (Zeitz and Kierfeld, 2014 Biophys J.). Previous studies have shown betaPix activates Rac1 through its GEF activity and also regulates the activity of Pak1 via direct binding. As reported by Kwon et al., betaPix-d isoform promotes neurite outgrowth via the PAK-dependent inactivation of Stathmin1. In this work, we did not assess binding activity of betaPix to Rac1 or Pak1. Nevertheless, our data on the rescue experiments via IPA-3 suggest that betaPix deficiency impaired migration through Pak1 signaling. 

      (3) Hemorrhages are caused by compromised vascular integrity, which was not measured (either qualitatively or quantitatively) throughout the manuscript. The authors do measure the length of the central artery in several gene deletion models (2I, 3C. 5F/J, 6G/K), which is indicative of artery growth/ angiogenesis. How (if at all) defects in angiogenesis are an indication of hemorrhage should be explained or established. Do these angiogenic growth defects translate into junctional defects at later developmental time points? Formation and maintenance of endothelial cell junctions within the hemorrhaging arteries should be assessed in fish with deleted beta-pix from astrocytes.

      We appreciate the reviewer’s point and agree that this is a key aspect we need to clarify. To address junctional defects in our model, we re-examined the scRNA-seq data and found mild downregulation of junction protein claudin-5a (cldn5a) levels in the transcriptome analysis of the endothelial cluster (Author response image 1). We agree in principle that single cell RNA sequencing findings should be validated by immunostaining. While we did not measure junctional defects directly in this work, we have previously reported comparable tight junction protein zonula occludens-1 (ZO1) expression between siblings and bbh mutants (Yang et al., 2017 Dis Model Mech). In zebrafish, functionally characterized blood brain barrier (BBB) is only identified after 3 dpf. The lack of mature BBB might be due to the immature status of barrier signature at this developmental stage. Hemorrhage phenotype occurred around 40 hpf, and hematomas would be almost completely absorbed at later stage since most mutants recover and survive to adulthood. Thus future studies are needed to address the junctional characteristics on the cellular and molecular level in later developmental stages of betaPix mutants.   

      Author response image 1.

      Violin plots showing cdh5, cldn5a, cldn5b and oclna expression levels in endothelial sub-cluster. ctrl, control siblings; ko, betaPix knockouts (CRISPR mutants); 1d or 2d, 1 or 2 days post fertilization.

      (4) More information is required about the quality control steps for 10X sequencing (Figure 4, number of cells, reads, etc.). What steps were taken to validate the data quality? The EC groups, 1 and 2-days post-KO are not visible in 4C. One appreciates that the progenitor group is affected the most 2 days post-KO. But since the effects are expected to be on the endothelial cell group as well (which is shown in in vivo data), an extensive analysis should be done on the EC group (like markers for junctional integrity, angiogenesis, mesenchymal interaction, etc.). Are Stathmins limited to glial cells? Are there indicators for angiogenic responses in endothelial cells?

      Thank the reviewer for these critical suggestions. The detailed statements about the quality control steps for 10X sequencing are now provided in the Materials and Methods section. We validate the data quality through multiple steps, including verification of the number of viable cells used in experiment, assessment of peak shapes and fragment sizes of scRNA-seq libraries, confirmation of sufficient cell counts and sequencing reads for data analyses, and implementation of stringent filtering steps to exclude low-quality cells. Stathmins expressions as shown in Violin plots in Figure 4E and stmn1a, stmn1b and stmn4l expressions in UMAP plots in Figure S6C. These expressions are not limited to glial cells but distributed more widely among zebrafish tissues. We would like to point out that despite the small amount, the endothelial cell clusters are presented in Figure 4C with color brown. The proportions of EC groups split by four sample are visualized in Figure S6B and shown significant reduction among betaPix knockouts at 2 dpf, which had similar trend as glial progenitors. In addition, gene ontology analysis identified a set of down-regulated angiogenic genes expression in endothelial cluster (Figure 6M). We realize our interpretation of endothelial cell phenotypes was not sufficiently clear in this work and have now added sentences to the manuscript text on pages 16-17. As noted above, future studies are needed to address how glial betaPix regulates endothelial cell and BBB function. 

      Reviewing Editor Comments:

      comments on your manuscript. Addressing comments 1-3 from Reviewer 1 and comment 1 and its subparts from Reviewer 2 (major weaknesses) will significantly improve the manuscript by reinforcing the cell autonomous requirement of betaPix and also gain mechanistic insights. In addition, extensive proofreading and editing of the text, as well as changes to the figure, figure legends, and the discussion as indicated by both reviewers, will improve the readability and clarity of this manuscript.

      Thanks for Reviewing Editor on his/her supports on this manuscript. As noted above, we are trying to address the reviewers’ comments using the data we obtained in this work, as well as our plans for future investigations. We have now made extensive proofreading and editing of manuscript text and figure legends for improving the readability and clarity of this manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) The Discussion is written like an introduction with very little engagement with the data generated in the manuscript. The role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa is barely discussed and contextualised vis-à-vis the current knowledge in the field.

      We appreciate the reviewer’s critical comments regarding the Discussion section. We have now revised the manuscript text on pages 20-23 to address the role of betapix-Pak-stathmin and betapix-zfhx3/4-vegfaa axis with contributions from this work.

      (2) Line 145: "light sheet microscopy" - explain that this was only for experiments involving fluorescence. Currently, it reads as if the data presented in Figures 1D and E are also obtained via light sheet microscopy. E.g., the paragraph starting on line 139 does not say what line was imaged (and what it labels) to reach the conclusions reached. This detail is not there even in the associated figure legend. Similarly, line 153 discusses radial glia, but there is no indication that these were labelled using Tg (GFAP:GFP) except in the figure annotation. There are various instances of such omissions throughout the text, and they should be remedied to indicate what each line is and what it labels, at least in the first instance.

      Thank the reviewer for their thoughtful points. In this revised version, we have incorporated more statements of the objectives and methodologies in the text in pages 8-9. We hope that the revised manuscript can better present the data with clarifying methodologies and materials used in this work. 

      (3) Figure 1E legend: What is the haemorrhage percentage? Is it the number of embryos per experiment showing hemorrhage? Indicate in the text. In the right panel, what is the number of embryos used? Please ensure all numbers (number of embryos, experiments, etc) used to plot any data in the set of figures in the entire manuscript are clearly indicated.

      Thank the reviewer for the suggestion. In this revised version, we have incorporated more detailed statements in figures and figure legends in the manuscript to show the numbers of embryos used.

      (4) The Discussion section suddenly introduces the blood-brain barrier and extensively discusses it. However, while cerebral haemorrhage can disrupt the BBB and exacerbate the effects of the haemorrhage, this manuscript does not suggest that a weakened BBB is the cause of haemorrhages in betapix mutants. More likely, betapix stabilises and maintains vascular integrity, and loss of this function causes haemorrhaging and subsequent disruption of the BBB. The glial function noted in this study is likely to be distinct from the glial function in BBB development and maintenance. The authors do not show any direct evidence for the latter. These should be shortened, and only relevant aspects facilitating contextualisation of data generated in this manuscript should be retained.

      We have now revised the Discussion section to reduce the introduction of blood-brain barrier and add statements according to the suggestions from both reviewers. We hope that the revisions provide a more relevant and balanced discussion.

      (5) Is the scratch assay in Figure 5 controlled for differences in cell proliferation among the different manipulations?

      We plated the same numbers of cells and cultured them in the same condition. Before conducted scratch assay we replaced medium with serum-free culture medium to reduce the effect from cell proliferation among the different manipulation groups. 

      (6) In the glioblastoma experiments involving betapix KD, does stathmin RNA/protein decrease? What about Ser 16 phosphorylation (as shown for neurons in Kwon et al, 2020)?

      STMN1 RNA was down-regulated by betaPIX deficiency, which was rescued by betaPIX overexpression in glial cells (Author response image 2). These results are similar to those from in vivo analysis (Figure 5A, 5B and S7A). We agree with the reviewer that it would been ideal to examine Ser 16 phosphorylation of Stathmin in our models. However, we believe that our data have established Stathmins function downstream to betaPix.

      Author response image 2.

      qRT-PCR analysis showing that betaPIX over-expression (betaPix OE) rescued STMN1 expression in betaPIX siRNA knockdown (betaPix KD) in U251 cells. Data are presented in mean ± SEM; one-way ANOVA analysis with Dunnett's test, individual P values mentioned in the figure

      (7) How was the rescue of betapix in glioblastoma cells with siRNA-mediated betapix knockdown performed? Is this by betapix-resistant cDNA? Further, no information about isoforms of betapix (both for siRNA-mediated KD and rescue) or stathmin is provided.

      As similar to our Zwitch method that disrupting all betaPix transcripts in vivo, the knockdown of human betaPIX were designed to target conserved region of all transcripts in glioblastoma cell lines. And the rescue human betaPIX were obtained from the U251 cDNA library, ideally all isoforms enriched in the glioblastoma cell line would be isolated. The missing details are now provided in the Materials and Methods section, page 26. 

      (8) It is unclear what the authors' thoughts are on the decrease in stathmin observed and the functional outcome of this decrease. The Discussion could benefit from this.

      Thanks. We have now incorporated a new paragraph in the Discussion section at pages 21-22 addressing that down-regulated expression of Stathmins is associated with functional outcome of this decrease.

      (9) Zfhx4 mRNA injection is performed on bbh and betapixKO (is this a global or glial KO?) and found to rescue haemorrhaging. While vegfaa mRNA increases, it is formally possible that the rescue is not due to the increase in vegfaa (or that vegfaa is sufficient). Injection of vegfaa mRNA could address this issue.

      Zfhx4 mRNA injection was performed on bbh mutants and global betapix knockouts (crispr mutants). To avoid confusion, we have now included a sentence highlighting global knockout mutants used for this rescue experiment. For the second part, we acknowledge that this study cannot definitively prove the necessity of increased vegfaa levels in the rescue experiment. However, our data established Zhfx3/4 as novel downstream effectors to betaPix in cerebral vessel development. And these effects might partly be linked to angiogenic responses regulated by Zhfx3/4. In this revised version, we carefully proposed that Vegfaa signals act downstream of betaPix-Zfhx3/4 axis and highlighted the weakness of our manuscript on not fully investigating sufficiency of Vegfaa in the Discussion section at page 24. We intend to pursue more extensive analysis in our follow-up studies.

      (10) A significant part of the manuscript looks at angiogenesis/vascularisation, however, the title of the paper only reflects vessel integrity (which can be distinct from angiogenesis).

      Thanks. We have now changed the title to: Glial betaPix is essential for blood vessel development in the zebrafish brain

      (11) Line 366: The BBB abbreviation is used without indicating the full form. Perhaps this can be introduced in the preceding sentence.

      We have now edited the following sentence: “The maturation hallmark of central nervous system (CNS) vasculature is acquisition of blood brain barrier (BBB) properties, establishing a stable environment ...” in lines 386-387, Discussion section.

      (12) Line 371: "rupture" and not "rapture".

      We thank the reviewer for pointing out the spelling error, and have now made this correction. 

      (13) Line 416: "is enriched" instead of "enriches"?

      We have now edited as: “...end feet that is enriched with aquaporin-4 ...” in line 411, page 19. 

      (14) The sentence in lines 121-123 should be simplified.

      We have now revised this sentence as the following: “A previous work has shown that bubblehead (bbh<sup>fn40a</sup>) mutant has a global reduction in betaPix transcripts, and bbh<sup>m292</sup> mutant has a hypomorphic mutation in betaPix, thus establishing that betaPix is responsible for bubblehead mutant phenotypes [10]”. 

      (15) No mention in the text of what o-dianisine labels.

      We have now edited the following sentence: “By using o-dianisidine staining to label hemoglobins, we found severe brain hemorrhages ...” in lines 131-133.

      (16) Line 165: Sentence requires improvement. Perhaps "Vascularisation of the central arteries in the zebrafish hindbrain ...".

      We have now edited this sentence as: “Vascularisation of the central arteries in the zebrafish hindbrain starts at 29 hpf.” in this revised version (line 176). 

      (17) Line 184: Why is "hematopoiesis" mentioned? The genesis of blood cells is not tested anywhere in the manuscript.

      Thanks. We have now edited this statement as: “IPA-3 treatment had no effect on heamorrhage induction in betaPix<sup>ct/ct</sup> control siblings.” 

      (18) Line 222-223: Improve "increasing trends". Perhaps "increased relative proportions". Clarify "progenitors" means neuronal and glial progenitors.

      We have now edited this statement: “we found that most neuronal clusters increased relative proportions ...” in this revised version.

      (19) Line 232-233: "arrow indicates" - perhaps "indicated by the arrow"? Also, the arrow indicating gfap needs to be mentioned in the Figure S6A legend. Cannot understand what is meant by "as of its enriched gfap".

      We have now edited in the text as: “Figure S6A, indicated by the arrow”, and added “Box area and arrow highlighting gfap expressions.” in Figure S6 legend. To avoid confusion, we have revised "as of its enriched gfap" sentence as the following: “We next focused on the progenitor cluster owing to the enriched gfap expression and the significantly reduced numbers of cells in this cluster by betaPix deficiency.”

      (20) Line 239 - 240: While the sentence says "... revealed three major categories:", well, more than 3 are mentioned subsequently.

      To avoid possible confusion in the text, we have now removed the sub-category examples and presented the data as: “three major categories: epigenetic remodeling, microtubule organizations and neurotransmitter secretion/transportation (Figure 4D).” 

      (21) Line 252: Stathmins negatively regulate microtubule stability. Why are they referred to as "microtubule polymerization genes stathmins"?

      We are thankful to the reviewer for pointing out this error, and we have now made correction in the text as “microtubule-destabilizing protein Stathmins”.

      (22) Line 262-265: The citation used to indicate concurrence with mouse data is disingenuous. That study did not show a reduction in stathmin levels upon betapix loss. Rather, it showed an increase in Ser16 phosphorylation on stathmin, which reduces stathmin's microtubule destabilising function. Please elaborate on the difference between the two studies.

      We completely agree with the reviewer’s statement that in the cited article, increased Ser16 phosphorylation on stathmin reduces its microtubule destabilising function. While that study did not show a reduction in Stathmin levels, others have shown that transcriptionally downregulated Stathmins are associated with the impaired neuronal and glial development. We have now revised the Discussion section by adding a new paragraph to address the disrupted homeostasis of Stathmins in these previous studies and their possible association with our data. We hope that these changes we made can clarify this issue. 

      (23) Line 310: While ZFHX3 levels are reduced in betapix mutants and KD in glioblastomas, were ZFHX3 and 4 up- or downregulated in the scRNA-Seq data?

      Thanks for this critical point. Indeed, our results showed that ZFHX3 and 4 down-regulated in the glial progenitor cluster in the scRNA-Seq data (Figure S8A) in betaPix knockouts and the FACS-sorted glia cells (Figure S8B). 

      (24) Line 317: "... betaPix acts upstream to Zfhx3/4-VEGFA signaling in regulating angiogenesis ...". While this is established later, the data at the time of this sentence does not warrant this claim.

      We agree with the reviewer’s statement and restated this sentence in the following way: “Zfhx3/4 might act as downstream effector of betaPix.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in 2E/H, 3B, 6F/J can use a schematic that helps readers to understand what to expect or look for. Splitting up the channels may also help in visualizing the vasculature clearly.

      Thank the reviewer for these suggestions. In this revised version, we have included schematic diagrams in the figures and incorporated more detailed statements in the legends.

      (2) Many times, arrows are pointing to structures (2E/H, 3B), but are not explained clearly (neither in the text nor in the legends). In 3B, the arrow is pointing to a negative space.

      (3) Legends are minimalistic and do not provide much information. The reader is left to interpret the data on their own.

      We apologize for not explaining the figures in enough details. In this revised version, we have now incorporated more detailed statements in the figure legends and have adjusted arrows in all figures.

      (4) The text needs heavy proofreading. For example:

      (a) Line 208- the title does not seem appropriate since the following text does not discuss Stathmins at all, which comes later.

      We agree with the reviewer’s statement and restated the title in the following way: “Single-cell transcriptome profiling reveals that gfap-positive progenitors were affected in betaPix knockouts.”

      (b) There is no mention of Figure 7 throughout the text.

      (c) Figure 7 does not include Zfhx or Vegfaa.

      Thank the reviewer for pointing out these errors. We have now revised Figure 7 and incorporated it to corresponding paragraphs in the Discussion section. 

      (5) The discussion seems incoherent in its current state.

      We have now revised the Discussion section according to the suggestions from both reviewers. We hope these revisions adequately address your concerns.

      (6) Please include some of the following points, if possible, in the discussion.

      (a) How is GEF activity of Rac/Cdc42 expected to be affected in beta-pix KO fishes?

      (b) What are the possible different ways the angiogenic pathways merge onto endothelial cells? Or do the authors imagine this process to be entirely driven by glial cells (directly)?

      We would like to thank the reviewer for his/her invaluable suggestions. We have now revised the Discussion section and hope that these changes can provide better and more balanced discussion. Since we have no data directly related to GEF activity of Rac/Cdc42 that might be affected in betaPix mutants, as well as have very limited data showing how glial betaPix regulates cerebral endothelial cells and BBB function, we would like to have the Discussion focused on the CRISPR-induced KI and cKO technologies, glial betaPix function and brain hemorrhage, and the putative role of betaPix-Zfhx3/4-VEGF function in central artery development. 

      References:

      Daub, H., Gevaert, K., Vandekerckhove, J., Sobel, A., and Hall, A. (2001). Rac/Cdc42 and p65PAK regulate the microtubule-destabilizing protein stathmin through phosphorylation at serine 16. J Biol Chem 276, 1677-1680. 10.1074/jbc.C000635200.

      Kim S, Park H, Kang J, Choi S, Sadra A, Huh SO. β-PIX-d, a Member of the ARHGEF7 Guanine Nucleotide Exchange Factor Family, Activates Rac1 and Induces Neuritogenesis in Primary Cortical Neurons. Exp Neurobiol. 2024;33(5):215-224. doi:10.5607/en24026

      Kwon Y, Jeon YW, Kwon M, Cho Y, Park D, Shin JE. βPix-d promotes tubulin acetylation and neurite outgrowth through a PAK/Stathmin1 signaling pathway [published correction appears in PLoS One. 2020 May 13;15(5):e0233327. doi: 10.1371/journal.pone.0233327.]. PLoS One. 2020;15(4):e0230814. Published 2020 Apr 6. doi:10.1371/journal.pone.0230814

      Kwon Y, Lee SJ, Shin YK, Choi JS, Park D, Shin JE. Loss of neuronal βPix isoforms impairs neuronal morphology in the hippocampus and causes behavioral defects. Anim Cells Syst (Seoul). 2025;29(1):57-71. Published 2025 Jan 8. doi:10.1080/19768354.2024.2448999

      Wittmann, T., Bokoch, G.M., and Waterman-Storer, C.M. (2004). Regulation of microtubule destabilizing activity of Op18/stathmin downstream of Rac1. J Biol Chem 279, 6196-6203.10.1074/jbc.M307261200.

      Zeitz, M., and Kierfeld, J. (2014). Feedback mechanism for microtubule length regulation by stathmin gradients. Biophys J 107, 2860-2871.10.1016/j.bpj.2014.10.056.

    1. eLife Assessment

      This paper addresses the significant question of quantifying epistasis patterns, which affect the predictability of evolution, by reanalyzing a recently published combinatorial deep mutational scan experiment. The findings are that epistasis is fluid, i.e. strongly background dependent, but that fitness effects of mutations are predictable based on the wild-type phenotype. However, these potentially interesting claims are inadequately supported by the analysis, because measurement noise is not accounted for, arbitrary cutoffs are used, and global nonlinearities are not sufficiently considered. If the results continue to hold after these major improvements in the analysis, they should be of interest to all biologists working in the field of fitness landscapes.

    2. Reviewer #1 (Public review):

      This paper describes a number of patterns of epistasis in a large fitness landscape dataset recently published by Papkou et al. The paper is motivated by an important goal in the field of evolutionary biology to understand the statistical structure of epistasis in protein fitness landscapes, and it capitalizes on the unique opportunities presented by this new dataset to address this problem.

      The paper reports some interesting previously unobserved patterns that may have implications for our understanding of fitness landscapes and protein evolution. In particular, Figure 5 is very intriguing. However, I have two major concerns detailed below. First, I found the paper rather descriptive (it makes little attempt to gain deeper insights into the origins of the observed patterns) and unfocused (it reports what appears to be a disjointed collection of various statistics without a clear narrative. Second, I have concerns with the statistical rigor of the work.

      (1) I think Figures 5 and 7 are the main, most interesting, and novel results of the paper. However, I don't think that the statement "Only a small fraction of mutations exhibit global epistasis" accurately describes what we see in Figure 5. To me, the most striking feature of this figure is that the effects of most mutations at all sites appear to be a mixture of three patterns. The most interesting pattern noted by the authors is of course the "strong" global epistasis, i.e., when the effect of a mutation is highly negatively correlated with the fitness of the background genotype. The second pattern is a "weak" global epistasis, where the correlation with background fitness is much weaker or non-existent. The third pattern is the vertically spread-out cluster at low-fitness backgrounds, i.e., a mutation has a wide range of mostly positive effects that are clearly not correlated with fitness. What is very interesting to me is that all background genotypes fall into these three groups with respect to almost every mutation, but the proportions of the three groups are different for different mutations. In contrast to the authors' statement, it seems to me that almost all mutations display strong global epistasis in at least a subset of backgrounds. A clear example is C>A mutation at site 3.

      1a. I think the authors ought to try to dissect these patterns and investigate them separately rather than lumping them all together and declaring that global epistasis is rare. For example, I would like to know whether those backgrounds in which mutations exhibit strong global epistasis are the same for all mutations or whether they are mutation- or perhaps position-specific. Both answers could be potentially very interesting, either pointing to some specific site-site interactions or, alternatively, suggesting that the statistical patterns are conserved despite variation in the underlying interactions.

      1b. Another rather remarkable feature of this plot is that the slopes of the strong global epistasis patterns seem to be very similar across mutations. Is this the case? Is there anything special about this slope? For example, does this slope simply reflect the fact that a given mutation becomes essentially lethal (i.e., produces the same minimal fitness) in a certain set of background genotypes?

      1c. Finally, how consistent are these patterns with some null expectations? Specifically, would one expect the same distribution of global epistasis slopes on an uncorrelated landscape? Are the pivot points unusually clustered relative to an expectation on an uncorrelated landscape?

      1d. The shapes of the DFE shown in Figure 7 are also quite interesting, particularly the bimodal nature of the DFE in high-fitness (HF) backgrounds. I think this bimodality must be a reflection of the clustering of mutation-background combinations mentioned above. I think the authors ought to draw this connection explicitly. Do all HF backgrounds have a bimodal DFE? What mutations occupy the "moving" peak?

      1e. In several figures, the authors compare the patterns for HF and low-fitness (LF) genotypes. In some cases, there are some stark differences between these two groups, most notably in the shape of the DFE (Figure 7B, C). But there is no discussion about what could underlie these differences. Why are the statistics of epistasis different for HF and LF genotypes? Can the authors at least speculate about possible reasons? Why do HF and LF genotypes have qualitatively different DFEs? I actually don't quite understand why the transition between bimodal DFE in Figure 7B and unimodal DFE in Figure 7C is so abrupt. Is there something biologically special about the threshold that separates LF and HF genotypes? My understanding was that this was just a statistical cutoff. Perhaps the authors can plot the DFEs for all backgrounds on the same plot and just draw a line that separates HF and LF backgrounds so that the reader can better see whether the DFE shape changes gradually or abruptly.

      1f. The analysis of the synonymous mutations is also interesting. However I think a few additional analyses are necessary to clarify what is happening here. I would like to know the extent to which synonymous mutations are more often neutral compared to non-synonymous ones. Then, synonymous pairs interact in the same way as non-synonymous pair (i.e., plot Figure 1 for synonymous pairs)? Do synonymous or non-synonymous mutations that are neutral exhibit less epistasis than non-neutral ones? Finally, do non-synonymous mutations alter epistasis among other mutations more often than synonymous mutations do? What about synonymous-neutral versus synonymous-non-neutral. Basically, I'd like to understand the extent to which a mutation that is neutral in a given background is more or less likely to alter epistasis between other mutations than a non-neutral mutation in the same background.

      (2) I have two related methodological concerns. First, in several analyses, the authors employ thresholds that appear to be arbitrary. And second, I did not see any account of measurement errors. For example, the authors chose the 0.05 threshold to distinguish between epistasis and no epistasis, but why this particular threshold was chosen is not justified. Another example: is whether the product s12 × (s1 + s2) is greater or smaller than zero for any given mutation is uncertain due to measurement errors. Presumably, how to classify each pair of mutations should depend on the precision with which the fitness of mutants is measured. These thresholds could well be different across mutants. We know, for example, that low-fitness mutants typically have noisier fitness estimates than high-fitness mutants. I think the authors should use a statistically rigorous procedure to categorize mutations and their epistatic interactions. I think it is very important to address this issue. I got very concerned about it when I saw on LL 383-388 that synonymous stop codon mutations appear to modulate epistasis among other mutations. This seems very strange to me and makes me quite worried that this is a result of noise in LF genotypes.

    3. Reviewer #2 (Public review):

      Significance:

      This paper reanalyzes an experimental fitness landscape generated by Papkou et al., who assayed the fitness of all possible combinations of 4 nucleotide states at 9 sites in the E. coli DHFR gene, which confers antibiotic resistance. The 9 nucleotide sites make up 3 amino acid sites in the protein, of which one was shown to be the primary determinant of fitness by Papkou et al. This paper sought to assess whether pairwise epistatic interactions differ among genetic backgrounds at other sites and whether there are major patterns in any such differences. They use a "double mutant cycle" approach to quantify pairwise epistasis, where the epistatic interaction between two mutations is the difference between the measured fitness of the double-mutant and its predicted fitness in the absence of epistasis (which equals the sum of individual effects of each mutation observed in the single mutants relative to the reference genotype). The paper claims that epistasis is "fluid," because pairwise epistatic effects often differs depending on the genetic state at the other site. It also claims that this fluidity is "binary," because pairwise effects depend strongly on the state at nucleotide positions 5 and 6 but weakly on those at other sites. Finally, they compare the distribution of fitness effects (DFE) of single mutations for starting genotypes with similar fitness and find that despite the apparent "fluidity" of interactions this distribution is well-predicted by the fitness of the starting genotype.

      The paper addresses an important question for genetics and evolution: how complex and unpredictable are the effects and interactions among mutations in a protein? Epistasis can make the phenotype hard to predict from the genotype and also affect the evolutionary navigability of a genotype landscape. Whether pairwise epistatic interactions depend on genetic background - that is, whether there are important high-order interactions -- is important because interactions of order greater than pairwise would make phenotypes especially idiosyncratic and difficult to predict from the genotype (or by extrapolating from experimentally measured phenotypes of genotypes randomly sampled from the huge space of possible genotypes). Another interesting question is the sparsity of such high-order interactions: if they exist but mostly depend on a small number of identifiable sequence sites in the background, then this would drastically reduce the complexity and idiosyncrasy relative to a landscape on which "fluidity" involves interactions among groups of all sites in the protein. A number of papers in the recent literature have addressed the topics of high-order epistasis and sparsity and have come to conflicting conclusions. This paper contributes to that body of literature with a case study of one published experimental dataset of high quality. The findings are therefore potentially significant if convincingly supported.

      Validity:

      In my judgment, the major conclusions of this paper are not well supported by the data. There are three major problems with the analysis.

      (1) Lack of statistical tests. The authors conclude that pairwise interactions differ among backgrounds, but no statistical analysis is provided to establish that the observed differences are statistically significant, rather than being attributable to error and noise in the assay measurements. It has been established previously that the methods the authors use to estimate high-order interactions can result in inflated inferences of epistasis because of the propagation of measurement noise (see PMID 31527666 and 39261454). Error propagation can be extreme because first-order mutation effects are calculated as the difference between the measured phenotype of a single-mutant variant and the reference genotype; pairwise effects are then calculated as the difference between the measured phenotype of a double mutant and the sum of the differences described above for the single mutants. This paper claims fluidity when this latter difference itself differs when assessed in two different backgrounds. At each step of these calculations, measurement noise propagates. Because no statistical analysis is provided to evaluate whether these observed differences are greater than expected because of propagated error, the paper has not convincingly established or quantified "fluidity" in epistatic effects.

      (2) Arbitrary cutoffs. Many of the analyses involve assigning pairwise interactions into discrete categories, based on the magnitude and direction of the difference between the predicted and observed phenotypes for a pairwise mutant. For example, the authors categorize as a positive pairwise interaction if the apparent deviation of phenotype from prediction is >0.05, negative if the deviation is <-0.05, and no interaction if the deviation is between these cutoffs. Fluidity is diagnosed when the category for a pairwise interaction differs among backgrounds. These cutoffs are essentially arbitrary, and the effects are assigned to categories without assessing statistical significance. For example, an interaction of 0.06 in one background and 0.04 in another would be classified as fluid, but it is very plausible that such a difference would arise due to error alone. The frequency of epistatic interactions in each category as claimed in the paper, as well as the extent of fluidity across backgrounds, could therefore be systematically overestimated or underestimated, affecting the major conclusions of the study.

      (3) Global nonlinearities. The analyses do not consider the fact that apparent fluidity could be attributable to the fact that fitness measurements are bounded by a minimum (the fitness of cells carrying proteins in which DHFR is essentially nonfunctional) and a maximum (the fitness of cells in which some biological factor other than DHFR function is limiting for fitness). The data are clearly bounded; the original Papkou et al. paper states that 93% of genotypes are at the low-fitness limit at which deleterious effects no longer influence fitness. Because of this bounding, mutations that are strongly deleterious to DHFR function will therefore have an apparently smaller effect when introduced in combination with other deleterious mutations, leading to apparent epistatic interactions; moreover, these apparent interactions will have different magnitudes if they are introduced into backgrounds that themselves differ in DHFR function/fitness, leading to apparent "fluidity" of these interactions. This is a well-established issue in the literature (see PMIDs 30037990, 28100592, 39261454). It is therefore important to adjust for these global nonlinearities before assessing interactions, but the authors have not done this.

      This global nonlinearity could explain much of the fluidity claimed in this paper. It could explain the observation that epistasis does not seem to depend as much on genetic background for low-fitness backgrounds, and the latter is constant (Figure 2B and 2C): these patterns would arise simply because the effects of deleterious mutations are all epistatically masked in backgrounds that are already near the fitness minimum. It would also explain the observations in Figure 7. For background genotypes with relatively high fitness, there are two distinct peaks of fitness effects, which likely correspond to neutral mutations and deleterious mutations that bring fitness to the lower bound of measurement; as the fitness of the background declines, the deleterious mutations have a smaller effect, so the two peaks draw closer to each other, and in the lowest-fitness backgrounds, they collapse into a single unimodal distribution in which all mutations are approximately neutral (with the distribution reflecting only noise).<br /> Global nonlinearity could also explain the apparent "binary" nature of epistasis. Sites 4 and 5 change the second amino acid, and the Papkou paper shows that only 3 amino acid states (C, D, and E) are compatible with function; all others abolish function and yield lower-bound fitness, while mutations at other sites have much weaker effects. The apparent binary nature of epistasis in Figure 5 corresponds to these effects given the nonlinearity of the fitness assay. Most mutations are close to neutral irrespective of the fitness of the background into which they are introduced: these are the "non-epistatic" mutations in the binary scheme. For the mutations at sites 4 and 5 that abolish one of the beneficial mutations, however, these have a strong background-dependence: they are very deleterious when introduced into a high-fitness background but their impact shrinks as they are introduced into backgrounds with progressively lower fitness. The apparent "binary" nature of global epistasis is likely to be a simple artifact of bounding and the bimodal distribution of functional effects: neutral mutations are insensitive to background, while the magnitude of the fitness effect of deleterious mutations declines with background fitness because they are masked by the lower bound. The authors' statement is that "global epistasis often does not hold." This is not established. A more plausible conclusion is that global epistasis imposed by the phenotype limits affects all mutations, but it does so in a nonlinear fashion.

      In conclusion, most of the major claims in the paper could be artifactual. Much of the claimed pairwise epistasis could be caused by measurement noise, the use of arbitrary cutoffs, and the lack of adjustment for global nonlinearity. Much of the fluidity or higher-order epistasis could be attributable to the same issues. And the apparently binary nature of global epistasis is also the expected result of this nonlinearity.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have studied a previously published large dataset on the fitness landscape of a 9 base-pair region of the folA gene. The objective of the paper is to understand various aspects of epistasis in this system, which the authors have achieved through detailed and computationally expensive exploration of the landscape. The authors describe epistasis in this system as "fluid", meaning that it depends sensitively on the genetic background, thereby reducing the predictability of evolution at the genetic level. However, the study also finds two robust patterns. The first is the existence of a "pivot point" for a majority of mutations, which is a fixed growth rate at which the effect of mutations switches from beneficial to deleterious (consistent with a previous study on the topic). The second is the observation that the distribution of fitness effects (DFE) of mutations is predicted quite well by the fitness of the genotype, especially for high-fitness genotypes. While the work does not offer a synthesis of the multitude of reported results, the information provided here raises interesting questions for future studies in this field.

      Strengths:

      A major strength of the study is its detailed and multifaceted approach, which has helped the authors tease out a number of interesting epistatic properties. The study makes a timely contribution by focusing on topical issues like the prevalence of global epistasis, the existence of pivot points, and the dependence of DFE on the background genotype and its fitness. The methodology is presented in a largely transparent manner, which makes it easy to interpret and evaluate the results.

      The authors have classified pairwise epistasis into six types and found that the type of epistasis changes depending on background mutations. Switches happen more frequently for mutations at functionally important sites. Interestingly, the authors find that even synonymous mutations in stop codons can alter the epistatic interaction between mutations in other codons. Consistent with these observations of "fluidity", the study reports limited instances of global epistasis (which predicts a simple linear relationship between the size of a mutational effect and the fitness of the genetic background in which it occurs). Overall, the work presents some evidence for the genetic context-dependent nature of epistasis in this system.

      Weaknesses:

      Despite the wealth of information provided by the study, there are some shortcomings of the paper which must be mentioned.

      (1) In the Significance Statement, the authors say that the "fluid" nature of epistasis is a previously unknown property. This is not accurate. What the authors describe as "fluidity" is essentially the prevalence of certain forms of higher-order epistasis (i.e., epistasis beyond pairwise mutational interactions). The existence of higher-order epistasis is a well-known feature of many landscapes. For example, in an early work, (Szendro et. al., J. Stat. Mech., 2013), the presence of a significant degree of higher-order epistasis was reported for a number of empirical fitness landscapes. Likewise, (Weinreich et. al., Curr. Opin. Genet. Dev., 2013) analysed several fitness landscapes and found that higher-order epistatic terms were on average larger than the pairwise term in nearly all cases. They further showed that ignoring higher-order epistasis leads to a significant overestimate of accessible evolutionary paths. The literature on higher-order epistasis has grown substantially since these early works. Any future versions of the present preprint will benefit from a more thorough contextual discussion of the literature on higher-order epistasis.

      (2) In the paper, the term 'sign epistasis' is used in a way that is different from its well-established meaning. (Pairwise) sign epistasis, in its standard usage, is said to occur when the effect of a mutation switches from beneficial to deleterious (or vice versa) when a mutation occurs at a different locus. The authors require a stronger condition, namely that the sum of the individual effects of two mutations should have the opposite sign from their joint effect. This is a sufficient condition for sign epistasis, but not a necessary one. The property studied by the authors is important in its own right, but it is not equivalent to sign epistasis.

      (3) The authors have looked for global epistasis in all 108 (9x12) mutations, out of which only 16 showed a correlation of R^2 > 0.4. 14 out of these 16 mutations were in the functionally important nucleotide positions. Based on this, the authors conclude that global epistasis is rare in this landscape, and further, that mutations in this landscape can be classified into one of two binary states - those that exhibit global epistasis (a small minority) and those that do not (the majority). I suspect, however, that a biologically significant binary classification based on these data may be premature. Unsurprisingly, mutational effects are stronger at the functional sites as seen in Figure 5 and Figure 2, which means that even if global epistasis is present for all mutations, a statistical signal will be more easily detected for the functionally important sites. Indeed, the authors show that the means of DFEs decrease linearly with background fitness, which hints at the possibility that a weak global epistatic effect may be present (though hard to detect) in the individual mutations. Given the high importance of the phenomenon of global epistasis, it pays to be cautious in interpreting these results.

      (4) The study reports that synonymous mutations frequently change the nature of epistasis between mutations in other codons. However, it is unclear whether this should be surprising, because, as the authors have already noted, synonymous mutations can have an impact on cellular functions. The reader may wonder if the synonymous mutations that cause changes in epistatic interactions in a certain background also tend to be non-neutral in that background. Unfortunately, the fitness effect of synonymous mutations has not been reported in the paper.

      (5) The authors find that DFEs of high-fitness genotypes tend to depend only on fitness and not on genetic composition. This is an intriguing observation, but unfortunately, the authors do not provide any possible explanation or connect it to theoretical literature. I am reminded of work by (Agarwala and Fisher, Theor. Popul. Biol., 2019) as well as (Reddy and Desai, eLife, 2023) where conditions under which the DFE depends only on the fitness have been derived. Any discussion of possible connections to these works could be a useful addition.

    5. Author response:

      Thank you for sharing a detailed review of our manuscript titled, Variations and predictability of epistasis on an intragenic fitness landscape. We have now carefully gone through the reviewers’ and the editor’s comments and have the following preliminary responses.

      (1) Measurement noise in the folA fitness landscape. All three reviewers and the editors raise the important matter of incorporating measurement noise in the fitness landscape. The paper by Papkou and coworkers makes the fitness measurements of the landscape in six independent repeats. They show that the fitness data is highly correlated in each repeat, and use the weighted mean of the repeats to report their results. They do not study how measurement noise influences their findings. The results by Papkou and coworkers were our starting point, and hence, we built on the landscape properties reported in their study. As a result, we also analyse our results working with the same mean of the six independent measurements.

      The main result of the work by Papkou and coworkers is that largest subgraph in the landscape has 514 fitness peaks. 

      We revisit this result by quantifying how measurement noise changes this number. By doing this, we note the subgraph contains only 127 peaks which are statistically significant. We define a sequence as a peak when its corresponding fitness is greater than all its one-distance neighbours with a p-value < 0.05. This shows that, as pointed out in the reviews, incorporating noise in the landscape results significantly changes how we view the landscape – a facet not included in Papkou et al and the current version of our manuscript. 

      Not incorporating measurement noise means that the entire landscape has 4055 peaks. When measurement noise is included in the analysis, this number reduces to 137, out of which 136 are high fitness backgrounds (functional). 

      In the revised version of our manuscript, we will incorporate measurement noise in our analysis. Through this, we will also address the concern regarding the use of an arbitrary cut-off to study “fluid” epistasis. However, we note that arbitrary cut-offs to define DFEs have been recently used (Sane et al., PNAS, 2023).

      We also note that previous work with large scale landscapes (Wu et al, eLife, 2016) also reported a fitness landscape with a single experiment, with no repeats. 

      (2) Global nonlinearities and higher-order leading to fluid epistasis. Attempts at building models for higher-order epistasis from empirical data have largely been confined to landscapes of a limited data size. For example, Sailer & Harms, Genetics, 2017 propose models for higher-order epistasis from seven empirical data sets, each with less than a 100 data points. Another recent attempt (Park et al, Nat Comm, 2024) proposes rule for protein structure-function with 20 fitness landscapes. In this study, only one landscape which used fitness as a phenotype had ~160000 data points (of which only 42% were included for analysis). All other data sets which used fitness as a phenotype contained less than 10000 data points. While these statistical proposals of how higher-order epistasis operates exist, none of them are reliant of large scale, exhaustive network, like the one proposed by Papkou and coworkers.  

      In the edited manuscript, we will replace our arbitrary cut-off with results of statistical tests carried out based on measurement noise. 

      Global non-linearities shape evolutionary responses. We would like to emphasize that the goal of this work to study and understand how these global non-linearities result in patterns on a large fitness landscape by presenting the sum total of these fundamental factors in shaping statistical patterns. 

      While we understand that we may not have sufficiently explained the effects of global non-linearities on our results, we do not agree with the reviewer’s conclusion that our results are artifacts of these non-linearities. We will expand on the role of these nonlinearities on the patterns that we observe (like, fitness being bounded, as pointed out by reviewer 2, or differential impact of a mutation in functional vs. non-functional variants).

      We also speculate that changing our arbitrary cut-off (selection coefficient of 0.05) to measurement noise will not alter our results qualitatively. 

      The question we address in our work is, therefore, how does the nature of epistasis change with genetic background over a large, exhaustive landscape. The nature of epistasis between two mutations is analysed in all 4<sup>7</sup> backgrounds. The causative agents for the change in epistasis will be context-dependent, depending on the precise nature of the two mutations and the background. For instance, a certain background might simply introduce a Stop codon in the sequence. Notwithstanding these precise, local mechanistic explanations, we seek to answer how epistasis changes statistically in a sequence. Investigating statistical patterns which explain switch in nature of epistasis in deep, exhaustive landscapes is a long-term goal of this research.

      (3) Last, in our revised manuscript, we will address the reviewers’ other minor comments on the various aspects of the manuscript.

    1. eLife Assessment

      This valuable study introduces the peptidisc-TPP approach as a promising solution to challenges in membrane proteomics, enabling thermal proteome profiling in a detergent-free system. The concept is innovative and holds significant potential, and the demonstration of its utility and validation is solid. The method presents a strong foundation for broader applications in identifying physiologically and pharmacologically relevant membrane protein-ligand interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths:

      Novelty of the approach, potential implications for discovering novel interactions

      Comments on revisions:

      The authors have adequately addressed most of my concerns in this improved version of the manuscript

    3. Reviewer #2 (Public review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. promises a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting. The manuscript is lucidly written and the data presented seems clear. Kudos to the authors. This methodology shows immense potential for identifying membrane protein binders (small-molecule or protein) in a near-native environment, and as a result promises to be a great tool for drug discovery campaigns.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions in a detergent-free environment, it is crucial to bear in mind that the process of reconstitution begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths: 

      Novelty of the approach, potential impli=cations for discovering novel interactions

      Weaknesses:

      The Duong had introduced their highly elegant peptidisc approach several years ago. In this present work, they combine it with thermal proteome profiling (TPP) and attempt to demonstrate the utility of this combination for identifying novel membrane protein-ligand interactions.

      While I find this idea intriguing, and the approach potentially useful, I do not feel that the authors had sufficiently demonstrated the utility of this approach. My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary. In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We thank the reviewer for their thoughtful comments. In this revision, we have experimentally addressed the reviewer’s concerns in three ways:

      (1) To demonstrate the utility of our MM-TPP method over the detergent-based TPP workflow (termed DB-TPP), we performed a side-by-side comparison using ATP–VO₄ at 51 °C (Figure 3B and Figure 4A). From the DB-TPP dataset, 7.4% of all identified proteins were annotated as ATP-binding, while 6.4% of proteins differentially stabilized were annotated as ATP-binding. In contrast, in the MM-TPP dataset, 9.3% of all identified proteins were annotated as ATP-binding proteins, while 17% of proteins differentially stabilized were annotated as ATP-binding. The lack of enrichment in the detergent-based approach indicates that the observed differences are likely stochastic, rather than a result of specific ATP–VO₄-mediated stabilization as found with MM-TPP. For instance, several key proteins—BCS1, P2RY6, SLC27A2, ABCB1, ABCC2, and ABCC9— found differentially stabilized using the MM-TPP method showed no such pattern in the DB-TPP dataset. This divergence strongly supports the specificity and utility of our Peptidisc approach. 

      (2) To demonstrate that MM-TPP can resolve not only the broader effects of ATP–VO₄ but also specific ligand–protein interactions, we employed 2-methylthio-ADP (2-MeS-ADP), a selective agonist of the P2RY12 receptor [PMID: 24784220]. In that case, we observed clear thermal stabilization of P2RY12, with more than 6-fold increase in stability at both 51 °C and 57 °C (–log₁₀ p > 5.97; Figure 4B and Figure S4). Notably, no other proteins—including the structurally related but non-responsive P2RY6 receptor- showed comparable stabilization fold change at these temperatures.

      (3) To further probe the reproducibility of the method, we performed an independent MMTPP evaluation with ATP–VO₄ at 51 °C using data-independent acquisition (DIA), in contrast to the data-dependent acquisition (DDA) approach used in the initial study (Figure S5). Overall, 7.8% of all identified proteins were annotated as ATP-binding, and as before, this proportion increased to 17% among proteins with log₂ fold changes greater than 0.5. Specifically, BCS1 and SLC27A2 exhibited strong stabilization (log₂ fold change > 1), while P2RY6, ABCB11, ABCC2, and ABCG2 showed moderate stabilization (log₂ fold changes between 0.5 and 1), and consistent with previous results, P2RX4 was destabilized, with a log₂ fold change below –1. These findings support the consistency and reproducibility of the method across distinct data acquisition methods.

      My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary.  

      The primary objective of our study is to establish and benchmark the MM-TPP workflow using known targets, rather than to discover novel ligand–protein interactions. Identifying new binders requires extensive screening and downstream validations, which we believe is beyond the scope of this methodological report. Instead, our study highlights the sensitivity and reliability of the MM-TPP approach by demonstrating consistent and reproducible results with well-characterized interactions.

      We respectfully disagree with the notion that introducing a new methodology must necessarily include the discovery of novel interactions. For instance, Martinez Molina et al. [PMID: 23828940] introduced the cellular thermal shift assay (CETSA) by validating established targets such as MetAP2 with TNP-470 and CDK2 with AZD-5438, without identifying novel protein–ligand pairs. Similarly, Kalxdorf et al. [PMID: 33398190] published their cell-surface thermal proteome profiling (CS-TPP) using Ouabain to stabilize the Na⁺/K⁺-ATPase pump in K562 cells, and SB431542 to stabilize its canonical target JAG1. In fact, when these methods revealed additional stabilizations, these were not validated but instead interpreted through reasoning grounded in the literature. For instance, they attributed the SB431542-induced stabilization of MCT1 to its reported role in cell migration and tumor invasiveness, and explained that SLC1A2 stabilization is related to the disruption of Na⁺/K⁺-ATPase activity by Ouabain. In the same way, our interpretation of ATP-VO₄–mediated stabilization of Mao-B is justified by predictive AlphaFold-3 rather than direct orthogonal assays, which are beyond the scope of our methodological presentation. 

      Collectively, the influential studies cited above have set methodological precedents by prioritizing validation and proof-of-concept over merely finding uncharacterized binders. In the same spirit, our work is centred on establishing MM-TPP as a robust platform for probing membrane protein–ligand interactions in a water-soluble format. The discovery of novel binders remains an exciting future direction—one that will build upon the methodological foundation laid by the present study.

      In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      We deliberately began this study with our model protein, MsbA, examined under both native and overexpressed conditions, to establish an adequation between MMTPP (Figure 2D) and biochemical stability assays (Figure 2A). This validation has provided us with the foundation to confidently extend MM-TPP to the mouse organ proteome. To demonstrate the validity of our workflow, we have used ATP-VO₄ because it has expected targets. 

      We note that orthogonal validation often requires overproduction and purification of the candidate proteins, including suitable antibodies, which is a true challenge for membrane proteins. Here, we demonstrate that MM-TPP can detect ligand-induced thermal shifts directly in native membrane preparations, without requiring protein overproduction or purification. We also emphasize several influential studies in TPP, including Martinez Molina et al. (PMID: 23828940) and Fang et al. (PMID: 34188175), which focused primarily on establishing and benchmarking the methodology, rather than on extensive orthogonal validation. In the same spirit, our study prioritizes methodological development, and accordingly, several orthogonal validations are now included in this revision.

      [...] and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.

      To clarify, all analyses on ligand-induced stabilization or destabilization were carried out using LFQ values. The sole exception is on Figure 2B, where we used iBAQ values to depict the relative abundance of proteins within a single sample; this to show MsbA's relative level within the E. coli peptidisc library.

      Respectfully, we disagree with the assertion that we are “quantifying rather small differences in abundances using either iBAQ or LFQ.” We were able to clearly distinguish between stabilizations driven by specific ligands binding to their targets versus those caused by non-specific ligands with broader activity. This is further confirmed by comparing 2-MeS-ADP, a selective ligand for P2RY12, with ATP-VO₄, a highly promiscuous ligand, and AMP-PNP, which exhibits intermediate breadth. When tested in triplicate at 51 °C, 2-MeS-ADP significantly altered the thermal stability of 27 proteins,  AMP-PNP 44 proteins, and ATP-VO₄ 230 proteins, consistent with the expectation that broader ligands stabilize more proteins nonspecifically. Importantly, 2-MeS-ADP produced markedly stronger stabilization of its intended target, P2RY12 (–log<sub>10</sub>p = 9.32), than the top stabilized proteins for ATP–VO₄ (DNAJB3, –log₁₀p = 5.87) or AMP-PNP (FTH1, p = 5.34). Moreover, 2-MeS-ADP did not significantly stabilize proteins that were consistently stabilized by the broad ligands, such as SLC27A2, which was strongly stabilized by both ATP-VO<sub>4</sub> and AMP-PNP (–log<sub>10</sub> p>2.5). Together, these findings demonstrate that MMTPP can robustly distinguish between broad-spectrum and target-specific ligands, with selective ligands inducing stronger and more physiologically meaningful stabilization at their intended targets compared to promiscuous ligands.

      Finally, we emphasize that our findings are not marginal, but meet quantitative and statistical rigor consistent with best practices in proteomics. We apply dual thresholds combining effect size (|log₂FC| ≥ 1, i.e., at least a two-fold change) with statistical significance (FDR-adjusted p ≤ 0.05)—criteria commonly used in proteomics methodology studies (e.g., PMID: 24942700, 38724498). Moreover, the stabilization and destabilization events we report are reproducible across biological replicates (n = 3), consistent across adjacent temperatures for most targets, and technically robust across acquisition modes (DDA vs. DIA). Taken together, these results reflect statistically valid and biologically meaningful effects, fully aligned with standards set by prior published proteomics studies.

      Furthermore, the reported changes in abundances are solely based on iBAQ or LFQ analysis. This must be supported by a more quantitative approach such as SILAC or labeled peptides. In summary, I think this story requires a stronger and broader demonstration of the ability of peptidisc-TPP to identify novel physiologically/pharmacologically relevant interactions.

      With respect to labeling strategies, we deliberately avoided using TMT due to concerns about both cost and potential data quality issues. Some recent studies have documented the drawbacks of TMT in contexts directly relevant to our work. For example, a benchmarking study of LiP-MS workflows showed that although TMT increased proteome depth and reduced technical variance, it was less accurate in identifying true drug–protein interactions and produced weaker dose–response correlations compared with label-free DIA approaches [PMID: 40089063]. More broadly, technical reviews have highlighted that isobaric tagging is intrinsically prone to ratio compression and reporterion interference due to co-isolation and co-fragmentation of peptides, which flatten measured fold-changes and obscure biologically meaningful differences [PMID: 22580419, 22036744]. In terms of SILAC, the technique requires metabolic incorporation of heavy amino acids, which is feasible in cultured cells but not in physiologically relevant tissues such as the liver organ used here. SILAC mouse models exist, but they are expensive and time-consuming [PMID: 18662549, 21909926]. We are not a mouse lab, and introducing liver organ SILAC labeling in our workflow is beyond the scope of these revisions. We also note that several hallmark TPP studies have been successfully carried out using label-free quantification [PMID: 25278616, 26379230, 33398190, 23828940], establishing this as an accepted and widely applied approach in the field. 

      To further support our conclusions, we added controls showing that detergent solubilization of mouse liver membranes followed by SP4 cleanup fails to detect ATP-VO₄– mediated stabilization of ATP-binding proteins, underscoring the necessity of Peptidisc reconstitution for capturing ligand-induced thermal stabilization. We also present new data demonstrating selective stabilization of the P2Y12 receptor by its agonist 2-MeS-ADP, providing orthogonal, receptor-specific validation within the MM-TPP framework. Finally, an orthogonal DIA acquisition on separate replicates confirmed robust ATP-vanadate stabilization of ATP-binding proteins, including BCS1l and SLC27A2. Together, these additions reinforce that the observed stabilizations are genuine, physiologically relevant ligand–protein interactions and highlight the unique advantage of the Peptidisc-based workflow in capturing such events.

      Cited Reference:

      24784220: Zhang J, Zhang K, Gao ZG, et al. Agonist-bound structure of the human P2Y₁₂ receptor. Nature.  2014;509(7498):119-122. doi:10.1038/nature13288. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606.

      33398190: Kalxdorf M, Günthner I, Becher I, et al. Cell surface thermal proteome profiling tracks perturbations and drug targets on the plasma membrane. Nat Methods. 2021;18(1):84-91. doi:10.1038/s41592-020-01022-1.

      34188175: Fang S, Kirk PDW, Bantscheff M, Lilley KS, Crook OM. A Bayesian semi-parametric model for thermal proteome profiling. Commun Biol. 2021;4(1):810. doi:10.1038/s42003-021-02306-8.

      24942700: Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014;13(9):2513-2526. doi:10.1074/mcp.M113.031591.

      38724498: Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun. 2024;15(1):3922. doi:10.1038/s41467-02447899-w. 

      40089063: Koudelka T, Bassot C, Piazza I. Benchmarking of quantitative proteomics workflows for limited proteolysis mass spectrometry. Mol Cell Proteomics. 2025;24(4):100945. doi:10.1016/j.mcpro.2025.100945.

      22580419: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      22036744: Christoforou AL, Lilley KS. Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem. 2012;404(4):1029-1037. doi:10.1007/s00216-012-6012-9. 

      18662549: Krüger M, Moser M, Ussar S, et al. SILAC mouse for quantitative proteomics uncovers kindlin-3 as an essential factor for red blood cell function. Cell. 2008;134(2):353-364. doi:10.1016/j.cell.2008.05.033.

      21909926: Zanivan S, Krueger M, Mann M. In vivo quantitative proteomics: the SILAC mouse. Methods Mol Biol. 2012;757:435-450. doi:10.1007/978-1-61779-166-6_25. 

      25278616: Kalxdorf M, Becher I, Savitski MM, et al. Temperature-dependent cellular protein stability enables highprecision proteomics profiling. Nat Methods. 2015;12(12):1147-1150. doi:10.1038/nmeth.3651.

      26379230: Savitski MM, Reinhard FBM, Franken H, et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science. 2015;346(6205):1255784. doi:10.1126/science.1255784. 

      33452728: Leuenberger P, Ganscha S, Kahraman A, et al. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science. 2020;355(6327):eaai7825. doi:10.1126/science.aai7825. 

      23066101: Savitski MM, Zinn N, Faelth-Savitski M, et al. Quantitative thermal proteome profiling reveals ligand interactions and thermal stability changes in cells. Nat Methods. 2013;10(12):1094-1096. doi:10.1038/nmeth.2766.  

      30858367: Piazza I, Kochanowski K, Cappelletti V, et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nat Commun. 2019;10(1):1216. doi:10.1038/s41467019-09199-0. 

      Reviewer #2 (Public Review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. seems to be a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting.

      The manuscript is lucidly written and the data presented seems clear. The only insignificant grammatical error I found was that the 'P' in the word peptidisc is not capitalized in the beginning of the methods section "MM-TPP profiling on membrane proteomes". The clear writing made it easy to understand and evaluate what has been presented. Kudos to the authors.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions, addressing some of the minor caveats listed below could make it much more impactful.

      The authors claim that MM-TPP is done by "completely circumventing structural perturbations invoked by detergents[1] ". This may not be entirely accurate, because before reconstitution of the membrane proteins in peptidisc, the membrane fractions are solubilized by 1% DDM. The solubilization and following centrifugation step lasts at least for 45 min. It is less likely that all the structural perturbations caused by DDM to various membrane proteins and their transient interactions become completely reversed or rescued by peptidisc reconstitution.

      We thank the reviewer for this insightful comment. In response, we have revised the sentence and expanded the discussion to clarify that the Peptidisc provides a complementary approach to detergent-based preparations for studying membrane proteins, preserving native lipid–protein interactions and stabilization effects that may be diminished in detergent.

      To further address the structural perturbations invoked by detergents, and as already detailed to our response to Reviewer 1, we have compared the thermal profile of the Peptidisc library to the mouse liver membranes solubilized with 1% DDM, after incubation with ATP–VO₄ at 51 °C (Figure 4A). The results with the detergent extract revealed random patterns of stabilization and destabilization, with only 6.4% of differentially stabilized proteins being ATP-binding—comparable to the 7.4% observed in the background. In contrast, in the Peptidisc library, 17% of differentially stabilized proteins were ATP-binding, compared to 9.3% in the background. Thus, while Peptidisc reconstitution does not fully avoid initial detergent exposure, these findings underscore the importance of implementing Peptidisc in the TPP workflow when dealing with membrane proteins.

      In the introduction, the authors make statements such as "..it is widely acknowledged that even mild detergents can disrupt protein structures and activities, leading to challenges in accurately identifying drug targets.." and "[peptidisc] libraries are instrumental in capturing and stabilizing IMPs in their functional states while preserving their interactomes and lipid allosteric modulators...'. These need to be rephrased, as it has been shown by countless studies that even with membrane protein suspended in micelles robust ligand binding assays and binding kinetics have been performed leading to physiologically relevant conclusions and identification of protein-protein and protein-ligand interactions.

      We thank the reviewer for this valuable feedback and fully agree with the point raised. In response, we have revised the Introduction and conclusion to moderate the language concerning the limitations of detergent use. We now explicitly acknowledge that numerous studies have successfully used detergent micelles for ligand-binding assays and kinetic analyses, yielding physiologically relevant insights into both protein–protein and protein–ligand interactions [e.g., PMID: 22004748, 26440106, 31776188].

      At the same time, we clarify that the Peptidisc method offers a complementary advantage, particularly in the context of thermal proteome profiling (TPP), which involves mass spectrometry workflows that are incompatible with detergents. In this setting, Peptidiscs facilitate the detection of ligand-binding events that may be more difficult to observe in detergent micelles.

      We have reframed our discussion accordingly to present Peptidiscs not as a replacement for detergent-based methods, but rather as a complementary tool that broadens the available methodological landscape for studying membrane protein interactions.

      If the method involves detergent solubilization, for example using 1% DDM, it is a bit disingenuous to argue that 'interactomes and lipid allosteric modulators' characterized by lowaffinity interactions will remain intact or can be rescued upon detergent removal. Authors should discuss this or at least highlight the primary caveat of the peptidisc method of membrane protein reconstitution - which is that it begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

      We would like to clarify that, in our current workflow, ligand incubation occurs after reconstitution into Peptidiscs. As such, the method is designed to circumvent the negative effects of detergent during the critical steps involving low-affinity interactions.

      That said, we fully acknowledge that Peptidisc reconstitution begins with detergent solubilization (e.g., 1% DDM), and we have revised the conclusion to explicitly state this important caveat. As the reviewer correctly points out, this initial step may introduce some structural perturbations or result in the loss of weakly associated lipid modulators.

      However, reconstitution into Peptidiscs rapidly restores a detergent-free environment for membrane proteins, which has been shown in our previous studies [PMID: 38577106, 38232390, 31736482, 31364989] to mitigate these effects. Specifically, we have demonstrated that time-limited DDM exposure, followed by Peptidisc reconstitution, minimizes membrane protein delipidation, enhances thermal stability, retains functionality, and preserves multi-protein assemblies.

      It would also be important to test detergents that are even milder than 1% DDM and ones which are harsher than 1% DDM to show that this method of reconstitution can indeed rescue the perturbations to the structure and interactions of the membrane protein done by detergents during solubilization step. 

      We selected 1% DDM based on our previous work [PMID: 37295717, 39313981,38232390], where it consistently enabled robust and reproducible solubilization for Peptidisc reconstitution. We agree that comparing milder detergents (e.g., LMNG) and harsher ones (e.g., SDC) would provide valuable insights into how detergent strength influences structural perturbations, and how effectively these can be mitigated by Peptidisc reconstitution. Preliminary data (not shown) from mouse liver membranes indicate broadly similar proteomic profiles following solubilization with DDM, LMNG, and SDC, although potential differences in functional activity or ligand binding remain to be investigated.

      Based on the methods provided, it appears that the final amount of detergent in peptidisc membrane protein library was 0.008%, which is ~150 uM. The CMC of DDM depending on the amount of NaCl could be between 120-170 uM.

      While we cannot entirely rule out the presence of residual DDM (0.008%) in the raw library, its free concentration may be lower than initially estimated. This is related to the formation of mixed micelles with the amphipathic peptide scaffold, which is supplied in excess during reconstitution. These mixed micelles are subsequently removed during the ultrafiltration step. Furthermore, in related work using His-tagged Peptidiscs [PMID: 32364744], we purified the library by nickel-affinity chromatography following a 5× dilution into a detergent-free buffer. Although this purification step reduced the number of soluble proteins, the same membrane proteins were retained, suggesting that any residual detergent does not significantly interfere with Peptidisc reconstitution. Supporting this, our MM-TPP assays on purified libraries (data not shown) consistently demonstrated stabilization of ATP-binding proteins (e.g., SLC27A2, DNAJB3), indicating that the observed ligand–protein interactions result from successful incorporation into Peptidiscs.

      Perhaps, to completely circumvent the perturbations from detergents other methods of detergentfree solubilization such as using SMA polymers and SMALP reconstitution could be explored for a comparison. Moreover, a comparison of the peptidisc reconstitution with detergent-free extraction strategies, such as SMA copolymers, could lend more strength to the presented method.

      We agree that detergent-free methods such as SMA polymers hold promise for membrane protein solubilization. However, in preliminary single-replicate experiments using SMA2000 at 51 °C in the presence of ATP–VO₄ (data not shown), we observed broad, non-specific stabilization effects. Of the 2,287 quantified proteins, 9.3% were annotated as ATP-binding, yet 9.9% of the 101 proteins showing a log₂ fold change >1 or <–1 were ATPbinding, indicating no meaningful enrichment. Given this lack of specificity and the limited dataset, we chose not to pursue further SMA experiments and have not included them here. However, in a recent study (https://doi.org/10.1101/2025.08.25.672181), we directly compared Peptidisc, SMA, and nanodiscs for liver membrane proteome profiling. In that work, Peptidisc outperformed both SMA and nanodiscs in detecting membrane protein dysregulation between healthy and diseased liver. By extension, we expect Peptidisc to offer superior sensitivity and specificity for detecting ligand-induced stabilization events, such as those observed here with ATP–vanadate.

      Cross-verification of the identified interactions, and subsequent stabilization or destabilizations, should be demonstrated by other in vitro methods of thermal stability and ligand binding analysis using purified protein to support the efficacy of the MM-TPP method. An example cross-verification using SDS-PAGE, of the well-studied MsbA, is shown in Figure 2. In a similar fashion, other discussed targets such as, BCS1L, P2RX4, DgkA, Mao-B, and some un-annotated IMPs shown in supplementary figure 3 that display substantial stabilization or destabilization should be cross-verified.

      We appreciate this suggestion and note that a similar point was raised in R1’s comment “In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely on quantifying rather small differences in abundances using either iBAQ or LFQ.” We have developed a detailed response to R1 on this matter, which equally applies here. 

      Cited Reference:

      35616533: Young JW, Wason IS, Zhao Z, et al. Development of a Method Combining Peptidiscs and Proteomics to Identify, Stabilize, and Purify a Detergent-Sensitive Membrane Protein Assembly. J Proteome Res. 2022;21(7):1748-1758. doi:10.1021/acs.jproteome.2c00129. PMID: 35616533.

      31364989: Carlson ML, Stacey RG, Young JW, et al. Profiling the Escherichia coli membrane protein interactome captured in Peptidisc libraries. Elife. 2019;8:e46615. doi:10.7554/eLife.46615. 

      22004748: O'Malley MA, Helgeson ME, Wagner NJ, Robinson AS. Toward rational design of protein detergent complexes: determinants of mixed micelles that are critical for the in vitro stabilization of a G-protein coupled receptor. Biophys J. 2011;101(8):1938-1948. doi:10.1016/j.bpj.2011.09.018.

      26440106: Allison TM, Reading E, Liko I, Baldwin AJ, Laganowsky A, Robinson CV. Quantifying the stabilizing effects of protein-ligand interactions in the gas phase. Nat Commun. 2015;6:8551. doi:10.1038/ncomms9551.

      31776188: Beckner RL, Zoubak L, Hines KG, Gawrisch K, Yeliseev AA. Probing thermostability of detergentsolubilized CB2 receptor by parallel G protein-activation and ligand-binding assays. J Biol Chem. 2020;295(1):181190. doi:10.1074/jbc.RA119.010696.

      38577106: Jandu RS, Yu H, Zhao Z, Le HT, Kim S, Huan T, Duong van Hoa F. Capture of endogenous lipids in peptidiscs and effect on protein stability and activity. iScience. 2024;27(4):109382. doi:10.1016/j.isci.2024.109382.

      38232390: Antony F, Brough Z, Zhao Z, Duong van Hoa F. Capture of the Mouse Organ Membrane Proteome Specificity in Peptidisc Libraries. J Proteome Res. 2024;23(2):857-867. doi:10.1021/acs.jproteome.3c00825.

      31736482: Saville JW, Troman LA, Duong Van Hoa F. PeptiQuick, a one-step incorporation of membrane proteins into biotinylated peptidiscs for streamlined protein binding assays. J Vis Exp. 2019;(153). doi:10.3791/60661. 

      37295717: Zhao Z, Khurana A, Antony F, et al. A Peptidisc-Based Survey of the Plasma Membrane Proteome of a Mammalian Cell. Mol Cell Proteomics. 2023;22(8):100588. doi:10.1016/j.mcpro.2023.100588. 

      39313981: Antony F, Brough Z, Orangi M, Al-Seragi M, Aoki H, Babu M, Duong van Hoa F. Sensitive Profiling of Mouse Liver Membrane Proteome Dysregulation Following a High-Fat and Alcohol Diet Treatment. Proteomics. 2024;24(23-24):e202300599. doi:10.1002/pmic.202300599. 

      32364744: Young JW, Wason IS, Zhao Z, Rattray DG, Foster LJ, Duong Van Hoa F. His-Tagged Peptidiscs Enable Affinity Purification of the Membrane Proteome for Downstream Mass Spectrometry Analysis. J Proteome Res. 2020;19(7):2553-2562. doi:10.1021/acs.jproteome.0c00022.

      32591519: The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun. 2020;11(1):3234. doi:10.1038/s41467-020-17037-3. 

      33188197: Kurzawa N, Becher I, Sridharan S, et al. A computational method for detection of ligand-binding proteins from dose range thermal proteome profiles. Nat Commun. 2020;11(1):5783. doi:10.1038/s41467-02019529-8. 

      26524241: Reinhard FBM, Eberhard D, Werner T, et al. Thermal proteome profiling monitors ligand interactions with cellular membrane proteins. Nat Methods. 2015;12(12):1129-1131. doi:10.1038/nmeth.3652. 

      23828940: Martinez Molina D, Jafari R, Ignatushchenko M, et al. Monitoring drug target engagement in cells and tissues using the cellular thermal shift assay. Science. 2013;341(6141):84-87. doi:10.1126/science.1233606. 

      32133759: Mateus A, Kurzawa N, Becher I, et al. Thermal proteome profiling for interrogating protein interactions. Mol Syst Biol. 2020;16(3):e9232. doi:10.15252/msb.20199232. 

      14755328: Dorsam RT, Kunapuli SP. Central role of the P2Y12 receptor in platelet activation. J Clin Invest. 2004;113(3):340-345. doi:10.1172/JCI20986. 

      Reviewer #1 (Recommendations for the authors):

      “The authors use iBAC or LFQ to compare across samples. This inconsistency is puzzling. As far as I know, LFQ should always be used when comparing across samples”

      As mentioned above, we use iBAQ only in Fig. 2B to illustrate within-sample relative abundance; all comparative analyses elsewhere use LFQ. We have updated the Fig. 2B legend to state this explicitly.

      We used iBAQ Fig. 2B as it provides a notion of protein abundance within a sample, normalizing the summed peptide intensities by the number of theoretically observable peptides. This normalization facilitates comparisons between proteins within the same sample, offering a clearer understanding of their relative molar proportions [PMID: 33452728]. LFQ, by contrast, is optimized for comparing the same protein across different samples. It achieves this by performing delayed normalization to reduce run-to-run variability and by applying maximal peptide ratio extraction, which integrates pairwise peptide intensity ratios across all samples to build a consistent protein-level quantification matrix [PMID: 24942700]. These features make LFQ more robust to missing values and technical variation, thereby enabling accurate detection of relative abundance changes in the same protein under different experimental conditions. This distinction is well supported by the proteomics literature: Smits et al. [PMID: 23066101] used iBAQ specifically to determine the relative abundance of proteins within one sample, whereas LFQ was applied for comparative analyses between conditions.

      “[Regarding Figure 2A] Why does the control also contain ATP-vanadate? Also, I am not aware of a commercially available chemical "ATP-VO4". I assume this is a mistake”

      The control condition in Figure 2A was mislabeled, and the figure has been corrected to remove this discrepancy. In our experiments, ATP and orthovanadate (VO<sub>4</sub>) were added together, and for simplicity this was annotated as “ATP-VO<sub>4</sub>.” 

      “[Regarding Figure 2B] What is the fold change in MsbA iBAQ values? It seems that the differences are quite small, and as such require a more quantitative approach than iBAQ (e.g SILAC or some other internal standard). In addition, what information does this panel add relative to 2C”

      The figure has been updated to clarify that the values shown are log₂transformed iBAQ intensities. Figures 2B and 2C are complementary: Figure 2B shows that in the control sample, MsbA’s peptide abundance decreases with temperatures (51, 56, and 61 °C) relative to the remaining bulk proteins. Figure 2C shows the specific thermal profiles of MsbA in control and ATP–vanadate conditions. To make this clearer, we have added a sentence to the Results section explaining the specific role of Figure 2B.

      Together, these panels indicate that the method can identify ligand-induced stabilization even for proteins whose abundance decreases faster than the bulk during the TPP assay. We have provided the rationale for not using SILAC or TMT labeling in our public response.

      “[Regarding Figure 2C] Although not mentioned in the legend, I assume this is iBAQ quantification, which as mentioned above isn't accurate enough for such small differences. In addition, I find this data confusing: why is MsbA more stable at the lower temperatures in the absence of ATP-vanadate? The smoothed-line representation is misleading, certainly given the low number of data points”

      The data presented represent LFQ values for MsbA, and we have updated the figure legend to clearly indicate this. Additionally, as suggested, we have removed the smoothing line to more accurately reflect the data. Regarding the reviewer’s concern about stability at lower temperatures, we note that MsbA exhibits comparable abundance at 38 °C and 46 °C under both conditions, with overlapping error bars. We therefore interpret these data as indicating no significant difference in stability at the lower temperatures, with ligand-dependent stabilization becoming apparent only at elevated temperatures. We do not exclude the possibility that MsbA stability at these temperatures is affected by the conformational dynamics of this ABC transporter upon ATP binding and hydrolysis.

      “[Regarding Figure 3A] is this raw LFQ data? Why did the authors suddenly change from iBAQ to LFQ? I find this inconsistency puzzling”

      To clarify, all analyses of protein stabilization or destabilization presented in the manuscript are based on LFQ values. The only instance where iBAQ was used is Figure 2B, where it served to illustrate the relative peptide abundance of MsbA within the same sample. We have revised the figure legends and text to make this distinction explicit and ensure consistency in presentation.

      “[Regarding Figure 3B] The non-specific ATP-dependent stabilization increases the likelihood of false positive hits. This limitation is not mentioned by the authors. I think it is important to show other small molecules, in addition to ATP. The authors suggest that their approach is highly relevant for drug screening. Therefore, a good choice is to test an effect of a known stabilizing drug (eg VX-809 and CFTR)”

      We thank the reviewer for this suggestion. As noted in the manuscript (results and discussion sections), ATP is a natural hydrotrope and is therefore expected to induce broad, non-specific stabilization effects, a phenomenon also observed in previous proteome-wide studies, which demonstrated ATP’s widespread influence on cytosolic protein solubility and thermal stability (PMID: 30858367). To demonstrate that MM-TPP can resolve specific ligand–protein interactions beyond these global ATP effects, we tested 2-methylthio-ADP (2-MeS-ADP), a selective agonist of P2RY12 (PMID: 14755328). In these experiments, we observed robust and reproducible stabilization of P2RY12 at both 51°C and 57°C, with no consistent stabilization of unrelated proteins across temperatures. This provides direct evidence that our workflow can distinguish specific from non-specific ligand-induced effects. We selected 2-MeS-ADP due to its structural stability and receptor higher-affinity over ADP, allowing us to extend our existing workflow while testing a receptor-specific interaction. We agree that extending this approach to clinically relevant small-molecule drugs, such as VX-809 with CFTR, would further underscore the pharmacological potential of MM-TPP, and we have now noted this as an important avenue for future studies.

      “X axis of Figure 3B: Log 2 fold difference of what? iBAQ? LFQ? Similar ambiguity regarding the Y axis of 3E. What peptide? And why the constant changes in estimating abundances?”

      We thank the reviewer for pointing out these inaccuracies in the figure annotations. As mentioned above, all analyses (except Figure 2B) are based on LFQ values. We have revised the figure legends and text to make this clear.

      In Figure 3E, “peptide intensity” refers to log2 LFQ peptide intensities derived from the BCS1L protein, as indicated in the figure caption. 

      “The authors suggest that P2RY6 and P2RY12 are stabilized by ADP, the hydrolysis product of ATP. Currently, the support for this suggestion is highly indirect. To support this claim, the authors need to directly show the effect of ADP. In reference to the alpha fold results shown in Figure 4D, the authors state that "Collectively, these data highlight the ability of MM-TPP to detect the side effects of parent compounds, an important consideration for drug development". To support this claim, it is necessary to show that Mao-B is indeed best stabilized with ADP or AMP, rather than ATP.”

      In this revision, we chose not to test ADP directly, as it is a broadly binding, relatively weak ligand that would likely stabilize many proteins without revealing clear target-specific effects. Since we had already evaluated ATP-VO₄, a similarly broad, non-specific ligand, additional testing with ADP would provide limited additional insight. Instead, we prioritized 2-methylthio-ADP, a selective agonist of P2RY12, to more effectively demonstrate the specificity of MM-TPP. With this ligand, we observed clear and reproducible stabilization of P2RY12, underscoring the ability of MM-TPP to resolve receptor–ligand interactions beyond ATP’s broad hydrotropic effects. Importantly, and as expected, we did not observe stabilization of the related purinergic receptor P2RY6, further supporting the specificity of the observed effect.

      We have also revised the AlphaFold-related statement in Figure 4D to adopt a more cautious tone: “Collectively, these data suggest that MM-TPP may detect potential side effects of parent compounds, an important consideration for drug development.” In this context, we use AlphaFold not as a validation tool, but rather as a structural aid to help rationalize why certain off-target proteins (e.g., ATP with Mao-B) exhibit stabilization.

      Reviewer #2 (Recommendations for the authors):

      “In the main text, it will be useful to include the unique peptides table of at least the targets discussed in the manuscript. For example, in presence of AMP-PNP at 51oC P2RY6 shows 4-6 peptides in all n=3 positive & negative ionization modes. But, for P2RY12 only 1-3 peptides were observed. Depending on the sequence length and the relative abundance in the cell of a protein of interest, the number of peptides observed could vary a lot per protein. Given the unique peptide abundance reported in the supplementary file, for various proteins in different conditions, it appears the threshold of observation of two unique peptides for a protein to be analyzed seems less stringent.”

      By applying a filter requiring at least two unique peptides in at least one replicate, we exclude, on average, 15–20% of the total identified proteins. We consider this a reasonable level of stringency that balances confidence in protein identification with the retention of relevant data. This threshold was selected because it aligns with established LC-MS/MS data analysis practices (PMID: 32591519, 33188197, 26524241), and we have included these references in the Methods section to justify our approach. We have included in this revision a Supplemental Table 2 showing the unique peptide counts for proteins highlighted in this study.  

      “It appears that the time of heat treatment for peptidisc library subjected to MM-TPP profiling was chosen as 3 min based on the results presented in Supplementary Figure 1A, especially the loss of MsbA observed in 1% DDM after 3 min heat perturbation. However, when reconstituted in peptidisc there seems to be no loss in MsbA even after 12 mins at 45oC. So, perhaps a longer heat treatment would be a more efficient perturbation.”

      Previous studies indicate that heat exposure of 3–5 minutes is optimal for visualizing protein denaturation (PMID: 23828940, 32133759). We have added a statement to the Results section to justify our choice of heat exposure. Although MsbA remains stable at 45 °C for extended periods, higher temperatures allow for more effective perturbation to reveal destabilization. Supplementary Figure 1A specifically illustrates MsbA instability in detergent environments.

      “Some of the stabilized temperatures listed in Table 1 are a bit confusing. For example, ABCC3 and ABCG2. In the case of ABCC3 stabilization was observed at 51oC and 60oC, but 56oC is not mentioned. In the same way, 51oC is not mentioned for ABCG2. You would expect protein to be stabilized at 56oC if it is stabilized at both 51oC and 60oC. So, it is unclear if the stabilizations were not monitored for these proteins at the missing temperatures in the table or if no peptides could be recorded at these temperatures as in the case of P2RX4 at 60oC in Figure 4C.”

      Both scenarios are represented in our data. For some proteins, like ABCG2, sufficient peptide coverage was achieved, but no stabilization was observed at intermediate temperatures (e.g., 56 °C), likely because the perturbation was not strong enough to reveal an effect. In other cases, such as ABCC3 at 56 °C or P2RX4 at 60 °C, the proteins were not detected due to insufficient peptide identifications at those temperatures, which explains their omission from the table. 

      “In Figure 4C, it is perplexing to note that despite n = 3 there were no peptide fragments detected for P2RX4 at 60oC in presence of ATP-VO4, but they were detected in presence of AMP-PNP. It will be useful to learn authors explanation for this, especially because both of these ligands destabilize P2RX4. In Figure 4B, it would have been great to see the effect of ADP too, to corroborate the theory that ATP metabolites could impact the thermal stability.”

      In Figure 4C, the absence of P2RX4 peptide detection at 60 °C with ATP–VO₄ mirrors variability observed in the corresponding control (n = 6). Specifically, neither the control nor ATP–VO₄ produced unique peptides for P2RX4 at 60 °C in that replicate, whereas peptides were detected at 60 °C in other replicates for both the control and AMPPNP, and at 64 °C for ATP–VO<sub>4</sub>, the controls, and AMP-PNP. Such missing values are a natural feature of MS-based proteomics and can arise from multiple technical factors, including inconsistent heating, incomplete digestion, stochastic MS injection, or interference from Peptidisc peptides. We therefore interpret the absence of peptides in this replicate as a technical artifact rather than evidence against protein destabilization. Importantly, the overall dataset consistently shows that both ATP–VO₄ and AMP-PNP destabilize P2RX4, supporting their characterization as broad, non-specific ligands with off-target effects.

      Because ATP and ADP belong to the same class of broadly binding, non-specific ligands, additional testing with ADP would not provide meaningful mechanistic insight. Instead, we chose to test 2-methylthio-ADP, a selective P2RY12 agonist. This experiment revealed robust, reproducible stabilization of P2RY12, without consistent effects on unrelated proteins at 51 °C and 57 °C, thereby demonstrating the ability of MM-TPP to detect specific receptor–ligand interactions.

      Finally, we note that P2RX4 is not a primary target of ATP–VO<sub>4</sub> or AMP-PNP. Consequently, the observed destabilization of P2RX4 is expected to be less pronounced than the strong, physiologically consistent stabilization of ABC transporters by ATP–VO<sub>4</sub>, as shown in Figure 3D, where the majority of ABC transporters are thermally stabilized across all tested temperatures.

      “As per Figure 4, P2Y receptors P2RY6 and P2RY12 both showed great thermal stability in presence of ATP-VO4 despite their preference for ADP. The authors argue this could be because of ATP metabolism, and binding of the resultant ADP to the P2RY6. If P2RX4 prefers ATP and not the metabolized product ADP that apparently is available, ideally you should not see a change in stability. A stark destabilization would indicate interaction of some sorts. P2X receptors are activated by ATP and are not naturally activated by AMP-PNP. So, destabilization of P2RX4 upon binding to ATP that can activate P2X receptors is conceivable. However, destabilization both in presence of ATP-VO4 and AMP-PNP is unclear. It is perhaps useful to test effect of ADP using this method, and maybe even compare some antagonists such as TNPATP.”

      In this study, we did not directly test ADP, as we had already demonstrated that MM-TPP detects stabilization by broad-binding ligands such as ATP–VO₄. Instead, we focused on a more selective ligand, 2-MeS-ADP, a specific agonist of P2RY12 [PMID: 14755328]. Here, we observed robust and reproducible stabilization of P2RY12 at 51 °C and 57 °C, while P2RY6 showed no significant changes, and no other proteins were consistently stabilized (Figure 4B, S4). This confirms that MM-TPP can distinguish specific ligand–receptor interactions from broader ATP-induced effects. To further explore the assay’s nuance and sensitivity, testing additional nucleotide ligands—including antagonists like TNP-ATP or ATPγS—would provide valuable insights, and we have identified this as an important future direction.

    1. eLife Assessment

      This valuable study reports the physiological function of a putative transmembrane UDP-N-acetylglucosamine transporter called SLC35G3 in spermatogenesis. The conclusion that SLC35G3 is a new and essential factor for male fertility in mice and probably in humans is supported by convincing data. This study will be of interest to reproductive biologists and physicians working on male infertility.

    2. Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile but females were fertile. Slc35g3-null males produced normal sperm count but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number sperm proteins in testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired its transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations needed to be revised. These have been adequately addressed in the revised manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the present manuscript, Mashiko and colleagues describe a novel phenotype associated with deficient SLC35G3, a testis-specific sugar transporter that is important in glycosylation of key proteins in sperm function. The study characterizes a knockout mouse for this gene and the multifaceted male infertility that ensues. The manuscript is well-written and describes novel physiology through a broad set of appropriate assays.

      Strengths:

      Robust analysis with detailed functional and molecular assays

      Weaknesses:

      (1) The abstract references reported mutations in human SLC35G3, but this is not discussed or correlated to the murine findings to a sufficient degree in the manuscript. The HEK293T experiments are reasonable and add value, but a more detailed discussion of the clinical phenotype of the known mutations in this gene and whether they are recapitulated in this study (or not) would be beneficial.

      Since no patients have been identified, our experiment was conducted to investigate the activity of the mutation found in humans.

      (2) Can the authors expand on how this mutation causes such a wide array of phenotypic defects? I am surprised there is a morphological defect, a fertilization defect, and a transit defect. Do the authors believe all of these are present in humans as well?

      Thank you for your comment. There are many glycoprotein-coding genes that influence sperm head morphology, fertilization defect, and transit defect have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm. 

      Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile, but females were fertile. Slc35g3-null males produced a normal sperm count, but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number of sperm proteins in the testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired their transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations need to be revised.

      Thank you for comments. We revised interpretations.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction could be structured more efficiently. Much of what is discussed in the first paragraph appears to be redundant to the second paragraph (or perhaps unrelated to the present manuscript).

      In the Introduction, we described the process of glycoprotein formation, 1) quality control or nascent glycoproteins in the ER and its relations importance in sperm fertilizing ability, 2) glycan maturation in the Golgi apparatus and its importance in sperm fertilizing ability, and 3) the supply of nucleotide sugars as the basis of these processes. 

      We would like to retain this structure in the revised manuscript and appreciate your understanding.

      (2) Given the significant difference in morphology between murine and human sperm, can the authors comment on whether these findings are directly translatable to humans?

      Thank you for your comment. There are significant differences in sperm morphology between mice and humans, but many glycoprotein-coding genes that influence sperm head morphology have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm head morphology. Observing sperm samples from individuals with SLC35G3 mutations is the most direct approach to verify this point and is considered an important goal for future research. The following text has been added to clarify the point:

      New Line 338; While these proteins are also found in humans, it is still too early to infer the importance of SLC35G3 in the morphogenesis of human sperm heads. Observing sperm samples from individuals with SLC35G3 mutations would be the most direct approach to address this, and we consider it an important objective for future studies.

      (3) Line 194 - while the inability to pass the UTJ may indeed be a component of this infertility phenotype, I would argue that a complete lack of ability to fertilize (even with IVF but not ICSI) suggests that the primary defect is elsewhere. This statement should be removed, and the topic of these two separate mechanisms should be compared/contrasted in the discussion.

      We agree that this is an overstatement, so we changed it;

      New line 187; Thus, the defective UTJ migration is one of the primary causes of Slc35g3-/- male infertility. 

      We believe the current statement in the discussion can stay as it is. 

      Line 379; We reaffirmed that glycosylation-related genes specific to the testis play a crucial role in the synthesis, quality control, and function of glycoproteins on sperm, which are essential for male fertility through their interactions with eggs and the female reproductive system.

      (4) Did the authors consider performing TEM to assess the sperm ultrastructure and the acrosome?

      Since morphological abnormalities were evident even at the macro level, TEM was not performed in this study. In the future, we plan to use immune-TEM against affected/non-affected glycoproteins when the antibodies become available.

      (5) I would argue that Figure 3 should not be labeled as "essential", given the abnormal sperm head morphology compared to humans, the relatively modest difference between the groups on PCA, and more broadly speaking, the relatively poor correlation with morphology and human male infertility. While globozoospermia is clearly an exception, the data in this figure may not translate to human sperm and/or may not be clinically relevant even if it does.

      Indeed, other KO spermatozoa with similar morphological features are known to cause a reduction in litter size but do not result in complete infertility. As discussed in line 1, this head shape is not essential for fertilization. Reviewer 2 also pointed out that the phrase "Slc35g3 is essential for sperm head formation" is too strong; therefore, we would like to revise Fig3 title to "Slc35g3 is involved in the regulation of sperm head morphology."

      (6) Have the authors generated slc35b4 KO mice?

      No, we did not. Since Slc35b4 is expressed throughout the body, a straight knockout may affect other organs or developmental processes. To investigate its role specifically in the testis, it will be necessary to generate a conditional knockout (cKO) model. As this requires considerable cost, time, and labor, we would like to leave it for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 122-123: "it is prominently expressed in the testis, beginning 21 days postpartum (Figure 1B), suggesting expression from the secondary spermatocyte stage to the round spermatid stage in mice." Day 21 indicates the first appearance of round spermatids, but not secondary spermatocytes. Please change to the following: ...suggesting that its expression begins in round spermatids in mice.

      I agree with your comment and have revised the text accordingly (New line 114).

      (2) Figure 1E: What germ cells are they? The type of germ cells needs to be labelled on the image. Double staining with a germ cell marker would be helpful to distinguish germ cells from testicular somatic cells.

      Thank you for your comment. We replaced the Figure 1E as follows.

      To distinguish germ cells from testicular somatic cells, we used the germ cell marker TRA98 antibody. Furthermore, based on the nuclear and GM130 staining pattern, we consider that the Golgi apparatus of round spermatids is labeled.

      (3) Figure 2C: The most abundant WB band is between 20 and 25 kD and is non-specific. Does the arrow point to the expected SLC35G3 band? There are two minor bands above the main non-specific band. Are both bands specific to SLC35G3? Given the strong non-specific band on WB, how specific is the immunofluorescence signal produced by this antibody? These need to be explained and discussed.

      The arrow pointed to the expected size (35kDa).

      We thought that these non-specific bands could be due to blood contamination, so we retried with testicular germ cells. We confirmed that non-specific bands disappeared in the subsequent Western blot analysis. The specificity of the immunofluorescence signal is supported by its complete absence in the KO, as shown in the Supplementary Figures. We have decided to include this improved dataset. Thank you for your comment, which helped us improve the data.

      Author response image 1.

      (4) Line 184: "Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability, but genomic integrity is intact." Producing viable offspring does not necessarily mean that genomic integrity is intact. Suggestion: Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability but produce viable offspring. Likewise, the Figure S9 caption also needs to be changed.

      Thank you for your constructive comment. We have revised the text as you suggested.

      (5) Figure 3. "Slc35g3 is essential for sperm head formation". This statement is too strong. It is not essential for sperm head formation. The sperm head is still formed, but shows subtle deformation.

      Thank you for your suggestion. We changed as follows:

      FIg.3; ”Slc35g3 is involved in the regulation of sperm head morphology.”

      (6) Lines 204-205: Figure 6B: "Interestingly, some bands of sperm acrosome-associated 1 (SPACA1; 26) disappeared in Slc35g3-/- testis lysates." I don't see the absence of SPACA1 bands in -/- testis. This needs to be clearly labeled with arrows. On the contrary, the bands are stronger in Slc35g3-/- testis lysates.

      Thank you for your comment. After carefully considering your comments, we concluded that using "disappeared" is indeed inappropriate. We would like to revise the sentence as follows: New line 197; "Interestingly, SPACA1 (Sperm Acrosome Associated 1; 26) exhibited a subtle difference in banding pattern in the Slc35g3-/- testis lysate."

    1. eLife Assessment

      This study reports important negative results, showing that genetically removing the RNA-binding protein PTBP1 in astrocytes is insufficient to convert them into neurons, thereby challenging previous claims in the field. It also offers a compelling analysis of PTBP1's role in regulating astrocyte-specific splicing. The evidence is strong, as the experiments are technically sound, carefully controlled, and supported by both imaging and transcriptomic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated - using fluorescence imaging and bulk and single-cell RNA-sequencing - whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to the broad readership of eLife.

      Original weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However - assuming that the coverage plots are CPM-normalized - the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      Point 1 has been successfully addressed in the revision by providing relevant references/discussion. Points 2-4 were addressed by including additional data/analyses.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated if deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate if the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition being detected.

      These experiments demonstrate that, in this experiment setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.<br /> To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      My concerns in the previous review have been addressed satisfactorily.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNAbinding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. (lines 346-353)

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes vis FACS poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype (with three different alleles) are needed for each biological replicate. The manuscript contains PTBP1 immunofluorescence staining of brain slides to demonstrate PTBP1 deletion (Figures 1-2, Figure 3 supplement 1). Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs using Western blotting (PMID: 30496473). Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. Although we are using an astrocyte-specific PTBP1 knockout (KO) mouse model, which is designed to delete PTBP1 in all the astrocyte throughout mouse brain, and although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion.

      We have analyzed the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. We added a new figure (Figure 3-figure supplement 1) to show the results. We found in cKO samples, tdT+ cells lack PTBP1 immunostaining, and there is no overlapping of NeuN+ and tdT+ signals. These results show effective PTBP1 depletion in the substantia nigra, similar to that observed in the cortex and striatum. (line 221-224)

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we have performed separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examined whether their positional patterns differ (Figure 4–figure supplement 2).

      Our analysis revealed that CU-rich motifs were significantly enriched in the upstream introns of both activated and repressed exons by PTBP1 loss, with higher enrichment observed in repressed exons (Enrichment ratio = 2.14, q = 9.00×10-5) compared to activated exons (Enrichment ratio = 1.72, q = 7.75×10-5) (Figure 4–figure supplement 2B–C). In contrast, no CU-rich motifs were found downstream of activated exons (Figure 4–figure supplement 2D), while a weak, non-significant enrichment was observed downstream of repressed exons (Enrichment ratio = 1.21, q = 0.225; Figure 4–figure supplement 2E). These results do not necessarily fully fit with a couple of earlier PTBP1 CLIP studies showing differential PTBP1 binding for repressed vs activated exons but are more in line with the Black Lab study (PMID: 24499931) that PTBP1 binds upstream introns of both repressed and activated exons. Either case, PTBP1 affects a diverse set of alternative exons and likely involves diverse contextdependent binding patterns (lines 244-257).

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptome-wide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we have incorporated publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors, Ngn2 or PmutNgn2 (PMID: 38956165).

      The results of principal component analysis (PCA) for splicing profiles revealed that the in vivo splicing profiles from this study and the in vitro splicing profiles from PMID 38956165 are well separated on PC1 and PC2. While Ngn2/PmutNgn2-induced neurons and control astrocytes started to show distinction on PC3 (and to some degree on PC4), Ptbp1 cKO samples remained tightly grouped with control astrocytes and showed no directional shift toward the neuronal cluster (Figure 5–figure supplement 2B). These findings further support the conclusion that PTBP1 depletion in mature astrocytes does not induce a neuronal-like splicing program, even when compared against neurons derived from the astrocyte lineage (lines 306318).

      The pairwise correlation analysis of percent spliced in between Ptbp1 cKO, control astrocytes, and induced neurons confirmed that Ptbp1 cKO astrocytes are highly similar to control astrocytes (ρ = 0.81) and clearly distinct from induced neurons (ρ = 0.62) (Figure 5– figure supplement 2C), reinforcing the notion that PTBP1 loss alone is insufficient to drive a neuronal-like splicing transition (lines 319-336).

      Consistent with the analysis for splicing profiles, PCA for gene expression profiles showed that control and Ptbp1 cKO astrocytes clustered tightly together and no directional shift toward the neuronal cluster while Ngn2/PmutNgn2-induced neurons and control astrocytes were distributed across a broader range (Figure 6–figure supplement 1A–B). Correlation analysis further supported this result, with a strong similarity between Ptbp1 cKO and control astrocytes (ρ = 0.97), and low similarity between Ptbp1 cKO astrocytes and induced neurons (ρ = 0.27) (Figure 6–figure supplement 1C). These findings indicate that, even with PTBP1 loss, cKO astrocytes retain a transcriptional profile very distinct from that of neurons, underscoring that Ptbp1 deficiency alone does not induce astrocyte-to-neuron reprogramming at the transcriptomic level (lines 366-373).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. We have expanded the Discussion to include discussion of possible origins of glial cells responsible for neuronal transition. (lines 441-461)

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the text and figures, it is customary to write loxP with a capital "P".

      We have capitalized “P” in loxP throughout the text and figures.

      (2) It would be helpful to indicate the brain regions analyzed above the images in Figure 1B-C, Figure 2A-B, Figure 1 - Supplement 3, and Figure 2 - Supplement 2, as was done in Figure 1 - Supplement 1.

      The labels indicating brain regions of corresponding images have been added to the figures. 

      (3) The arrowheads in Figure 1C, Figure 2B, Figure 3, and several supplemental panels are nearly equilateral triangles, making their direction difficult to discern. Consider using a more slender or indented design (e.g., ➤).

      We have replaced triangular arrowheads with indented arrowheads in the figures. 

      (4) Lines 181-209: This section should be revised, given that the striatum is not a midbrain structure.

      We have revised this section to reflect our analysis of the striatum as a brain region of the nigrostriatal pathway rather than a midbrain structure. 

      Reviewer #2 (Recommendations for the authors):

      In Supplemental Figure 1, the two open triangles are almost indistinguishable. It would be better if the colors of these open triangles were changed so that it is easier to tell what's what. There is not enough contrast between white and yellow.

      We have changed the open triangle arrowheads to solid yellow and violet arrowheads to improve contrast between labels.

    1. eLife Assessment

      This computational study examines how neurons in the songbird premotor nucleus HVC might generate the precise, sparse burst sequences that drive adult song. The findings would be useful for understanding how intrinsic conductances and HVC microcircuitry may produce neural sequences, but the work is incomplete because of arbitrary network assumptions, insufficient consideration of biological details such as how silent gaps in song sequences are represented, and failure to incorporate interactions with auditory and brainstem inputs. As a result, the study offers limited advance and only a modest conceptual advance over prior models.

    2. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations?

      The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three majors classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not-well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers.

      The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. The authors do not discuss adequately in their introduction what questions have recent experimental and theoretical work failed to explain adequately concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, that includes some experimental details but ignores many others, can match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied.

      But these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      Also missing is a discussion, or at least an acknowledgement, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some model can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored, and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of nucleus HVC acts as a premotor circuit that drives nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVC-RA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what was the fitting algorithm (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what was the error function used for fitting (sum of least squares?), and what data were used for the fitting.

      The authors should also state clearly what is the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depend on the choice of initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      Comments on revised version:

      The second version, unfortunately, did not address most of the substantive comments so that, while some parts of the discussion were expanded, most of the serious scientific weaknesses mentioned in the first round of review remain. The revised preprint is not a substantive improvement over the first.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      We thank Reviewer 1 for their thoughtful comments.

      While the reviewer is correct about the fact that the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, we need to emphasize that this is true only if there is no intrinsic or synaptic perturbation to the HVC network. For example, we showed in Figures 10 and 12 how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC<sub>RA</sub> neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics. Moreover, all existing models that describe premotor sequence generation in the HVC either assume a distributed model (Elmaleh et al., 2021) that dictates that local HVC circuitry is not sufficient to advance the sequence but rather depends upon moment to-moment feedback through Uva (Hamaguchi et al., 2016), or assume models that rely on intrinsic connections within HVC to propagate sequential activity. In the latter case, some models assume that HVC is composed of multiple discrete subnetworks that encode individual song elements (Glaze & Troyer, 2013; Long & Fee, 2008; Wang et al., 2008), but lacks the local connectivity to link the subnetworks, while other models assume that HVC may have sufficient information in its intrinsic connections to form a single continuous network sequence (Long et al. 2010). The HVC model we present extends the concept of a feedforward network by incorporating additional neuronal classes that influence the propagation of activity (interneurons and HVC<sub>X</sub> neurons). We have shown that any disturbance of the intrinsic or synaptic conductances of these latter neurons will disrupt activity in the circuit even when HVC<sub>RA</sub> neurons properties are maintained. 

      In regard to the similarities between our model and earlier models, several aspects of our model distinguish it from prior work. In short, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. We tuned the intrinsic and the synaptic properties bases on the traces collected by Daou et al. (2013) and Mooney and Prather (2005) as shown in Figure 3. The three classes of model neurons incorporated to our network as well as the synaptic currents that connect them are based on Hodgkin- Huxley formalisms that contain ion channels and synaptic currents which had been pharmacologically identified. This is an advancement over prior models that primarily focused on the role of synaptic interactions or external inputs. The model is based on feedforward chain of microcircuits that encode for the different sub-syllabic segments and that interact with each other through structured feedback inhibition, defining an ordered sequence of cell firing. Moreover, while several models highlight the critical role of inhibitory interneurons in shaping the timing and propagation of bursts of activity in HVC<sub>RA</sub> neurons, our work offers an intricate and comprehensive model that help understand this critical role played by inhibition in shaping song dynamics and ensuring sequence propagation.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      Regarding the concern about the usefulness of the 'microcircuit' concept in our study, we appreciate the comment and we are glad to clarify its relevance in our network. While we acknowledge that HVC<sub>RA</sub> neurons interconnect microcircuits, our model's dynamics are still best described within the framework of microcircuitry particularly due to the firing behavior of HVC<sub>X</sub> neurons and interneurons. Here, we are referring to microcircuits in a more functional sense, rather than rigid, isolated spatial divisions (Cannon et al. 2015), and we now make this clear on page 21. A microcircuit in our model reflects the local rules that govern the interaction between all HVC neuron classes within the broader network, and that are essential for proper activity propagation. For example, HVC<sub>INT</sub> neurons belonging to any microcircuit burst densely and at times other than the moments when the corresponding encoded SSS is being “sung”. What makes a particular interneuron belong to this microcircuit or the other is merely the fact that it cannot inhibit HVC<sub>RA</sub> neurons that are housed in the microcircuit it belongs to. In particular, if HVC<sub>INT</sub> inhibits HVC<sub>RA</sub> in the same microcircuit, some of the HVC<sub>RA</sub> bursts in the microcircuit might be silenced by the dense and strong HVC<sub>INT</sub> inhibition breaking the chain of activity again. Similarly, HVC<sub>X</sub> neurons were selected to be housed within microcircuits due to the following reason: if an HVC<sub>X</sub> neuron belonging to microcircuit i sends excitatory input to an HVC<sub>INT</sub> neuron in microcircuit j, and that interneuron happens to select an HVC<sub>RA</sub> neuron from microcircuit i, then the propagation of sequential activity will halt, and we’ll be in a scenario similar to what was described earlier for HVC<sub>INT</sub> neurons inhibiting HVC<sub>RA</sub> neurons in the same microcircuit.

      We agree that there are no sub-syllabic segments described during the silent gaps and we thank the reviewer to pointing this out. Although silent gaps are integral to the overall process of song production, we have not elaborated on them in this model due to the lack of a clear, biophysically grounded representation for the gaps themselves at the level of HVC. Our primary focus has been on modeling the active, syllable-producing phases of the song, where the HVC network’s sequential dynamics are critical for song. However, one can think the encoding of silent gaps via similar mechanisms that encode SSSs, where each gap is encoded by similar microcircuits comprised of the three classes of HVC neurons (let’s call them GAP rather than SSS) that are active only during the silent gaps. In this case, the propagation of sequential activity is carried throughout the GAPs from the last SSS of the previous syllable to the first SSS of the subsequent syllable. This is no described more clearly on page 22 of the manuscript.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010). Why is it important that the model should NOT be sensitive to the connection strengths?

      We thank the reviewer for the comment. While mathematical models designed for highly complex nonlinear biological processes tangentially touch the biological realism, the current network as is right now is the first biologically realistic-enough network model designed for HVC that explains sequence propagation. We do not include dendritic processes in our network although that increases the realistic dynamics for various reasons. 1) The ion channels we integrated into the somatic compartment are known pharmacologically (Daou et al. 2013), but we don’t know about the dendritic compartment’s intrinsic properties of HVC neurons and the cocktail of ion channels that are expressed there. 2) We are able to generate realistic bursting in HVC<sub>RA</sub> neurons despite the single compartment, and the main emphasis in this network is on the interactions between excitation and inhibition, the effects of ion channels in modulating sequence propagation, etc … 3) The network model already incorporates thousands of ODEs that govern the dynamics of each of the HVC neurons, so we did not want to add more complexity to the network especially that we don’t know the biophysical properties of the dendritic compartments.

      Therefore, our present focus is on somatic dynamics and the interaction between HVC<sub>RA</sub> and HVC<sub>INT</sub> neurons, but we acknowledge the importance of these processes in enhancing network resiliency. Although we agree that adding dendritic processes improves robustness, we still think that somatic processes alone can offer insightful information on the sequential dynamics of the HVC network. While the network should be robust across a wide range of parameters, it is also essential that certain parameters are designed to filter out weaker signals, ensuring that only reliable, precise patterns of activity propagate. Hence, we specifically chose to make the HVC<sub>RA</sub>-to-HVC<sub>RA</sub> excitatory connections more sensitive (narrow range of values) such that only strong, precise and meaningful stimuli can propagate through the network representing the high stereotypy and precision seen in song production.

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      We acknowledge that under baseline and singing settings, interneurons fire in an extremely noisy and inaccurate manner, although they exhibit time locked episodes in their activity (Hahnloser et al 2002, Kozhinikov and Fee 2007). In order to mimic the biological variability of these neurons, our model does, in fact, include a stochastic current to reflect the intrinsic noise and random variations in interneuron firing shown in vivo (and we highlight this in the Methods). However, to make sure the network is resilient to this randomness in interneuron firing, introduced a stochastic input current of the form I<sub>noise</sub> (t)= σ.ξ(t) where ξ(t) is a Gaussian white noise with zero mean and unit variance, and σ is the noise amplitude. This stochastic drive was introduced to every model neuron and it mimics the fluctuations in synaptic input arising from random presynaptic activity and background noise. For values of σ within 1-5% of the mean synaptic conductance, the stochastic current has no effect on network propagation. For larger values of σ, the desired network activity was disrupted or halted. We now talk about this on page 22 of the manuscript.  

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      We thank the reviewer for the comment. The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. Hence, Kosche et al. (2015) findings do not invalidate the approach of our model, but highlights that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision. 

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      Our model of individual HVC<sub>RA</sub> neurons and as stated previously is reductive model that focuses on understanding the mechanisms that govern sequential neural activity. We agree that scaling the model to include many of HVC<sub>RA</sub> neurons poses challenges, specifically concerning the summation of presynaptic inputs. However, our model can still be adapted to a larger network without requiring the level of fine-tuning currently needed. In fact, the current fine-tuning of synaptic connections in the model is a reflection of fundamental network mechanisms rather than a limitation when scaling to a larger network. Besides, one important feature of this neural network is redundancy. Even if some neurons or synaptic connections are impaired, other neurons or pathways can compensate for these changes, allowing the activity propagation to remain intact.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function. In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      In our HVC network model, one goal with HVC<sub>X</sub> neurons is to generate bursts in their underlying neuron population. Since HVC<sub>X</sub> neurons in our model receive only inhibitory inputs from interneurons, we rely on inhibition followed by rebound bursts orchestrated by the I<sub>H</sub> and the I<sub>CaT</sub> currents to achieve this goal. The interplay between the T-type Ca<sup>++</sup> current and the H current in our model is fundamental to generate their corresponding bursts, as they are sufficient for producing the desired behavior in the network. Due to this interplay, we do not need significant inhibition to generate rebound bursts, because the T-type Ca<sub>++</sub> current’s conductance can be stronger leading to robust rebound bursting even when the degree of inhibition is not very strong. This is now highlighted on page 42 in the revised version.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

      We wanted on purpose to keep the results shown in Mooney and Prather (2005) to be shown as is, in order to compare them with our model simulations highlighting the degree of resemblance. We believe that creating schematics of the Mooney and Prather (2005) results will not have the same impact, similarly creating a schematic for Hahnloser et al (2002) results won’t help much. However, if the reviewer still believes that we should do that, we’re happy to do it.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations? The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers. The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      We thank the reviewer for this valuable comment, and we agree that we did not clarify enough throughout the paper the utility of our model or how it advanced our understanding of the HVC dynamics and circuitry. To that end, we revised several places of the manuscript and made sure to cite and highlight the relevance and relatedness of the mentioned papers.

      In short, and as mentioned to Reviewer 1, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015; Jin et al., 2007), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. 

      No existing hypothesis had been challenged with our model, rather; our model is a distillation of the various models that’s been proposed for the HVC network. We go over this in detail in the Discussion. We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied. However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      A major novelty of our work is the incorporation of experimental data with detailed network models. While earlier works have established robust burst propagation, our model uses realistic ion channel kinetics and feedback inhibition not only to reproduce experimental neural activity patterns but also to suggest prospective mechanisms for song sequence production in the most biophysical way possible. This aspect that distinguishes our work from other feed-forward models. We go over this in detail in the Discussion. However, the reviewer is right regarding the details of the calculations conducted for the fits, we will make sure to highlight this in the Methods and throughout the manuscript with more details.

      We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVCRA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits. 

      Recent results showed a tight correlation between the intrinsic properties of neurons and features of song (Daou and Margoliash 2020, Medina and Margoliash 2024), where adult birds that exhibit similar songs tend to have similar intrinsic properties. While this is relevant, we acknowledge that not all details may be necessary for every aspect of vocalization, and future models could simplify concentrate on core dynamics and exclude certain features while still providing insights into the primary mechanisms.

      The question of whether HVC<sub>X</sub> neurons are relevant for burst propagation given that our model includes these neurons as part of the network for completeness, the reviewer is correct, the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, but only if there is no perturbation to the HVC network. For example, we have shown how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics.

      We agree with the reviewer however that a potential drawback of our model is that its sole focus is on local excitatory connectivity within the HVC (Kornfeld et al., 2017; Long et al., 2010), while HVC neurons receive afferent excitatory connections (Akutagawa & Konishi, 2010; Nottebohm et al., 1982) that plays significant roles in their local dynamics. For example, the excitatory inputs that HVC neurons receive from Uvaeformis may be crucial in initiating (Andalman et al., 2011; Danish et al., 2017; Galvis et al., 2018) or sustaining (Hamaguchi et al., 2016) the sequential activity. While we acknowledge this limitation, our main contribution in this work is the biophysical insights onto how the patterning activity in HVC is largely shaped by the intrinsic properties of the individual neurons as well as the synaptic properties where excitation and inhibition play a major role in enabling neurons to generate their characteristic bursts during singing. This is true and holds irrespective of whether an external drive is injected onto the microcircuits or not. We elaborated on this further in the revised version in the Discussion.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      We thank the reviewer again. The fitting process in our model occurred only at the first stage where the synaptic parameters were fit to the Mooney and Prather as well as the Kosche results. There was no data shared and we merely looked at the figures in those papers and checked the amplitude of the elicited currents, the magnitudes of DC-evoked excitations etc … and we replicated that in our model. While this is suboptimal, it was better for us to start with it rather than simply using equations for synaptic currents from the literature for other types of neurons (that are not even HVC’s or in the songbird) and integrate them into our network model. The number of ODEs that govern the dynamics of every model neuron is listed on page 10 of the manuscript as well as in the Appendix.  Moreover, we highlighted the details of this fitting process in the revised version.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      We agree that making experimental testable predictions is crucial for the advancement of the model. Our predictions include testing whether eradication of a class of neurons such as HVC<sub>X</sub> neurons disrupts activity propagation which can be done through targeted neuron elimination. This also can be done through preventing rebound bursting in HVC<sub>X</sub> by pharmacologically blocking the I<sub>H</sub> channels. Others include down regulation of certain ion channels (pharmacologically done through ion blockers) and testing which current is fundamental for song production (and there a plenty of test based our results, like the SK current, the T-type Ca<sup>2+</sup> current, the A-type K<sup>+</sup> current, etc…). We incorporated these into the Discussion of the revised manuscript to better demonstrate the model's applicability and to guide future research directions.

      Main issues:

      (1) Parameters are overly fine-tuned and often do not match known biology to generate chains. This fine-tuning does not reveal fundamental insights.

      (1a) Specific conductances (e.g. AMPA) are finely tweaked to generate bursts, in part due to a lack of a dendritic mechanism for burst generation. A dendritic mechanism likely reflects the true biology of HVC neurons.

      We acknowledge that the model does not include active dendritic processes and we do not regard this as a limitation. In fact, our present approach, although simplified, is intended to focus on somatic mechanisms to identify minimal conditions required for stable sequential propagation. We know HVC<sub>RA</sub> neurons possess thin, spiny dendrites which can contribute to burst initiation and shaping. Future models that include such nonlinear dendritic mechanisms would likely reduce the need for fine tuning of specific conductances at the soma and consequently better match the known biology of HVC<sub>RA</sub> neurons. 

      In text: “While our simplified, somatically driven architecture enables better exploration of mechanisms for sequence propagation, future extensions of the model will incorporate dendritic compartments to more accurately reflect the intrinsic bursting mechanisms observed in HVC<sub>RA</sub> neurons.”

      (1b) In this paper, microcircuits are simulated and then concatenated to make the HVC chain, resulting in no representations during silent gaps. This is out of touch with the known HVC function. There is no anatomical nor functional evidence for microcircuits of the kind discussed in this paper or in the earlier and rather similar paper by Eve Armstrong and Henry Abarbanel (J. Neurophy 2016). One can write a large number of papers in which one makes arbitrary unconstrained guesses of network structure in HVC and, unless they reveal some novel principle or surprising detail, they are all going to be weak.

      Although the model is composed of sequentially activated microcircuits, the gaps between each microcircuit’s output do not represent complete silence in the network. During these periods, other neurons such as those in other microcircuits may still exhibit bursting activity. Thus, what may appear as a 'silent gap' from the perspective of a given output microcircuit is, in fact, part of the ongoing background dynamics of the larger HVC neuron network. We fully acknowledge the reviewer's point that there is no direct anatomical or physiological evidence supporting the presence of microcircuits with this structure in HVC. Our intention was not to propose the existence of such a physical model but to use it as a computational simplification to make precise sequential bursting activity feasible given the biologically realistic neuronal dynamics used. Hence, our use of 'microcircuits' refers to a modeling construct rather than a structural hypothesis. Even if the network topology is hypothetical, we still believe that the temporal structuring suggested allows us to generate specific predictions for future work about burst timing and neuronal connections.

      (1c) HVC interneuron discharge in the author's model is overly precise; addressing the observation that these neurons can exhibit noisy discharge. Real HVC interneurons are noisy. This issue is critical: All reviewers strongly recommend that the authors should, at the minimum in a revision, focus on incorporating HVC-I noise in their model.

      We agree that capturing the variability in interneuron bursting is critical for biological realism. In our model, HVC interneurons receive stochastic background current that introduces variability in their firing patterns as observed in vivo. This variability is seen in our simulations and produces more biologically realistic dynamics while maintaining sequence propagation. We clarify this implementation in the Methods section. 

      (1d) Address the finding that Kosche et al show that even with reduced inhibition, HVCra neuronal timing is preserved; it is the burst pattern that is affected.

      The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. 

      We acknowledged this point in the discussion: “While findings of Kosche et al. (2015) emphasize the robustness of the HVC timing circuit to inhibition, our model is more sensitive to inhibition, highlighting that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision.”

      (1e) The real HVC is robust to microlesions, cooling, and HVCra neuron turnover. The model in this paper relies on precise HVCra connectivity and is not robust.

      Although our model is grounded in the biologically observed behavior of HVC neurons in vivo, we don’t claim that it fully captures the resilience seen in the HVC network. Instead, we see this as a simplified framework that helps us explore the basic principles of sequential activity. In the future, adding features like recurrent excitation, synaptic plasticity, or homeostatic mechanisms could make the model more robust.

      (1f) There is unclear motivation for Ih-driven HVCx bursting, given past findings from the Mooney group.

      Daou et al (2013) noticed that the observed in HVC<sub>X</sub> and HVC<sub>INT</sub> neurons in response to hyperpolarizing current pulses (Dutar et al. 1998; Kubota and Saito 1991; Kubota and Taniguchi 1998) was completely abolished after the application of the drug ZD 7288 in all of the neurons tested indicating that the sag in these HVC neurons is due to the hyperpolarization-activated inward current (I<sub>h</sub>). in addition, the sag and the rebound seen in these two neuron groups were larger as for larger hyperpolarization current pulses.

      (1g) The initial conditions of the network and its activity under those conditions, as well as the possible reliance on external inputs, are not defined.

      In our model, network activity is initiated through a brief, stochastic excitatory input to a small HVC<sub>RA</sub> neuron of one microcircuit. This drive represents a simplified version of external input from upstream brain regions known to project to HVC, such as nuclei in the high vocal center's auditory pathways such as Nif and Uva. Modeling the activity of these upstream regions and their influence on HVC dynamics is an ongoing research work to be published in the future.

      (1h) It has been known from the time of Hodgkin and Huxley how to include temperature dependences for neuronal dynamics so another suggestion is for the authors to add such dependences for the three classes of neurons and see if their simulation causes burst frequencies to speed up or slow down as T is varied.

      We added this as limitation to the discussion section: “Our model was run at a fixed physiological temperature, but it's well known going all the way back to Hodgkin and Huxley that both ion channel activity and synaptic dynamics can change with temperature. In future work, adding temperature scaling (like Q10 factors) could help us explore how burst timing and sequence speed change with temperature changes, and how neural activity in HVC would/would not preserve its precision under different physiological conditions.”

      (2) The scope of the paper and its objectives must be clearly defined. Defining the scope and providing caveats for what is not considered will help the reader contextualize this study with other work.

      (2a) The paper does not consider the role of external inputs to HVC, which are very likely important for the capacity of the HVC chain to tile the entire song, including silent gaps.

      The role of afferent input to HVC particularly from nuclei such as Uva and Nif is critical in shaping the timing and initiation of HVC sequences throughout the song, including silent intervals. In fact, external inputs are likely involved in more than just triggering sequences, they may also influence the continuity of activity across motifs. However, in this study, we chose to focus on the intrinsic dynamics of HVC as a step toward understanding the internal mechanisms required for generating temporally precise sequences and for this reason, we used a simplified external input only to initiate activity in the chain.

      (2b) The paper does not consider important dendritic mechanisms that almost certainly facilitate the all-or-none bursting behavior of HVC projection neurons. the authors need to mention and discuss that current-clamped neuronal response - in which an electrode is inserted into the soma and then a constant current-step is applied - bypasses dendritic structure and dendritic processing and so is an incomplete way to characterize a neuron's properties. In particular, claiming to fit current-clamp data accurately and then claiming that one now has a biophysically accurate network model, as the authors do, is greatly misleading.

      While we addressed this is 1a, we do not suggest that our model is a fully accurate biophysical representation of HVC network. Instead, we see it as a simplified framework that helps reveal how much of HVC’s sequential activity can be explained by somatic properties and synaptic interactions alone. However, additional biological mechanisms, like dendritic processing, are likely to play an important role and should be explored in future work.

      (2c) The introduction does not provide a clear motivation for the paper - what hypotheses are being tested? What is at stake in the model outcomes? It is not inherently informative to take a known biological representation and fine-tune a limited model to replicate that representation.

      We explicitly added the hypotheses to the revised introduction.

      (2d) There have been several published modeling efforts applied to the HVC chain (Seung, Fee, Long, Greenside, Jin, Margoliash, Abarbanel). These and others need to be introduced adequately, and it needs to be crystal clear what, if anything, the present study is adding to the canon.

      While several influential models have explored how HVC might generate sequences ranging from synfire chains to recurrent dynamics or externally driven sequences (e.g., Seung, Fee, Long, Greenside, Jin, Abarbanel, and others), these models could not capture the detailed dynamics observed in vivo. Our aim was to bridge a gap in the modeling literature by exploring how far biophysically grounded intrinsic properties and experimentally supported synaptic connections that are local to the HVC can alone produce temporally precise sequences. We have proven that these mechanisms are sufficient to generate these sequences, although some missing components (such as dendritic mechanisms or external inputs) might be needed to fully capture the complexity and robustness of HVC function.

      (2e) The authors mention learning prominently in the abstract, summary, and introduction but this paper has nothing to do with learning. Most or all mentions of learning should be deleted since they are misleading.

      We appreciate the reviewer’s observation however our intent by referencing learning was not to suggest that our model directly simulates learning processes, but rather to place HVC function within the broader context of song learning and production, where temporal sequencing plays a fundamental role. Yet, repeated references to learning may be misleading given that our current model does not incorporate plasticity, synaptic modification, or developmental changes. Hence, we have carefully revised the manuscript to rephrase mentions of learning unless directly relevant to context. 

      (3) Using the model for hypothesis generation and prediction of experimental results.

      (3a) The utility of a model is to provide conceptual insight into how or why the real HVC functions as it does, or to predict outcomes in yet-to-be conducted experiments to help motivate future studies. This paper does not adequately achieve these goals.

      We revised the Discussion of the manuscript to better emphasize potential contributions and point out many experiments that could validate or challenge the model’s predictions. These include dynamic clamp or ion channel blockers targeting A-type K<sup>+</sup> in HVC<sub>RA</sub> neurons to assess their impact on burst precision, optogenetic disruption of inhibitory interneurons to observe changes in burst timing and sequence propagation, pharmacological modulation of I<sub>h</sub> or I<sub>CaT</sub> in HVC<sub>X</sub> and interneurons etc. 

      (3b) Additionally, it can be interesting to conduct an experiment on an existing model; for example, what happens to the HVCra chain in your model if you delete the HVCx neurons? What happens if you block NMDA receptors? Such an approach in a modeling paper can help motivate hypotheses and endow the paper with a sense of purpose.

      We agree that running targeted experiments to test our computational model such as removing an HVC neuron population or blocking a synaptic receptor can be a powerful way to generate new ideas and guide future experiments. While we didn’t include these specific tests in the current study, the model is well suited for this kind of exploration. For instance, removing interneurons could help us better understand their role in shaping the timing of HVC<sub>RA</sub> bursts. These are great directions for future experiments, and we now highlight this in the discussion as a way the model could be used to guide experiments.

      (4) Changes to the paper's organization may improve clarity.

      (4a) Nearly all equations should be moved to an Appendix so that the main part of the paper can focus on the science: assumptions made, details of simulations, conclusions obtained, and their significance. The authors present many equations without discussion which weakens the paper.

      Equations moved to appendix.

      (4b) There are many grammatical errors, e.g., verbs do not match the subject in terms of being single or plural. The authors need to run their manuscript through a grammar checker.

      Done.

      (4c) Many of the figures are poorly designed and should be substantially modified. E.g. in Figure 1B, too many colors are used, making it hard to grasp what is being plotted and the colors are not needed. Figures 1C and 1D are entire figures taken from other papers, and there is no way a reader will be able to see or appreciate all the details when this figure is published on a single page. Figure 2 uses colors for dots that are almost identical, and the colors could be avoided by using different symbols. Figure 5 fills an entire page but most of the figure conveys no information, there is no need to show the same details for all 120 neurons, just show the top 1/3 of this figure; the same for Figure 7, a lot of unnecessary information is being included. Figure 10, the bottom time series of spikes should be replaced with a time series of rates, cannot extract useful information.

      Adjusted as requested. 

      (4d) Table 1 is long and largely uninteresting, and should be moved to an appendix.

      Table 1 moved to appendix.

      (4e) Many sentences are not carefully written, which greatly weakens the paper. As one typical example, the first sentence in the Discussion section "In this study, we have designed a neural network model that describes [sic] zebra finch song production in the HVC." This is inaccurate, the model does not describe song production, it just explores some properties of one nucleus involved with song production. Just one or few sentences like this is ok but there are so many sentences of this kind that the reader loses faith in the authors.

      Thank you for raising this point, we revised the manuscript to improve the precision of the writing. We replaced the first sentence of the discussion with this: "In this study, we developed a biophysically realistic neural network model to explore how intrinsic neuronal properties and local connectivity within the songbird nucleus HVC may support the generation of temporally precise activity sequences associated with zebra finch song."

    1. eLife Assessment

      This is a valuable analysis of STORM data that characterizes the clustering of active zones in retinogeniculate terminals across ages and in the absence of retinal waves. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling. The latest revision has tempered the causal claims made in previous versions. The result provides solid structural support for the hypotheses regarding how activity influences the clustering of these synapses.

    2. Joint Public Review:

      Summary:

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. non-dominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      There has been much discussion with the reviewers through multiple versions of this paper. of how to interpret these findings. Based on a large number of tests for statistical significance, the authors interpreted the presence of a statistical significance difference as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title of previous version of manuscript)". The reviewers have focused on the small effect size as indicating that the small differences observed are not informative regarding this biological question. The authors have now tempered this interpretation.

      Strengths:

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Reviewing Editor's comment on the latest revision (without sending the paper back to the individual reviewers):

      In their latest revision, the authors have moderated earlier causal claims, incorporated additional statistical controls, and largely maintained their original interpretation of the data. While these changes address some prior concerns, the underlying issues remain. The previous review emphasized that the reported effect sizes were small and therefore hard to link to biological relevance. The authors argue that the effect sizes are large. Given the lack of a biological argument for this effect size, this point is really semantic. We would like to point out that the effect size measurement the authors used is likely a standard effect size calculation (the difference between groups is divided by the standard deviation of the groups). With only three experiments and irregular variance, it is likely that their estimates of standard deviation-and therefore effect size-are unreliable. Overall, the revisions improve presentation but do not substantively resolve the difficulty in drawing strong conclusions from the data set raised earlier.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      The authors previously published a study of RGC boutons in the dLGN in developing wild-type mice and developing mutant mice with disrupted spontaneous activity. In the current manuscript, they have broken down their analysis of RGC boutons according to the number of Homer/Bassoon puncta associated with each vGlut3 cluster.

      The authors find that, in the first post-natal week, RGC boutons with multiple active zones (mAZs) are about a third as common as boutons with a single active zone (sAZ). The size of the vGluT2 cluster associated with each bouton was proportional to the number of active zones present in each bouton. Within the author's ability to estimate these values (n=3 per group, 95% of results expected to be within ~2.5 standard deviations), these results are consistent across groups: 1) dominant eye vs. nondominant eye, 2) wild-type mice vs. mice with activity blocked, and at 3) ages P2, P4, and P8. The authors also found that mAZs and sAZs also have roughly the same number (about 1.5) of sAZs clustered around them (within 1.5 um).

      However, the authors do not interpret this consistency between groups as evidence that active zone clustering is not a specific marker or driver of activity dependent synaptic segregation. Rather, the authors perform a large number of tests for statistical significance and cite the presence or absence of statistical significance as evidence that "Eye-specific active zone clustering underlies synaptic competition in the developing visual system (title)". I don't believe this conclusion is supported by the evidence.

      We have revised the title to be descriptive: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system." While our correlative approach does not establish direct causality, our findings provide important structural evidence that complements existing functional studies of activity-dependent synaptic refinement. We have carefully revised the text throughout to avoid causal language, focusing instead on the developmental patterns we observe.

      Strengths

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model for activity dependent synaptic remodeling.

      Weaknesses

      In my previous review I argued that it was not possible to determine, from their analysis, whether the differences they were reporting between groups was important to the biology of the system. The authors have made some changes to their statistics (paired t-tests) and use some less derived measures of clustering. However, they still fail to present a meaningfully quantitative argument that the observed group differences are important. The authors base most of their claims on small differences between groups. There are two big problems with this practice. First, the differences between groups appear too small to be biologically important. Second, the differences between groups that are used as evidence for how the biology works are generally smaller than the precision of the author's sampling. That is, the differences are as likely to be false positives as true positives.

      (1) Effect size. The title claims: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". Such a claim might be supported if the authors found that mAZs are only found in dominant-eye RGCs and that eye-specific segregation doesn't begin until some threshold of mAZ frequency is reached. Instead, the behavior of mAZs is roughly the same across all conditions. For example, the clear trend in Figure 4C and D is that measures of clustering between mAZ and sAZ are as similar as could reasonably be expected by the experimental design. However, some of the comparisons of very similar values produced p-values < 0.05. The authors use this fact to argue that the negligible differences between mAZ and sAZs explain the development of the dramatic differences in the distribution of ipsilateral and contralateral RGCs.

      We have changed the title to avoid implying a causal relationship between clustering and eye-specific segregation. Our key findings in Figures 4C and 4D demonstrate effect sizes >2.0 with high statistical power (Supplemental Table S2). While the absolute magnitude of differences is modest (5-7%), these high effect sizes combined with low inter-animal variability demonstrate consistent, reproducible biological phenomena. During development, small differences during critical periods can have profound downstream consequences for synaptic refinement outcomes.

      We acknowledge that significance in Figure 4 arises due to low variance between biological replicates rather than large mean differences. We have revised the text to describe these as "slight" differences and that "WT mice show a tendency toward forming more synapses near mAZ inputs," reflecting appropriate caution in our interpretation while maintaining the statistical robustness of our findings.

      (2) Sample size. Performing a large number of significance tests and comparing pvalues is not hypothesis testing and is not descriptive science. At best, with large sample sizes and controls for multiple tests, this approach could be considered exploratory. With n=3 for each group, many comparisons of many derived measures, among many groups, and no control for multiple testing, this approach constitutes a random result generator.

      The authors argue that n=3 is a large sample size for the type of high resolution / large volume data being used. It is true that many electron microscopy studies with n=1 are used to reveal the patterns of organization that are possible within an individual. However, such studies cannot control individual variation and are, therefore, not appropriate for identifying subtle differences between groups.

      In response to previous critiques along these lines, the authors argue they have dealt with this issue by limiting their analysis to within-individual paired comparisons. There are several problems with their thinking in this approach. The main problem is that they did not change the logic of their arguments, only which direction they pointed the t-tests. Instead of claiming that two groups are different because p < 0.05, they say that two groups are different because one produced p < 0.05 and the other produced p > 0.05. These arguments are not statistically valid or biologically meaningful.

      We have implemented rigorous statistical controls, applying false discovery rate (FDR) correction using the Benjamini-Hochberg method (α = 0.05) within each experimental condition (age × genotype combination). This correction strategy treats each condition as addressing a distinct experimental question: “What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?” The approach appropriately controls for multiple testing while preserving power to detect biologically meaningful differences. We applied FDR correction separately to the ~20-34 measurements (varying by age and genotype) within each of the six experimental conditions, resulting in condition-specific adjusted p-values reported in updated Supplemental Table S2. This correction confirmed the robustness of our key findings. We do not base conclusions solely on comparing p-values across conditions. Our interpretations focus on effect sizes, confidence intervals, and consistent patterns within each condition, with statistical significance providing supporting evidence rather than the primary basis for biological conclusions.

      To the best of my understanding, the results are consistent with the following model:

      RGCs form mAZs at large boutons (known)

      About a quarter of week-one RGC boutons are mAZs (new observation)

      Vesicle clustering is proportional to active zone number (~new observation)

      RGC synapse density increases during the first post-week (known)

      Blocking activity reduces synapse density (known)

      Contralateral eye RGCs for more and larger synapses in the lateral dLGN (known)

      While mAZ formation is known in adult and juvenile dLGN, the formation of mAZ boutons during eye-specific competition represents new information with important functional implications. Synapses with multiple release sites should be stronger than single-active-zone synapses, suggesting a structural correlate for competitive advantage during refinement.

      We demonstrate distinct developmental patterns for sAZ versus mAZ contacts during the first postnatal week. Multi-active zone density favors the dominant eye, while single active-zone synapse density from the competing eye increases from P2-P4 to match dominant-eye levels. This reveals that newly formed synapses from the competing eye predominantly contain single release sites, marking P4-P8 as a critical window for understanding molecular mechanisms driving synaptic elimination.

      Our results show that altered retinal activity patterns (β2KO mice) reduce synapse density during eye-specific competition. We relied on β2 knockout mice, which retain retinal waves and spontaneous spike activity but with disrupted patterns and output levels compared to controls. We make no claims about complete activity blockade. Previous studies using different activity manipulations (epibatidine, TTX) have examined terminal morphology, but effects on synapse density during competition remain largely unknown. Achieving complete retinal activity blockade is technically challenging, making it of interest to revisit the role of activity using more precise manipulations to control spike output and relative timing.

      With n=3 and effect sizes smaller than 1 standard deviation, a statistically significant result is about as likely to be a false positive as a true positive.

      A true-positive statistically significant result does is not evidence of a meaningful deviation from a biological model.

      Our conclusions are based on results with effect sizes substantially larger than 1. Key findings demonstrate effect sizes exceeding 2.0. These large effect sizes, combined with rigorous FDR correction and low inter-animal variability, provide evidence against false positive results. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. All statistical results, effect sizes, and power analyses are reported in Supplementary Tables S2, with confidence intervals in Supplementary Table S3. We have revised the text in several places where small differences are presented to reflect appropriate caution in our interpretation.

      Providing plots that show the number of active zones present in boutons across these various conditions is useful. However, I could find no compelling deviation from the above default predictions that would influence how I see the role of mAZs in activity dependent eye-specific segregation.

      Below are critiques of most of the claims of the manuscript.

      Claim (abstract): individual retinogeniculate boutons begin forming multiple nearby presynaptic active zones during the first postnatal week.

      Confirmed by data.

      Claim (abstract): the dominant-eye forms more numerous mAZ contacts,

      Misleading: The dominant-eye (by definition) forms more contacts than the nondominant eye. That includes mAZ.

      While the dominant eye forms more total contacts, the pattern depends critically on contact type and developmental stage. The dominant eye forms more mAZ contacts across all ages (Figures 2 and S1). However, for sAZ contacts, the two eyes form similar numbers at P4, with the non-dominant eye showing increased sAZ formation during this critical period. This differential pattern by synapse type represents an important aspect of how synaptic competition unfolds structurally.

      Claim (abstract): At the height of competition, the non-dominant-eye projection adds many single active zone (sAZ) synapses

      Weak: While the individual observation is strong, it is a surprising deviation based on a single n=3 experiment in a study that performed twelve such experiments (six ages, mutant/wildtype, sAZ/mAZ)

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (abstract): Together, these findings reveal eye-specific differences in release site addition during synaptic competition in circuits essential for visual perception and behavior.

      False: This claim is unambiguously false. The above findings, even if true, do not argue for any functional significance to active zone clustering.

      Our phrasing “circuits essential for visual perception and behavior” referred to the general importance of binocular organization in the retinogeniculate system for visual processing and we did not intend to claim direct functional significance of our structural data. For clarity we have deleted the latter part of this sentence. In lines 35-37, the abstract now reads “Together, these findings reveal eye-specific differences in release site addition that correlate with axonal refinement outcomes during retinogeniculate refinement.”

      Claim (line 84): "At the peak of synaptic competition midway through the first postnatal week, the non-dominant-eye formed numerous sAZ inputs, equalizing the global synapse density between the two eyes"

      Weak: At one of twelve measures (age, bouton type, genotype) performed with 3 mice each, one density measure was about twice as high as expected.

      The difference in eye-specific sAZ formation at P2 and P8 had effect sizes of ~5.3 and ~2.7 respectively (after FDR correction the difference was still significant at P2 and trending at P8). At P4, no effect was observed by paired T-test and the 5/95% confidence intervals ranged from -0.021-0.008 synapses/m<sup>3</sup>. The consistency of this pattern across P2 and P8, combined with the large effect sizes, supports the reliability of this developmental finding. We report all effect sizes and power test analyses in Supplemental Table S2, and confidence intervals in Supplemental Table S3. 

      Claim (line 172): "In WT mice, both mAZ (Fig. 3A, left) and sAZ (Fig. 3B, left) inputs showed significant eye-specific volume differences at each age."

      Questionable: There appears to be a trend, but the size and consistency is unclear.

      Claim (line 175): "the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left)."

      Cherry picking. Twelve differences were measured with an n of 3, 3 each time. The biggest difference of the group was cited. No analysis is provided for the range of uncertainty about this measure (2.5 standard deviations) as an individual sample or as one of twelve comparisons.

      Claim (line 174): "In the middle of eye-specific competition at P4 in WT mice, the median VGluT2 cluster volume in dominant-eye mAZ inputs was 3.72 fold larger than that of non-dominant-eye inputs (Fig. 3A, left). In contrast, β2KO mice showed a smaller 1.1 fold difference at the same age (Fig. 3A, right panel). For sAZ synapses at P4, the magnitudes of eye-specific differences in VGluT2 volume were smaller: 1.35-fold in WT (Fig. 3B, left) and 0.41-fold in β2KO mice (Fig. 3B, right). Thus, both mAZ and sAZ input size favors the dominant eye, with larger eye-specific differences seen in WT mice (see Table S3)."

      No way to judge the reliability of the analysis and trivial conclusion: To analyze effect size the authors choose the median value of three measures (whatever the middle value is). They then make four comparisons at the time point where they observed the biggest difference in favor of their hypothesis. There is no way to determine how much we should trust these numbers besides spending time with the mislabeled scatter plots. The authors then claim that this analysis provides evidence that there is a difference in vGluT2 cluster volume between dominant and non-dominant RGCs and that that difference is activity dependent. The conclusion that dominant axons have bigger boutons and that mutants that lack the property that would drive segregation would show less of a difference is very consistent with the literature. Moreover, there is no context provided about what 1.35 or 1.1 fold difference means for the biology of the system.

      We focused on P4 for biological reasons rather than post-hoc selection. P4 represents the established peak of synaptic competition when eye-specific synapse densities are globally equivalent. This is a timepoint consistently highlighted throughout our manuscript and supported by previous literature. We have modified our presentation from fold changes to measured eye-specific differences in volume (mean ± standard error) and added confidence intervals in Supplemental Table S3. The effect sizes for eye-specific differences in VGluT2 volume at P4 are robust: ~2.3 and ~1.5 for mAZ and sAZ measurements in WT mice, and ~2.5 and ~1.8 in β2KO mice, with all analyses well-powered (Supplemental Table S2).

      We were unable to identify any mislabeled scatter plots and believe all figures are correctly labeled. While dominant-eye advantage in bouton size is consistent with previous literature, our study provides the first detailed analysis of how this develops specifically during the critical period of competition, with distinct patterns for single versus multi-active zone contacts. Our data show that dominant-eye inputs have larger vesicle pools that scale with active zone number. While this suggests enhanced transmission capacity, we make no direct physiological claims based on structural data alone.

      Claim (189): "This shows that vesicle docking at release sites favors the dominant-eye as we previously reported but is similar for like eye type inputs regardless of AZ number."

      Contradicts core claim of manuscript: Consistent with previous literature, there is an activity dependent relative increase in vGlut2 clustering of dominant eye RGCs. The new information is that that activity dependence is more or less the same in sAZ and mAZ. The only plausible alternative is that vGlut2 scaling only increases in mAZ which would be consistent with the claims of their paper. That is not what they found. To the extent that the analysis presented in this manuscript tests a hypothesis, this is it. The claim of the title has been refuted by figure 3.

      We report the volume of docked vesicle signal (VGluT2) nearby each active zone, finding this is greater for dominant-eye synapses. Within each eye-specific synapse population, vesicle signal per active zone is similar regardless of whether these are part of single- or multi-active zone contacts. This is consistent with a modular program of active zone assembly and maintenance: core molecular programs facilitate docking at each AZ similarly regardless of how many AZs are nearby. 

      This finding does not contradict our main conclusions but rather provides insight into how synaptic advantages are structured. The dominant eye's advantage may arise in part from forming more multi-AZ contacts (which have proportionally more docked vesicles) rather than from enhanced vesicle loading per individual active zone. This organization may reflect how developmental competition operates through contact number and active zone addition rather than fundamental changes to individual release site properties.

      We have changed the title to be descriptive rather than mechanistic.

      Claim (line 235): "For the non-dominant eye projection, however, clustered mAZ inputs outnumbered clustered sAZ inputs at P4 (Fig. 4C, bottom left panel), the age when this eye adds sAZ synapses (Fig. 2C)."

      Misleading: The overwhelming trend across 24 comparisons is that the sAZ clustering looks like mAZ clustering. That is the objective and unambiguous result. Among these 24 underpowered tests (n=3), there were a few p-values < 0.05. The authors base their interpretation of cell behavior on these crossings.

      In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Claim (line 328): "The failure to add synapses reduced synaptic clustering and more inputs formed in isolation in the mutants compared to controls."

      Trivially true: Density was lower in mutant.

      We have rewritten the sentence for clarity: “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      Claim (line 332): "While our findings support a role for spontaneous retinal activity in presynaptic release site addition and clustering..."

      Not meaningfully supported by evidence: I could not find meaningful differences between WT and mutant beside the already known dramatic difference in synapse density.

      We have changed the sentence to avoid overinterpreting the results. The new sentence in lines 415-417 reads: “While our results highlight developmental changes in presynaptic release site addition and clustering, activity-dependent postsynaptic mechanisms also influence input refinement at later stages.”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes of this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of synapses based on Bassoon clustering: the multiple active zone (mAZ) synapse and single active zone (sAZ) synapse. In this revision, the authors have added EM data to support the idea that mAZ synapses represent boutons with multiple release sites. They have also reanalyzed their data set with different statistical approaches.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      While the interpretation of this data set is much more grounded in this second revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence. In particular, the data does not support the title: "Eye-specific active zone clustering underlies synaptic competition in the developing visual system". The data show that there are fewer synapses made for both contra- and ipsi- inputs in the β2KO mice-- this fact alone can account for the differences in clustering. There is no evidence linking clustering to synaptic competition. Moreover, the findings of differences in AZ# or distance between AZs that the authors report are quite small and it is not clear whether they are functionally meaningful.

      We thank the reviewer for their helpful suggestions that improved the manuscript in this revision. We have changed the title to remove the reference to “clustering” and to avoid implying any causal relationships. The new title is descriptive: “Eye-specific differences in active zone addition during synaptic competition in the developing visual system”.

      To further address the reviewers comments, we have removed the remaining references to activity-dependent effects on synaptic development (line 36, line 96, line 415). We have also modified the text in lines 411-413 to state that “The failure to add synapses could explain the observation that synaptic clustering was reduced and more inputs formed in isolation in the mutants compared to controls.”

      We have also updated our presentation of results for Figure 4 to ensure that we do not causally link clustering to synaptic competition. In Figures 4C and 4D we report significant results with high effect sizes (effect sizes all greater than 2; see Supplemental Table S2). The mean differences are modest (5-7%) and significance arises due to low variance between biological replicates. We acknowledge that clustering patterns are generally similar between mAZ and sAZ inputs across most conditions. We have revised the text to describe these as “slight” differences and that “WT mice show a tendency toward forming more synapses near mAZ inputs”, reflecting appropriate caution in our interpretation while noting the statistical consistency of these patterns.

      Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contraprojecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters. The authors interpret these results as providing insight into how neural activity drives synapse maturation, the strength of their conclusions is not directly tested by their analysis.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate the eye of origin is what makes this data set unique over previous structural work. The addition of example images from the EM dataset provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of single vs multi-active zone synapses are important and represent a significant advance, the authors continue to make unsupported conclusions regarding the biological processes driving these changes. Although this revision includes additional information about the populations tested and the tests conducted, the authors do not address the issue raised by previous reviews. Specifically, they provide no assessment of what effect size represents a biologically meaningful result. For example, a more appropriate title is "The distribution of eye-specific single vs multiactive zone is altered in mice with reduced spontaneous activity" rather than concluding that this difference in clustering is somehow related to synaptic competition. Of course, the authors are free to speculate, but many of the conclusions of the paper are not supported by their results.

      We appreciate the reviewer’s helpful critique. We have changed the title to be descriptive and avoid implying causal relationships. 

      We have applied false discovery rate (FDR) correction using the Benjamini-Hochberg method with α = 0.05 within each experimental condition (age × genotype combination). The FDR correction treats each condition as addressing a distinct experimental question: 'What synaptic properties differ between left eye and right eye inputs in this specific developmental stage and genotype?'

      This correction strategy is appropriate because: 1) we focus our statistical comparisons within each age/genotype; 2) each age-genotype combination represents a separate biological context where different synaptic properties between eye-of-origin may be relevant; and 3) this approach controls for multiple testing within each experimental question while maintaining statistical power to detect meaningful biological differences.

      We applied FDR correction separately to the ~20-34 measurements (varying with age and genotype) within each of the six experimental conditions (P2-WT, P2-ß2, P4-WT, P4-ß2, P8-WT, P8-ß2), resulting in condition-specific adjusted p-values. These are reported in the updated Supplemental Table S2. Figures have been also been updated to reflect the FDR-adjusted values. Selected between-genotype comparisons are presented descriptively using 5/95% confidence intervals. This correction confirmed the robustness of our key findings.

      With regard to the biological significance of effect sizes, our key findings demonstrate effect sizes >2.0, indicating robust effects. During critical developmental periods, consistent structural differences, even those modest in absolute magnitude, can reflect important regulatory mechanisms that influence refinement outcomes. The differences in synaptic organization we observe occur during the first postnatal week when eyespecific competition is active, suggesting these patterns may be relevant to understanding how structural advantages emerge during synaptic refinement.

      Reviewer #1 (Recommendations for the authors):

      I have tried to understand the analysis and biology of this manuscript as best I can. I believe the analytical approach taken is not reliable and I have explained why in my public comments. I don't believe this manuscript is unique in taking this approach. I have recently published a paper on how common this approach is and why it doesn't work. I don't want to give the impression that the problem with the analysis was that it was not computationally sophisticated enough or that you did not jump through a specific statistical hoop. If I strip out the arguments that depend on misinterpretations of p-values and -instead- look at the scatterplots, I come up with a very different view of the data than what is described in the paper.

      The information in the plots could be translated into a rigorous statistical analysis of estimated differences between groups given the uncertainties of the experimental design. I don't really think that analysis would be useful. I think it would have been enough to publish the plots and report your estimates of the number of active zones in RGCs during development. I don't see evidence of an additional effect.

      We appreciate the reviewer’s helpful comments throughout the review process. Mean active zone numbers per mAZ contact are presented in Figure S2D/E. We look forward to further technical and computational advances that will help us increase our data acquisition throughput and sample sizes when designing future studies. 

      Reviewer #2 (Recommendations for the authors):

      The authors should modify the title and other text to be more consistent with the data. There is no evidence that active zone clustering has any direct relationship to synaptic competition.

      We appreciate the reviewer’s helpful suggestions to ensure appropriate language around causal effects. We have modified the title to accurately reflect the results: "Eyespecific differences in active zone addition during synaptic competition in the developing visual system." We have revised the text in the abstract, introduction, and results section for Figures 4 to be consistent with the data and not imply causality of synapse clustering on segregation phenotypes.

      Reviewer #3 (Recommendations for the authors):

      Change the title.

      We appreciate the reviewer’s feedback throughout the review process. We have modified the title to accurately reflect the results: "Eye-specific differences in active zone addition during synaptic competition in the developing visual system."

    1. eLife Assessment

      This important work advances our understanding of NMDAR diversity in the brain by providing evidence into the subunit arrangement, architecture, and activation mechanism of GluN1-N2-N3A tri-NMDAR. However, the evidence supporting the conclusions provides incomplete proof for the presence and functional properties of this NMDA receptor subtype. The work will be of broad interest to neuroscientists and biophysicists.

    2. Reviewer #1 (Public review):

      Summary:

      The previous evidence for NMDARs containing N1, N2, and N3 subunits (t-NMDARs) was weak. All previous results could be explained by mixtures of di-heteromeric receptors. The authors here set out to identify t-NMDARs both in vitro and in the brain.

      Strengths:

      The single-channel recording is quite convincing because the authors could reproduce previous results in their system, but could also then add new observations. It is quite hard (if not impossible) to obtain the N1-N2A-N3A result at 100 µM Glu/Gly from a mixture, because the N1-N2A diheteromer has such a high open probability. Therefore, any idea that this might be, in fact, two receptors (GluN1-N2A and GluN1-N3A) is trivially falsified. The authors might prefer to make this argument based on the reduction of open probability, which cannot be achieved from a mixture masquerading as a single channel.

      With regard to crosslinker usage in brain tissue, these are very impressive attempts, which I applaud. The fluorescence images of the brain sections look convincing. But the bands corresponding to N2-N3 crosslinked subunits from neurons or the brain are faint. I would want more information to be convinced that these faint bands come from GluN2-N3 dimers.

      Weaknesses:

      In the first part of the paper, where the CryoEM structure is determined, it's not really clear to me the extent to which Fab binding might bias the position of the ATDs (and even then the arrangement of each subunit within the whole complex). Then, much later at the end of the results, there is a structural analysis that claims to be integrative (Figure 7) but does not obviously rely on any other data than the structures, but does mention this point about the Fabs. The results could be rearranged to make these points clearer.

      I have my biggest doubts about the crosslinking of native receptors. For the biochemistry from neurons or brain tissue, this is a very ambitious idea that has been hard to execute over the past 15-20 years. The authors use AzF for the obvious reason that this was done before in NMDARs. The constructs that have been assembled are neat. But AzF is a really bad crosslinker. The authors attribute the weak bands to subunit mobility, but the minor abundance is more likely due to the strong constraints on AzF crosslinking and its unsuitable photochemistry in general (very easily activated with room light, for example).

      There is no information at all given about the wavelength, intensity, duration of UV exposure, and how, for example, the right exposure was determined. How were the samples protected in between?

    3. Reviewer #2 (Public review):

      Summary:

      The authors purified and solved by cryo-EM a structure of tri-heteromeric GluN1/GluN2A/GluN3A NMDA receptors, whose existence has long been contentious. Using patch-clamp electrophysiology on GluN1/GluN2/GluN3A NMDARs reconstituted into liposomes, they characterized the function of this NMDAR subtype. Finally, thanks to site-targeted crosslinking using unnatural amino acid incorporation, they show that the GluN2A subunit can crosslink with the GluN3A subunit in a cellular context, both in recombinant systems (HEK cells) and neuronal cultures and in vivo.

      Strengths:

      The NMDAR GluN3 subunit is a glycine-binding subunit that was long thought to assemble into GluN1/GluN2/GluN3 tri-heteromeric receptors during development, acting as a brake for synaptic development. However, several studies based on single subunit counting (Ulbrich et al., PNAS 2008) and ex vivo/in vivo electrophysiology have challenged the existence of these tri-heteromers (see Bossi, Pizzamiglio et al., Trends Neurosci. 2023). A large part of the controversy stems from the difficulty in isolating the tri-heteromeric population from their di-heteromeric counterparts, which led to a lack of knowledge on the biophysical and pharmacological properties of putative GluN1/GluN2/GluN3 receptors. To counteract this problem, the authors used a two-step purification method - first with a strep-tag attached to the GluN3 subunit, then with a His tag attached to the GluN2 subunit - to isolate GluN1/GluN2/GluN3 tri-heteromers from GluN1/GluN2A and GluN1/GluN3 di-heteromers, and they did observe these entities in Western blot and FSEC. They solved a cryo-EM structure of this NMDAR subtype using specific FAbs to identify the GluN1 and GluN2A subunits, showing an asymmetrical, splayed architecture. Then, they reconstituted the purified receptors in lipid vesicles to perform single-channel electrophysiological recordings. Finally, in order to validate the tri-heteromeric arrangement in a cellular system, they performed photocrosslinking experiments between the GluN2A and GluN3 subunits. For this purpose, a photoactivatable unnatural amino acid (AzF) was incorporated at the bottom of GluN2A NTD, a region embedded within the receptor complex that is predicted to be in close proximity to the GluN3 subunit. This is an elegant approach to validate the existence of GluN1/GluN2/GluN3 tri-hets, since at the chosen AzF incorporation position, crosslinking between GluN2A and GluN3 is more likely to reflect interaction of subunits within the same receptor complex than between two receptors. They show crosslinking between GluN2A and GluN3 in the presence of AzF and UV light, but not if UV light or AzF were not provided, suggesting that GluN2A and GluN3 can indeed be incorporated in the same complex. In a further attempt to demonstrate the physiological relevance of these tri-heteromers, they performed the same crosslinking experiments in cultured neurons and even native brain samples. While unnatural amino acid incorporation is now a well-established technique in vitro, such an approach is very difficult to implement in vivo. The technical effort put into the validation of the presence of these tri-heteromers in vivo should thus be commended.

      Overall, all the strategies used by this paper to prove the existence of GluN1/GluN2/GluN3 tri-heteromers, and investigate their structure and function, are well-thought-out and very elegant. But the current data do not fully support the conclusions of the paper.

      Weaknesses:

      All the experiments aiming at proving the existence of GluN1/GluN2/GluN3 tri-heteromers rely on the purification of these receptors from whole cell extracts. There is therefore no proof that these receptors are expressed at the membrane and are functional. This is a limitation that has been overlooked and should be discussed in the manuscript. In addition, in the current manuscript state, each demonstration suffers from caveats that do not allow for a firm conclusion about the existence and the properties of this receptor subtype.

      (1) In Cryo-EM images of GluN1/GluN2A/GluN3A receptors, the GluN3 subunit is identified as the subunit having no Fab bound to it. How can the authors be sure that this is indeed the GluN3A subunit and not a GluN2A subunit that has not bound the Fab? Does the GluN3A subunit carry features that would allow distinguishing it independently of Fab binding? In addition, it is surprising that the authors did not incubate the tri-heteromers with a Fab against GluN3A, since Extended Figure 3 shows that such a Fab is available.

      (2) Whether the single-channel recordings reflect the activity of GluN1/GluN2/GluN3 tri-heteromers is not convincing. Indeed, currents from liposomes containing these tri-heteromers have two conductance levels that correspond to the conductances of the corresponding di-heteromers. There is therefore a need for additional proof that the measured currents do not reflect a mixture of currents from N1/2A di-heteromers on one side, and N1/3A di-heteromers on the other side. What is the purity of the N1/3A sample? Indeed, given the high open probability and high conductance of N1/2A tri-heteromers, even a small fraction of them could significantly contribute to the single-channel currents. Additionally, although the authors show no current induced by 3uM glycine alone on proteoliposomes with the N1/2A/3A prep (no stats provided, though), given the sharp dependence of N1/3A currents on glycine concentration, this control alone cannot rule out the presence of contaminant N1/3A dihets in the preparation.

      Finally, pharmacological characterization of these tri-heteromers is lacking. In vivo, the presence of tri-heteromeric GluN1/GluN2/GluN3 tri-heteromers was inferred from recordings of NMDARs activated by glutamate but with low magnesium sensitivity. What is the effect of magnesium on N1/2A/3A currents? Does APV, the classical NMDAR antagonist acting at the glutamate site, inhibit the tri-heteromers? What is the effect of CGP-78608, which inhibits GluN1/GluN2 NMDARs but potentiates GluN1/GluN3 NMDARs? Such pharmacological characterization is critical to validate that the measured currents are indeed carried by a tri-heteromeric population, and would also be very important to identify such tri-heteromers in native tissues.

      (3) Validation of GluN1/GluN2/GluN3 tri-heteromer expression by photocrosslinking: The mixture of constructions used (full-length or CTD-truncated constructs, with or without tags) is confusing, and it is difficult to track the correct molecular weight of the different constructs. In Figure 6, the band corresponding to a putative GluN3/GluN2A dimer is very weak. In addition, given the differences in molecular weights between the GluN2 subunits and GluN3, we would expect the band corresponding to a GluN2A/GluN2B to migrate differently from the GluN2A/GluN3 dimer, but all high molecular weight bands seem to be a the same level in the blot. Finally, in the source data, the blots display additional bands that were not dismissed by the authors without justification. In short, better clarification of the constructs and more careful interpretation of the blots are necessary to support the conclusions claimed by the authors.

    1. eLife Assessment

      This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.

    2. Reviewer #1 (Public review):

      Summary:

      The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.

      Strengths:

      This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.

      Weaknesses:

      One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      Weaknesses:

      (1) Lack of participants' characteristics (socio-economic, nutritional, physical).

      (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.

      (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.

      (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.

      (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.

      (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.

      (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      Assessment of Claims:

      The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence; however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

    1. eLife Assessment

      This important work combines theoretical analysis with precise experimental perturbation to demonstrate that the Wnt signaling pathway is characterized by anti-resonance, or a suppression of pathway output at intermediate activation frequencies. The authors identify an anti-resonance behavior, with compelling evidence from optogenetic stimulation in multiple cell types, alongside modeling results that corroborate the phenomenon. While the demonstration of this phenomenon has yet to be extended to fully physiological situations, its clear existence within optogenetically stimulated systems shows that it is likely a significant factor that contributes to the behavior of this central signaling pathway.

    2. Reviewer #1 (Public review):

      Summary:

      This report demonstrates that the gene expression output of the Wnt pathway, when controlled precisely by a synthetic light-based input, depends substantially on the frequency of stimulation. The particular frequency-dependent trend that is observed - anti-resonance, a suppression of target gene expression at intermediate frequencies given a constant duty cycle - is a novel aspect that has not been clearly shown before for this or other signaling pathways. The paper provides both clear experimental evidence of the phenomenon with engineered cellular systems and a model-based analysis of how the pairing of rate constants in pathway activation/deactivation could result in such a trend.

      Strengths:

      This report couples in vitro experimental data with an abstracted mathematical model. Both of these approaches appear to be technically sound and to provide consistent and strong support for the main conclusion. The experimental data are particularly clear, and the demonstration that Brachyury expression is subject to anti-resonance in ESCs is particularly compelling. The modeling approach is reasonably scaled for the system at the level of detail that is needed in this case, and the hidden variable analysis provides some insight into how the anti-resonance works.

      Weaknesses:

      (1) The anti-resonance phenomenon has not been demonstrated using physiological Wnt ligands; however, I view this as only a minor weakness for an initial report of the phenomenon. The potential significance of the phenomenon for Wnt outweighs the amount of effort it would take to carry the demonstration further - testing different frequencies/duty cycles at the level of ligand stimulus using microfluidics could get quite involved, and would likely take quite some time. Adding some more discussion about how the time scales of ligand-receptor binding could play into the reduced model would further ameliorate this issue.

      (2) While the model is fully consistent with the data, it has not been validated using experimental manipulations to establish that the mechanisms of the cell system and the model are the same. There may be some ways to make such modifications, for example, using a proteasome inhibitor. An alternative would be to more explicitly mention the need to validate the model's mechanism with experiments.

      (3) I think the manuscript misses an opportunity to discuss the potential of the phenomenon in other pathways. The hedgehog pathway, for example, involves GSK3-mediated partial proteolysis of a transcription factor, which could conceivably be subject to similar behaviors, and there are certainly other examples as well.

      (4) Some aspects of the modeling and hidden variable analysis are not optimally presented in the main text, although when considered together with the Supplemental Data, there are no significant deficiencies.

    3. Reviewer #2 (Public review):

      Summary:

      By combining optogenetics with theoretical modelling, the authors identify an anti-resonance behavior in the WnT signaling pathway. This behavior is manifested as a minimal response at a certain stimulation frequency. Using an abstracted hidden variable model, the authors explain their findings by a competition of timescales. Furthermore, they experimentally show that this anti-resonance influences the cell fate decision involved in human gastrulation.

      Strengths:

      (1) This interdisciplinary study combines precise optogenetic manipulation with advanced modelling.

      (2) The results are directly tested in two different systems: HEK293T cells and H9 human embryonic stem cells.

      (3) The model is implemented based on previous literature and has two levels of detail: i) a detailed biochemical model and ii) an abstract model with a hidden parameter.

      Weaknesses:

      (1) While the experiments provide both single-cell data and population data, the model only considers population data.

      (2) Although the model captures the experimental data for TopFlash very well, the beta-Cat curves (Figure 2B) are only described qualitatively. This discrepancy is not discussed.

      Overall Assessment:

      The authors convincingly identified an anti-resonance behavior in a signaling pathway that is involved in cell fate decisions. The focus on a dynamic signal and the identification of such a behavior is important. I believe that the model approach of abstracting a complicated pathway with a hidden variable is an important tool to obtain an intuitive understanding of complicated dependencies in biology. Such a combination of precise ontogenetic manipulation with effective models will provide a new perspective on causal dependencies in signaling pathways and should not be limited only to the system that the authors study.

    1. eLife Assessment

      This fundamental study presents a new method for longitudinally tracking cells in two-photon imaging data that addresses the specific challenges of imaging neurons in the developing cortex. It provides compelling evidence demonstrating reliable longitudinal identification of neurons across the second postnatal week in mice. The study should be of interest to development neuroscientists engaged in population-level recordings using two-photon imaging.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling and innovative approach that combines Track2p neuronal tracking with advanced analytical methods to investigate early postnatal brain development. The work provides a powerful framework for exploring complex developmental processes such as the emergence of sensory representations, cognitive functions, and activity-dependent circuit formation. By enabling the tracking of the same neurons over extended developmental periods, this methodology sets the stage for mechanistic insights that were previously inaccessible.

      Strengths:

      (1) Innovative Methodology:

      The integration of Track2p with longitudinal calcium imaging offers a unique capability to follow individual neurons across critical developmental windows.

      (2) High Conceptual Impact:

      The manuscript outlines a clear path for using this approach to study foundational developmental questions, such as how early neuronal activity shapes later functional properties and network assembly.

      (3) Future Experimental Potential:

      The authors convincingly argue for the feasibility of extending this tracking into adulthood and combining it with targeted manipulations, which could significantly advance our understanding of causality in developmental processes.

      (4) Broad Applicability:

      The proposed framework can be adapted to a wide range of experimental designs and questions, making it a valuable resource for the field.

      Weaknesses:

      None major. The manuscript is conceptually strong and methodologically sound. Future studies will need to address potential technical limitations of long-term tracking, but this does not detract from the current work's significance and clarity of vision

      Comments on revisions:

      I have no further requests. I think this is an excellent manuscript

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Majnik and colleagues introduces "Track2p", a new tool designed to track neurons across imaging sessions of two-photon calcium imaging in developing mice. The method addresses the challenge of tracking cells in the growing brain of developing mice. The authors showed that "Track2p" successfully tracks hundreds of neurons in the barrel cortex across multiple days during the second postnatal week. This enabled identification of the emergence of behavioral state modulation and desynchronization of spontaneous network activity around postnatal day 11.

      Strengths

      The authors have satisfactorily addressed the majority of our questions and comments, and the revisions substantially improve the manuscript. The expansion of Track2p to accept general NumPy array inputs makes the tool more accessible to researchers using different analysis pipelines. While the absence of benchmarking standards remains a limitation across the field, the release of the ground-truth dataset is an important step forward that will allow other researchers to evaluate and compare algorithms.

      Minor point

      (1) The authors tested the robustness of the algorithm across non-consecutive days. As expected, performance drops significantly under these conditions. We agree that this limitation reflects biological constraints due to brain growth rather than shortcomings of the algorithm itself. This is relevant for researchers planning to use Track2p for longitudinal imaging or benchmarking new algorithms, and we recommend including some of this information in the Supplementary Information along with a brief discussion.

      Comments on revisions:

      We acknowledge the extended documentation for using Track2p and converting between Suite2p outputs and NumPy arrays. This addition is of great utility. We would also suggest further expanding the documentation for the NumPy array implementation, as we ran into some errors when testing this feature using NumPy arrays generated from deltaF traces, TIFF FOVs, and Cellpose masks.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript Majnik et al. developed a computational algorithm to track individual developing interneurons in the rodent cortex at postnatal stages. Considerable development in cortical networks takes place during the first postnatal weeks, however, tools to study them longitudinally at a single cell level are scarce. This paper provides a valuable approach to study both single cell dynamics across days and state-drive network changes. The authors used Gad67Cre mice together with virally introduced TdTom to track interneurons based on their anatomical location in the FOV and AAVSynGCaMP8m to follow their activity across the second postnatal week, a period during which the cortex is known to undergo marked decorrelation in spontaneous activity. Using Track2P, the authors show feasibility to track populations of neurons in the same mice capturing with their analysis previously described developmental decorrelation and uncovering stable representations of neuronal activity, coincident with the onset of spontaneous active movement. The quality of the imaging data is compelling, and the computational analysis is thorough, providing a widely applicable tool for the analysis of emerging neuronal activity in the cortex. Below are some points for the authors to consider.

      Major points

      The authors use a viral approach to label cortical interneurons. It is unclear how Track2P will perform in dense networks of excitatory cells using GCaMP transgenic mice.

      The authors used 20 neurons to generate a ground truth data set. The rational for this sample size is unclear. Figure 1 indicates capability to track ~728 neurons. A larger ground truth data set will increase the robustness of the conclusions.

      It is unclear how movement was scored in the analysis shown in Fig 5A. Was the time that the mouse spent moving scored after visual inspection of the videos? Were whisker and muscle twitches scored as movement or was movement quantified as amount of time in which the treadmill was displaced?

      The rational for binning the data analysis in early P11 is unclear. As the authors acknowledged, it is likely that the decoder captured active states from P11 onwards. Because active whisking begins around P14, it is unlikely to drive this change in network dynamics at P11. Does pupil dilation in the pups change during locomotor and resting states? Does the arousal state of the pups abruptly change at P11?

      Comments on revisions:

      The authors have addressed carefully all my comments. This is an interesting paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for very enthusiastic and supportive comments on our manuscript. 

      Summary:

      This manuscript presents a compelling and innovative approach that combines Track2p neuronal tracking with advanced analytical methods to investigate early postnatal brain development. The work provides a powerful framework for exploring complex developmental processes such as the emergence of sensory representations, cognitive functions, and activity-dependent circuit formation. By enabling the tracking of the same neurons over extended developmental periods, this methodology sets the stage for mechanistic insights that were previously inaccessible.

      Strengths:

      (1) Innovative Methodology:

      The integration of Track2p with longitudinal calcium imaging offers a unique capability to follow individual neurons across critical developmental windows.

      (2) High Conceptual Impact:

      The manuscript outlines a clear path for using this approach to study foundational developmental questions, such as how early neuronal activity shapes later functional properties and network assembly.

      (3) Future Experimental Potential:

      The authors convincingly argue for the feasibility of extending this tracking into adulthood and combining it with targeted manipulations, which could significantly advance our understanding of causality in developmental processes.

      (4) Broad Applicability:

      The proposed framework can be adapted to a wide range of experimental designs and questions, making it a valuable resource for the field.

      Weaknesses:

      No major weaknesses were identified by this reviewer. The manuscript is conceptually strong and methodologically sound. Future studies will need to address potential technical limitations of long-term tracking, but this does not detract from the current work's significance and clarity of vision.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Majnik and colleagues introduces "Track2p", a new tool designed to track neurons across imaging sessions of two-photon calcium imaging in developing mice. The method addresses the challenge of tracking cells in the growing brain of developing mice. The authors showed that "Track2p" successfully tracks hundreds of neurons in the barrel cortex across multiple days during the second postnatal week. This enabled the identification of the emergence of behavioral state modulation and desynchronization of spontaneous network activity around postnatal day 11.

      Strengths:

      The manuscript is well written, and the analysis pipeline is clearly described. Moreover, the dataset used for validation is of high quality, considering the technical challenges associated with longitudinal two-photon recordings in mouse pups. The authors provide a convincing comparison of both manual annotation and "CellReg" to demonstrate the tracking performance of "Track2p". Applying this tracking algorithm, Majnik and colleagues characterized hallmark developmental changes in spontaneous network activity, highlighting the impact of longitudinal imaging approaches in developmental neuroscience. Additionally, the code is available on GitHub, along with helpful documentation, which will facilitate accessibility and usability by other researchers.

      Weaknesses:

      (1) The main critique of the "Track2p" package is that, in its current implementation, it is dependent on the outputs of "Suite2p". This limits adoption by researchers who use alternative pipelines or custom code. One potential solution would be to generalize the accepted inputs beyond the fixed format of "Suite2p", for instance, by accepting NumPy arrays (e.g., ROIs, deltaF/F traces, images, etc.) from files generated by other software. Otherwise, the tool may remain more of a useful add-on to "Suite2p" (see https://github.com/MouseLand/suite2p/issues/933) rather than a fully standalone tool.

      We thank the reviewer for this excellent suggestion. 

      We have now implemented this feature, where Track2p is now compatible with ‘raw’ NumPy arrays for the three types of inputs. For more information, please check the updated documentation: https://track2p.github.io/run_inputs_and_parameters.html#raw-npy-arrays. We have also tested this feature using a custom segmentation and trace extraction pipeline using Cellpose for segmentation.

      (2) Further benchmarking would strengthen the validation of "Track2p", particularly against "CaIMaN" (Giovannucci et al., eLife, 2019), which is widely used in the field and implements a distinct registration approach.

      This reviewer suggested  further benchmarking of Track2P.  Ideally, we would want to benchmark Track2p against the current state-of-the-art method. However, the field currently lacks consensus on which algorithm performs best, with multiple methods available including CaIMaN, SCOUT (Johnston et al. 2022), ROICaT (Nguyen et al. 2023), ROIMatchPub (recommended by Suite2p documentation and recently used by Hasegawa et al. 2024), and custom pipelines such as those described by Sun et al. 2025. The absence of systematic benchmarking studies—particularly for custom tracking pipelines—makes it impossible to identify the current state-of-the-art for comparison with Track2p. While comparing Track2p against all available methods would provide comprehensive evaluation, such an analysis falls beyond the scope of this paper.

      We selected CellReg for our primary comparison because it has been validated under similar experimental conditions—specifically, 2-photon calcium imaging in developing hippocampus between P17-P25 (Wang et al. 2024)—making it the most relevant benchmark for our developmental neocortex dataset.

      That said, to support further benchmarking in mouse neocortex (P8-P14), we will publicly release our ground truth tracking dataset.

      (3) The authors might also consider evaluating performance using non-consecutive recordings (e.g., alternate days or only three time points across the week) to demonstrate utility in other experimental designs.

      Thank you for your suggestion. We have performed a similar analysis prior to submission, but we decided against including it in the final manuscript, to keep the evaluation brief and to not confuse the reader with too many different evaluation methods. We have included the results inAuthor response images 1 and 2 below.

      To evaluate performance in experimental designs with larger time spans between recordings (>1 day) we performed additional evaluation of tracking from P8 to each of the consecutive days while omitting the intermediate days (e. g. P8 to P9, P8 to P10 … P8 to P14). The performance for the three mice from the manuscript is shown below:

      Author response image 1.

      As expected with increasing time difference between the two recordings the performance drops significantly (dropping to effectively zero for 2 out of 3 mice). This could also explain why CellReg struggles to track cells across all days, since it takes P8 as a reference and attempts to register all consecutive days to that time point before matching, instead of performing registration and matching in consecutive pairs of recordings (P8-P9, P9-P10 … P13-P14) as we do.

      Finally for one of the three mice we also performed an additional test where we asked how adding an additional recording day might rescue the P8-P14 tracking performance. This corresponds to the comment from the reviewer, answering the question if we can only perform three days of recording which additional day would give the best tracking performance. 

      Author response image 2.

      As can be seen from the plot, adding the P10 or P11 recording shows the most significant improvement to the tracking performance, however the performance is still significantly lower than when including all days (see Fig. 4). This test suggests that including a day that is slightly skewed to earlier ages might improve the performance more than simply choosing the middle day between the two extremes. This would also be consistent with the qualitative observation that the FOV seems to show more drastic day-to-day changes at earlier ages in our recording conditions.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Majnik et al. developed a computational algorithm to track individual developing interneurons in the rodent cortex at postnatal stages. Considerable development in cortical networks takes place during the first postnatal weeks; however, tools to study them longitudinally at a single-cell level are scarce. This paper provides a valuable approach to study both single-cell dynamics across days and state-driven network changes. The authors used Gad67Cre mice together with virally introduced TdTom to track interneurons based on their anatomical location in the FOV and AAVSynGCaMP8m to follow their activity across the second postnatal week, a period during which the cortex is known to undergo marked decorrelation in spontaneous activity. Using Track2P, the authors show the feasibility of tracking populations of neurons in the same mice, capturing with their analysis previously described developmental decorrelation and uncovering stable representations of neuronal activity, coincident with the onset of spontaneous active movement. The quality of the imaging data is compelling, and the computational analysis is thorough, providing a widely applicable tool for the analysis of emerging neuronal activity in the cortex. Below are some points for the authors to consider.

      We thank the reviewer for a constructive and positive evaluation of our MS. 

      Major points:

      (1) The authors used 20 neurons to generate a ground truth dataset. The rationale for this sample size is unclear. Figure 1 indicates the capability to track ~728 neurons. A larger ground truth data set will increase the robustness of the conclusions.

      We think this was a misunderstanding of our ground truth dataset analysis which included 192 and not 20 neurons. Indeed, as explained in the methods section, since manually tracking all cells would require prohibitive amounts of time, we decided to generate sparse manual annotations, only tracking a subset of all cells from the first recording day onwards. To do this, we took the first recording (s0), and we defined a grid 64 equidistant points over the FOV and, for each point, identified the closest ROI in terms of euclidean distance from the median pixel of the ROI (see Fig. S3A). We then manually tracked these 64 ROIs across subsequent days. Only neurons that were detected and tracked across all sessions were taken into account and referred to as our ground truth dataset (‘GT’ in Fig. 4). This was done for 3 mice, hence 3X64 neurons and not 20 were used to generate our GT dataset. 

      (2) It is unclear how movement was scored in the analysis shown in Figure 5A. Was the time that the mouse spent moving scored after visual inspection of the videos? Were whisker and muscle twitches scored as movement, or was movement quantified as the amount of time during which the treadmill was displaced?

      Movement was scored using a ‘motion energy’ metric as in Stringer et al. 2019 (V1) or Inácio et al. 2025 (S1). This metric takes each two consecutive frames of the videography recordings and computes the difference between them by summing up the square of pixelwise differences between the two images. We made the appropriate changes in the manuscript to further clarify this in the main text and methods in order to avoid confusion.

      Since this metric quantifies global movements, it is inherently biased to whole-body movements causing more significant changes in pixel values around the whole FOV of the camera. Slight twitches of a single limb, or the whisker pad would thus contribute much less to this metric, since these are usually slight displacements in a small region of the camera FOV. Additionally, comparing neural activity across all time points (using correlation or R<sup>2</sup>) also favours movements that last longer (such as wake movements / prolonged periods of high arousal) since each time point is treated equally.

      As we suggested in the discussion, in further analysis it would be interesting to look at the link between twitches and neural activity, but this would likely require extensive manual scoring. We could then treat movements not as continuous across all time-points, but instead using event-based analysis for example peri-movement time histograms for different types of movements at different ages, which is however outside of the scope of this study.

      (3) The rationale for binning the data analysis in early P11 is unclear. As the authors acknowledged, it is likely that the decoder captured active states from P11 onwards. Because active whisking begins around P14, it is unlikely to drive this change in network dynamics at P11. Does pupil dilation in the pups change during locomotor and resting states? Does the arousal state of the pups abruptly change at P11?

      We agree that P11 does not match any change in mouse behavior that we have been able to capture. However, arousal state in mice does change around postnatal day 11. This period marks a transition from immature, fragmented states to more organized and regulated sleep-wake patterns, along with increasing influence from neuromodulatory and sensory systems. All of these changes have been recently reviewed in Wu et al. 2024 (see also Martini et al. 2021). In addition, in the developing somatosensory system, before postnatal day 11 (P11), wake-related movements (reafference) are actively gated and blocked by the external cuneate nucleus (ECN, Tiriac et al. 2016 and all excellent recent work from the Blumberg lab). This gating prevents sensory feedback from wake movements from reaching the cortex, ensuring that only sleep-related twitches drive neural responses. However, around P11, this gating mechanism abruptly lifts, enabling sensory signals from wake movements to influence cortical processing—signaling a dramatic developmental shift from Wu et al. 2024

      Reviewer #1 (Recommendations for the authors):

      This manuscript represents a significant advancement in the field of developmental neuroscience, offering a powerful and elegant framework for longitudinal cellular tracking using the Track2p method combined with robust analytical approaches. The authors convincingly demonstrate that this integrated methodology provides an invaluable template for investigating complex developmental processes, including the emergence of sensory representations and higher cognitive functions.

      A major strength of this work is its emphasis on the power of longitudinal imaging to illuminate activity-dependent development. By tracking the same neurons over time, the authors open up new possibilities to uncover how early activity patterns shape later functional outcomes and the organization of neuronal assemblies-insights that would be inaccessible using conventional cross-sectional designs.

      Importantly, the manuscript highlights the potential for this approach to be extended even further, enabling continuous tracking into adulthood and thus offering an unprecedented window into long-term developmental trajectories. The authors also underscore the exciting opportunity to incorporate targeted perturbation experiments, allowing researchers to causally link early circuit dynamics to later outcomes.

      Given the increasing recognition that early postnatal alterations can underlie the etiology of various neurodevelopmental disorders, this work is especially timely. The methods and perspectives presented here are poised to catalyze a new generation of developmental studies that can reveal mechanistic underpinnings of both typical and atypical brain development.

      In summary, this is a technically impressive and conceptually forward-looking study that sets the stage for transformative advances in developmental neuroscience.

      Thank you for the thoughtful feedback—it's greatly appreciated!

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Figure 1. Consider merging or moving to Supplemental, as its rationale is well described in the text.

      We would like to retain the current figure as we believe it provides an effective visual illustration of our rationale that will capture readers' attention and could serve as a valuable reference for others seeking to justify longitudinal tracking of the developing brain. We hope the reviewer will understand our decision.

      (2) Some axis labels and panels are difficult to read due to small font sizes (e.g. smaller panels in Figures 5-7).

      Modified, thanks 

      (3) Supplementary Figures. The order of appearance in the main text is occasionally inconsistent.

      This was modified, thanks

      (4) Line 132. Add a reference to the registration toolbox used (elastix). A brief description of the affine transformation would also be helpful, either here or in the Methods section (p. 27).

      We have added reference to Ntatsis et al. 2023 and described affine transformation in the main text (lines 133-135): 

      Firstly, we estimate the spatial transformation between s0 and s1 using affine image registration (i.e. allowing shifting, rotation, scaling and shearing, see Fig. 2B, the transformation is denoted as T).

      (5) Lines 147-151. If this method is adapted from another work, please cite the source.

      Computing the intersection over union of two ROIs for tracking is a widely established and intuitive method used across numerous studies, representing standard practice rather than requiring specific citation. We have however included the reference to the paper describing the algorithm we use to solve the linear sum assignment problem used for matching neurons across a pair of consecutive days (Crouse 2016).

      (6) Line 218. "classical" or automatic?

      We meant “classical” in the sense of widely used. 

      (7) Lines 220-231. Did the authors find significant variability of successfully tracked neurons across mice? While the data for successfully tracked cells is reported (Figure 5B), the proportions are not. Could differences in neuron dropout across days and mice affect the analysis of neuronal activity statistics?

      We thank the reviewer for raising this important point. We computed the fraction of successfully tracked cells in our dataset and found substantial variability:

      Cells detected on day 0: [607, 1849, 2190, 1988, 1316, 2138] 

      Proportion successfully tracked: [0.47, 0.20, 0.36, 0.37, 0.41, 0.19]

      Notably, the number of cells detected on the first day varies considerably (607–2138 cells). There appears to be a trend whereby datasets with fewer initially detected cells show higher tracking success rates, potentially because only highly active cells are identified in these cases.

      To draw more definitive conclusions about the proportion of active cells and tracking dropout rates, we would require activity-independent cell detection methods (such as Cellpose applied to isosbestic 830 nm fluorescence, or ideally a pan-neuronal marker in a separate channel, e.g., tdTomato). We have incorporated the tracking success proportions into the revised manuscript.

      (8) Line 260. Please briefly explain, here or in the Methods, the rationale for using data from only 3 mice (rather than all 6) for evaluating tracking performance.

      We used three mice for this analysis due to the labor-intensive nature of manually annotating 64 ROIs across several days. Given the time constraints of this manual process, we determined that three subjects would provide adequate data to reliably assess tracking performance.

      (9) Line 277. Consider clarifying or rephrasing the phrase "across progressively shorter time intervals"? Do you mean across consecutive days?

      This has been rephrased as follows: 

      Additionally, to assess tracking performance over time, we quantified the proportion of reconstructed ground truth tracks over progressively longer time intervals (first two days, first three days etc. ‘Prop. correct’ in Fig. 4C-F, see Methods). This allowed us to understand how tracking accuracy depends on the number of successive sessions, as well as at which time points the algorithm might fail to successfully track cells.

      (10) Line 306. "we also provide additional resources and documentation". Please add a reference or link.

      Done, thanks

      Track2p  

      (11) Lines 342-344. Specify that the raster plots refer to one example mouse, not the entire sample.

      Done, thanks.

      (12) Lines 996-1002. Please confirm whether only successfully tracked neurons were used to compute the Pearson correlations between all pairs.

      Yes of course, this only applies to tracked neurons as it is impossible to compute this for non-tracked pairs.

      (13) Line 1003. Add a reference to scikit-learn.

      Reference was added to: 

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. 

      (14) Typos.Correct spacing between numeric values and units.

      We did not find many typos regarding spacing between the numerical value and the unit symbol (degrees and percent should not be spaced right?).

      Reviewer #3 (Recommendations for the authors):

      The font size in many of the figures is too small. For example, it is difficult to follow individual ROIs in Figure S3.

      Figure font size has been increased, thanks. In Figure S3 there might have been a misunderstanding, since the three FOV images do not correspond to the FOV of the same mouse across three days but rather to the first recording for each of the three mice used in evaluation (the ROIs can thus not be followed across images since they correspond to a different mouse). To avoid confusion we have labelled each of the FOV images with the corresponding mouse identifier (same as in Fig. 4 and 5).

    1. eLife Assessment

      This is a valuable study that explores the role of the conserved transcription factor POU4-2 in the maintenance, regeneration, and function of planarian mechanosensory neurons. The authors present convincing evidence provided by gene expression and functional studies to demonstrate that POU4-2 is required for the maintenance and regeneration of mechanosensory neurons and mechanosensory function in planarians. Furthermore, the authors identify conserved genes associated with human auditory and rheosensory neurons as potential targets of this transcription factor.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore the role of the conserved transcription factor POU4-2 in planarian maintenance and regeneration of mechanosensory neurons. The authors explore the role of this transcription factor and identify potential targets of this transcription factor. Importantly, many genes discovered in this work are deeply conserved, with roles in mechanosensation and hearing, indicating that planarians may be a useful model with which to study the roles of these key molecules. This work is important within the field of regenerative neurobiology, but also impactful for those studying evolution of the machinery that is important for human hearing.

      Strengths:

      The paper is rigorous and thorough, with convincing support for the conclusions of the work.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of the transcription factor Smed-pou4-2 in the maintenance, regeneration and function of mechanosensory neurons in the freshwater planarian Schmidtea mediterranea. First, they characterize the expression of pou4-2 in mechanosensory neurons during both homeostasis and regeneration, and examine how its expression is affected by the knockdown of soxB1, 2, a previously identified transcription factor essential for the maintenance and regeneration of these neurons. Second, the authors assess whether pou4-2 is functionally required for the maintenance and regeneration of mechanosensory neurons.

      Strengths:

      The study provides some new insights into the regulatory role of pou4-2 in the differentiation, maintenance, and regeneration of ciliated mechanosensory neurons in planarians.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors explore the role of the conserved transcription factor POU4-2 in planarian maintenance and regeneration of mechanosensory neurons. The authors explore the role of this transcription factor and identify potential targets of this transcription factor. Importantly, many genes discovered in this work are deeply conserved, with roles in mechanosensation and hearing, indicating that planarians may be a useful model with which to study the roles of these key molecules. This work is important within the field of regenerative neurobiology, but also impactful for those studying the evolution of the machinery that is important for human hearing. 

      Strengths: 

      The paper is rigorous and thorough, with convincing support for the conclusions of the work. 

      Weaknesses: 

      Weaknesses are relatively minor and could be addressed with additional experiments or changes in writing.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of the transcription factor Smed-pou4-2 in the maintenance, regeneration, and function of mechanosensory neurons in the freshwater planarian Schmidtea mediterranea. First, they characterize the expression of pou4-2 in mechanosensory neurons during both homeostasis and regeneration, and examine how its expression is affected by the knockdown of soxB1, 2, a previously identified transcription factor essential for the maintenance and regeneration of these neurons. Second, the authors assess whether pou4-2 is functionally required for the maintenance and regeneration of mechanosensory neurons. 

      Strengths: 

      The study provides some new insights into the regulatory role of pou4-2 in the differentiation, maintenance, and regeneration of ciliated mechanosensory neurons in planarians. 

      Weaknesses: 

      The overall scope is relatively limited. The manuscript lacks clear organization, and many of the conclusions would benefit from additional experiments and more rigorous quantification to enhance their strength and impact. 

      Reviewing Editor Comments: 

      (1) Quantification of pou4-2(+) cells that express (or do not express) hmcn-1-L and/or pkd1L-2(-) is a common suggestion amongst reviewers. It is recognized that Ross et al. (2018) showed that pkd1L-2 and hmcn-1L expression is detected in separate cells by double FISH, and the analysis presented in Supplementary Figure S3 is helpful in showing that some cells expressing pou4-2 (magenta) are not labeled by the combined signal of pkd1L-2 and hmcn-1-L riboprobes (green). However, I am not sure that we can conclude that pkd1L-2 and hmcn-1-L are effectively detected when riboprobes are combined in the analysis. Therefore, quantification of labeled cells as proposed by Reviewers 1 and 2 would help.

      Combining riboprobes is a standard approach in the field, and we chose this method as a direct way to determine which cells lack expression of both genes. We agree that providing the raw quantification data would be helpful for readers, and we included this data in Supplementary File S7; the file contains the quantification information for this dFISH experiment represented in Supplementary Figure 3.

      (2) It may be helpful to comment on changes (or lack of changes) in atoh gene RNA levels in RNAseq analyses of pou4-2 animals. As mentioned by one of the reviewers, in situs that don't show signal are inconclusive in this regard. 

      We fully agree with both reviewers. Two of the planarian atonal homologs are difficult to detect and produce background signals, which we attempted and previously reported in Cowles et al. Development (2013). We conceived performing reciprocal RNAi/in situ experiments, born out of curiosity given the reported role of atonal in the pou4 cascade in other organisms. However, these exploratory experiments lacked a strong rationale for inclusion, particularly given that pou4-2 and the atonal homologs do not share expression patterns, co-expression, or differential expression in our RNA-seq dataset. Therefore, we decided to omit the atonal in situs following pou4-2 RNAi. We retained the experiments showing that knockdown of the atonal genes does not show robust effects on the mechanosensory neuron pattern, as expected. We thank the reviewing editor and reviewers for pinpointing the concern. We agree that additional experiments, such as qPCR experiments, would be needed. We reasoned that while these additional experiments could be informative, they are unlikely to alter the key conclusions of this study substantially.

      (3) There seem to be typos at bottom of Figure 10 and top of page 11 when referencing to Figure 4B (should be to 5B instead): "While mechanosensory neuronal patterned expression of Eph1 was downregulated after pou4-2 and soxB1-2 inhibition, low expression in the brain branches of the ventral cephalic ganglia persisted (Figure 4B)." 

      Thank you! We have fixed those.

      (4) Typo (page 13; kernel?): "...to test to what extent the Pou4 gene regulatory kernel is conserved among these widely divergent animals." 

      Regulatory kernels are defined as the minimal sets of interacting genes that drive developmental processes and are the core circuits within a gene regulatory network, but we recognize that this might not be as well known, so we have changed the term to “network” for clarity.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors indicate that they are interested in finding out whether POU4-2 is important in the creation of mechanosensory neurons in adulthood as well as in embryogenesis (in other words, whether the mechanism is "reused during adult tissue maintenance and regeneration"). The manuscript clearly shows that planarian POU4 -2 is important in adult neurogenesis in planarians, but there is no evidence presented to show that this is a recapitulation of embryogenesis. Is pou4-2 expressed in the planarian embryo? This might be possible to examine by ISH or through the evaluation of sequencing data that already exists in the literature. 

      We agree that these statements should be precise. We have clarified when we make comparisons to the role of Pou4 in sensory system development in other organisms versus its role in the adult planarian. We examined its expression using the existing database of embryonic gene expression. Thanks for hinting at this idea. We performed BLAST in Planosphere (Davies et al., 2017) to cross-reference our clone matching dd_Smed_v6_30562_0_1, which is identical to SMED30002016. The embryonic gene expression for SMED30002016 indicates this gene is expressed at the expected stages given prior knowledge of the timing of organ development in Schmidtea mediterranea (a positive trend begins at Stage 5, with a marked increase by Stage 6 that remains comparable to the asexual expression levels shown). We thank the reviewer for pointing out this oversight. We have incorporated this result in the paper as a Supplementary Figure and discuss how we can only speculate that it has a similar role as we detect in the adult asexual worms.

      (2) Can it be determined whether the punctate pou4-2+ cells outside of the stripes are progenitors or other neural cell types? Are there pou4-2+ neurons that are not mechanosensory cell types? Could there be other roles for POU4-2 in the neurogenesis of other cell types? It might help to show percentages of overlap in Figure 4A and discuss whether the two populations add up to 100% of cells. 

      These are good questions that arise in part from other statements that need clarification in the text (pointed out by Reviewer 2). We think some of the dorsal pou4-2<sup>+</sup> might represent progenitor cells undergoing terminal differentiation (see Supplementary Figure 4). We attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. In response to this helpful comment, we have included this question as a future direction in the revised Discussion. Finally, we have edited our description of the expression pattern. We already pointed out that there are other cells on the ventral side that are not affected when soxB1-2 is knocked down. We attempted to resolve the potential identity of those cells working with existing scRNA-seq data in collaboration with colleagues, but their low abundance made it difficult to distinguish other populations. While we acknowledge this interesting possibility, we have chosen to focus this report on the role of pou4-2 downstream of soxB1-2, as this represents the most well-supported aspect of the dataset and was positively highlighted by both the reviewer and editor.

      (3) The authors discuss many genes from their analysis that play conserved roles in mechanosensation and hearing. Were there any conserved genes that came up in the analysis of pou4-2(RNAi) planarians that have not yet been studied in human hearing and neurodevelopment? I am wondering the extent to which planarians could be used as a discovery system for mechanosensory neuron function and development, and discussion of this point might increase the impact of this paper or provide critical rationale for expanding work on planarian mechanosensation. 

      Indeed, we agree that planarians could be used to identify conserved genes with roles in mechanosensation and have included this point in the Discussion. In this study, we have focused on demonstrating the conservation of gene regulation. While this study was initially based on a graduate thesis project, we have since generated a more comprehensive dataset from isolated heads, which we are currently analyzing. This has been emphasized in the revised Discussion.

      Minor: 

      (1) For Figure 6E, the authors could consider showing data along a negative axis to indicate a decrease in length in response to vibration and to more clearly show that this decrease doesn't occur as strongly after pou4-2(RNAi). 

      We displayed this behavior as the percent change, as this is a standard way to represent this data. As the percent change is a positive value, we represent the data as these positive values.

      (2) The authors should consider quantifying the decrease of pou4-2 mRNA after atonal(RNAi) conditions, either by RT-qPCR or cell quantification. Visually, the signal in the stripes after atoh8-2(RNAi) seems lower, particularly in the tail. The punctate pattern outside the stripes may also be decreased after atoh8-1(RNAi). But quantification might strengthen the argument. 

      We agree with the reviewer and acknowledge that we should have been more cautious in interpreting these results. Those two genes are difficult to detect and did not show specific patterns in Cowles et al. (2013). The reviewer is correct that additional experiments are necessary before reaching conclusions, but we do not think as discussed earlier we do not think new experiments would provide insights for the major conclusions. These experiments were exploratory in nature and tangential to our main conclusions, especially in the absence of reciprocal evidence (e.g., shared expression patterns, co-expression, or differential expression in our RNA-seq data. Therefore, we decided to eliminate the atonal in situs following pou4-2 RNAi.

      Reviewer #2 (Recommendations for the authors): 

      A. Expression of pou4-2 in ciliated mechanosensory neurons: 

      (1) The conclusion that pou4-2 is expressed in ciliated mechanosensory neurons is primarily based on co-expression analysis using a published single-cell dataset. Although the authors later show that a subset of pou4-2 cells also express pkd1L-2 (Figure 4A), a known marker of ciliated mechanosensory neurons, this finding is not properly quantified. I recommend moving Figure 4A to earlier in the manuscript (e.g., to Figure 2) and expanding the analysis to include additional known markers of this cell type. Proper quantification of the extent of co-localization is necessary to support the claim robustly. 

      As pointed out by the reviewer, there is substantive evidence from our lab and other reports. King et al. also showed pou4-2 and pkd1L-2 ‘regulation’ by their scRNA-seq data, and this function is conserved in the acoel Hofstenia miamia (Hulett et al., PNAS 2024 ). Our analysis shows convincing co-localization by scRNA-seq and expression of soxB1-2 and neural markers in the respective populations. Furthermore, we included colocalization of pou4-2 with mechanosensory genes using fluorescence in situ hybridization (Figure 3B, Supplementary Figure 4, and Supplementary File S7). We are confident the data conclusively show pou4-2 regulates pkd1L-2 expression in a subset of mechanosensory neurons. Given the strength of existing observations and previously published data, we believe that additional staining experiments are not essential to support this conclusion. 

      (2) There appears to be a conceptual inconsistency in the interpretation of pou4-2 expression dynamics. On one hand, the authors suggest that delayed pou4-2 expression indicates a role in late-stage differentiation (p.6). On the other hand, they propose that pou4-2 may be expressed in undifferentiated progenitors to initiate downstream transcriptional programs (p.8). These interpretations should be reconciled. Additionally, claims regarding pou4-2 expression in progenitor populations should be supported by co-localization with established stem cell or progenitor markers, rather than inferred from signal intensity alone. 

      This is an excellent point, and we agree with the reviewer that this section requires editing. As described in response to Reviewer 1, we attempted BrdU pulse chase experiments but were not successful in consistently detecting pou4-2 at sufficient levels with our protocol. Furthermore, we could not obtain strong signals in double labeling experiments in pou4-2 in situs combined with piwi-1 or PIWI-1 antibodies. We will include those experiments as a future direction and amend our conclusions accordingly.

      (3) The expression pattern shown in Figure 1B raises questions about the precise anatomical localization of pou4-2 cells. It is unclear whether these cells reside in the subepidermal plexus or the deeper submuscular plexus, which represent distinct neuronal layers (Ross et al., 2017). The observed signals near the ventral nerve cords could suggest submuscular localization. To clarify this, higher-resolution imaging and co-staining with region-specific neural markers are recommended. 

      In Ross et al. (2018), we showed that the pkd1L-2<sup>+</sup> cells are located submuscularly. The pkd1L-2 cells express pou4-2, thus the pou4-2<sup>+</sup> cells are located in the same location. Based on co-expression data and co-expression with PKD genes, we are confident it is submuscular.

      B. The functional requirements of pou4-2 in the maintenance of mechanosensory neurons: 

      (1) To evaluate the functional role of pou4-2 in maintaining mechanosensory neurons, the authors performed whole-animal RNA-seq on pou4-2(RNAi) and control animals, identifying a significant downregulation of genes associated with mechanosensory neuron expression. However, the presentation of these findings is fragmented across Figures 3, 4, and 5. I recommend consolidating the RNA-seq results (Figure 3) and the subsequent validation of downregulated genes (Figures 4 and 5) into a single, cohesive figure. This would improve the logical flow and clarity of the manuscript. 

      As suggested by the reviewer, we have combined Figures 3 and 4 (new Figure 3), which we believe improves the flow. We decided to keep Figure 5 (new Figure 4) as a standalone because it focuses on the characterization of new genes revealed by RNAseq and scRNA-seq data mining that were not previously reported in Ross et al. 2018 and

      2024.

      (2) In pou4-2(RNAi) animals, pkd1L-2 expression appears to be entirely lost, while hmcn-1-L shows faint expression in scattered peripheral regions. The authors suggest that an extended RNAi treatment might be necessary to fully eliminate hmcn-1-L expression. However, an alternative explanation is that pou4-2 is not essential for maintaining all hmcn-1-L cells, particularly if pou4-2 expression does not fully overlap with that of hmcn-1-L. This possibility should be acknowledged and discussed. 

      We agree and have acknowledged this point in the revised text.

      (3) On page 9, the section title claims that "Smed-pou4-2 regulates genes involved in ciliated cell structure organization, cell adhesion, and nervous system development." While some differentially expressed genes are indeed annotated with these functions based on homology, the manuscript does not provide experimental evidence supporting their roles in these biological processes in planarians. The title should be revised to avoid overstatement, and the limitations of extrapolating a function solely from gene annotation should be acknowledged. 

      Excellent point. We have edited the text to indicate that the genes were annotated or implicated.

      (4) The cilia staining presented in Figure 6B to support the claim that pou4-2 is required for ciliated cell structure organization is unconvincing. Improved imaging and more targeted analysis (e.g., co-labeling with mechanosensory markers) are needed to support this conclusion. 

      We have addressed this concern by adjusting the language to be more precise and indicate that the stereotypical banded pattern is disrupted with decreased cilia labeling along the dorsal ciliated stripe. Indeed, our conclusion overstated the observations made with the staining and imaging resolution. Thank you.

      C. The functional requirements of pou4-2 in the regeneration of mechanosensory neurons: 

      To evaluate the role of pou4-2 in the regeneration of mechanosensory neurons, the authors performed amputations on pou4-2(RNAi) and control(RNAi) animals and assessed the expression of mechanosensory markers (pkd1L-2, hmcn-1-L) alongside a functional assay. However, the results shown in Figure 4B indicate the presence of numerous pkd1L-2 and hmcn-1-L cells in the blastema of pou4-2(RNAi) animals. This observation raises the possibility that pou4-2 may not be essential for the regeneration of these mechanosensory neurons. The authors should address this alternative interpretation. 

      Our interpretation is that there were very few cells expressing the markers compared to controls. The pattern was predominantly lost, which is consistent with other experiments shown in the paper. However, we have added the additional caveat suggested by the reviewer.

      Minor points: 

      (1) On p.8, the authors wrote "every 12 hours post-irradiation". However, this is not consistent with the figure, which only shows 0, 3, 4, 4.5, 5, and 5.5 dpi. 

      We corrected this. Thank you for catching the mistake!

      (2) On p.12, the authors wrote "Analysis of pou4-2 RNAi data revealed differentially expressed genes with known roles in mechanosensory functions, such as loxhd-1, cdh23, and myo7a. Mutations in these genes can cause a loss of mechanosensation/transduction". This is misleading because, to my knowledge, the role of these genes in planarians is unknown. If the authors meant other model systems, they should clearly state this in the text and include proper references. 

      The reviewer is correct that we are referencing findings from other organisms. We have clarified this point in the revised text. The appropriate references were included and cited in the first version.

      (3) On p.7, the authors wrote, "conversely, the expression of atonal genes was unaffected in pou4-2 RNAi-treated regenerates (Supplementary Figure S2B)". However, it is unclear whether the Atoh8-1 and Atoh8-2 signals are real, as the quality of the in situ results is too low to distinguish between real signals and background noise/non-specific staining. 

      This valid concern was addressed in our response to Reviewer 1. We have adjusted the figure and the text accordingly.

      (4) On p.6 the authors wrote "pinpointed time points wherein the pou4-2 transcripts were robustly downregulated". However, the current version of the manuscript does not provide data explaining why Pou4-2 transcripts are robustly downregulated on day 12. 

      Yes, we determined the appropriate time points using qPCR for all sample extractions. As an example, see the figure for qPCR validation at day 12 showing that pou4-2 and pkd1L2 are down.

      Author response image 1.

      In this graph, samples labeled “G” represent four biological controls of gfp(RNAi) control animals, and samples labeled “P” represent four biological controls of pou4-2(RNAi)animals at day 12 in the RNAi protocol.

      (5) On p.13, the authors wrote "collecting RNA from how animals." Is this a typo? 

      Thanks for catching the typo. It should read “whole” animals. We have corrected this.

      (6) On p.14, the authors wrote "but the expression patterns of planarian atonal genes indicated that they represent completely different cell populations from pou4-2-regulated mechanosensory neurons". However, this is unclear from the images, as the in situ staining of Atoh8-1 and Atoh82 are potentially failed stainings. 

      We agree. We have edited accordingly.

    1. eLife Assessment

      This valuable manuscript presents an open-source and low-cost acoustic system for quantifying biting and chewing in mice. The approach is carefully validated against human observers, demonstrating strong methodological reliability and enabling high-resolution analysis of feeding microstructure. The tool has broad relevance for studies of appetite circuits and pharmacological interventions. A significant contribution is the identification of previously unrecognized "meal-related" neurons in the lateral hypothalamus, providing novel biological insight into food consumption. While the support for the methodological advances is compelling and robust, some circuit-level conclusions are preliminary or incomplete, relying on small pilot samples and manual classification, and should be interpreted with caution. This paper will be of interest to those interested in ingestive behavior and/or hypothalamus.

    2. Reviewer #1 (Public review):

      This is an interesting and valuable paper by Gil-Lievana, Arroyo et al. that presents an open-source method (the "Crunchometer") for quantifying biting and chewing behavior in mice using audio detection. The work addresses an important and unmet need in the field: quantitative measures of feeding behavior with solid foods, since most prior approaches have been limited to liquids. The authors make a clear and compelling case for why this problem is important, and I fully agree with their motivation.

      The system is carefully validated against human-scored video data and is shown to be at least as accurate, and in some cases more accurate, than human observers. This is a major strength of the study. I also particularly appreciate the demonstration of the technology in the context of LHA circuitry, which nicely illustrates its utility and importance for mechanistic studies of feeding. I also appreciate the ability to readily time-lock neural data to individual crunches. Overall, the manuscript is well-executed and represents a useful contribution to the field.

      The comments I have are largely minor and should be straightforward to address:

      (1) The authors should report sample sizes for all mouse cohorts, either alongside the statistics or in the figure legends for mean data.

      (2) Clarification is needed as to whether crunch detection fidelity is influenced by the hardness or softness of the food. The focus here is on standard pellets, with some additional high-fat pellet data, but it would be useful to know how generalizable the method is across different textures.

      (3) The authors should comment on how susceptible the Crunchometer is to background noise. For example, how well does it perform in the presence of white noise, experimenter movement, or other task-related sounds?

      (4) Chemogenetic activation of LHA GABAergic neurons is used. DREADD-based activation may strongly drive these neurons in a way that is not directly comparable to optogenetic or more physiological manipulations. While I do not think additional experiments are required, it would strengthen the discussion to briefly acknowledge this limitation.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript introduces the Crunchometer, a low-cost, open-source acoustic platform for monitoring the microstructure of solid food intake in mice. The Crunchometer is designed to overcome the limitations of existing methods for studying feeding behavior in rodents. The goal was to provide a tool that could precisely capture the microstructure of solid food intake, something often overlooked in favor of liquid-based assays, while being affordable, scalable, and compatible with neural recording techniques. By doing so, the authors aimed to enable detailed analysis of how physiological states, drugs, and specific neural circuits shape naturalistic feeding behaviors.

      Strengths:

      The study's strengths lie in its clear innovation, methodological rigor in validation against human annotation, and demonstration of broad utility across behavioral and neuroscience paradigms. The approach addresses a significant methodological gap in the field by moving beyond liquid-based feeding assays and provides an accessible tool for precisely dissecting ingestive behavior. The system is validated across multiple contexts, including physiological state (fed vs. fasted), pharmacological manipulation (semaglutide), and circuit-level interventions (chemogenetic activation of LH neurons), and is further shown to integrate seamlessly with both electrophysiology and calcium imaging.

      (1) Introduces a low-cost, open-source acoustic tool for measuring solid food intake, filling a critical gap left by expensive and proprietary systems.

      (2) Makes the method easily adoptable across labs with detailed setup instructions and shared benchmark datasets.

      (3) Provides high temporal precision for detecting bite events compared to human observers.

      (4) Successfully distinguishes feeding microstructure (bites, bouts, IBIs, gnawing vs. consumption) with greater objectivity than manual annotation.

      (5) Demonstrates compatibility with electrophysiology and calcium imaging, enabling fine-scale alignment of neural activity with feeding behavior.

      (6) Effectively discriminates between fed vs. fasted states, validating physiological sensitivity.

      (7) Captures the pharmacological effects of semaglutide, although this is really just reduced feeding and associated readouts (bouts, latency, etc).

      (8) Has potential to distinguish consummatory vs. non-consummatory behaviors (e.g., food spillage, gnawing); however, the current SVM model struggles to separate biting from gnawing due to similar acoustic profiles, and manual validation is still required.

      (9) Provides potential for closed-loop experiments.

      Weaknesses:

      Several limitations temper the strength of the conclusions: the supervised classifier still requires manual correction for gnawing, generalizability across different setups is limited, and the neuroscience findings, particularly calcium imaging of GABAergic and glutamatergic neurons, are based on small pilot samples. These issues do not undermine the value of the tool, but mean that the neural circuit findings should be interpreted as preliminary.

      (1) Some neuroscience findings (calcium imaging of GABAergic vs. glutamatergic neurons) are based on small pilot samples (n=2 mice per condition), limiting generalizability.

      (2) Chemogenetic and pharmacological experiments used small cohorts, raising statistical power concerns.

      (3) Correlation with actual food intake is modest and sometimes less accurate than human observers.

      (4) Sensitive to hoarding behavior, which can reduce detection accuracy and requires manual correction for misclassifications (e.g., tail movements, non-food noises). However, these limitations are discussed and not ignored.

      Conclusion:

      Overall, this is an exciting and impactful methodological advance that will likely be widely adopted in the field. I recommend minor revisions to clarify the limits of classifier generalizability, better contextualize the small-sample neuroscience findings as pilot data, and discuss future directions (e.g., real-time closed-loop applications).

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript provides detailed information on the construction of open-source systems to monitor ingestive behavior with low-cost equipment. Overall, this is a welcome addition to the arsenal of equipment that could be used to make measurements. The authors show interesting applications with data that reveal important neurophysiological properties of neurons in the lateral hypothalamus. The identification of previously unknown "meal-related" neurons in the LH highlights the utility of the device and is a novel insight that should spark further investigation on the LH. This manuscript and videos provide a wealth of useful information that should be a must-read for anyone in the ingestive behavior or hypothalamus fields.

      A scholarly introduction to the history and utility of various ways feeding is measured in rodents is provided. One point - the microstructure of eating solid food - has been studied extensively (for one of many studies, see https://doi.org/10.1371/journal.pone.0246569 ). However, I agree that the crunchometer will allow for more people to access recordings during food intake and temporally lock consummatory behavior to neural activity.

      Questions on results:

      (1) It is unclear why 10% sucrose solution was used as a liquid instead of water, given that the study is focusing on the solid food source.

      (2) It is unclear how essential the human verification is in the pipeline - results for Figure 1 keep referring to the verification as essential. Is that dispensable once the ML algorithms have been trained?

      (3) The ability to extrapolate food quantity consumed is limited, with high variability. This limitation does not undercut the utility of the crunchometer, but should be highlighted as one of the parameters that are not suitable for this system. This limitation should be added to the limitations section.

      (4) The ability to discriminate between gnawing and consummatory behavior is a strength (Figure 5), and these findings are important. However, it is unclear what can be made of mice that have 'gnawing' behavior in the fasted state (like in Figure 3). It seems they would need to be eliminated from the analysis with this tool?

      (5) Why is there a post-semaglutide fed group and not a fasted group in Figure 4? It seems both would have been interesting, as one could expect an effect on feeding even 24h after semaglutide treatment. This would help parse the preference better because the animals eat such a small amount on semaglutide, that it is hard to compare to the fasted condition with saline treatment.

      (6) The identification of 'meal-related' neurons in the LH is another strength of the manuscript. Although there is currently insufficient data, could similar recordings be used to give a neurophysiological definition of a 'meal' duration/size? Typically, these were somewhat arbitrarily defined behaviorally. Having a neural correlate to a 'meal' would be a powerful tool for understanding how meals are involved in overall caloric intake.

      (7) The conclusion in the title of Figure 8 is premature, given the pilot nature and small number of neurons and mice sampled.

      Conclusion:

      Overall, this report on the Crunchometer is well done and provides a valuable tool for all who study food intake and the behaviors around food intake. Clarification or answers to the points above will only further the utility and understanding of the tool for the research community. I am excited to see the future utility of this tool in emerging research.

    1. eLife Assessment

      This paper is an important overview of the currently published literature on low-intensity focused ultrasound stimulation (TUS) in humans, providing a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects. The overall synthesis is convincing. The database proposed by the paper has the potential to become a key community resource if carefully curated and developed.

    2. Reviewer #1 (Public review):

      This paper is a relevant overview of the currently published literature on low-intensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems, nevertheless, relevant to summarise the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data free permutation of the parameter space.

      A database summarising the literature and allowing for quantitative assessment of these studies is a key contribution of the paper. If curated well, it can become a valuable community resource.

      Comments on revisions:

      The paper is much improved. There remain a few caveats the authors may want to address.

      I'm not going to dwell on this if the authors don't agree, but remain critical about the inclusion of TPS in the discussion. It's comparing apples and oranges, and unless there's a personal interest the authors have in TPS, it remains puzzling why it is included in the first place. As per my previous review, the literature on TPS, and especially the main example cited, has been highly criticised, including national patient and medical associations. A mere disclaimer that more work is needed isn't enough, in this reviewer's opinion - I simply don't understand why the authors go out on a limb here when the rest of the paper is done so well and thoroughly.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This paper is a relevant overview of the currently published literature on lowintensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      We thank the reviewer for their positive assessment of the revised manuscript and the potential importance of the resource to the TUS community. 

      Suggestions:

      The paper remains lengthy and somewhat unfocused, to the detriment of readability. One can understand that the authors wish to include as much information as possible, but this reviewer is sceptical that this will aid the use of the databank, or help broaden the readership. For one, there is a good chunk of repetition throughout. The intro is also somewhat oscillating between TMS, tDCS and TUS. While the former two help contextualizing the issue, it doesn't seem necessary. In the section on clinical applications of TUs and possible outcomes of TUS, there's an imbalance of the content across examples. That's in part because of the difference in knowledge base but some sections could probably be shortened, eg stroke. In any case, the authors may want to consider whether it is worth making some additional effort in pruning the paper

      We thank the reviewer for these suggestions. We have checked for redundancy and that the clinical review section is more balanced, although some of the sections have more TUS studies than others, therefore some imbalance is unavoidable. As some examples, we have condensed the “Stroke and neuroprotection in brain injury” section (lines 624-647). This helps to improve the clarity and readability of the manuscript.

      The terms or concept of enhancement and suppression warrant a clearer definition and usage. In most cases, the authors refer to E/S of neural activity. Perhaps using terms such as "neural enhancement" etc helps distinguish these from eg behavioural or clinical effects. Crucially, how one maps onto the other is not clear. But in any case, a clear statement that the changes outlined on lines 277ff do not

      We thank the reviewer for this point and agree that it is important to distinguish neural E/S, as we had intended, from behavioral effects. In the first instance and in several places we add ‘neural’ before enhancement/suppression.  Also see Lines 276-279: Probable net neural enhancement versus suppression was characterised as follows. Note that our use of the terms enhancement and suppression refers exclusively to the increase or decrease of neural activity, respectively, as measured by, neurophysiological methods (EEG-ERPs, BOLD fMRI, etc.) and does not imply equivalent changes in behavioural responses 

      Please see also lines 108-116.

      Re tb-TUS (lines 382ff), it is worth acknowledging here that independent replication is very limited (eg Bao et al 2024; Fong et al bioRxiv 2024) and seems to indicate rather different effects

      We have updated this section by referencing Bao et al. and Fong et al., as examples of the limited independent replication of tbTUS results. Please see lines 392-396. “However, independent replication of these findings remains limited. For example, Bao, found reduced motor cortex excitability – measured as decreased TMS-MEP amplitude in M1 -- that lasted up to 30 minutes post-sonication (Bao et al., 2024). Whereas Fong reported no significant effects between tbTUS and sham conditions in M1 excitability (Fong et al., 2024).”

      The comparison with TPS is troublesome. For one, that original study was incredibly poorly controlled and designed. Cherry-picking individual (badly conducted) proof-of-principle studies doesn't seem a great way to go about as one can find a match for any desired use or outcome. Moreover, other than the concept of "pulsed" stimulation, it is not clear why that original study would motivate the use of TUS in the way the authors propose; both types of stimulation act in very different ways (if TPS "acts" at all). But surely the cited TPS study does not "demonstrate the capability for TUS for pre-operative cognitive mapping". As an aside, why the authors feel the need to state the "potential for TPS... to enhance cognitive function" is unclear, but it is certainly a non-sequitur. This review feels quite strongly that simplistic analogies such as the one here are unnecessary and misleading, and don't reflect the thoughtful discussion of the rest of the paper. In the other clinical examples, the authors build their suggestions on other TUS studies, which seems more sensible.

      This is an excellent point, and we have removed that statement replacing it with: “However, TPS effects studies remain highly limited and would require further study and comparison to effects with other TUS protocols.”. Please see lines 561-562. We thank the reviewer for the supportive comments on the rest of the review.

    1. eLife Assessment

      This important study addresses a topic that is frequently discussed in the literature but is under-assessed, namely correlations among genome size, repeat content, and pathogenicity in fungi. Contrary to previous assertions, the authors found that repeat content is not associated with pathogenicity. Rather, pathogenic lifestyle was found to be better explained by the number of protein-coding genes, with other genomic features associated with insect association status. The results are considered solid, although there remain concerns about potential biases stemming from the underlying data quality of the analyzed genomes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analyses of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of life styles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of protein coding genes, including total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated to insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The size of the dataset is impressive and likely makes the conclusions robust. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation. The authors have done an excellent investigation into these issues, but these show concerning trends, and my concerns are not as assuaged as the authors.

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. From the number of scaffolds its clear that the quality of the assemblies varies dramatically, even within categories of long- and short-read. This is going to impact many of the values important for this study, as the authors show.

      I have considerable worries that the gene annotation methods could impart biases that significantly effect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors observe in their extended analysis. While the authors are not concerned about phylogenetic distance from the training species, due to prevailing trends, I am not as convinced. In figure S12, the Augustus features appear to have considerably more variation in values for the H2 set and possible the microascales. It is unclear how this would effect the conclusions in this study.

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will included multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions. The authors show that there is a significant bias in their set.

      To a lesser degree I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strengths of this study lie in (i) the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes, (ii) the quality of the analyses and the quality of the presentation of the results, (iii) the importance of the authors' findings.

      Weaknesses:

      The weakness is a common issue in most comparative genomics studies in fungi, but it remains important and valid to highlight it. Defining lifestyles is complex because many fungi go through different lifestyles during their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In many fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which does not necessarily mean that this substrate is a key part of the life cycle. The authors discuss this issue, but they do not eliminate the underlying uncertainties.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analysis of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of lifestyles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of proteincoding genes, including the total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated with insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation.

      We thank the reviewer for all the comments. We are aware that the genome assemblies are of heterogeneous quality since they come from many sources. The goal of this study was to make the best use of the existing assemblies, with the assumption that noise introduced by the heterogeneity of sequencing methods should be overcome by the robustness of evolutionary trends and the breadth and number of analyzed assemblies. Therefore, at worst, we would expect a decrease in the power to detect existing trends. It is important to note that the only way to confidently remove all potential biases would be to sequence and analyze all species in the same way; this would require a complete study and is beyond the scope of the work presented here. Nevertheless some biases could affect the results in a negative way, eg. is if they affect fungal lifestyles differently. We therefore made an attempt to explore the impact of sequencing technology, gene and repeat annotation approach among genomes of different fungal lifestyles. Details are described in Supplementary Results and below. Overall, even though the assembly size and annotations conducted with Augustus can sometimes vary compared to annotations from other resources, such as JGI Mycocosm, we do not observe a bias associated with fungal lifestyles. Comparison of annotations conducted with Augustus and JGI Mycocosm dataset revealed variation in gene-related features that reflect biological differences rather than issues with annotation.  

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. Not only has the impact of the sequencing method not been evaluated, but the technology is not even listed in Table S1. From the number of scaffolds it is clear that the quality of the assemblies varies dramatically. This is going to impact many of the values important for this study, including genome size, repeat content, and gene number.

      We have now added sequencing technology in Table S1 as it was reported in NCBI. We evaluated the impact of long-read (Nanopore, PacBio, Sanger) vs short-read assemblies in Supplementary Results. In short, the proportion of different lifestyles (pathogenic vs. nonpathogenic, IA vs non-IA) were the same for short- and long-read assemblies. Indeed, longread assemblies were longer, had a higher fraction of repeats and less genes on average, but the differences between pathogenic vs. non-pathogenic (or IA vs non-IA) species were in the same direction for two sequencing technologies and in line with our results. There were some discrepancies, eg. mean intron length was longer for pathogens with long-read assemblies, but slightly shorter on average for short-read assemblies (and to lesser extent GC and pseudo tRNA count), which could explain weaker or mixed results in our study for these features.

      Additionally, since some filtering was employed for small contigs, this could also bias the results.

      The reason behind setting the lower contig length threshold was the fact that assemblies submitted to NCBI have varying lower-length thresholds. This is because assemblers do not output contigs above a certain length, and this threshold can be manipulated by the user. Setting a common min contig length was meant to remove this variation, knowing that any length cut-off will have a larger effect on short-read based assemblies than long-read-based assemblies. Notably, genome assemblies of corresponding species in JGI Mycocosm have a minimum contig length of 865 bp, not much lower than in our dataset. Importantly, in a response to a comment of previous reviewer, repeat content was recalculated on raw assembly lengths instead of on filtered assembly length. 

      I have considerable worries that the gene annotation methods could impart biases that significantly affect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors reported that it under-annotated the genomes by 10%. I suspect it will have performed worse with increasing phylogenetic distance from the reference genomes. None of the species used for training were insectassociated, except for those generated by the authors for this study. As this feature was used to split the data it could impact the results. Some major results rely explicitly on having good gene annotations, like exon length, adding to these concerns. Looking manually at Table S1 at Ophiostoma, it does seem to be a general trend that the genomes annotated with Magnaporthe grisea have shorter exons than those annotated with H294. I also wonder if many of the trends evident in Figure 5 are also the result of these biases. Clades H1 and G each contain a species used in the training and have an increase in genes for example.

      We have applied 6 different reference training sets (instead of one) precisely to address the problem of increasing phylogenetic distance of annotated species. To further investigate the impact of chosen species for training, we plotted five gene features (number of genes, number of introns, intron length, exon length, fraction of genes with introns) as a function of   branch length distance from the species (or genus) used as a training set for annotation. We don’t see systematic biases across different training sets. However,  trends are very clear for clades annotated with fusarium. This set of species includes Hypocreales and Microascales, which is indeed unfortunate since Microascales is an IA group and at the same time the most distant from the fusarium genus in this set. To clarify if this trend is related to annotation bias or a biological trend, we compared gene annotations with those of Mycocosm, between Hypocreales Fusarium species, Hypocreales non-Fusarium species, and Microascales, and we observe exactly the same trends in all gene features. 

      Similarly, among species that were annotated with magnaporthe_grisea, Ophiostomatales (another IA group) are among the most distant from the training set species. Here, however, another order, Diaporthales, is similarly distant, yet the two orders display different feature ranges. In terms of exon length, top 2 species in this training set include Ophiostoma, and they reach similar exon length as the Ophiostoma species annotated using H294 as a training set. In summary, it is possible that the choice of annotation species has some effect on feature values; however, in this dataset, these biases are likely mitigated by biological differences among lifestyles and clades. 

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will include multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions.

      We have searched for the same species in JGI Mycocosm and were able to retrieve 58 genome assemblies with matching species, with 19 of them belonging to the same strain as in our dataset. Overall we found no differences in genome assembly length. Interestingly, repeat content was slightly higher for NCBI genome assemblies compared to JGI Mycocosm assemblies, perhaps due to masking of host multicopy genes, as the reviewer mentioned. By comparing pathogenic and non-pathogenic species for the same 19 strains, we observe that JGI Mycocosm annotates fewer repeats in pathogenic species than Augustus annotations (but trends are similar when taking into account 58 matching species). Given a small number of samples, it is hard to draw any strong conclusions; however, the differences that we see are in favor of our general results showing no (or negative) correlation of repeat content with pathogenicity. 

      To a lesser degree, I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content, and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

      In our case the use of protein sequences could underestimate divergence between closely related strains from the same species. We also excluded strains of the same species to avoid overrepresentation of closely related strains with similar lifestyle traits. We agree that some changes in the genome architecture can occur very rapidly, even at the species level, though analyzing emergence of eg. pathogenicity at the population level would require a slightly different approach which accounts for population-level processes. 

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strength of this study lies in the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes.

      Weaknesses:

      The main strength of the study is not the clarity of the conclusions.

      (1) This is due firstly to the presentation of the hypotheses. The introduction is poorly structured and contradictory in some places. It is also incomplete since, for example, fungusinsect associations are not mentioned in the introduction even though they are explicitly considered in the analyses.

      We thank the reviewer for pointing this out. We strived to address all comments and suggestions of the reviewer to clarify the message and remove the contradictions. We also added information about why we included insect-association trait in our analysis. 

      (2) The lack of clarity also stems from certain biases that are challenging to control in microbial comparative genomics. Indeed, defining lifestyles is complicated because many fungi exhibit different lifestyles throughout their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In numerous fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which doesn't mean that this substrate is a crucial aspect of the life cycle. This issue is discussed by the authors, but they do not eliminate the underlying uncertainties.

      We agree with the reviewer that lack of certainty in the lifestyle or range of possible lifestyles of studied species is a weakness in this analysis. We are limited by the information available in the literature. We hope that our study will increase interest in collecting such data in the future.

      Reviewer #3 (Public review):

      Summary:

      This important study combines comparative genomics with other validation methods to identify the factors that mediate genome size evolution in Sordariomycetes fungi and their relationship with lifestyle. The study provides insights into genome architecture traits in this Ascomycete group, finding that, rather than transposons, the size of their genomes is often influenced by gene gain and loss. With an excellent dataset and robust statistical support, this work contributes valuable insights into genome size evolution in Sordariomycetes, a topic of interest to both the biological and bioinformatics communities.

      Strengths:

      This study is complete and well-structured.

      Bioinformatics analysis is always backed by good sampling and statistical methods. Also, the graphic part is intuitive and complementary to the text.

      Weaknesses:

      The work is great in general, I just had issues with the Figure 1B interpretation.

      I struggled a bit to find the correspondence between this sentence: "Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the size of the assembly excluding repeats and the number of genes (Figure 1B)." and the Figure 1B. Perhaps highlighting the key p values in the figure could help.

      We thank the reviewer for pointing out this sentence. Perhaps the misunderstanding comes from the fact that in this sentence one variable is missing. The correct version should be “Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the genome size, the genome size excluding repeats and the number of genes (Figure 1B)”. Also, the variable names now correspond better to those shown on the figure.

      Reviewer #1 (Recommendations for the authors):

      The authors have clearly done a lot of good work, and I think this study is worthwhile. I understand that my concerns about the underlying data could necessitate rerunning the entire analysis with better gene models, but there may be another option. JGI has a fairly standard pipeline for gene and repeat annotation. Their gene predictions are based on RNA data from the sequenced strain and should be quite good in general. One could either compare the annotations from this manuscript to those in mycocosm for genomes that are identical and see if there are systematic biases, or rerun some analyses on a subset of genomes from mycocosm. Indeed, it's possible that the large dataset used here compensates for the above concerns, but without some attempt to evaluate these issues, it's difficult to have confidence in the results.

      We very appreciate the positive reception of our manuscript. Following the reviewer’s comments we have investigated gene annotations in comparison with those of JGI Mycocosm, even though only 58 species were matching and only 19 of them were from the same strain. This dataset is not representative of the Sordariomycetes diversity (most species come from one clade), therefore will not reflect the results we obtained in this study. To note, the reason for not choosing JGI Mycocosm in the first place, was the poor representation of the insect-associated species, which we found key in this study. In general, we found that assembly lengths were nearly identical, number of genes was higher, and the repeat content was lower for the JGI Mycocosm dataset. When comparing different lifestyles (in particular pathogens vs. non-pathogens), we found the same differences for our and JGI Mycocosm annotations, with one exception being the repeat content. In the small subset (19 same-strain assemblies), our dataset showed the same level of repeats between the two lifestyles, whereas JGI Mycocosm showed lower repeat content for pathogens (but notably for all 58 species, the trend was same for our and JGI Mycocosm annotations). None of these observations are in conflict with our results where we find no or negative association of repeat content with pathogens. 

      The figures are very information-dense. While I accept that this is somewhat of a necessity for presenting this type of study, if the authors could summarize the important information in easier-to-interpret plots, that could help improve readability.

      We put a lot of effort into showing these complicated results in as approachable manner as possible. Given that other reviewers find them intuitive we decided to keep most of them as they are. To add more clarification, we added one supplementary figure showing distributions of genomic traits across lifestyles. Moreover, in Figure 5, a phylogenetic tree was added with position of selected clades, as well as a scatterplot showing distributions of mean values for genome size and number of genes for those clades. If the reviewer has any specific suggestions on what to improve and in which figure, we’re happy to consider it. 

      Reviewer #2 (Recommendations for the authors):

      I have no major comments on the analyses, which have already been extensively revised. My major criticism is the presentation of the background, which is very insufficient to understand the importance or relevance of the results presented fully.

      Lines are not numbered, unfortunately, which will not help the reading of my review.

      (1) The introduction could better present the background and hypotheses:

      (a) After reading the introduction, I still didn't have a clear understanding of the specific 'genome features' the study focuses on. The introduction fails to clearly outline the current knowledge about the genetic basis of the pathogenic lifestyle: What is known, what remains unknown, what constitutes a correlation, and what has been demonstrated? This lack of clarity makes reading difficult.

      We thank the reviewer for pointing this out. We have now included in the introduction a list of genomic traits we focus on. We also tried to be more precise about demonstrated pathogenic traits and other correlated traits in the introduction. 

      (b) Page 3. « Various features of the genome have been implicated in the evolution of the pathogenic lifestyle. » The cited studies did not genuinely link genome features to lifestyle, so the authors can't use « implicated in » - correlation does not imply causation.

      This sentence also somehow contradicts the one at the end of the paragraph: « we still have limited knowledge of which genomic features are specific to pathogenic lifestyle

      We thank the reviewer for this comment. We added a phrase “correlated with or implicated in” and changed the last sentence of the paragraph into “Yet we still have limited knowledge of how important and frequent different genomic processes are in the evolution of pathogenicity across phylogenetically distinct groups of fungi and whether we can use genomic signatures left by some of these processes as predictors of pathogenic state.”.

      (c) Page 3: « Fungal pathogen genomes, and in particular fungal plant pathogen genomes have been often linked to large sizes with expansions of TEs, and a unique presence of a compartmentalized genome with fast and slow evolving regions or chromosomes » Do the authors really need to say « often »? Do they really know how often?

      We removed “often”.

      (d) Such accessory genomic compartments were shown to facilitate the fast evolution of effectors (Dong, Raffaele, and Kamoun 2015) ». The cited paper doesn't « show » that genomic compartments facilitate the fast evolution of effectors. It's just an observation that there might be a correlation. It's an opinion piece, not a research manuscript.

      We changed the sentence to “Such accessory genomic compartments could facilitate the fast evolution of effectors”.

      (e) even though such architecture can facilitate pathogen evolution, it is currently recognized more as a side effect of a species evolutionary history rather than a pathogenicity related trait ». This sentence somehow contradicts the following one: « Such accessory genomic compartments were shown to facilitate the fast evolution of effectors".

      Here we wanted to point out that even though accessory genome compartments and TE expansions can facilitate pathogen evolution the origin of such architecture is not linked to pathogenicity. We reformulated the sentence to “Even though such architecture can facilitate pathogen evolution, it is currently recognized that its origin is more likely a side effect of a species evolutionary history rather than being caused by pathogenicity”.

      (f) As the number of genes is strongly correlated with fungal genome size (Stajich 2017), such expansions could be a major contributor to fungal genome size. » This sentence suggests that pathogens might have bigger genomes because they have more effectors. This is contradictory to the sentence right after « At the end of the spectrum are the endoparasites Microsporidia, which have among the smallest known fungal genomes ».

      The authors state that pathogens have bigger genomes and then they take an example of a pathogen that has a minimal genome. I know it's probably because they lost genes following the transition to endoparasitism and not related to their capacity to cause disease. I just want to point out that their writing could be more precise. I invite authors to think of young scholars who are new to the field of fungal evolutionary genomics.

      We thank the reviewer for prompting us to clarify the text. We rewrote this short extract as follows “Notably, not all pathogenic species experience genome or gene expansions, or show compartmentalized genome architecture. While gene family expansions are important for some pathogens, the contrary can be observed in others, such as Microsporidia. Due to transition to obligatory intracellular lifestyle these fungi show signatures of strong genome contractions and reduced gene repertoire (Katinka et al. 2001) without compromising their ability to induce disease in the host. This raises questions about universal genomic mechanisms of transition to pathogenic state.”

      (g) I find it strange that the authors do not cite - and do not present the major results of two other studies that use the same type of approach and ask the same type of question in Sordariomycetes, although not focusing on pathogenicity:

      Hensen et al.: https://pubmed.ncbi.nlm.nih.gov/37820761/

      Shen et al.: https://pubmed.ncbi.nlm.nih.gov/33148650/

      We thank the reviewer for pointing out this omission. We now added more information in the introduction to highlight the importance of the phylogenetic context in studying genome evolution as demonstrated by these studies. The following part was added to introduction:  “Other phylogenomic studies investigating a wide range of Ascomycete species, while not explicitly focusing on the neutral evolution hypothesis, have found strong phylogenetic signals in genome evolution, reflected in distinct genome characteristics (e.g., genome size, gene number, intron number, repeat content) across lineages or families (Shen et al. 2020; Hensen et al. 2023). Variation in genome size has been shown to correlate with the activity of the repeat-induced point mutation (RIP) mechanism (Hensen et al. 2023; Badet and Croll 2025), by which repeated DNA is targeted and mutated. RIP can potentially lead to a slower rate of emergence of new genes via duplication (Galagan et al. 2003), and hinder TE proliferation limiting genome size expansion (Badet and Croll 2025). Variation in genome dynamics across lineages has also been suggested to result from environmental context and lifestyle strategies (Shen et al. 2020), with Saccharomycotina yeast fungi showing reductive genome evolution and Pezizomycotina filamentous fungi exhibiting frequent gene family expansions. Given the strong impact of phylogenetic membership,  demographic history (Ne) and host-specific adaptations of pathogens on their genomes, we reasoned that further examination of genomic sequences in groups of species with various lifestyles can generate predictions regarding the architecture of pathogenic genomes.”

      (h) Genome defense mechanisms against repeated elements, such as RIP, are not mentioned while they could have a major impact on genome size (Hensen et al cited above; Badet and Croll https://www.biorxiv.org/content/10.1101/2025.01.10.632494v1.full).

      This citation is added in the text above.

      (i) Should the reader assume that the genome features to be examined are those mentioned in the first paragraph or those in the penultimate one?

      In the last paragraph of the introduction we included the complete list of investigated genomic traits.

      (j) The insect-associated lifestyle is mentioned only in the research questions on page 4, but not earlier in the introduction. Why should we care about insect-associated fungi?

      We apologize for this omission. We added a sentence explaining how neutral evolution hypotheses can explain patterns of genome evolution in endoparasites and species with specialized vectors (traits present in insect-associated species) and added a sentence in the last paragraph that this is the reason why we have selected this trait for analysis.  

      (2) Why use concatenation to infer phylogeny?

      (a) Kapli et al. https://pubmed.ncbi.nlm.nih.gov/32424311/ « Analyses of both simulated and empirical data suggest that full likelihood methods are superior to the approximate coalescent methods and to concatenation »

      (b) It also seems that a homogeneous model was used, and not a partitioned model, while the latter are more powerful. Why?

      We thank the reviewer for the comment. When we were reconstructing the phylogenetic tree  we were not aware of the publication and we followed common practices from literature for phylogenetic tree reconstruction even though currently they are not regarded as most optimal. In fact, in the first round of submission, we have included both concatenation as well as a multispecies coalescent method based on 1000 busco sequences and a concatenation method with different partitions for 250 busco sequences. All three methods produced similar topologies. Since the results were concordant, we chose to omit these analyses from the manuscript to streamline the presentation and focus on the most important results.

      (3) Other comments:

      Is there a table listing lifestyles?

      Yes, lifestyles (pathogenicity and insect-association) are listed in Supplementary Table S1. 

      (4) Summary:

      (a) seemingly similar pathogens »: meaning unclear; on what basis are they similar? why « seemingly »?

      We removed “seemingly” from the sentence.

      (b) Page 4: what's the difference between genome feature and genome trait?

      There is no difference. We apologize for the confusion. We changed “feature” to “trait” whenever it refers to the specific 13 genomic traits analyzed in this study.

      (c) Page 22: Braker, not Breaker

      corrected

      What do the authors mean when they write that genes were predicted with Augustus and Braker? Do they mean that the two sets of gene models were combined? Gene counts are based on Augustus (P24): why not Braker?

      We only meant here that gene annotation was performed using Braker pipeline, which uses a particular version of Augustus. We corrected the sentence.

      (d) Figure 2B and 2C:

      'Undetermined sign' or 'Positive/Negative' would be better than « YES » or it's just impossible to understand the figure without reading the legend.

      We changed “YES” to “UNDETERMINED SIGN” as suggested by the reviewer.

    1. eLife Assessment

      This valuable study uses a sophisticated array of techniques to investigate the mechanisms through which the chordotonal receptors in the locust ear (Müller's organ) sense auditory signals. Ultrastructural reconstruction of the sensory organ provides convincing evidence of the organization of the scolopidial structure that wraps the sensory neuron cilium. However, the recordings of sound-evoked motion and electrophysiological activity from the chordotonal sensory neurons provide incomplete evidence for the proposed axial stretch model of mechanotransduction.

    2. Reviewer #1 (Public review):

      Chaiyasitdhi et al. set out to investigate the detailed ultrastructure of the scolopidia in the locust Müller's organ, the geometry of the forces delivered to these scolopidia during natural stimulation, and the direction of forces that are most effective at eliciting transduction currents. To study the ultrastructure, they used the FIB-SEM technique, to study the geometry of natural stimulation, they used OCT vibrometry and high-speed light microscopy, and to study transduction currents, they used patch clamp physiology.

      Strengths:

      I believe that the ultrastructural description of the locust scolopidium is excellent and the first of its kind in any insect system. In particular, the finding of the bend in the dendritic cilium and the position of the ciliary dilation are interesting, and it would be interesting to see whether these are common features within the huge diversity of insect chordotonal organs.

      I believe the use of OCT to measure organ movements is a significant strength of this paper; however, using ex vivo preparations undermines any conclusions drawn about the system's in vivo mechanics.

      The choice of Group III scolopidia is also good. Research on the mechanics of locust tympana has shown that travelling waves are formed on the tympanum and waves of different frequencies show highest amplitudes at different positions on the tympanum, and therefore also on different groups of scolopidia within the Müller's organ (Windmill et al, 2005; 2008, and Malkin et al, 2013). The lowest frequency modal waves (F0) observed by Windmill et al 2008 were at about 4.4 kHz, which are slightly higher than the ~3 kHz frequencies studied in this paper but do show large deflections where these group III scolopidia attach at the styliform body (Windmill et al, 2005).

      This should be mentioned in the paper since the electrophysiology justification to use group III neurons is less convincing, given that Jacobs et al 1999 clearly point out that group III neurons are very variable and some of them are tuned much higher to 10 kHz, and others even higher to 20-30 kHz.

      Weaknesses:

      Specifically, it is understandable that the authors decided to use excised ears for the light microscopy, where Müller's organ would not be accessible in situ. However, it is very likely that excision will change the system's mechanics, especially since any tension or support to Müller's organ will be ablated. OCT enables in vivo measurements in fully undissected systems (Mhatre et al, Biorxiv, 2021) or in systems with minimal dissection where the mechanics have not been compromised (Vavakou et al, 2021). The choice to entirely dissect out the membrane is difficult to understand here.

      My main concern with this paper, however, is the use of light microscopy very close to the Nyquist limit to study scolopidial motion, and the fact that the OCT data contradict and do not match the light microscopy data.

      The light microscopy data is collected at ~8 kHz, and hence the Nyquist limit is ~4 kHz. It is possible to measure frequencies reliably this close to the limit, but the amplitude of motion is quite likely to be underestimated, given that the technique only provides 2 sample points per cycle at 4 kHz and approximately 2.66 sample points at 3 kHz. At that temporal resolution, the samples are much more likely to miss the peak of the wave than not, and therefore, amplitudes will be misestimated. A much more reasonable sample rate for amplitude estimation is generally about 10 samples per cycle. I do not believe the data from the microscopy is reliable for what the authors wish to use them for.

      Using the light microscopy data, the authors claim that the strains experienced by the group III scolopidia at 3 kHz are greater along the AP axis than the ML axis (Figure 4). However, this is contradicted by the OCT data, which show very low strain along the AP axis (black traces) at and around 3 kHz (Figure 3c and extended data Figure 2f) and show some movement along the ML axis (red traces, same figures). The phase at low amplitudes of motion cannot be considered very reliable either, and hence phase variations at these frequencies in the OCT cannot be considered reliable indicators of AP motion; hence, I'm unclear whether the vector difference in the OCT is a reliable indicator of movement.

      The OCT data are significantly more reliable as they are acquired at an appropriate sampling rate of 90 kHz. The authors do not mention what microphone they use to monitor or calibrate their sound field and phase measurements in OCT, but I presume this was done since it is the norm. Thus, the OCT data show that the movement within the Müller's organ is complex, probably traces an ellipse at some frequencies as observed in bushcrickets (Vavkou et al, 2021) and also thought to be the case in tree crickets based on the known attachment points of the TO (Mhatre et al, 2021). The OCT data shows relatively low AP motion at frequencies near 3 kHz, and higher ML motion, which contradicts the less reliable light microscopy data. Given that the locust membrane shows peaks in motion at ~4.5 kHz, ~11 kHz, and also at ~20 kHz (Windmill et al, 2008), I am surprised that the authors limited their OCT experiments and analyses to 5 kHz.

      In summary for this section, I am not convinced of the conclusion drawn by the authors that group III scolopidia receive significantly higher stimulation along the AP axis in their native configuration, if indeed they were studied in the appropriate force regime (altered due to excision).

      In the scolopidial patch clamp data, the authors study transduction currents in response to steady state stimulation along the AP axis and the ML axis. The responses to steady state and periodic forces may well be different, and the authors do not offer us a way to clearly relate the two and therefore, to interpret the data.

      In addition, both stimulation types, along the AP axis and the ML, elicit clear transduction responses. Stimulation along the AP axis might be slightly higher, but there is over 40% variation around the mean in one case (pull: 26.22 {plus minus} 10.99 pA) and close to 80% variation in the other (push: 10.96 {plus minus} 8.59 pA). These data are indeed from a very high displacement range (2000 nm), which is very high compared to the native displacement levels, which are in the 1-10 nm range.

      The factor change from sample to sample is not reported, and is small even overall. The statistical analyses of these data are not clearly reported, and I don't see the results of the overall ANOVA in the results section. I also find the dip in the reported transduction currents between 10 and 100 nm quite odd (Figure 5 j-m) and would like to know what the authors' interpretation of this behaviour is. It seems to me that those currents increase continuously linearly after ~50-100 nm and that the data below that range are in the noise. Thus, the transduction currents observed at the relevant displacement range (1-10 nm) may not actually be reliable. How were these small displacements achieved, and how closely were the actual levels monitored? Is it possible to reliably deliver 1-10 nm displacements using a micromanipulator?

      What is clear, despite the difficulty in interpreting this data, is that both AP and ML stimulation evoke transduction currents, and their relative differences are small. Additionally, in Müller's organ itself, in the excised organ, the scolopidia are stimulated along both axes. Thus, in my opinion, it is not possible to say that axial stretch along the cilium is 'the key mechanical input that activates mechano-electrical transduction'.

    3. Reviewer #2 (Public review):

      Summary of strengths and weaknesses:

      Using several techniques-FIB-SEM, OCT, high-speed light microscopy, and electrophysiology-Chaiyasitdhi et al. provide evidence that chordotonal receptors in the locust ear (Müller's organ) sense the stretch of the scolapale cell, primarily of its cilium. Careful measurements certainly show cell stretch, albeit with some inconsistencies regarding best frequencies and amplitudes. The weakest argument concerns the electrophysiological recordings, because the authors do not show directly that the stimulus stretches the cells. If this latter point can be clarified, then our confidence that ciliary stretch is the proximal stimulus for mechanotransduction will be increased. This conclusion will not come as a surprise for workers in the field, as the chordotonal organ is known as a stretch-receptor organ (e.g., Wikipedia). But it is a useful contribution to the field and allows the authors to suggest transduction mechanisms whereby ciliary stretch is transduced into channel opening.

    4. Reviewer #3 (Public review):

      Summary:

      The paper 'A stretching mechanism evokes mechano-electrical transduction in auditory chordotonal neurons' by Chaiyasitdhi et al. presents a study that aims to address the mechanical model for scolopidia in Schistocerca gregaria Müller's organ, the basic mechanosensory units in insect chordotonal organs. The authors combine high-resolution ultrastructural analysis (FIB-SEM), sound-evoked motion tracking (OCT and high-speed light microscopy), and electrophysiological recordings of transduction currents during direct mechanical stimulation of individual scolopidia. They conclude that axial stretching along the ciliary axis is an adequate mechanical stimulus for activating mechanotransduction channels.

      Strengths/Highlights:

      (1) The 3D FIB-SEM reconstruction provides high resolution of scolopidial architecture, including the newly described "scolopale lid" and the full extent of the cilium.

      (2) High-speed microscopy clearly demonstrates axial stretch as the dominant motion component in the auditory receptors, which confirms a long-standing question of what the actual motion of a stretch receptor is upon auditory stimulation.

      (3) Patch-clamp recordings directly link mechanical stretch to transduction currents, a major advance over previous indirect models.

      Weaknesses/Limitations:

      (1) The text is conceptually unclear or written in an unclear manner in some places, for example, when using the proposed model to explain the sensitivity of Nanchung-Inactive in the discussion.

      (2) The proposed mechanistic models (direct-stretch, stretch-compression, stretch-deformation, stretch-tilt) are compelling but remain speculative without direct molecular or biophysical validation. For example, examining whether the organ is pre-stretched and identifying the mechanical components of cells (tissues), such as the extracellular matrix and cytoskeleton, would help establish the mechanical model and strengthen the conclusion.

      (3) To some extent, the weaknesses of the paper are part of its strengths and vice versa. For example, the direct push/pull and up/down stimulations are a great experimental advance to approach an answer to the question of how the underlying cellular components are deformed and how the underlying ion channels are forced. However, as the authors clearly state, neither of their stimulations can limit all forces to only one direction, and both orthogonal forces evoke responses in the neurons. The question of which of the two orthogonal forces 'causes' the response cannot be answered with these experiments and has not been answered by this manuscript. But the study has brought the field a considerable step closer to answering the question. The answer, however, might be that both longitudinal ('stretch') and perpendicular ('compression') forces act together to open the ion channels and that both dendritic extension via stretch and bending can provide forces for ion channel gating. The current paper has identified major components (longitudinal stretch components) for the neurons they analysed, but these will surely have been chosen according to their accessibility, and as such, the variety of mechanical responses in Müller's organ might be greater. In light of these considerations, the authors might acknowledge such uncertainties more clearly in their paper. The paper is an impressive methodological progress and breakthrough, but it simply does not "demonstrate that axial stretch along the cilium is the adequate stimulus or the key mechanical input that activates mechano-electrical transduction" as the authors write at the start of their discussion. They do show that axial stretch dominates for the neurons they looked at, which is important information. The same applies to the end of the discussion: The authors write, "This relative motion within the organ then drives an axial stretch of the scolopidium, which in turn evokes the mechano-electrical transduction current." Reading the manuscript, the certainty and display of confidence are not substantiated by the data provided. But they are also not necessary. The study has paved the road to answer these questions. Instead, the authors are encouraged to make suggestions on how the remaining uncertainties could be removed (and what experiments or model might be used).

    5. Author response:

      Reviewer #1 (Public review):

      Chaiyasitdhi et al. set out to investigate the detailed ultrastructure of the scolopidia in the locust Müller's organ, the geometry of the forces delivered to these scolopidia during natural stimulation, and the direction of forces that are most effective at eliciting transduction currents. To study the ultrastructure, they used the FIB-SEM technique, to study the geometry of natural stimulation, they used OCT vibrometry and high-speed light microscopy, and to study transduction currents, they used patch clamp physiology.

      Strengths:

      I believe that the ultrastructural description of the locust scolopidium is excellent and the first of its kind in any insect system. In particular, the finding of the bend in the dendritic cilium and the position of the ciliary dilation are interesting, and it would be interesting to see whether these are common features within the huge diversity of insect chordotonal organs.

      Thank you very much for your comments. We indeed plan to extend and continue our approach to exploit and understand diverse chordotonal organs in insects and crustaceans.

      I believe the use of OCT to measure organ movements is a significant strength of this paper; however, using ex vivo preparations undermines any conclusions drawn about the system's in vivo mechanics.

      Having re-read the manuscript, we failed to explicitly describe our ex vivo preparation of Müller’s organ including key references that detail the largely retained physiological function of Müller’s organ. We have now revised this detail in the method section:

      “We used an excised locust ear preparation for all experiments, following a previously described dissection protocol [9]. In short, the tympanum, with Muller’s organ attached was left intact suspended between the cuticular rim. The cuticular rim of the tympanum was fixed into a hole in a preparation dish that allowed Muller’s organ to be submerged with extracellular saline, whilst the outside of the tympanum was dry and could be stimulated with airborne sound. This ex vivo preparation of Muller’s organ retained frequency tuning (Warren & Matheson, 2018), similar electrophysiological function as freshly dissected Muller’s organs (Hill, 1983a, 1983b; Michelsen, 1968: frequency discrimination in the locust ear by means of four groups of receptor cells), and amplitude coding (Warren & Matheson, 2018). Since Müller’s organ is backed by an air-filled trachea in vivo, the addition of saline solution in the ex vivo preparation decreased its displacements ~100 fold due to a dampening effect (Warren et al., 2020).”

      And in the last section of the introduction:

      “Here, we combined FIB-SEM to resolve the 3D ultrastructure of a scolopidium, OCT and high-speed microscopy to examine sound-evoked motion at both the organ and individual scolopidium levels, and direct mechanical stimulation of the scolopale cap, where the ciliary tip is anchored, whilst simultaneously recording transduction currents. Here, Muller’s organ and the tympanum was excised from the locust for physiological experiments. This ex vivo preparation of Muller’s organ retained frequency tuning, amplitude coding and electrophysiological function. This preparation also permitted the enzymatic isolation of individual scolopidia whilst recording transduction currents (Warren & Matheson, 2018).”  

      To further clarify physiological differences between the in vivo and ex vivo operation of the tympanum and Müller’s organ, we will perform an additional experiment for the revised manuscript by quantifying the changes in the sound-evoked tonotopic travelling wave of the tympanum using Laser Doppler Vibrometry (LDV). This result will be added to the Supplementary Text.

      The choice of Group III scolopidia is also good. Research on the mechanics of locust tympana has shown that travelling waves are formed on the tympanum and waves of different frequencies show highest amplitudes at different positions on the tympanum, and therefore also on different groups of scolopidia within the Müller's organ (Windmill et al, 2005; 2008, and Malkin et al, 2013). The lowest frequency modal waves (F0) observed by Windmill et al 2008 were at about 4.4 kHz, which are slightly higher than the ~3 kHz frequencies studied in this paper but do show large deflections where these group III scolopidia attach at the styliform body (Windmill et al, 2005).

      Thank you very much. We accept that the frequencies studied in this manuscript were lower than the lowest modal wave observed by Windmill et al., 2008. Other authors, according to Jacobs et al. 1999, found broad tuning form 3.4-3.74 kHz (Michelson et al., 1971) and 2-3.5 kHz (Halex et al., 1988). We settled on tuning previously measured for Group-III neurons in the same kind of preparation as in this manuscript, which was broadly around 3 kHz (Warren & Matheson, 2018).

      This should be mentioned in the paper since the electrophysiology justification to use group III neurons is less convincing, given that Jacobs et al 1999 clearly point out that group III neurons are very variable and some of them are tuned much higher to 10 kHz, and others even higher to 20-30 kHz.

      Looking at Fig. 7 from Jacobs et al., 1999, we indeed see that the four Group-III neurons recorded in this study are broadly tuned to 3-4 kHz. Often these tuning curves have threshold dips at higher frequencies at least 20 dB higher. We settled on the most sensitive frequency that we previously measured, and which also overlaps the most sensitive frequencies from several other studies.

      Weaknesses:

      Specifically, it is understandable that the authors decided to use excised ears for the light microscopy, where Müller's organ would not be accessible in situ. However, it is very likely that excision will change the system's mechanics, especially since any tension or support to Müller's organ will be ablated.

      We completely understand this criticism. We have now added descriptions in the methodology and introduction (as detailed previously). In short, the tympanum was left intact suspended on the cuticle. Müller’s organ retains all (measured) physiological properties: frequency tuning, amplitude coding and electrophysiological function. To further investigate whether this excised preparation is a representative of the in vivo conditions, we plan to measure tympanal mechanics, such as the travelling wave, as part of the revisions.

      OCT enables in vivo measurements in fully undissected systems (Mhatre et al, Biorxiv, 2021) or in systems with minimal dissection where the mechanics have not been compromised (Vavakou et al, 2021). The choice to entirely dissect out the membrane is difficult to understand here.

      The pioneering OCT works by Mhatre et al, Biorxiv, 2021 and Vavakou et al, 2021 set the new standard of in vivo measurements in the field. We also totally agree with Reviewer#1’s view that OCT is best performed on in vivo Müller’s organ and we tried OCT imaging of Müller’s organ for several months in vivo. Although the OCT penetrates the tympanum the OCT beam does not penetrate the tracheal air sac that surrounds Müller’s organ and therefore OCT cannot be used in vivo. Please also see previous comment with regards to the intact physiological operation of Muller’s organ in the ex vivo preparation.

      My main concern with this paper, however, is the use of light microscopy very close to the Nyquist limit to study scolopidial motion, and the fact that the OCT data contradict and do not match the light microscopy data. The light microscopy data is collected at ~8 kHz, and hence the Nyquist limit is ~4 kHz. It is possible to measure frequencies reliably this close to the limit, but the amplitude of motion is quite likely to be underestimated, given that the technique only provides 2 sample points per cycle at 4 kHz and approximately 2.66 sample points at 3 kHz. At that temporal resolution, the samples are much more likely to miss the peak of the wave than not, and therefore, amplitudes will be mis-estimated. A much more reasonable sample rate for amplitude estimation is generally about 10 samples per cycle. I do not believe the data from the microscopy is reliable for what the authors wish to use them for.

      We understand your concern that the study of sound-evoked motion of the scolopidium using light microscopy was done near the Nyquist limit (with our average sampling rate at 8.6 ± 0.3 kHz and the Nyquist limit at 4.3 kHz). We also agree with your comment that amplitude of the motion could be underestimated at frequencies closer to the limit. However, we find that this systematic error does not change the key observation from our direct light microscopy observation that axial stretch of the scolopidium occurs around 3 kHz.

      To address this concern, we plan to study the scolopidial motion within Group 1 auditory neurons, which are tuned to lower frequencies (0.5-1.5 kHz). This new set of data will allow us to obtain more data points per cycle (up to ~8.6 data points at 1 kHz). We will consider adding this result into the revised Fig. 4 or its extended data.

      Regarding increasing the sampling rate, we did try to achieve higher sampling rate (> 10 kHz), however, there is a technical limitation of our camera and a trade-off between other key parameters, such as the size of the region of interest (ROI) and magnification. To increase the sampling rate, we will have to reduce the magnification or the ROI and in turn lose the spatial resolution required for quantification of the scolopidial motion or the ROI does not cover the whole scolopidial motion. The sampling rate at 8.6 ± 0.3 kHz was the best we could achieve.

      Using the light microscopy data, the authors claim that the strains experienced by the group III scolopidia at 3 kHz are greater along the AP axis than the ML axis (Figure 4). However, this is contradicted by the OCT data, which show very low strain along the AP axis (black traces) at and around 3 kHz (Figure 3c and extended data Figure 2f) and show some movement along the ML axis (red traces, same figures). The phase at low amplitudes of motion cannot be considered very reliable either, and hence phase variations at these frequencies in the OCT cannot be considered reliable indicators of AP motion; hence, I'm unclear whether the vector difference in the OCT is a reliable indicator of movement.

      This is our fault for not clearly explaining the orientation of the light microscopy measurement, which then leads to the reviewer’s concern about contradiction between OCT and light microscopy. Our OCT measurements was done along the Antero-Posterior (AP) and Mesio-Lateral axes (ML), while the axial stretch of the scolopidium occurs along the Dorso-Ventral (DV) axis. We recognise that the anatomical references in this manuscript can be confusing, and we tried to show the orientation of the scolopidium relative to Müller’s organ in Fig. 3b. To further clarify the orientation of our observations, we will add anatomical references in Fig. 4a and Fig. 5a. in the revised manuscript.

      As stated in our result section (Line 165-167)

      “Notably, we could not resolve the Group-III scolopidia along the ventro-dorsal axis—which runs parallel to the dendrite—as the OCT beam was obstructed by either the cuticle or the elevated process”

      We did try to perform OCT measurement along the VD axis, but we could not resolve the scolopidial region along the scolopidial or ciliary axes because the OCT beam could not go through the thick cuticle at the edge of the tympanic membrane and the elevated process. For this reason, it is impossible for us to find an agreement or rule out any contradiction between the OCT and light microscopy since they are measuring motion along different axes. We plan to address this accessibility issue in a separate work using OCT measurements in combination with mirrors.

      The OCT data are significantly more reliable as they are acquired at an appropriate sampling rate of 90 kHz. The authors do not mention what microphone they use to monitor or calibrate their sound field and phase measurements in OCT, but I presume this was done since it is the norm.

      We use a condenser microphone (MK301, Microtech) and measuring amplifier (type 2610, Brüle & Kjær) for calibration. The calibration microphone was also calibrated beforehand using  a sound calibrator type 4231 from B&K.

      Thus, the OCT data show that the movement within the Müller's organ is complex, probably traces an ellipse at some frequencies as observed in bushcrickets (Vavkou et al, 2021) and also thought to be the case in tree crickets based on the known attachment points of the tympanal organ (Mhatre et al, 2021). The OCT data shows relatively low AP motion at frequencies near 3 kHz, and higher ML motion, which contradicts the less reliable light microscopy data. Given that the locust membrane shows peaks in motion at ~4.5 kHz, ~11 kHz, and also at ~20 kHz (Windmill et al, 2008), I am surprised that the authors limited their OCT experiments and analyses to 5 kHz.

      We found that immediately above 5 kHz the displacements reduced to undetectable magnitudes. We accept that there may be other modes of vibration at higher frequencies >10 kHz (based on Jacobs et al., 1999) that we could have detected with OCT. However, we focused our analysis on Group-III neurons at the best frequency and frequencies that we could cross-compere between our high-speed imaging system and OCT.

      In summary for this section, I am not convinced of the conclusion drawn by the authors that group III scolopidia receive significantly higher stimulation along the AP axis in their native configuration, if indeed they were studied in the appropriate force regime (altered due to excision).

      Again, we accept our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. We also did not clearly describe in detail that our ex vivo preparation largely retains its physiological properties. We will address the errors of our measurement near Nyquist and provide additional information from Group 1 scolopidia where we could achieve higher data points per cycle.

      In the scolopidial patch clamp data, the authors study transduction currents in response to steady state stimulation along the AP axis and the ML axis. The responses to steady state and periodic forces may well be different, and the authors do not offer us a way to clearly relate the two and therefore, to interpret the data.

      We will revise the Fig. 5a to clarify that the push-pull were done along the Dorso-Ventral (DV) axis and the push-pull were done along the Antero-Posterior (AP) axis. We do agree that steady-state and periodic forces may well be very different. However, valuable insight can be gained from mechanical systems when displaced outside of their normal physiological frequency (e.g. the transformative work on vertebrate hair bundle mechanics, Howard & Hudspeth, 1988). For the same reason, we believe artificial stimulation of the scolopidium gives us new and crucial information to understand scolopidial mechanics. Our main finding that stretch is the dominant stimulus should still, or at least provide strong support, that stretch is the dominant stimulus in periodical motion.

      In addition, both stimulation types, along the AP axis and the ML, elicit clear transduction responses. Stimulation along the AP axis might be slightly higher, but there is over 40% variation around the mean in one case (pull: 26.22 {plus minus} 10.99 pA) and close to 80% variation in the other (push: 10.96 {plus minus} 8.59 pA). These data are indeed from a very high displacement range (2000 nm), which is very high compared to the native displacement levels, which are in the 1-10 nm range.

      In this experiment, we wished to establish the upper limits (and plateau region) of displacement-transduction current response. However, even at 2000 nm we still did not see a plateau. Therefore, we believe that the strain on the scolopidium is still in the operating range even though our displacement is not. This discrepancy can be explained because the base of the scolopidium is not fixed. Therefore, the displacement imposed in our experiment is not equivalent to the strain on the cilium but a combination of pulling and stretching along the length of the dendrite. The force, however, remains along that particular axis, supporting our main finding.

      Another important consideration is that the cilium is surrounded by the scolopale wall. It is assumed that the scolopale wall is far stiffer than the ciliary and will therefore limit the amount of ciliary strain.

      The factor change from sample to sample is not reported and is small even overall. The statistical analyses of these data are not clearly reported, and I don't see the results of the overall ANOVA in the results section.

      We reported the statistical analyses in the Fig. 5 Source Data. We will now add tables displaying these statistics in the supplementary text of the revised manuscript.

      I also find the dip in the reported transduction currents between 10 and 100 nm quite odd (Figure 5 j-m) and would like to know what the authors' interpretation of this behaviour is. It seems to me that those currents increase continuously linearly after ~50-100 nm and that the data below that range are in the noise. Thus, the transduction currents observed at the relevant displacement range (1-10 nm) may not actually be reliable. How were these small displacements achieved, and how closely were the actual levels monitored? Is it possible to reliably deliver 1-10 nm displacements using a micromanipulator?

      One interpretation is that the cilium has both sensitive and insensitive mechanically gated ion channels. A finding that is also supported by Effertz et al., 2012. We will add a sentence in the discussion highlighting this interpretation. We will also provide our calibration of displacement vs voltage delivered to the piezo in the Supplementary Text.

      What is clear, despite the difficulty in interpreting this data, is that both AP and ML stimulation evoke transduction currents, and their relative differences are small. Additionally, in Müller's organ itself, in the excised organ, the scolopidia are stimulated along both axes. Thus, in my opinion, it is not possible to say that axial stretch along the cilium is 'the key mechanical input that activates mechano-electrical transduction'.

      We confirm that the scolopidia are displaced along both. We also note that displacements of the scolopidium limited to the up-down axis will also produce a strain on the scolopidium along the push-pull axis. However, we tried to disentangle this complex motion by limiting the displacements to one axis during recordings of the transduction current. We found that displacement along the scolopidial axis generated the largest transduction currents. Even though there is large variation our statistical analysis confirmed a significant difference as stated in the result section (Line 283 – 286)

      “Additionally, the transduction current evoked by pull from the resting position was larger than displacement upward, 12.17 ± 5.37 pA (N = 11, n = 11) (Tukey's procedure, p = 1.75e-03, t = -3.83) or downward 7.28 ± 9.76 pA (N = 11, n = 11) (Tukey's procedure, p = 5.10e-06, t = -4.53).”

      The reason for large variation is that the discrete depolarisations (random depolarisations of unknown function and a common feature of chordotonal neurons so far recorded) have a similar magnitude to the transduction current produced by the step displacements. We will highlight these discrete depolarisations in Figure 4d and mention them in the results.

      Reviewer #2 (Public review):

      Summary of strengths and weaknesses:

      Using several techniques-FIB-SEM, OCT, high-speed light microscopy, and electrophysiology-Chaiyasitdhi et al. provide evidence that chordotonal receptors in the locust ear (Müller's organ) sense the stretch of the scolapale cell, primarily of its cilium. Careful measurements certainly show cell stretch, albeit with some inconsistencies regarding best frequencies and amplitudes.

      Thank you very much for acknowledging the strength of our study. Regarding the inconsistencies between best frequencies and amplitude, we believe that this concern largely arises from our faults for not clearly displaying the anatomical references of the scolopidial and ciliary axes in Fig. 4 and Fig. 5. As previously addressed in our response to Reviewer#1, we will add the anatomical references and revised the text to clarify the orientation of our measurements.

      The weakest argument concerns the electrophysiological recordings, because the authors do not show directly that the stimulus stretches the cells. If this latter point can be clarified, then our confidence that ciliary stretch is the proximal stimulus for mechanotransduction will be increased.

      We agree that the displacement is not solely stretching the scolopidium. However, the force is still constrained and acting along the push-pull axis. Due to this reason, we overestimate the displacement required to open the MET channels but stand by our conclusion that stretch is the dominant stimulus. For future work, we wish to devise a technique to mechanically clamp the base of the scolopidium and measure the more physiological relevant current-strain relationship.

      This conclusion will not come as a surprise for workers in the field, as the chordotonal organ is known as a stretch-receptor organ (e.g., Wikipedia). But it is a useful contribution to the field and allows the authors to suggest transduction mechanisms whereby ciliary stretch is transduced into channel opening.

      One of the goals of this manuscript is to highlight the lack of direct evidence for stretch-sensitivity of chordotonal organs, as this is assumed from their structure. More importantly the acceptance of chordotonal organs, as being stretch sensitive does not address the mechanism of how organs work. For instance, one candidate for the MET channel, NompC, is shown to be sensitive to compression (Wang et al., 2021). We find that a preconceived concept of “stretch-sensitive” mechanism, without an appreciation of scolopidium mechanics, cannot explain how NompC can be opened in chordotonal organs.

      P. .E. Howse wrote in his work on ‘The Fine Structure and Functional Organisation of Chordotonal Organs’ in 1968 (Symp. Zool. Soc. Lon.) No. 23

      “There is, however, a common tendency to refer to chordotonal organs in which scolopidia are contained in a connective tissue strand as “stretch receptor”. This is unfortunate in two senses, for firstly the implied function may not have been proved and secondly even if the organ responds to stretch the scolopidia may not.” then he proceeded to cite a pioneering work in the chordotonal organs of the hermit crab by R.C. Taylor (Comp. Biochem. Physiol. 1966) showing that the scolopidia may experience flexing when the connective strand are stretched.

      This work represents the first efforts to investigate the problematic assumption of stretch-sensitivity of scolopidia since it was first highlighted 57 years ago.

      Reviewer #3 (Public review):

      Summary:

      The paper 'A stretching mechanism evokes mechano-electrical transduction in auditory chordotonal neurons' by Chaiyasitdhi et al. presents a study that aims to address the mechanical model for scolopidia in Schistocerca gregaria Müller's organ, the basic mechanosensory units in insect chordotonal organs. The authors combine high-resolution ultrastructural analysis (FIB-SEM), sound-evoked motion tracking (OCT and high-speed light microscopy), and electrophysiological recordings of transduction currents during direct mechanical stimulation of individual scolopidia. They conclude that axial stretching along the ciliary axis is an adequate mechanical stimulus for activating mechanotransduction channels.

      Strengths/Highlights:

      (1) The 3D FIB-SEM reconstruction provides high resolution of scolopidial architecture, including the newly described "scolopale lid" and the full extent of the cilium.

      (2) High-speed microscopy clearly demonstrates axial stretch as the dominant motion component in the auditory receptors, which confirms a long-standing question of what the actual motion of a stretch receptor is upon auditory stimulation.

      (3) Patch-clamp recordings directly link mechanical stretch to transduction currents, a major advance over previous indirect models.

      Weaknesses/Limitations:

      (1) The text is conceptually unclear or written in an unclear manner in some places, for example, when using the proposed model to explain the sensitivity of Nanchung-Inactive in the discussion.

      We will rephrase and make clearer the context of our findings for Nanchung-Inactive mechanism of MET in the introduction and the discussion. We will also refine and simplify unclear text overall.

      (2) The proposed mechanistic models (direct-stretch, stretch-compression, stretch-deformation, stretch-tilt) are compelling but remain speculative without direct molecular or biophysical validation. For example, examining whether the organ is pre-stretched and identifying the mechanical components of cells (tissues), such as the extracellular matrix and cytoskeleton, would help establish the mechanical model and strengthen the conclusion.

      We agree with the speculative nature of our four proposed hypotheses. We have, however, narrowed down from at least ten previous hypotheses (Field and Matheson, 1998). These hypotheses will enable us, and hopefully the field, to test them and more rapidly advance our understanding of how scolopidia work. We will add a section in the discussion as to the best way to experimentally test these four hypotheses (e.g pushing directly onto the cap should elicit sensitive responses for the cap-compression hypothesis).

      (3) To some extent, the weaknesses of the paper are part of its strengths and vice versa. For example, the direct push/pull and up/down stimulations are a great experimental advance to approach an answer to the question of how the underlying cellular components are deformed and how the underlying ion channels are forced. However, as the authors clearly state, neither of their stimulations can limit all forces to only one direction, and both orthogonal forces evoke responses in the neurons. The question of which of the two orthogonal forces 'causes' the response cannot be answered with these experiments and has not been answered by this manuscript. But the study has brought the field a considerable step closer to answering the question. The answer, however, might be that both longitudinal ('stretch') and perpendicular ('compression') forces act together to open the ion channels and that both dendritic extension via stretch and bending can provide forces for ion channel gating.

      Thank you very much for your acknowledgement of our experimental advances. We agree that this study cannot identify and localise the forces on the cilium as it is enclosed in the scolopidial unit. As previously explained, we plan to address this question in our next work by improving and expanding our experimental techniques, including modelling, to study the scolopidial mechanics based on our experiments using patch-clamp recording in combination with individual and direct manipulation the scolopidium.

      The current paper has identified major components (longitudinal stretch components) for the neurons they analysed, but these will surely have been chosen according to their accessibility, and as such, the variety of mechanical responses in Müller's organ might be greater. In light of these considerations, the authors might acknowledge such uncertainties more clearly in their paper.

      Our high-speed and OCT imaging confirms complex multi-dimensional displacements (and presumably forces) acting on the scolopidium. We agree that our mechanical stimulation cannot recapitulate such complex motions. But for future work we wish to extend our mechanical stimulation to three axis and also to pivot on the axis of the scolopidial cap.

      The paper is an impressive methodological progress and breakthrough, but it simply does not "demonstrate that axial stretch along the cilium is the adequate stimulus or the key mechanical input that activates mechano-electrical transduction" as the authors write at the start of their discussion.

      We rephrase to clarity that stretching along the “scolopidial axis”, not “along the ciliary axis” is the adequate stimulus. We cannot yet verify how this translates to forces acting on the cilium, hence the four speculative hypotheses. We will re-write the discussion to make clear that we are only interpretating the forces and displacements at the level of the cilium.

      They do show that axial stretch dominates for the neurons they looked at, which is important information. The same applies to the end of the discussion: The authors write, "This relative motion within the organ then drives an axial stretch of the scolopidium, which in turn evokes the mechano-electrical transduction current." Reading the manuscript, the certainty and display of confidence are not substantiated by the data provided. But they are also not necessary. The study has paved the road to answer these questions. Instead, the authors are encouraged to make suggestions on how the remaining uncertainties could be removed (and what experiments or model might be used).

      We will moderate our conclusion in the discussion, but we are confident that we have experimental repeats, and the statistical test, to support our conclusion that stretching of the scolopidium provides that largest transduction current responses (although not at the level of the cilium). As mentioned previously, we will include a section in the discussion for the best way to test the hypotheses arising from this work.

    1. eLife Assessment

      This study provides new and interesting findings that SCoR2 acts as a denitrosylase to control cardioprotective metabolic reprogramming and prevent injury following ischemia/reperfusion. The compelling evidence is supported by a novel multi-omics approach, but questions remain regarding the stability and human relevance of BDH1 as well as the sufficiency of SCoR2. Overall, the work will be of interest to cardiovascular researchers and provides valuable information to the field, though some mechanistic aspects require further clarification.

    2. Reviewer #1 (Public review):

      Summary:

      This study shows a novel role for SCoR2 in regulating metabolic pathways in the heart to prevent injury following ischemia/reperfusion. It combines a new multi-omics method to determine SCoR2 mediated metabolic pathways in the heart. This paper would be of interest to cardiovascular researchers working on cardioprotective strategies following ischemic injury in the heart.

      Strengths:

      (1) Use of SCoR2KO mice subjected to I/R injury.

      (2) Identification of multiple metabolic pathways in the heart by a novel multi-omics approach.

      Comments on revisions:

      Authors have addressed all concerns raised in the previous round of review. Substantial modifications have been made in response to those concerns. There are no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses the gap in knowledge related to the cardiac function of the S-denitrosylase SNO-CoA Reductase 2 (SCoR2; product of the Akr1a1 gene). Genetic variants in SCoR2 have been linked to cardiovascular disease, yet its exact role in heart remains unclear. This paper demonstrates that mice deficient in SCoR2 show significant protection in a myocardial infarction (MI) model. SCoR2 influenced ketolytic energy production, antioxidant levels, and polyol balance through the S-nitrosylation of crucial metabolic regulators.

      Strengths:

      Addresses a well-defined gap in knowledge related to the cardiac function of SNO-CoA Reductase 2. Besides the in-depth case for this specific player, the manuscripts sheds more light on the links between S-nytrosylation and metabolic reprogramming in heart.

      Rigorous proof of requirement through the combination of gene knockout and in vivo myocardial ischemia/reperfusion

      Identification of precise Cys residue for SNO-modification of BDH1 as SCoR2 target in cardiac ketolysis

      Weaknesses:

      The experiments with BDH1 stability were performed in mutant 293 cells. Was there a difference in BDH1 stability in myocardial tissue or primary cardiomyocytes from SCoR2-null vs -WT mice? Same question extends to PKM2.

      In the absence of tracing experiments, the cross-sectional changes in ketolysis, glycolysis or polyol intermediates presented in Figures 4 and 5 are suggestive at best. This needs to be stressed while describing and interpreting these results.

      The findings from human samples with ischemic and non-ischemic cardiomyopathy do not seem immediately or linearly in line with each other and with the model proposed from the KO mice. While the correlation holds up in the non-ischemic cardiomyopathy (increased SNO-BDH1, SNO-PKM2 with decreased SCoR2 expression), how do the Authors explain the decreased SNO-BDH1 with preserved SCoR2 expression in ischemic cardiomyopathy? This seems counterintuitive as activation of ketolysis is a quite established myocardial response to the ischemic stress. It may help the overall message clarity to focus the human data part on only NICM patients.

      (partially linked to the point above) an important proof that is lacking at present is the proof of sufficiency for SCoR2 in S-Nytrosylation of targets and cardiac remodeling. Does SCoR2 overexpression in heart or isolated cardiomyocytes reduce S-nitrosylation of BDH1 and other targets, undermining heart function at baseline or under stress?

      Comments on revisions:

      Some of my points have been addressed. However, the points related to 1) BDH1 stability effect in cardiomyocytes; 2) human relevance of SNO-BDH1; 3) SCoR2 sufficiency remain unclear. That said, this manuscript will provide useful information to the field as such.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript demonstrates that mice lacking the denitrosylase enzyme SCoR2/AKR1A1 demonstrate a robust cardioprotection resulting from reprogramming of multiple metabolic pathways, revealing<br /> widespread, coordinated metabolic regulation by SCoR2.

      Strengths:

      The extensive experimental evidence provided the use of the knockout model

      Weaknesses:

      No direct evidence for the underlying mechanism.

      The mouse model used is not a tissue-specific knock-out.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study shows a novel role for SCoR2 in regulating metabolic pathways in the heart to prevent injury following ischemia/reperfusion. It combines a new multi-omics method to determine SCoR2 mediated metabolic pathways in the heart. This paper would be of interest to cardiovascular researchers working on cardioprotective strategies following ischemic injury in the heart. 

      Strengths:

      (1) Use of SCoR2KO mice subjected to I/R injury. 

      (2) Identification of multiple metabolic pathways in the heart by a novel multi-omics approach.

      We thank the Reviewer for the positive review of our manuscript.

      Weaknesses:

      (1) Use of a global SCoR2KO mice is a limitation since the effects in the heart can be a combination of global loss of SCoR2. 

      (2) Lack of a cell type specific effect. 

      We agree that global KOs limit the cell type-specific mechanistic conclusions that can be drawn. Global knockouts are nonetheless informative in their own right and serve to identify phenotypes worthy of further study.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript addresses the gap in knowledge related to the cardiac function of the S-denitrosylase SNOCoA Reductase 2 (SCoR2; product of the Akr1a1 gene). Genetic variants in SCoR2 have been linked to cardiovascular disease, yet their exact role in the heart remains unclear. This paper demonstrates that mice deficient in SCoR2 show significant protection in a myocardial infarction (MI) model. SCoR2 influenced ketolytic energy production, antioxidant levels, and polyol balance through the S-nitrosylation of crucial metabolic regulators. 

      Strengths: 

      (1) Addresses a well-defined gap in knowledge related to the cardiac function of SNO-CoA Reductase 2. Besides the in-depth case for this specific player, the manuscript sheds more light on the links between Snitrosylation and metabolic reprogramming in the heart.

      (2) Rigorous proof of requirement through the combination of gene knockout and in vivo myocardial ischemia/reperfusion. 

      (3) Identification of precise Cys residue for SNO-modification of BDH1 as SCoR2 target in cardiac ketolysis 

      We thank the Reviewer for their kind words.

      Weaknesses: 

      (1) The experiments with BDH1 stability were performed in mutant 293 cells. Was there a difference in BDH1 stability in myocardial tissue or primary cardiomyocytes from SCoR2-null vs -WT mice? The same question extends to PKM2. 

      We have not assessed BDH1 stability directly in cardiomyocytes. However, S-nitrosylation increased BDH1 stability in HEK293 cells, and BDH1 expression was increased in (injured) hearts of SCoR2KO mice, together with increased SNO-BDH1. 

      For PKM2, there is a wealth of published evidence from us and others that S-nitrosylation does not regulate protein stability but rather inhibits tetramerization required for full activity.  

      (2) In the absence of tracing experiments, the cross-sectional changes in ketolysis, glycolysis, or polyol intermediates presented in Figures 4 and 5 are suggestive at best. This needs to be stressed while describing and interpreting these results. 

      We now acknowledge this limitation in the ‘Limitations’ section of the manuscript and in edits made to the text. 

      (3) The findings from human samples with ischemic and non-ischemic cardiomyopathy do not seem immediately or linearly in line with each other and with the model proposed from the KO mice. While the correlation holds up in the non-ischemic cardiomyopathy (increased SNO-BDH1, SNO-PKM2 with decreased SCoR2 expression), how do the authors explain the decreased SNO-BDH1 with preserved SCoR2 expression in ischemic cardiomyopathy? This seems counterintuitive as activation of ketolysis is a quite established myocardial response to ischemic stress. It may help the overall message clarity to focus the human data part on only NICM patients. 

      We find it interesting and important that SNO-BDH1 is readily detected in human heart tissue and its level is correlated to disease state. Our findings suggest conservation of this mechanism in human heart failure. However, we caution against drawing further conclusions related to NICM or ICM. Our animal model (based on a single time point) cannot faithfully recapitulate patients with chronic heart disease or differences between NICM and ICM. 

      (4) This is partially linked to the point above. An important proof that is lacking at present is the proof of sufficiency for SCoR2 in S-nitrosylation of targets and cardiac remodeling. Does SCoR2 overexpression in the heart or isolated cardiomyocytes reduce S-nitrosylation of BDH1 and other targets, undermining heart function at baseline or under stress? 

      The Reviewer proposes to test the effect of SCoR2 overexpression on cardioprotection. This is an interesting experiment for future study with the following caveats. First, it presupposes that native expression of SCoR2 is insufficient to control basal steady state S-nitrosylation of SNO-BDH1 and SNO-PKM2 (this does not seem to be the case). Second, overexpressed SCoR2 may be mislocalized within cells or associated with unnatural targets. Thank you.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript demonstrates that mice lacking the denitrosylase enzyme SCoR2/AKR1A1 demonstrate a robust cardioprotection resulting from reprogramming of multiple metabolic pathways, revealing widespread, coordinated metabolic regulation by SCoR2. 

      Strengths: 

      (1) The extensive experimental evidence. 

      (2) The use of the knockout model. 

      We thank the Reviewer for identifying strengths in our work.

      Weaknesses: 

      (1) The connection of direct evidence for the mechanism. 

      We believe we have identified a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the key message we convey. While genetic dissection of individual pathways may be worthwhile, these investigations will have their own limitations. 

      (2) The mouse model used is not tissue-specific. 

      Please see our response to Reviewer 1, above. 

      Reviewer #1 (Recommendations for the authors):

      In the study, titled "The denitrosylase SCoR2 controls cardioprotective metabolic reprogramming", Grimmett ZW et al., describe a role for SNO-CoA Reductase 2 (SCoR2) in promoting cardioprotection via metabolic reprogramming in the heart after I/R injury. Authors show that loss SCoR2 coordinates multiple metabolic pathways to limit infarct size. Overall, the hypothesis is interesting, however there are some limitations as described below: 

      (1) It is unclear whether SCoR2 mice are global or cardiomyocyte specific. 

      We apologize for any confusion. These are global SCoR2<sup>-/-</sup> mice. This is now stated in the Results when first identifying the strain, as well as in the Methods.  

      (2) Can the authors clarify how divergent metabolic pathways such as Ketone oxidation, glycolysis, PPP and polyol metabolism work downstream of SCoR2 to impact cardioprotection in mice with I/R. 

      The metabolic pathways of ketone oxidation, glycolysis, PPP and polyols appear to converge to support ischemic cardioprotection in SCoR2<sup>-/-</sup> mice, as depicted in the model shown in Fig. 5L. Subsequent to SNO-PKM2 blockade of flux through glycolysis (detailed in this manuscript and in Zhou et al, 2019, PMID: 30487609, as well as by others), substrates of ketolysis and glycolysis are funneled into the PPP, producing the antioxidant NADPH and energy precursor phosphocreatine, which are well-known to be cardioprotective. This occurs more readily in SCoR2<sup>-/-</sup> mice due to elevated SNO-BDH1 (detailed in this manuscript). 

      Polyols, thought to be products of the PPP carbohydrate intermediates arabinose, ribulose, xylulose (among others), have recently been shown to be harmful to cardiovascular health in humans. These polyols are uniformly downregulated in SCoR2<sup>-/-</sup> mice. We suggest this is likely the result of S-nitrosylation of SCoR2-substrate enzymes that form polyols (SCoR2/Akr1a1 is unable to directly reduce carbohydrates to their corresponding polyols). Regulation of endogenous polyol production in humans is a new concept and the mechanisms whereby these compounds increase risk of cardiac events are a subject of active investigation. This is detailed in the final paragraph of both the Results and Discussion sections, and in Fig. 5L. 

      (3) The only functional outcome of SCoR2 loss in echocardiography and measurements for apoptosis. However, it would be important to determine whether the cardioprotective effect persists. It seems cardiac function was recorded 24hours post injury and whether the benefit remains till later time point such as 2 or 4 weeks is not shown. Without this time point, loss of SCoR2 only leads to an acute increment in function. 

      Loss of SCoR2 reduced post-MI mortality at 4 hr; cardiac functional changes (plus troponin, LDH, and apoptosis) were studied in surviving animals at 24 hr post-MI. Cardiac response to acute injury and to chronic injury (weeks post-MI) are not the same metabolically. This is well elucidated in the literature and exemplified by the role of PKM2, which is protective in the chronic response to MI (28 days post-MI; PMID: 32078387), but implicated in injury at shorter timepoints post-MI (PMID: 33288902, 28964797). All that said, functional changes at 2-4 weeks will be important to determine in the future, as the Reviewer indicates. 

      Reviewer #2 (Recommendations for the authors): 

      (1) The last paragraph of the Results section should be divided into the statement related to Table S2 in the Results section, and the rest of the paragraph should be put somewhere in the Discussion. 

      Thank you for this suggestion, which we have taken. 

      (2) The number of mice alive/dead should be reported in the histogram in Figure 1G. 

      Done.

      (3) A concise Graphical Abstract will be useful to grasp the overall logic and message of the manuscript from the beginning. 

      We thank you for this suggestion and have added a graphical abstract to the manuscript.

      Reviewer #3 (Recommendations for the authors): 

      I would suggest having more evidence on the effect of metabolic reprogramming on which cell type. The use of a global knockout is a major limitation, and probably some in vitro experiments with shRNA knockdown in endothelial cells and fibroblasts would provide more insights. 

      The reviewer suggests one direction for future study. We identify a novel mechanism for cardioprotection entailing coordinate reprogramming of multiple metabolic pathways and suggesting a widescale role for SCoR2 in metabolic regulation. This is the message we wish to convey. The role of cardiomyocytes vs contributing cell types is a thoughtful direction for future study. Thank you. 

      Editor's additional comment:

      The editors wish to highlight a critical issue concerning the characterization of the SCoR2−/− mice employed in this study. 

      In the Methods section (page 20), the manuscript states that "SCoR2+/− mice were made by Deltagen, Inc. as described previously (33)." However, reference 33 does not describe SCoR2−/− mice; instead, it refers to other genetically modified strains, including Akr1a1+/−, eNOS−/−, and PKM2−/− mice, with no mention of a SCoR2-targeted model. 

      The editors fully acknowledge that the authors may be using the term "SCoR2" as a functional synonym for Akr1a1, based on its described role as a mammalian homologue of yeast SCoR. If this is the case, such equivalence should be explicitly stated in the manuscript to prevent potential confusion. Moreover, considering that the genetic deletion of Akr1a1 (i.e., SCoR2) underlies the key mechanistic findings presented, it is essential that the manuscript include a clear and comprehensive description of the generation and validation of the mouse model used. 

      We therefore ask the authors to (1) clarify the nomenclature and relationship between "SCoR2" and Akr1a1, and (2) provide full details on the generation of the knockout mice, including the targeting strategy and the genotyping procedures. This information is necessary not only to ensure transparency and reproducibility but also to allow readers to fully appreciate the biological relevance of the findings.

      Thank you for identifying this inconsistency. We have adjusted the manuscript text accordingly to clearly state that SCoR2 is a functional name for the product of the Akr1a1 gene and that these SCoR2<sup>-/-</sup> mice are the same as Akr1a1<sup>-/-</sup> mice described in Ref 33. We have augmented the Methods text to describe the generation and genotyping of these SCoR2/Akr1a1 knockout mice.

    1. eLife Assessment

      Using high-throughput small-molecule screening, this study discloses novel modulators of the mitochondrial transcription factor A (TFAM), a key regulator of mitochondrial function. Reviewers viewed the targeting of TFAM as innovative and the study's conclusions as potentially important (especially the effects on inflammation). However, the lack of evidence for a direct effect of the compounds on TFAM activity weakens the paper's key conclusion and renders the study incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors identify small-molecule compounds modulating the stability of the mitochondrial transcription factor A (TFAM) using a high-throughput CETSA screen and subsequent secondary assays. The identified compounds increased the protein levels of TFAM without affecting its RNA levels and led to an increase in mtDNA levels. As a read-out for dose-dependent action of the identified compounds, the authors investigated cGAS-STING and ISG activation in cellular inflammation models in the presence or absence of their compounds. The addition of TFAM modulators led to a decrease in cGAS-STING/ISG activation and decreased mtDNA release. Furthermore, beneficial effects could be determined in models of mtDNA disease (rescue of ATP rates), sclerotic fibroblasts (decreased fibrosis), and regulatory T cells (decreased activation of effector T cells). The study thus proposes novel first-in-class regulators of TFAM as a therapeutic option in conditions of mitochondrial dysfunction.

      Strengths:

      The authors identified TFAM as a promising target in conditions of mitochondrial dysfunction, as it is a key regulator of mitochondrial function, serving both as a transcription and packaging factor of mtDNA. Importantly, TFAM is a key regulator of mtDNA copy number, and a moderate increase in TFAM/mtDNA levels has been shown to be beneficial in a number of pathological conditions. Furthermore, mtDNA release leading to activation of inflammatory responses has been linked to a variety of pathological conditions in the last decade. Thus, the identification of small molecule modulators of TFAM that have the potential to increase mtDNA copy number and decrease inflammatory signaling is of great importance. Furthermore, the authors highlight potential applications in the field of mitochondrial disease, fibrosis, and autoimmune disease.

      Weaknesses:

      The central weakness of the study is the fact that the authors propose compounds as modulators or even activators of TFAM without sufficiently proving a direct effect on TFAM itself. There are no data indicating a direct effect on TFAM activity (e.g., mtDNA transcription, replication, packaging), and it is not sufficiently ruled out that other proteins (e.g., LONP1) mediate the effect. Additionally, important information on the performed screen is not provided. Thus, the data presented is currently incomplete to support the described findings. Furthermore, the introduction and discussion are lacking key references.

    3. Reviewer #2 (Public review):

      Summary:

      The present paper aims to identify small molecules that could possibly affect mitochondrial DNA (mtDNA) stability, limiting cytosolic mtDNA abundance and activation of interferon signaling. The authors developed a high-throughput screen incorporating HiBiT technology to identify possible target compounds affecting mitochondrial transcription factor A (TFAM) content, a compound known to impact mtDNA stability. Cells were subsequently exposed to target compounds to investigate the impact on TNFα-stimulated interferon signaling, a process activated by cytosolic mtDNA abundance. Compound 2, an analog of arylsulfonamide, was highlighted as a possible mitochondrial transcription factor A (TFAM)-activator, and emphasized as a small molecule that could stabilize mtDNA and prevent stress-induced interferon signaling.

      Strengths:

      Identifying compounds that positively affect mitochondrial biology has diverse implications. The combination of high-throughput screening and assay development to connect identified compounds with cellular interferon signalling events is a strength of the current approach, and the authors should be commended for identifying compounds that broadly impact interferon signalling. The authors have incorporated diverse measurements, including TFAM content, mtDNA content, interferon signaling, and ATP content, as well as verified the necessity of TFAM in mediating the beneficial effects of the emphasized small molecule (Compound 2).

      Weaknesses:

      (1) While the identified compound clearly works through TFAM, Compound 2 was identified as an arylsulfonamide, which would be expected to affect voltage-gated sodium channels (e.g. PMID: 31316182). Alterations in cellular sodium content and membrane polarization could affect metabolism to indirectly influence mtDNA and TFAM content. It remains unclear if this compound directly or indirectly affects TFAM content, especially as the authors have utilized various cancer cell lines, which could have aberrant sodium channels.

      (2) TFAM is nuclear encoded - if this compound directly functions to 'activate TFAM', why/how would TFAM content increase independent of nuclear transcription?

      (3) While a listed strength is the incorporation of diverse readouts, this is also a weakness, as there is a lack of consistency between approaches. For instance, data is not provided to show compound 2 increases TFAM or mtDNA content following TNFα stimulation, and extrapolating between cell lines may not be appropriate. The authors are encouraged to directly report TFAM and mtDNA for target compounds 2 and 15 to support their data reported in Figure 2. Ideally, the authors would also report for compound 1 as a control.

      (4) While the authors indicate compound 11 displayed the strongest effect on ISRE activity, this appears not to be identified in Figure 1B as a compound affecting TFAM content? Can the authors identify various Compounds in Figure 1B to better highlight the relationship between compounds and TFAM content?

      (5) The authors suggest Compound 2 increases cellular ATP - but they are encouraged to normalize luminescence to cellular protein and OXPHOS content to better interpret this data. Additionally, the authors are encouraged to report cellular ATP content following TNFα stimulation/stress (the key emphasis of the present data) and test compound 11, which the authors have implicated as a more sensitive compound.

      The discussion is really a perspective, theorizing the diverse implications of small molecule activation of TFAM. The authors are encouraged to provide a balanced discussion, including a critical evaluation of their own work, including an acknowledgement that evidence is not provided that Compound 2 directly activates TFAM or decreases mtDNA cytosolic leakage.

    1. eLife Assessment

      This study presents a useful inventory of genes that are up- and down-regulated in the mouse small intestine (duodenum and ileum) during the first postnatal month; the data were collected and analyzed using solid and validated methodology and can be used as a starting point for additional validation of specific markers and for follow-up functional studies. Some aspects of the study were incomplete, with claims being only partially supported by the data, and it is suggested that additional validation be performed. The authors attempted to correlate gene expression changes with periods of high and low NEC susceptibility, but these correlations are speculative and not supported by functional follow-up studies. Discussion of gene expression changes with NEC susceptibility would be more appropriate to include in the Discussion section and to be tempered in the results section.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aimed to clarify the transcriptional changes across murine postnatal small intestinal development (0 days to 1 month) in both the duodenum and ileum, a period that shows morphological similarity to 20-30 week old fetal humans. This is an especially critical stage in human intestinal development, as necrotizing enterocolitis (NEC) usually manifests during these stages.

      Strengths:

      The authors assessed numerous timepoints between 0 days and 1 month in the postnatal mouse duodenum and ileum using bulk RNA transcriptomics of bulk-isolated tissues. Cellular deconvolution, based on relative marker expression, was used to clarify immune cell proportions in the bulk RNA sequencing data. They confirmed some transcriptional targets found in vivo primarily in mouse via qrtPCR and immunohistochemistry, but also in human fetal tissues and isolated organoids, and are of decent quality.

      Weaknesses:

      The overall weakness of this study, as mentioned by the authors themselves, is that the bulk transcriptomic data generated for the study were isolated from non-fractionated bulk intestinal tissue. This makes it difficult to interpret much of this data regarding cellular fractions found across developmental time. It is difficult to rationalize the approach here, as even isolation protocols of epithelial-only or mesenchyme-only tissues for bulk RNA sequencing are well established. The authors address some of these concerns using cellular deconvolution for immune cell populations, which I think might be helpful if they expanded this analysis to other cell types (mesenchyme, endothelium, glia). However, I would assume that bulk isolations across developmental time are going to be influenced primarily by the bulk of tissue-type found at each time point - primarily epithelium. But this is also confirmed by the immune transcripts becoming more apparent later in their time series, as this system becomes more established during weaning. This study might also be strengthened by comparison with data that is publicly available for early fetal stage development in humans. Comparisons between the duodenum and ileum could be strengthened by what we already know from adult data, from both epithelial- and mesenchyme-isolated fractions. The rationale of using the postnatal mouse as a comparison to NEC is also a little unclear- perhaps some of the developmental processes are similar, however, the environments are completely different. For example, even in early postnatal mouse development, you would find microbial activity and milk.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a valuable resource by generating a comprehensive bulk RNA sequencing catalogue of gene expression in the mouse duodenum and ileum during the first postnatal month. The central findings of this work are based on an analysis of this dataset. Specifically, the authors characterized molecular shifts that occur as the intestine matures from an immature to an adult-like state, investigating both temporal changes and regional differences between the proximal and distal small intestine. A key objective was to identify gene expression patterns relevant to understanding the region-specific susceptibility and resistance to necrotizing enterocolitis (NEC) observed in humans during the postnatal period. They also sought to validate key findings through complementary methods and to provide comparative context with human intestinal samples. This study will provide a solid reference dataset for the community of researchers studying postnatal gastrointestinal development and diseases that arise during these stages. However, the study lacks functional validation of the interpretations.

      Strengths:

      (1) The inclusion of numerous time points (day 0 through 4 weeks) and comparative analyses throughout the first postnatal month.

      (2) Validation of key interpretations of RNA-seq data by other methods.

      (3) Linking mouse postnatal development to human premature infant development, enhancing its clinical relevance, particularly for NEC research. The inclusion of human intestinal biopsy and organoid data for comparison further strengthens this link.

      (4) The investigation covers a wide array of developmental gene categories with known significance, including epithelial differentiation markers (e.g., Vil1, Muc2, Lyz1), intestinal stem cell markers (e.g., Lgr5, Olfm4, Ascl2), mesenchymal markers (e.g., Pdgfra, Vim), Wnt signaling components (e.g., Wnt3, Wnt5a, Ctnnb1), and various immune genes (e.g., defensins, T cell, B cell, ILC, macrophage markers).

      Weaknesses:

      (1) The primary limitation is that there is no functional validation. The study primarily focuses on the interpretation of RNA expression. This is a common limitation of transcriptomic "atlas" studies, but the functional and mechanistic relevance of these interpretations remains to be determined.

      (2) The data are derived from bulk RNA-Seq of full-thickness intestinal tissue. While this approach helps capture rare cell types and both epithelial and mesenchymal components simultaneously, it does not provide cell-type-specific gene expression profiles, which might obscure important nuances. Future investigations using single-cell sequencing would be a logical follow-up.

      (3) The day 4 samples were omitted due to quality issues, which might have led to missing some dynamic changes, especially given that some ISC genes show dynamic changes around day 6.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses bulk mRNA sequencing to profile transcriptional changes in intestinal cells during the early postnatal period in mice - a developmental window that has received relatively little attention despite its importance. This developmental stage is particularly significant because it parallels late gestation in humans, a time when premature infants are highly vulnerable to necrotizing enterocolitis (NEC). By sampling closely spaced timepoints from birth through postnatal week four, the authors generate a resource that helps define transcriptional trajectories during this phase. Although the primary focus is on murine tissue, the authors also present limited data from human fetal intestinal biopsy samples and organoids. In addition, they discuss potential links between observed gene expression changes and factors that may contribute to NEC.

      Strengths:

      The close temporal sampling in mice offers a detailed view of dynamic transcriptional changes across the first four weeks after birth. The authors leverage these close timepoints to perform hierarchical clustering to define relationships between developmental stages. This is a useful approach, as it highlights when transcriptional states shift most dramatically and allows for functional predictions about classes of genes that vary over time. This high-level analysis provides an effective entry point into the dataset and will be useful for future investigations. The inclusion of human fetal intestinal samples, although limited, is especially notable given the scarcity of data from late fetal timepoints. The authors are generally careful in their presentation of results, acknowledging the limitations of their approach and avoiding over-interpretation. As they note, this dataset is intended as a foundation for their lab and others, with secondary approaches required to more fully explore the biological questions raised.

      Weaknesses:

      One limitation of the study is the use of bulk mRNA sequencing to draw conclusions about individual cell types. It has been documented that a few genes are exclusively expressed in single cell types. For instance, markers such as Lgr5 and Olfm4 are enriched in intestinal stem cells (ISCs), but they are also expressed at lower levels in other lineages and in differentiating cells. Using these markers as proxies for specific cell populations lowers confidence in the conclusions, particularly without complementary validation to confirm cell type-specific dynamics.

      Validation of the sequencing data was itself limited, relying primarily on qPCR, which measures expression at the same modality rather than providing orthogonal support. It is unclear how the authors selected the subset of genes for validation; many key genes highlighted in the sequencing data were not assessed. Moreover, the regional differences reported in Lgr5, Olfm4, and Ascl2, appearing much higher in proximal samples than in distal ones, were not recapitulated by qPCR validation of Olfm4, and this discrepancy was not addressed. Resolving such inconsistencies will be important for interpreting the dataset.

      The basis for linking particular gene sets to NEC susceptibility rests largely on their spatial restriction to the distal intestine and their temporal regulation between early (day 0-14) and later (weeks 3-4) developmental stages. While this is a reasonable approach for generating hypotheses, the correlations have limited interpretive power without experimental validation, which is not provided here. Many factors beyond NEC may drive regional and temporal differences in intestinal development.

      Finally, the contribution of human fetal biopsy samples is minimal. The central figure presenting these data (Figure 4A) shows immunofluorescence for LGR5, a single stem cell marker. The staining at day 35 is not convincing, and the conclusions that can be drawn are limited to confirming the localization of LGR5-positive cells to crypts as early as 26 weeks.

    1. eLife Assessment

      This valuable study examined the roles of the posterior parietal cortex in rats performing an auditory change-detection decision task. It provided solid evidence for two subpopulations with opposing modulation patterns during decision formation and for a correspondence between neural and behavioral measures of the short timescale used for evidence evaluation.

    2. Joint Public Review:

      In this study, the authors sought to characterize the relationship between the timescales of evidence integration in an auditory change detection task and neural activity dynamics in the rat posterior parietal cortex (PPC), an area that has been implicated in the accumulation of sensory evidence. Using the state-of-the-art Neuropixel recording techniques, they identified two subpopulations of neurons whose firing rates were positively and negatively modulated by auditory clicks. The timescale of click-related response was similar to the behaviorally measured timescale for evidence evaluation. The click-related response of positively modulated neurons also depended on when the clicks were presented, which the authors hypothesized to reflect a time-dependent gain change to implement an urgency signal. Using muscimol injections to inactivate the PPC, they showed that PPC inactivation affected the rats' choices and reaction times.

      There are several strengths of this study, including:

      (1) Compelling evidence for short temporal integration in behavioral and neural data for this task.

      (2) Well-executed and interpretable comparisons of psychophysical reverse correlation with single-trial, click-triggered neuronal analyses to relate behavior and neural activity.

      (3) Inactivation experiments to test for causality.

      (4) Characterization of neural subpopulations that allows for complex relationships between a brain region and behavior.

      (5) Experimental evidence for an interesting way to use sensory gain change to implement urgency signals.

      There are also some concerns, including:

      (1) The work could be better contextualized. From a normative Bayesian perspective, the observed adaptation of timescales and gain aligns closely with optimal strategies for change detection in noisy streams: placing greater weight on recent sensory samples and lowering evidence requirements as decision urgency grows. However, the manuscript could go further in explicitly connecting the experimental findings to normative models, such as leaky accumulator or dynamic belief-updating frameworks. This would strengthen the broader impact of the work by making clear how the observed PPC dynamics instantiate computationally optimal strategies.

      (2) It is unclear how the rats are performing the task, both in terms of the quality of performance (they only show hit rates, but the rats also seem to have high false alarm rates), and in terms of the underlying strategy that they seem to be using.

      (3) A major conceptual weakness lies in the claim that PPC "dynamically modulates evidence evaluation in a time-adaptive manner to suit the behavioral demands of a free-response change detection task." To support this claim, it would require direct comparison of neural activity between two task demands, either in two tasks or in one task with manipulations that promote the adoption of different timescales.

      (4) Some analyses of neural data are lacking or seem incomplete, without considering alternative interpretations.

      (5) The muscimol inactivation results did not provide a clear interpretation about the link between PPC activity and decision performance.

    1. eLife Assessment

      This study presents valuable findings regardingg a rare mode of reproduction called hybridogenesis in a species pair of frogs. While parts of the study provide solid support for the claim of hybridogenesis, other parts are incomplete with certain claims being only partially supported, as alternative modes of reproduction cannot be fully ruled out.

    2. Reviewer #1 (Public review):

      Summary:

      (1) Introduction Hybridogenesis involves one genome being clonally transmitted while the other is replaced by backcrossing. It results in high heterozygosity and balanced ancestry proportions in hybrids. Distinguishing it from other hybrid systems requires a combination of nuclear, mitochondrial, and population-genetic evidence. Hybridogenesis has been identified in only a few taxa (e.g., some fish, frogs, and stick insects), but no new cases have been reported in over a decade. Advancements in high-throughput sequencing now allow for the detection of high individual heterozygosity, which can indicate hybridization, but it is difficult to distinguish hybridogenesis from other similar asexual systems based solely on genome-wide data. To differentiate these systems, researchers look at several key indicators: Presence of pure-species offspring from hybrids (possible only in hybridogenesis); sex ratio (male presence in hybridogenetic systems); nuclear and mitochondrial haplotype sharing with co-distributed parental species; geographic distribution patterns, especially the lack of both parental species in hybrid populations.

      (2) What the authors were trying to achieve The paper studies Quasipaa Frogs. Q. robertingeri (narrowly endemic) and Q. boulengeri (widespread), which are morphologically similar and found sympatrically in parts of China. Preliminary RAD-seq data revealed bimodal heterozygosity in Q. boulengeri samples. Some individuals had extremely high heterozygosity, consistent across loci and suggestive of F1 hybrids. These high-heterozygosity individuals had one haplotype from each species. The study investigates the high heterozygosity observed in Quasipaa frogs, particularly in individuals morphologically resembling Q. boulengeri but genetically appearing to be F1 hybrids with Q. robertingeri. The goal is to determine whether these patterns are consistent with hybridogenesis, rather than other atypical reproductive modes. The authors also suggest the hypothesis that hybridogenesis could enable range expansion of an endemic species through hybridization with a widespread relative.

      (3) Methods A total of 107 individuals from 53 localities were collected for the study. This sample included 58 sexed adults-27 males and 31 females-as well as a majority of tadpoles. Of these individuals, 31 had previously determined karyotypes. DNA was extracted and sequenced. Individual heterozygosity and ancestry were estimated using bioinformatics tools. F1 hybrids were compared to one of the parental species to examine patterns of fixed heterozygous loci. Mitochondrial DNA was also extracted from sequencing data, and phylogenetic trees were constructed

      (4) Results Two groups of individuals were detected based on heterozygosity: one group exhibited high heterozygosity and consisted of F1 hybrids, while the other group showed low heterozygosity, representing pure-species types. The F1 hybrids demonstrated approximately equal ancestry from Q. robertingeri and Q. boulengeri, consistently maintaining a high proportion of heterozygous loci at around 16.7%. In contrast, pure individuals had much lower heterozygosity, approximately 2.9%. F1 hybrids were found across 21 different sites, including both male and female individuals. The presence of numerous fixed heterozygous loci in F1 hybrids confirmed their hybrid origin, and these loci were absent in pure Q. boulengeri samples. F1 individuals typically carried one haplotype from each parental species. There was minimal haplotype sharing between the two pure species, but extensive sharing was observed between F1 hybrids and co-occurring pure-species individuals. In fact, F1 types shared haplotypes with local Q. boulengeri in over 90% of cases, which supports the occurrence of local backcrossing and parental contribution. In terms of mitochondrial DNA, F1 hybrids possessed mitochondrial haplotypes that clustered with Q. boulengeri and often shared these haplotypes directly. Genetic structure and phylogenetic analyses, revealed three distinct genetic clusters corresponding to F1 hybrids, Q. boulengeri, and Q. robertingeri. The F1 hybrids positioned themselves intermediate between the two pure species. Neighbor-joining trees and TreeMix analyses confirmed a strong separation between pure-species types, with F1 hybrids clustering alongside local Q. boulengeri subpopulations, indicating local formation of hybrids.

      (5) Discussion In summary, the study reveals hybridogenesis (a reproductive system where hybrids clonally transmit one parental genome) in Quasipaa boulengeri and Q. robertingeri. Hybrids show high genetic heterozygosity and coexist with parental species, ruling out other reproductive modes like parthenogenesis or kleptogenesis. Evidence suggests hybridogenesis enables Q. robertingeri genomes to appear far outside their normal range, possibly aiding range expansion. Chromosomal abnormalities are linked to hybrid hybrids, supporting clonal genome transmission. The genetic divergence between parental species fits patterns seen in other hybridogenetic systems, highlighting a unique, understudied case in East Asia.

      Strengths:

      Overall, the authors carefully interpret their genetic data to support hybridogenesis as the reproductive mode in this system and propose that this mechanism may aid range expansion. They also appropriately acknowledge the need for further cytogenetic and ecological studies, demonstrating scientific caution. In summary, the discussion reasonably follows from the results, offering cautious interpretation where necessary.

      Weaknesses:

      Direct reproductive or cytological evidence is still lacking. While alternative reproductive modes are discussed and mostly ruled out logically, some require further empirical testing. The authors maintain a cautious interpretation, appropriately suggesting further research. Some outstanding questions remain.

      (1) The elevated heterozygosity and presence of fixed heterozygous loci in hybrids compared to parental species strongly indicate hybridogenesis. However, alternative explanations such as repeated F1 hybridization or some form of balanced polymorphism, while less likely, are not fully excluded.

      (2) The coexistence of hybrids and parental species, along with high nuclear and mitochondrial haplotype sharing between hybrids and Q. boulengeri, argues against reproductive modes like parthenogenesis, gynogenesis, or kleptogenesis. However, the assumption that hybrid sterility or multiple local hybrid origins are unlikely could be challenged if undetected local variation or cryptic reproductive strategies exist.

      (3) The presence of Q. robertingeri nuclear genomes far outside their known geographic range, genetically linked to nearby populations, fits a hybridogenetic-mediated dispersal model. Although the authors dismiss human-mediated or accidental transport as explanations, these scenarios are not necessarily unlikley.

    3. Reviewer #2 (Public review):

      This study describes F1 hybrid frog lineages that use an "unusual" form of reproduction, perhaps hybridogenesis. Identifying such species is important for understanding the biodiversity of reproduction in animals, and animals that do not reproduce via "canonical" sex can be useful model systems in ecology and evolution. The conclusion of the study are based on reduced representation sequencing (RAD-seq with a de-novo assembly of loci) of 107 wild-caught individuals from 53 localities (plus 4 outgroup individuals), including 27 males, 31 females, and 49 juveniles of unknown sex. Conclusive inferences of unusual forms of reproduction typically require breeding studies and parent-offspring genotype comparisons but such information is not available (and perhaps impossible to generate) for the focal frog lineages.

      (1) Conclusion 1: there are two pure species and F1 hybrids

      The authors infer that there are two lineages RR and BB (corresponding to two named species), and F1 interspecific hybrids RB. This inference is based on the results presented in Figure 1 (PCA, admixture, and heterozygosity analyses) as well as analyses of fixed SNP differences between R and B. I think that this conclusion is well supported; my only comment on this part is that it would be useful to have the admixture plots & cross-validation for the 107 samples with other k values (not only k=2) as a supplemental figure. The plots in the supplemental file S1 are for the subset of 55 inds inferred to be BB only.

      (2) Conclusion 2: F1 hybrids most likely reproduce via hybridogenesis

      This conclusion is based on the sex ratio of hybrids and haplotype sharing between species and lineages at different, ~150 bp long loci. Parthenogenesis (including sperm-dependent parthenogenesis) is unlikely to generate males, yet sexed F1 hybrid individuals include 18 females and 10 males which prompts the exclusion of parthenogenesis in the present paper. Specific haplotype-sharing patterns are also discussed in the study and used as further support, but these arguments (and the related main and supplementary figures) are difficult to read/interpret. To clarify the arguments related to haplotype sharing and haplotype diversities, I suggest that the authors phase the R and B haplotypes from all their hybrids by using their pure (RR and BB individuals) as references. The concatenated lineage-specific haplotypes can then be used to reconstruct a single phylogenetic tree for all loci (easier to visualize and interpret that the separate haplotype networks for the loci). The authors can then draw cartoon phylogenies for what would be the expected pattern for haplotype clustering and diversity for different reproductive modes, and discuss their observed phylogenies in this regard. Similarly, the migration weights (represented in Figure 4) can then also be computed for separate haplotypes in the hybrids.

      However, independently of the outcome of the phasing, it is important to note that there is no a priori reason why all F1 hybrid individuals would reproduce via the same reproductive mode. Notably, work by Barbara Mantovani and Valerio Scali on stick insects has shown that different F1 hybrid lineages involving the same parental species reproduce via hybridogenesis or parthenogenesis. I don't see how the presented data can allow excluding that some F1 hybrid frogs are parthenogenetic while others are hybridogenetic for example.

      (3) Conclusion 3: Crosses between hybridogenetic RB males and hybridogenetic RB females gave rise to a new population of RR individuals outside of the RR species range (this new population would correspond to location 30 from Figure 1).

      It is not entirely clear to me which data this conclusion is based on, I believe it is the combination of known species ranges for the species R (location 30 being outside of this) and the relatively low heterozygosity of RR individuals at location 30.

      However, as the authors point out, the study focuses on an understudied geographic range. Isolated or rare populations of the R species may easily have been overlooked in the past, especially since the R and B species are morphologically difficult to distinguish. Furthermore, an isolated, perhaps vestigial population may also likely be inbred/feature low diversity. It seems most appropriate to discuss different (equally likely) scenarios for the RR population at location 30 rather than implying a hybridogenetic origin of RR individuals. I would also choose a title that does not directly imply this scenario but reflects the solid (not speculative) findings of the study.

    4. Reviewer #3 (Public review):

      Summary:

      This work reports a new case of hybridogenetic reproduction in the frog genus Quasipaa. Only one other example of this peculiar reproductive mode is known in amphibians, and fewer than a dozen across the tree of life. Interestingly, a population of one of the parental species (Q. robertingeri) was found away from the core of its distribution, within the distribution of the hybridogens. This range expansion might have been mediated by hybridogenesis, whereby two copies of the same parental genome came together again after many generations of hybridogenesis.

      Strengths:

      Evidence for hybridogenesis is solid. The state of the art would be to genotype parents and offspring, but other known alternative scenarios have been considered carefully and can be ruled out convincingly. In addition, the authors are very careful in their phrasing and made sure to never overinterpret their data.

      The explicit predictions under different reproductive modes (and Table 1) are a useful resource for future studies and could inspire new findings of unusual reproductive modes in other taxa.

      The sampling is very impressive, with over 50 populations sampled across a very large area.

      The comparison of p-distances between pairs of species involved in hybridogenesis is interesting.

      Weaknesses:

      The current phylogenetic reconstruction with the F1s does not enable to infer the number of origins of hybridogenesis, nor whether the population of Q. robertingeri that was found far from the core of the species' distribution indeed derives from hybridogenesis. This is because some of the signal is driven by the Q. boulengeri haplome, which is replaced every generation and therefore does not reflect the evolutionary history of the lineage.

      All known reproductive modes except hybridogenesis can be excluded, but without genotyping parents and offspring, it is impossible to rule out another, yet undescribed reproductive mode.

    1. eLife Assessment

      This study provides valuable insights into the influence of sex on bile acid metabolism and the risk of hepatocellular carcinoma (HCC). The data to support that there are inter-relationships between sex, bile acids, and HCC in mice are convincing, although this is a largely descriptive study. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer at a mechanistic level. Also, there is not enough evidence to determine the clinical significance of the findings given the differences in bile acid composition between mice and men.

    2. Reviewer #1 (Public review):

      Liver cancer shows a high incidence in males than females with incompletely understood causes. This study utilized a mouse model that lacks the bile acid feedback mechanisms (FXR/SHP DKO mice) to study how dysregulation of bile acid homeostasis and a high circulating bile acid may underlie the gender-dependent prevalence and prognosis of HCC. By transcriptomics analysis comparing male and female mice, unique sets of gene signatures were identified and correlated with HCC outcomes in human patients. The study showed that ovariectomy procedure increased HCC incidence in female FXR/SHP DKO mice that were otherwise resistant to age-dependent HCC development, and that removing bile acids by blocking intestine bile acid absorption reduced HCC progression in FXR/SHP DKO mice. Based on these findings, the authors suggest that gender-dependent bile acid metabolism may play a role in the male-dominant HCC incidence, and that reducing bile acid level and signaling may be beneficial in HCC treatment. This study include many strengths: 1. Chronic liver diseases often proceed the development of liver and bile duct cancer. Advanced chronic liver diseases are often associated with dysregulation of bile acid homeostasis and cholestasis. This study takes advantage of a unique FXR/SHP DKO model that develop high organ bile acid exposure and spontaneous age-dependent HCC development in males but not females to identify unique HCC-associated gene signatures. The study showed that the unique gene signature in female DKO mice that had lower HCC incidence also correlated with lower grade HCC and better survival in human HCC patients. 2. The study also suggests that differentially regulated bile acid signaling or gender-dependent response to altered bile acids may contribute to gender-dependent susceptibility to HCC development and/or progression. 3. The sex-dependent differences in bile acid-mediated pathology clearly exist but are still not fully understood at the mechanistic level. Female mice have been shown to be more sensitive to bile acid toxicity in a few cholestasis models, while this study showed a male dominance of bile acid promotion of HCC. This study used ovariectomy to demonstrate that female hormones are possible underlying factors. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer.

    1. eLife Assessment

      This is an important study of critical period plasticity, focused on temperature manipulations, and how different parts of the Drosophila larval motor circuit adapt or maladapt. The work convincingly demonstrates that components of the motor network respond in distinct ways to the heat shock, and the combination of functional, structural, and electrophysiological approaches makes the study of significant interest. The work points to central interneurons as primary drivers of maladaptive changes, while motoneurons and neuromuscular junctions show compensatory or homeostatic adjustments. The study is methodologically rigorous, contributing significant insights into critical period biology using a tractable invertebrate model.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examine the impact of heat stress during an embryonic CP in Drosophila, focusing on the larval locomotor network. They show that elevated temperature increases neuronal activity and, when applied during the CP, results in long-term instability of the network, which manifests in prolonged seizure recovery times. At the neuromuscular junction, substantial structural changes occur, including terminal overgrowth and altered receptor composition, yet synaptic transmission remains preserved due to homeostatic regulation. Motoneurons display reduced excitability but receive increased synaptic input from premotor interneurons. These findings suggest that maladaptive instability originates within the central circuitry rather than at the neuromuscular junction, where changes seem to be homeostatically compensated. The study concludes that different network components exhibit distinct and hierarchical responses to CP perturbations, with premotor interneurons setting the tone for downstream adjustments in motoneurons.

      Strengths:

      The work takes advantage of the unique accessibility of the Drosophila system. A major strength of the study is the integration of structural, physiological, and behavioral analyses, which allows the authors to draw a comprehensive picture of how CP perturbations shape the locomotor network. The choice of an ecologically relevant stimulus (heat stress) is particularly convincing, as it links experimental manipulations more closely to natural environmental conditions. The experiments are carefully designed, and the results are robust and consistent with previous findings in the field, while also extending them in new directions.

      Weaknesses:

      The study leaves some uncertainty regarding the experimental design and interpretation. The change from short to prolonged heat shock manipulations raises the possibility that the effects observed may not be confined to the critical period alone - this could be experimentally addressed or simply rephrased in the text. In addition, the maladaptive (seizure recovery) and adaptive/homeostatic phenotypes are not always clearly distinguished or highlighted, which makes it harder to appreciate how the different levels of the network plasticity fit together into a single mechanistic framework.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a thoughtful and well-executed study of critical period plasticity in the Drosophila larval motor circuit. The authors examined how transient heat, 32 {degree sign}C, during the embryonic stage, altered network properties, showing that premotor interneurons A27h increase excitatory drive onto motoneurons, which respond with a reduction in excitability. At the NMJ, synaptic terminals expand and GluRIIA distribution shifts, yet synaptic transmission remains largely unaffected. Despite these local compensations, the treated larvae display slower crawling and prolonged recovery from seizures, indicating that the network is functionally compromised.

      Strengths:

      (1) One of the major strengths of this study is the elegant dissection of a defined circuit, tracking changes from premotor interneurons through motoneurons to the NMJ. The multimodal approach provides a comprehensive view of how connected elements respond to CP perturbations.

      (2) An interesting finding is that NMJ morphology changes dramatically without corresponding deficits in synaptic transmission, challenging the common assumption that larger boutons necessarily indicate stronger synapses.

      (3) Another intriguing result is that even with two layers of homeostatic compensation, locomotor behavior is still impaired, highlighting the limits of compensation and underscoring the critical role of CP timing.

      (4) Beyond these scientific insights, the study benefits from a well-defined, tractable system and simple experimental manipulations, which together make the results highly interpretable and reproducible.

      Weaknesses:

      There are a few areas where the manuscript could be strengthened.

      (1) Although A27h premotor neurons are well characterized, the claim that they are the causal driver of downstream changes would be strengthened by additional experiments or a clearer discussion of the temporal hierarchy.

      (2) While 32 {degree sign}C heat stress is presented as ecologically relevant, it produces maladaptive behavioral outcomes, raising questions about the ecological and mechanistic interpretation of the model. In particular, most experiments, with the exception of Figure 1, used prolonged (24h) heat treatments, which could introduce developmental effects beyond the CP itself. Comparing shorter and longer heat exposures would help clarify the specificity of the CP response.

      (3) While there are schematics for experimental procedures, a circuit diagram tracing information flow and indicating where structural and functional changes occur would help readers better understand the findings.

      (4) Finally, the main paradox of the study, that robust homeostatic compensations occur yet behavior remains impaired, could be explored in more depth in the Discussion.

    4. Reviewer #3 (Public review):

      Summary:

      During development, neural circuits undergo brief windows of heightened neuronal plasticity (e.g., critical periods) that are thought to set the lifelong functional properties of underlying circuits. These authors, in addition to others within the Drosophila community, previously characterized a critical period in late fly embryonic development, during which alterations to neuronal activity impact late-stage larval crawling behavior. In the current study, the authors use an ethologically-relevant activation paradigm (increased temperature) to boost motor activity during embryogenesis, followed by a series of electrophysiology and imaging-based experiments to explore how 3 distinct levels of the circuit remodel in response to increases in embryonic motor activity. Specifically, they find that each level of the circuit responds differently, with increased excitatory drive from excitatory pre-motor neurons, reduced excitability in motor neurons, and no physiological changes at the NMJ despite dramatic morphological differences. Together, these data suggest that early life experience in the motor neuron drives compensatory changes at each level of the circuit to stabilize overall network output.

      Strengths:

      The study was well-written, and the data presented were clear and an important contribution to the field.

      Weaknesses:

      The sample sizes and what they referred to throughout the distinct studies were unclear. In the legends, the authors should clearly state for each experiment N=X, and if N refers to an NMJ, for example, instead of an individual animal, they should state N=X NMJs per N=X animals. This will help readers better understand the statistical impact of the study.

    1. eLife Assessment

      This study provides important evidence that negative affect is associated with slower cognitive processing in daily life, with findings replicated across three independent samples and supported by rigorous statistical analyses. The strength of evidence is convincing, though reliance on a proxy measure of processing speed limits the completeness of the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      A study researching the relationship between affective shifts and cognitive performance in a daily life setting.

      Strengths:

      The evidence provided is compelling: the findings are conceptually replicated in three samples of adequate size and statistical rigor in analyzing the data, with methods beyond the current state of the art in applied research. For example, using two-step multilevel vector autoregressive models that were adopted to allow the inclusion of covariates, and contemporaneous effects corrected for temporal relations and background covariates. In addition, the authors use beautiful visualizations to convey the different samples used (Figure 1) and intuitive and rich figures to convey their obtained results.

      In summary, the authors were able to convincingly show that higher negative affect is linked to slower cognitive processing speed, with results supporting their conclusions.

      Weaknesses:

      I have one major concern. Although a check for careless responding has been conducted on the basis of long reaction times, I wonder whether, beyond long response times, any other sanity checks with respect to, e.g., careless responding were done? For example, a lack of variability of EMA items over subsequent occasions, e.g., say 15, is often seen as an indicator of careless responding, especially when using VAS items. In line 693, it is stated, "We added a small amount of random noise, ranging from -0.1 to +0.1, to each EMA time series to allow models to converge when EMA time series showed minimal variance over time", which I understand, but this lack of variability could also be caused by participants stopping to take the study seriously. For datasets 1 and 2, this might be more difficult to assess (due to the limited response values), but maybe the authors can get an indication of this in dataset 3?

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Fittipaldi et al. assessed whether cognitive processing speed - as operationalized by the Digital Questionnaire Response Time (DQRT) - and affect (both positive and negative) are related in contemporaneous and temporaneous ways, both between and within-subject. At the between-person level, they found positive relationships with DQRT and negative affect, and the opposite for positive affect. This was similar at the within-subject contemporaneous level.

      The authors further test Granger-causality in the dynamics, for both Affect -> DQRT and DQRT -> Affect. They find that affect and t-1 is associated with DQRT in the same manner as in the other models (positively for negative affect, and negatively for positive affect). Interestingly, DQRT -> Affect was largely non-significant for most affect items.

      This study adds important information on the associations between affect and cognitive measures outside the lab, showcasing a methodological approach to translate laboratory research to new contexts.

      Strengths

      Overall, this study has a strong methodological approach, which is commendable. The use of three independent samples with different affective measures is a good way to showcase the validity of the findings. The multi-level modelling approach is also done thoroughly and appropriately within the context of MLVAR modelling. The findings are also well visualized, making it easy to follow along with the interconnected and potentially confusing analyses.

      Weaknesses

      The authors use the DQRT as a measure of cognitive processing, which isn't fully validated or substantiated as such. The authors do address this as a limitation, but I believe it warrants a much broader discussion, as the construct being assessed may not be the construct intended by the authors. This makes it difficult to ascertain whether the conclusion drawn (that affect impacts cognitive function) is valid. I would rather frame it that there are associations between affect and response times, which can indicate many different things, be it potentially careless responding or other mechanisms at play.

    1. eLife Assessment

      This important work develops the C. elegans as a model organism for studying effort-based discounting by asking the worms to choose between patches of easy and hard to digest bacteria. The authors provide convincing evidence that the nematodes are effort discounting. They also provide solid evidence of involvement of dopamine in the food preference and that the finding is not restricted to lab-acclimated strains.

    2. Reviewer #1 (Public review):

      Summary:

      Millet et al. show that C. elegans systematically prefers easy-to-eat bacteria but will switch its choice when harder-to-eat bacteria are offered at higher densities, producing indifference points that fit standard economic discounting models. Detailed kinetic analysis reveals that this bias arises from unchanged patch-entry rates but significantly elevated exit rates on effortful food, and dop-3 mutants lose the preference altogether, implicating dopamine in effort sensitivity. These findings extend effort-discounting behavior to a simple nematode, pushing the phylogenetic boundary of economic cost-benefit decision-making.

      Strengths:

      Extends the well-characterized concept of effort discounting into C. elegans, setting a new phylogenetic boundary and opening invertebrate genetics to economic-behavior studies.

      Elegant use of cephalexin-elongated bacteria to manipulate "effort" without altering nutritional or olfactory cues, yielding clear preference reversals and reproducible indifference points.

      Application of standard discounting models to predict novel indifference points is both rigorous and quantitatively satisfying, reinforcing the interpretation of worm behavior in economic terms.

      The three-state patch-model cleanly separates entry and exit dynamics, showing that increased leaving rates-rather than altered re-entry-drive choice biases.

      Demonstrates that _dop-3_ mutants lose normal effort discounting, firmly tying monoaminergic signaling to this behavior and paralleling vertebrate findings.

      Demonstration of discounting in wild strain (solid evidence).

      Weaknesses:

      Only _dop-3_ shows an effect, whereas _cat-2_/_dat-1_ do not, leaving the broader role of dopamine synthesis and reuptake ambiguous.

      With only five wild isolates tested, and only one clearly showing clear evidence of preference for the easy to eat bacteria, it's hard to conclude that effort discounting isn't a lab-strain artifact or how broadly it varies in natural populations.

    3. Reviewer #2 (Public review):

      Summary:

      Here Millet et al. adapted a t-maze paradigm for use in C. elegans to understand whether nematodes exhibit effort discounting behaviors comparable to other species. C. elegans worms were reliably sensitive to how effortful the food was to consume, allowing for the application of standard economic models of decision-making to be applied to their behavior. The authors then demonstrated the necessity of dopamine signaling for this behavior, identifying dop-3 mutants in particular as insensitive to effort. Together, this work establishes a new model system for the study of discounting behavior in cost-benefit decision-making.

      Strengths:

      The question is well-motivated and the approach taken here is novel; it is uncommon for worms to undergo such behavioural procedures (although this lab has previously been integral to pushing the extent of the complexity of behaviours studied in C. elegans). The authors are careful in their approach to altering and testing the properties of the elongated bacteria. Similarly, they go to some effort to understand what exactly is driving behavioural choices in this context, both through application of simple standard models of effort discounting and a kinetic analysis of patch leaving. The comparisons to various dopamine mutants further extends the translational potential of their findings. I also appreciate the comparison to natural isolate strains as the question of whether this behaviour may be driven by some sort of strain-specific adaptation to the environment is not regularly addressed in mammalian counterparts to this work.

      Weaknesses:

      The authors have now addressed concerns about whether the mechanisms underlying the choice behavior here are generalizable to other organisms. Specifically, their work speaks to foraging-inspired effort discounting paradigms in rodents and humans in which the decision is whether to stay or leave a given resource, rather than to simultaneous decision-making across two options in a T-maze.

      The dopamine results are interesting but still difficult to interpret. As the authors discuss, the lack of an effect in the cat-2 and dat-1 mutants is surprising given the effect in the dop-3 mutants. Understanding what exactly the role of dop-3 is here therefore requires further study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors establish a behavioral task to explore effort discounting in C. elegans. By using bacterial food that takes longer to consume, the authors show that for equivalent effort, as measured by pumping rate, animals obtain less food, as measured by fat deposition.

      The authors formalize the task by applying a neuroeconomic decision making model that includes, value, effort, and discounting. They use this to estimate the discounting C. elegans apply based on ingestion effort by using a population level 2-choice T-maze.

      They then analyze the behavioral dynamics of individual animals transitioning between on-food and off-food states. Harder to ingest bacteria led to increased food patch leaving.

      Finally, they examined a set of mutants defective in different aspects of dopamine signaling, as dopamine plays a key role in discounting in vertebrates and regulates certain aspects of C. elegans foraging.

      In their response to the first set of reviews, the authors take care to ensure their task is analogous to at least some of those used in mammals and make changes to the text to better clarify some of their conclusions. My view is the same--that this is an interesting paper for methodological and scientific reasons that brings an important theoretical framework to bear on C. elegans foraging behavior. While I think the mutant results are somewhat unsatisfying, this is not the principal contribution of the work.

      Strengths:

      The behavioral experiments and neuroeconomic analysis framework are compelling and interesting and make a significant contribution to the field. While these foraging behaviors have been extensively studied, few include clearly articulated theoretical models to be tested.

      Demonstrating that C. elegans effort discounting fits model predictions and has stable indifference points is important for establishing these tasks as a model for decision making.

      Weaknesses:

      The dopamine experiments are harder to interpret. The authors point out the perplexing lack of an effect of dat-1 and cat-2. dop-3 leads to general indifference. I am not sure this is the expected result if the argument is a parallel functional role to discounting in vertebrates. dop-3 causes a range of locomotor phenotypes and may affect feeding (reduced fat storage), and thus there may be a general defect in the ability to perform the task rather than anything specific to discounting.

      That said, some of the other DA mutants also have locomotor defects and do not differ from N2. But there is no clear result here-my concern is that global mutants in such a critical pathway exhibit such pleiotropy that it's difficult to conclude there is a clear and specific role for DA in effort discounting. This would require more targeted or cell-specific approaches. The authors state these experiments are outside the scope of the current study, and that at minimum their results implicate dopamine signaling in some form. I tend to agree but still think locomotion defects of DA mutants complicate this question.

      Meanwhile, there are other pathways known to affect responses to food and patch leaving decisions-5HT, PDF, tyramine, etc. in their response the authors state they focus on dopamine because of its role in discounting behavior in mammals.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Reviews):

      Summary: 

      Here, Millet et al. consider whether the nematode C. elegans 'discounts' the value of reward due to effort in a manner similar to that shown in other species, including rodents and humans. They designed a T-maze effort choice paradigm inspired by previous literature, but manipulated how effortful the food is to consume.C. elegans worms were sensitive to this novel manipulation, exhibiting effort-discountinglike behaviour that could be shaped by varying the density of food at each alternative in order to calculate an indifference point. This discounting-like behaviour was related to worms' rates of patch leaving, which differed between the low and high effort patches in isolation. The authors also found a potential relationship to dopamine signalling, and also that this discounting behaviour was not specific to lab-based strains of C. elegans

      Strengths: 

      The question is well-motivated, and the approach taken here is novel. The authors are careful in their approach to altering and testing the properties of the effortful, elongated bacteria. Similarly, they go to some effort to understand what exactly is driving behavioural choices in this context, both through the application of simple standard models of effort discounting and a kinetic analysis of patch leaving. The comparisons to various dopamine mutants further extend the translational potential of their findings. I also appreciate the comparison to natural isolate strains, as the question of whether this behaviour may be driven by some sort of strain-specific adaptation to the environment is not regularly addressed in mammalian counterparts. The manuscript is well-written, and the figures are clear and comprehensible. 

      Weaknesses: 

      Discounting is typically defined as the alteration of a subjective value by effort (or time, risk, etc.), which is then used to guide future decision-making. By adapting the standard t-maze task for C. elegans as a patch-leaving paradigm, the authors observe behaviour strongly consistent with discounting models, but that is likely driven by a different process, in particular by an online estimate of the type of food in the current patch, which then influences patch-leaving dynamics (Figure 3). This is fundamentally different from decision-making strategies relating to effort that have been described in the rodent and human literatures. 

      We agree that in our study worms are likely making an on-line estimate of food quality in the current patch, but we wish to point out that rodents and humans also use on-line estimates in some significant effort-discounting paradigms. With respect to rodents, we call attention to effort discounting studies involving the widely used progressive ratio task (references in Discussion). In this task, animals can either lever-press for a preferred food or consume a less preferred food that is freely available nearby. However, the number of lever presses required to obtain preferred food increases as a function of the cumulative number of lever presses until the effort-cost of obtaining preferred food becomes too high and the animal switches to a freely available food. In essence, the lever and the freely available food are patches and the animal decides whether or not to leave the “lever” patch. It seems inescapable that the progressive ratio task involves an on-line assessment of the cost/benefit relationship associated with lever pressing. With respect to humans, one highly cited study (reference in Discussion) presented participants with a series of virtual apple trees. They could see how many apples are in the current tree and how much effort (squeezing a handgrip) is required to gather them. Their task was to decide whether or not to gather apples from that tree based on the perceived cost and benefit. Thus, on-line estimation is a common strategy used by animals and humans as shown in the effort discounting literature. We now make this point in the Discussion section titled A model of effort-discounting like behavior.

      Similarly, the calculation of indifference points at the group instead of at the individual level also suggests a different underlying process and limits the translational potential of their findings. The authors do not discuss the implications of these differences or why they chose not to attempt a more analogous trial-based experiment.  

      It is not clear to us why changing the read-out –– from the individual level to the population level –– necessarily suggests that a different biological mechanism is at work. In our view, there is one mechanism and it can be seen from different perspectives (e.g., individual vs population). Furthermore, the analogous trial-based experiment, as we understand it, would be to record behavior one worm at a time in the T-maze. This design is not practical because it entails recording a large number of single worms in the T-maze for 60 min each. 

      In the case of both the dopamine and natural isolate experiments, the data are very noisy despite large (relative to other C. elegans experiments) sample sizes. In the dopamine experiment, disruption of dop1, dop-2, and cat-2 had no statistically significant effect. There do not appear to be any corrections for multiple comparisons, and the single significant comparison, for dop-3, had a small effect size. 

      An ANOVA followed by a Dunnett test was used to test differences between groups in Fig. 4 and 5. The Dunnett test is a multiple comparison test comparing experimental groups to a single control group. It is used to minimize type I error while maintaining statistical power and does not require further correction for multiple comparisons. We have clarified the use of the Dunnett test in the statistical table.  The effect size for dop-3 is 0.5 (Cohen’s d), which is typically interpreted as a medium, not small, effect size.(e.g. Cohen, Psychological Bulletin, 1992, Vol. 112. No. 1,155-159). 

      More detailed behavioural analyses on both these and the wild isolate strains, for example by applying their kinetic analysis, would likely give greater insight as to what is driving these inconsistent effects. 

      More detailed behavioral analysis could reveal why we observe a difference in effort discounting in some strains and not others. However, it is not obvious what type of behavioral analysis would be needed to differentiate between pleiotropic effects of the mutations/natural isolates and more specific effects on effort discounting. A simple kinetic analysis in particular may not be enough to reveal relevant differences between mutants/natural isolates. For this reason, we think that such experiments may be better suited for future follow up studies.

      Reviewer #2 (Public Reviews)

      Summary: 

      Millet et al. show that C. elegans systematically prefers easy-to-eat bacteria but will switch its choice when harder-to-eat bacteria are offered at higher densities, producing indifference points that fit standard economic discounting models. Detailed kinetic analysis reveals that this bias arises from unchanged patch-entry rates but significantly elevated exit rates on effortful food, and dop-3 mutants lose the preference altogether, implicating dopamine in effort sensitivity. These findings extend effortdiscounting behavior to a simple nematode, pushing the phylogenetic boundary of economic costbenefit decision-making. 

      Strengths: 

      (1) Extends the well-characterized concept of effort discounting into C. elegans , setting a new phylogenetic boundary and opening invertebrate genetics to economic-behavior studies. 

      (2) Elegant use of cephalexin-elongated bacteria to manipulate "effort" without altering nutritional or olfactory cues, yielding clear preference reversals and reproducible indifference points. 

      (3) Application of standard discounting models to predict novel indifference points is both rigorous and quantitatively satisfying, reinforcing the interpretation of worm behavior in economic terms. 

      (4) The three-state patch-model cleanly separates entry and exit dynamics, showing that increased leaving rates-rather than altered re-entry-drive choice biases. 

      (5) Investigates the role of dopamine in this behavior to try to establish shared mechanisms with vertebrates. 

      (6) Demonstration of discounting in wild strain (solid evidence). 

      Weaknesses: 

      (1) The kinetic model omits rich trajectory details-such as turning angles or hazard functions-that could distinguish a bona fide roaming transition from other exit behaviors. 

      The overarching goal of present paper was to develop a simple model for effort discounting in a small, genetically tractable organism.  Accordingly,  we focused on quantitative assays that are easy to implement and analyze. The patch-leaving assay and its associated kinetic analysis are one such assay. To keep things simple in this assay, we counted the number of  transitions between the three states shown in Fig. 3A. We chose not to analyze the data in terms of turning angles or hazard functions because the metrics we developed seemed sufficient. Finally, we note that there are new modeling data showing that the presumptive transitions into the roaming state can be explained in terms of a one-state stochastic model in which there is no discrete roaming state (Elife. 2025 Jul 30;14:RP104972. doi:

      10.7554/eLife.104972.PMID: 40736321).

      (2) Only dop-3 shows an effect, and the statistical validity of this result is questionable. It is not clear if the authors corrected for multiple comparisons, and the effect size is quite small and noisy, given the large number of worms tested. Other mutants do not show effects. Given these two concerns, the role of dopamine in C. elegans effort discounting was unconvincing. 

      An ANOVA followed by a Dunnett test was used to test statistical significance in figures 4 and 5 (see above for a discussion of these tests). We believe this approach is rigorous, and the use of these tests is statistically valid. We note that the effect size for this comparison was medium.

      (3) With only five wild isolates tested (and variable data quality), it's hard to conclude that effort discounting isn't a lab-strain artifact or how broadly it varies in natural populations. 

      The fact that four of the five natural isolates tested display levels of effort discounting similar to N2 (only one natural isolate does not display effort discounting) argues against effort discounting being a laboratory adaption.  We have nevertheless weakened the claim regarding natural isolates. We now say effort discounting-like behavior may not be an adaptation to the laboratory environment.  

      (4) Detailed analysis of behavior beyond preference indices would strengthen the dopamine link and the claim of effort discounting in wild strains. 

      Going beyond preference in the behavioral analysis might or might not reveal new phenotypes that strengthen the link with dopamine. At present, however, we think such experiments are beyond the scope of the paper.

      (5) A few mechanistic statements (e.g., tying satiety exclusively to nutrient signals) would benefit from explicit citations or brief clarifications for non-worm specialists. 

      We are unable to identify a mechanistic statement tying satiety to nutrient signals in our manuscript.

      Reviewer #3 (Public Reviews)

      Summary: 

      The authors establish a behavioral task to explore effort discounting in C. eleganss . By using bacterial food that takes longer to consume, the authors show that, for equivalent effort, as measured by pumping rate, they obtain less food, as measured by fat deposition. The authors formalize the task by applying a formal neuroeconomic decision-making model that includes value, effort, and discounting. They use this to estimate the discounting that C. elegans applies based on ingestion effort by using a population-level 2-choice T-maze. They then analyze the behavioral dynamics of individual animals transitioning between on-food and off-food states. Harder to ingest bacteria led to increased food patch leaving. Finally, they examined a set of mutants defective in different aspects of dopamine signaling, as dopamine plays a key role in discounting in vertebrates and regulates certain aspects of C. elegans foraging. 

      Strengths: 

      The behavioral experiments and neuroeconomic analysis framework are compelling, interesting, and make a significant contribution to the field. While these foraging behaviors have been extensively studied, few include clearly articulated theoretical models to be tested. 

      Demonstrating that C. elegans effort discounting fits model predictions and has stable indifference points is important for establishing these tasks as a model for decision making. 

      Weaknesses: 

      The dopamine experiments are harder to interpret. The authors point out the perplexing lack of an effect of dat-1 and cat-2. dop-3 leads to general indifference. I am not sure this is the expected result if the argument is a parallel functional role to discounting in vertebrates. dop-3 causes a range of locomotor phenotypes and may affect feeding (reduced fat storage), and thus, there may be a general defect in the ability to perform the task rather than anything specific to discounting.

      That said, some of the other DA mutants also have locomotor defects and do not differ from N2. But there is no clear result here - my concern is that global mutants in such a critical pathway exhibit such pleiotropy that it's difficult to conclude there is a clear and specific role for DA in effort discounting. This would require more targeted or cell-specific approaches. 

      We agree with the reviewer that the results of the dopamine experiments are puzzling and getting a better understanding of the role of dopamine in effort-discounting will require more sensitive assays and different experimental approaches (e.g. cell-specific rescues). However, as mentioned by the reviewer, all the mutations tested have some pleiotropic effects, yet only dop-3 displays a defect in effort discounting. This, in our opinion, points to a specific role of dop-3 in effort-discounting in C. elegans. This point is now made in the Discussion in the section titled Role of dopamine signaling in effort discountinglike behavior.

      Meanwhile, there are other pathways known to affect responses to food and patch leaving decisions: serotonin, pigment-dispersing factor, tyramine, etc. The paper would have benefited from a clarification about why these were not considered as promising candidates to test (in addition to or instead of dopamine). 

      We focused on DA because of its well-established effect on effort discounting in rodents.

      Testing other pathways is a goal for future research.

      Reviewer #1 (Recommendations for the authors):

      The current results are more a reframing of data gathered from a patch-leaving paradigm, but described in the form of economic choice modelling in which discounting is one possible explanation. One more parsimonious explanation that worms estimate in real-time some rate of reward and leave the patch at some threshold, consistent with canonical foraging models, previous experiments in C. elegans, and the authors' own data (Figure 3). Therefore, I am wary about some of the claims made in this manuscript, such as 'decision-making strategies based on effort-cost trade-offs are evolutionarily conserved'. 

      These points are now addressed in the Discussion in a revised section titled A model of effortdiscounting like behavior. (i) We now call attention to the fact that our T-maze assay is a patch-leaving foraging paradigm. (ii) We now propose a revised model in which “worms make an on-line assessment of food value in the current patch which in turn alters patch-leaving dynamics, increasing the exit rates from cephalexin-treated patches as shown in Figure 3.” (iii) We now provide evidence from the rodent and human literature that the strategy of on-line assessment of reward value may be evolutionarily conserved in the case of a class of effort discounting tasks whose solution requires on-line assessments. 

      If the reason the authors chose to do a patch-leaving style task rather than a traditional t-maze is because C. elegans is unable to retain the sort of information necessary to make such simultaneous decisions - e.g., if pre-training on the two options isn't possible - then this in itself suggests that mechanisms underlying these decisions in worms and mammals are unlikely to be the same. I mention this because I would like to suggest to the authors an alternative interpretation: that patch foraging is actually 'the' canonical computation that translates across species. This would, in fact, be nicely consistent with some other recent modelling work in humans, e.g., https://www.biorxiv.org/content/10.1101/2025.05.06.652482v1

      Please see the previous response.

      Reviewer #2 (Recommendations for the authors):

      Can you provide a picture of the regular and CEPH bacteria? 

      Done (see Figure 1––figure supplement 1).

      Reviewer #3 (Recommendations for the authors):

      I would recommend testing representative mutants in other pathways in the choice task. If possible, more targeted experiments with dop-3, including either cell-specific KOs or rescues, would very much strengthen this aspect of the paper. 

      While valuable, these experiments are out of scope for the present study.

    1. eLife Assessment

      This important study combines behavioural psychophysics with image-computable models to contrast a view-selective model of face recognition with a view-tolerant process. Although diagnostic orientations vary with viewpoint (horizontal for frontal, vertical for profile), human recognition remains consistently tuned to horizontal information, aligning with the view-tolerant model's predictions. The evidence for view-invariant recognition is solid, though testing more plausible model variants and considering generalisability to more naturalistic face stimuli would strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

    3. Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

    1. eLife Assessment

      This important study uses a combination of behavioral and molecular techniques to identify neuromodulators that influence blood-feeding behavior in the disease vector, Anopheles stephensi. Through a combination of gene expression analysis and RNA knockdown, the authors identify neuropeptides RYamide and sNPF as candidate regulators for blood-feeding, demonstrate behavioral changes upon co-knockdown, and anatomically characterize their expression patterns. While the evidence for behavioral characterization and expression mapping is solid, the evidence supporting a direct causal role for these neuropeptides in promoting host-seeking remains unproven.

    2. Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      (3) Figure 1<br /> Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.<br /> Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      (4) Figure 3<br /> In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      (5) Figure 4<br /> The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well. Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding? In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data, and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligand-promiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      (10) Methods<br /> In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible? For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that  after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism  after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only  after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways.

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neuro-transcriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model.

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria.

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.

      4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in the graphs inserted below.

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following:

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligand promiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns.

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impacwul. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.

      b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.

      c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformating here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides).

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-host seeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios.

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability.

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 2023). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline  after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa_mediated control of blood feeding in _An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected.

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. eLife Assessment

      This manuscript investigates inter-hemispheric interactions in the olfactory system of Xenopus tadpoles. Using a combination of electrophysiology, pharmacology, imaging, and uncaging, the transection of the contralateral nerve is shown to lead to larger odor responses in the unmanipulated hemisphere, and implicates dopamine signaling in this process. The study uses a rich and sophisticated array of tools to investigate olfactory coding and uncovers valuable mechanisms of signaling. However, the data is incomplete, with a few of the conclusions not being well-supported by the data; the interpretation should be adjusted with some caveats, or additional experiments should be done to support these conclusions.

    2. Reviewer #1 (Public review):

      In this study, the authors investigate LFP responses to methionine in the olfactory system of the Xenopus tadpole. They show that this response is local to the glomerular layer, arises ipsilaterally, and is blocked by pharmacological blockade of AMPA and NMDA receptors, with little modulation during blockade of GABA-A receptors. They then show that this response is translently enlarged following transection of the contralateral olfactory nerve, but not the optic lobe nerve. Measurement of ROS- a marker of inflammation- was not affected by contralateral nerve transection, and LFP expansion was not affected by pharmacological blockade of ROS production. Imaging biased towards presynaptic terminals suggests that the enlargement of the LFP has a presynaptic component. A D2 antagonist increases the LFP size and variability in intact tadpoles, while a GABA-B antagonist does not. On this basis, the authors conclude that the increase driven by contralateral nerve transection is due to DA signaling.

      Overall, I found the array of techniques and approaches applied in this study to be creatively and effectively employed. However, several of the conclusions made in the Discussion are too strong, given the evidence presented. For example, the authors state that "The observed potentiation was not related to inflammatory mediators associated to inury, because it was caused by a release of the inhibition made by D2 dopamine receptor present in OSN axon terminals." This statement is too strong - the authors have shown that D2 receptors are sufficient to cause an increase in LFP, but not that they are required for the potentiation evoked by nerve transection. The right experiment here would be to get rid of the D2 receptors prior to transection and show that the potentiation is now abolished. In addition, the authors have not shown any data localizing D2 receptors to OSN axon terminals.

      Similarly, the authors state, "the onset of LFP changes detected in glomeruli is determined by glutamate release from OSNs." Again, the authors have shown that blockade of AMPA/NMDA receptors decreases the LFP, and that uncaging of glutamate can evoke small negative deflections, but not that the intact signal arises from glutamate release from OSNs. The conclusions about the in vivo contribution of this contralateral pathway are also rather speculative. Acute silencing of one hemisphere would likely provide more insight into the moment-to-moment contributions of bilateral signals to those recorded in one hemisphere.

    3. Author response:

      Thank you for your time and for considering our manuscript as a Reviewed Preprint. We also would like to thank Reviewer 1 for their evaluation of our manuscript.

      Here, we present a provisional response to reviewer comments and following their suggestions we will make an effort to: i) increase evidence for the role of dopamine in olfactory glomeruli and ii) delineate the circuit involved mediating the observed potentiation. Next, we briefly describe the set of experiments that are in progress or will be performed to improve our paper.

      We will carry out immunostainings for tyrosine hydroxylase to certify that dopamine can be released on the genetically labelled glomerulus. There is a lack of good commercial antibodies for Xenopus (we already tried one and did not work, PA1-4679, Thermofisher scientific), but we will look for alternatives. In a previous set of experiments, we attempted to measure dopamine release in the glomerular layer by electroporating olfactory sensory neurons or olfactory bulb neurons with the dopamine sensors dLight1.1 (Addgene #111053) or dLight1.3 (Addgene # 111056). In our hands, fluorescence signals were extremely weak, barely undetectable. Similar results were obtained after electroporating the tectum or the rhombencephalon. We propose to repeat experiments using a more sensitive sensor such as GRAB_DA2m. Other approaches, such as performing single cell transcriptomics of olfactory sensory neurons might be considered to confirm the expression of D2 receptors.

      We agree with the reviewer that we should obtain more lines of evidence in support for a presynaptic inhibition mediated by D2 receptors.To gain insight on the bilateral circuit mediating the observed potentiation of glomerular responses we are currently investigating the role of dorsolateral pallium neurons. In Xenopus tadpoles the lateral pallium plays an analogous role to the olfactory cortex in amniotes. Preliminary observations show that neurons located in this pallial region respond to ipsilateral stimulation of the olfactory epithelium and if damaged, a contralateral potentiation of glomerular output occurs. We aim to conclude this set of experiments and include it in the paper as we believe it clarifies the circuitry involved.

    1. eLife Assessment

      This valuable developmental study provides intriguing but incomplete evidence suggesting that, relative to adults, the enhancement of instrumental learning by Pavlovian bias is most pronounced in adolescence, while reward-induced memory enhancements are strongest in childhood. Although the authors tackle a key aspect of learning and motivation with rigorous experimental methods and sophisticated modeling techniques, there are substantial concerns about the absence of relevant analyses, the lack of accord between model-based and exploratory analyses, and the lack of an explanation for how the results cohere with inconsistent findings in the literature.

    2. Reviewer #1 (Public review):

      In this study, the authors aim to elucidate both how Pavlovian biases affect instrumental learning from childhood to adulthood, as well as how reward outcomes during learning influence incidental memory. While prior work has investigated both of these questions, findings have been mixed. The authors aim to contribute additional evidence to clarify the nature of developmental changes in these processes. Through a well-validated affective learning task and a large age-continuous sample of participants, the authors reveal that adolescents outperform children and adults when Pavlovian biases and instrumental learning are aligned, but that learning performance does not vary by age when they are misaligned. They also show that younger participants show greater memory sensitivity for images presented alongside rewards.

      The manuscript has notable strengths. The task was carefully designed and modified with a clever, developmentally appropriate cover story, and the large sample size (N = 174) means their study was better powered than many comparable developmental learning studies. The addition of the memory measure adds a novel component to the design. The authors transparently report their somewhat confusing findings.

      The manuscript also has weaknesses, which I describe in detail below.

      It was not entirely clear to me what central question the researchers aimed to address. They note that prior studies using a very similar learning task design have reported inconsistent findings, but they do not propose a reason for why these inconsistent findings may emerge nor do they test a plausible cause of them (in contrast, for example, Raab et al. 2024 explicitly tested the idea that developmental changes in inferences about controllability may explain age-related change in Pavlovian influences on learning). While the authors test a sample of participants that is very large compared to many developmental studies of reinforcement learning, this sample is much smaller than two prior developmental studies that have used the same learning task (and which the authors cite - Betts et al., 2020; Moutoussis et al., 2018). Thus, the overall goal seems to be to add an additional ~170 subjects of data to the existing literature, which isn't problematic per se, but doesn't do much to advance our theoretical understanding of learning across development. They happen to find a pattern of results that differs from all three prior studies, and it is not clear how to interpret this.

      Along those lines, the authors extend prior work by adding a memory manipulation to the task, in which trial-unique images were presented alongside reward outcomes. It was not clear to me whether the authors see the learning and memory questions as fundamentally connected or as two separate research questions that this paradigm allows them to address. The manuscript would potentially be more impactful if the authors integrated their discussion of these two ideas more. Did they have any a priori hypotheses about how Pavlovian biases may affect the encoding of incidentally presented images? Could heightened reward sensitivity explain both changes in learning and changes in memory? It was also not clear to me why the authors hypothesized that younger participants would demonstrate the greatest effects of reward on memory, when most of the introduction seems to suggest they might hypothesize an adolescent peak in both learning and memory.

      As stated above, while the task methods seemed sound, some of the analytic decisions are potentially problematic and/or require greater justification for the results of the study to be interpretable.

      Firstly, it is problematic not to include random participant slopes in the regression models. Not accounting for individual variation in the effects of interest may inflate Type I errors. I would suggest that the authors start with the maximal model, or follow the same model selection procedure they did to select the fixed effects to include for the random effects as well.

      Secondly, the central learning finding - that adolescents demonstrate enhanced learning in Pavlovian-congruent conditions only - is interesting, but it is unclear why this is the case or how much should be made of this finding. The authors show that adolescents outperform others in the Pavlovian-congruent conditions but not the Pavlovian-incongruent conditions. However, this conclusion is made by analyzing the two conditions separately; they do not directly compare the strength of the adolescent peak across these conditions, which would be needed to draw this strong conclusion. Given that no prior study using the same learning design has found this, the authors should ensure that their evidence for it is strong before drawing firm conclusions.

      It was also not clear to me whether any of the RL models that the authors fit could potentially explain this pattern. Presumably, they need an algorithmic mechanism in which the Pavlovian bias is enhanced when it is rewarded. This seems potentially feasible to implement and could help explain the condition-specific performance boosts.

      I also have major concerns about the computational model-fitting results. While the authors seemingly follow a sound approach, the majority of the fitted lapse rates (Figure S10) are near 1. This suggests that for most participants, the best-fitting model is one in which choices are random. This may be why the authors do not observe age-related change in model parameters: for these subjects, the other parameter values are essentially meaningless since they contribute to the learned value estimate, which gets multiplied by a near-0 weight in the choice function. It is important that the authors clarify what is going on here. Is it the case that most of these subjects truly choose at random? It does seem from Figure 2A that there is extensive variability in performance. It might be helpful if the authors re-analyze their data, excluding participants who show no evidence of learning or of reward-seeking behavior. Alternatively, are there other biases that are not being accounted for (e.g., choice perseveration) that may contribute to the high lapse rates?

      Parameter recovery also looks poor, particularly for gain & loss sensitivity, the lapse rate, and the Pavlovian bias - several parameters of interest. As noted above, this may be due to the fact that many of the simulations were conducted with lapse rates sampled from the empirical distribution. It would be helpful for the authors to a.) plot separately parameter recoverability for high and low lapse rates and b.) report the recoverability correlation for each parameter separately.

      Finally, many of the analytic decisions made regarding the memory analyses were confusing and merit further justification.

      (1) First, it seems as though the authors only analyze memory data from trials where participants "could gain a reward". Does this mean only half of the memory trials were included in the analyses? What about memory as a function of whether participants made a "correct" response? Or a correct x reward interaction effect?

      (2) The RPE analysis overcomes this issue by including all trials, but the trial-wise RPEs are potentially not informative given the lapse rate issue described above.

      (3) The authors exclude correct guesses but include incorrect guesses. Is this common practice in the memory literature? It seems like this could introduce some bias into the results, especially if there are age-related changes in meta-memory.

      (4) Participants provided a continuum of confidence ratings, but the authors computed d' by discretizing memory into 'correct' or 'incorrect'. A more sensitive approach could compute memory ROC curves taking into account the full confidence data (e.g., Brady et al., 2020).

      (5) The learning and memory tradeoff idea is interesting, but it was not clear to me what variables went into that regression model.

    3. Reviewer #2 (Public review):

      The authors of this study set out to investigate whether adolescents demonstrate enhanced instrumental learning compared to children and adults, particularly when their natural instincts align with the actions required in a learning task, using the Affective Go/No-Go Task. Their aim was to explore how motivational drives, such as sensitivity to rewards versus avoiding losses, and the congruence between automatic responses to cues and deliberate actions (termed Pavlovian-congruency) influence learning across development, while also examining incidental memory enhancements tied to positive outcomes. Additionally, they sought to uncover the cognitive mechanisms underlying these age-related differences through behavioral analyses and reinforcement learning models.

      The study's major strengths lie in its rigorous methodological approach and comprehensive analysis. The use of mixed-effects logistic regression and beta-binomial regression models, with careful comparison of nested models to identify the best fit (e.g., a significant ΔBIC of 19), provides a robust framework for assessing age-related effects on learning accuracy. The task design, which separates action (pressing a key or holding back) from outcome type (earning money or avoiding a loss) across four door cues, effectively isolates these factors, allowing the authors to highlight adolescent-specific advantages in Pavlovian-congruent conditions (e.g., Go to Win and No-Go to Avoid Loss), supported by significant quadratic age interactions (p < .001). The inclusion of reaction time data and a behavioral metric of Pavlovian bias further strengthens the evidence, showing adolescents' faster responses and greater reliance on instinctual cues in congruent scenarios. The exploration of incidental memory, with a clear reward memory bias in younger participants (p < .001), adds a valuable dimension, suggesting a learning-memory trade-off that enriches the study's scope. However, weaknesses include minor inconsistencies, such as the reinforcement learning model's Pavlovian bias parameter not reflecting an adolescent enhancement despite behavioral evidence, and a weak correlation between learning and memory accuracy (r = -.17), which may indicate incomplete integration of these processes.

      The authors largely achieved their aims, with the results providing convincing support for their conclusion that Pavlovian-congruency boosts instrumental learning in adolescence. The significant quadratic age effects on overall learning accuracy (p = .001) and in congruent conditions (e.g., p = .01 for Go to Win), alongside faster reaction times in these scenarios, convincingly demonstrate an adolescent peak in performance. While the reinforcement learning model's lack of an adolescent-specific Pavlovian bias parameter introduces a slight caveat, the behavioral and statistical evidence collectively align with the hypothesis, suggesting that adolescents leverage their natural instincts more effectively when these align with task demands. The incidental memory findings, showing younger participants' enhanced recall for reward-paired images, partially support the secondary aim, though the trade-off with learning accuracy warrants further exploration.

      This work is likely to have an important impact on the field, offering valuable insights into developmental differences in learning and memory that could influence educational practices and psychological interventions tailored to adolescents. The methods, particularly the task's orthogonal design and probabilistic feedback, are useful to the community for studying motivation and cognition across ages, while the detailed regression analyses and reinforcement learning approach provide a solid foundation for future replication and extension. The data, including trial-by-trial accuracy and memory performance, are openly shareable, enhancing their utility for researchers exploring similar questions, though refining the model-parameter alignment could strengthen its broader applicability.

    4. Author response:

      We thank both reviewers for their thoughtful and constructive comments. To address this feedback, we plan to do the following:

      Questions/Hypotheses: We will clarify the study’s motivation, central questions, and our hypotheses, with a particular focus on the integration across learning and memory.

      Methods: To improve clarity and transparency, we will expand the Methods section and modify relevant figures to provide more explanation of the task, our decisions regarding data analysis approaches, and how they address our questions and hypotheses.

      Learning Behavioral Analysis: As suggested by reviewers, we will fit and compare mixed-effects models with the maximal random effects structure for the within-subject variables and their interactions. We may simplify this structure as the data justify (i.e., if we encounter convergence problems or the random effects explain minimal variance). In the revision, we will also directly compare the adolescent peaks in performance across the conditions to support our conclusion that adolescents outperform people of other ages in the Pavlovian-congruent conditions.

      Computational Modeling: We appreciate the reviewers’ close attention to the computational modeling methods, as it identified a small error in the reporting of the formulas we implemented. Specifically, the preprint’s softmax function had an error and should be printed as:

      This correct parameterization can be seen in the Huys, 2018 public repository on line 48 here. As such, rather than indicating random choices, the lapse rates with estimated solutions close to one represent expected goal-directed behavior. That said, we acknowledge that parameter recovery indicated potential identifiability issues for some parameters, especially those with extreme values. We appreciate the reviewer’s suggestion to examine “learners” separately from “non-learners,” as has been done in prior work with adults (Cavanagh et al., 2013; Guitart-Masip et al., 2012). In this revision, we will investigate whether behavioral differences in learners vs. non-learners, among other potential explanations, accounts for the relatively poor parameter recovery. We will also explain more about why we selected these RL models, including how the Pavlovian policy works and why it adequately captures participants’ behavior.

      Memory Behavioral Analysis: At the reviewers’ suggestion, we will expand our analysis of the learning-memory trade-off to fully explore this possible explanation. We will also explore the additional analyses that the reviewers suggested (e.g., ROC curves accounting for confidence ratings, analysis of correct vs. incorrect responses).

      We are confident that these revisions will strengthen the work, and we are grateful to the reviewers for their thorough, insightful feedback. In the coming revision, we will provide a detailed point-by-point response to all comments and questions.

      References

      Cavanagh, J. F., Eisenberg, I., Guitart-Masip, M., Huys, Q., & Frank, M. J. (2013). Frontal Theta Overrides Pavlovian Learning Biases. The Journal of Neuroscience, 33(19), 8541–8548. https://doi.org/10.1523/JNEUROSCI.5754-12.2013

      Guitart-Masip, M., Huys, Q. J. M., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012). Go and no-go learning in reward and punishment: Interactions between affect and effect. NeuroImage, 62(1), 154–166. https://doi.org/10.1016/j.neuroimage.2012.04.024

      Huys, Q. J. M. (2018). Bayesian Approaches to Learning and Decision-Making. In Computational Psychiatry (pp. 247–271). Elsevier. https://doi.org/10.1016/B978-0-12-809825-7.00010-9

    1. eLife Assessment

      This study provides a valuable contribution to understanding the functional and molecular organization of the medial nucleus accumbens shell in feeding. Using in vivo imaging, optogenetics, and genetic engineering, the authors present solid evidence for a rostro-caudal gradient in D1-SPN activity that refines earlier pharmacological models. The identification of Stard5 and Peg10 as molecular markers and the creation of a Stard5-Flp line represent meaningful advances for future circuit-specific studies. While stronger integration of molecular and functional results and additional analyses of other Stard5-expressing cell types (e.g., D2-SPNs, interneurons) would enhance completeness, the overall methodological rigor and convergence of findings make this a well-executed and informative study. This will be of interest to those interested in brain circuits, reward, emotion, and feeding behavior.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines how different parts of the brain's reward system regulate eating behavior. The authors focus on the medial shell of the nucleus accumbens, a region known to influence pleasure and motivation. They find that nerve cells in the front (rostral) portion of this region are inhibited during eating, and when artificially activated, they reduce food intake. In contrast, similar cells at the back (caudal) are excited during eating but do not suppress feeding. The team also identifies a molecular marker, Stard5, that selectively labels the rostral hotspot and enables new genetic tools to study it. These findings clarify how specific circuits in the brain control hedonic feeding, providing new entry points to understand and potentially treat conditions such as overeating and obesity.

      Strengths:

      (1) Conceptual advance: The work convincingly establishes a rostro-caudal gradient within the medNAcSh, clarifying earlier pharmacological studies with modern circuit-level and genetic approaches.

      (2) Methodological rigor: The combination of fiber photometry, optogenetics, CRISPR-Cas9 genetic engineering, histology, FISH, scRNA-seq, and novel mouse genetics adds robustness, with complementary approaches converging on the central claim.

      (3) Innovation: The generation of a Stard5-Flp line is a valuable resource that will enable precise interrogation of the rostral hotspot in future studies.

      (4) Specificity of findings: The dissociation between appetitive and aversive conditions strengthens the interpretation that the observed gradient is restricted to feeding.

      Weaknesses and points for clarification

      (1) Role of D2-SPNs: Since D1 and D2 pathways often show opposing roles in feeding, testing, or discussing D2-SPN contributions would provide an important control and context. Since the claim is that Stard5 is expressed in both D1- and D2MSNs, it seems to contradict the exclusive role of D1R MSNs in authorizing food intake.

      (2) Behavioral analyses:

      a) In Figure 2, group differences in consumption appear uneven; additional analyses (e.g., lick counts across blocks and session totals) would strengthen interpretation.

      b) The design and contribution of aversive assays to the main conclusions remain somewhat unclear and could be better justified.

      c) The scope of behavior is mainly limited to consumption; testing related domains (motivation, reward valuation, and extinction) could broaden the significance.

      (3) Molecular profiling:

      a) Stard5 expression is present in both D1- and D2-SPNs; comparisons to bulk calcium signals and quantification of percentages across rostral and caudal cells would be helpful. The authors should establish whether these cells also express SerpinB2, an established marker of LH projecting neurons.

      b) Verification of the Stard5-2A-Flp line (specificity, overlap with immunomarkers) should be documented more thoroughly.

      c) The molecular analysis is restricted to a small set of genes; broader spatial transcriptomics could uncover additional candidate markers. See also above.

    3. Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient, while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1-positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implications in physiological and pathological feeding behavior. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings, and proposes relevant future directions

      Weaknesses:

      At this stage, identification and characterization of the activity of Stard5-positive neurons is a bit disconnected from the rest of the paper, as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to be determined whether their manipulation would result in comparable behavioral outcomes.

    1. eLife Assessment

      This study presents a valuable in-depth comparison of statistical methods for the analysis of ecological time series data, and shows that different analyses can generate different conclusions, emphasizing the importance of carefully choosing methods and of reporting methodological details. The evidence supporting the claims, based on simulated data for a two-species ecosystem, is solid, although testing on more complex datasets could be of further benefit. This paper should be of broad interest to researchers in ecology.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript investigates methods for the analysis of time series data, in particular ecological time series. Such data can be analyzed using a myriad of approaches, with choices being made in both the statistical test performed and the generation of artificial datasets for comparison. The simulated data is for a two-species ecosystem. The main finding is that the rates of false positives and negatives strongly depend on the choices made during analysis, and that no one methodology is an optimal choice for all contexts. A few different scenarios were analyzed, including analysis with a time lag and communities with different species ratios.

      Strengths:

      The paper sets up a clear problem to motivate the study. The writing is easy to follow, given the dense subject matter. A broad range of approaches was compared for both statistical tests and surrogate data generation. The appendix will be helpful for readers, especially those readers hoping to implement these findings into their own work. The topic of the manuscript should be of interest to many readers, and the authors have put in extra effort to make the writing as clear as possible.

      Weaknesses:<br /> The main conclusions are rather unsatisfying: "use more than one method of analysis", "be more transparent in how testing is done", and there is a "need for humility when drawing scientific conclusions". In fact, the findings are not instructions for how to analyze data, but instead highlight the extreme dependence of the interpretation of results on choices made during analysis. The conclusions reached in this study would be of interest to a specialized subset of researchers focused on the biostatistics of ecological data. Ending the article with a few specific recommendations for how to apply these conclusions to a broad range of datasets would increase the impact of the work.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript tackles an important and often neglected aspect of time-series analysis in ecology - the multitude of "small" methodological choices that can alter outcomes. The findings are solid, though they may be limited in terms of generalizability, due to the simple use case tested.

      Strengths:

      (1) Comprehensive Methodological Benchmarking:

      The study systematically evaluates 30 test variants (5 correlation statistics × 6 surrogate methods), which is commendable and provides a broad view of methodological behavior.

      (2) Important Practical Recommendations:

      The manuscript provides valuable real-world guidance, such as the superiority of tailored lags over fixed lags, the risks of using shuffling-based nulls, and the importance of selecting appropriate surrogate templates for directional tests.

      (3) Novel Insights into System Dependence:

      A key contribution is the demonstration that test results can vary dramatically with system state (e.g., initial conditions or abundance asymmetries), even when interaction parameters remain constant. This highlights a real-world issue for ecological inference.

      (4) Clarification of Surrogate Template Effects:

      The study uncovers a rarely discussed but critical issue: that the choice of which variable to surrogate in directional tests (e.g., convergent cross mapping) can drastically affect false-positive rates.

      (5) Lag Selection Analysis:

      The comparison of lag selection methods is a valuable addition, offering a clear takeaway that fixed-lag strategies can severely inflate false positives and that tailored-lag approaches are preferred.

      (6) Transparency and Reproducibility Focus:

      The authors advocate for full methodological transparency, encouraging researchers to report all analytical choices and test multiple methods.

      Weaknesses / Areas for Improvement:

      (1) Limited Model Generality:

      The study relies solely on two-species systems and two types of competitive dynamics. This limits the ecological realism and generalizability of the findings. It's unclear how well the results would transfer to more complex ecosystems or interaction types (e.g., predator-prey, mutualism, or chaotic systems).

      (2) Method Description Clarity:

      Some method descriptions are too terse, and table references are mislabeled (e.g., Table 1 vs. Table 2 confusion). This reduces reproducibility and clarity for readers unfamiliar with the specific tests.

      (3) Insufficient Discussion of Broader Applicability:

      While the pairwise test setup justifies two-species models, the authors should more explicitly address whether the observed test sensitivities (e.g., effect of system state, template choice) are expected to hold in multi-species or networked settings.

      (4) Lack of Practical Summary:

      The paper offers great insights, but currently spreads recommendations throughout the text. A dedicated section or table summarizing "Best Practices" would increase accessibility and application by practitioners.

      (5) No Real-World Validation:

      The work is based entirely on simulation. Including or referencing an empirical case study would help illustrate how these methodological choices play out in actual ecological datasets.

    1. eLife Assessment

      This important work employed a recent, functional muscle network analysis for evaluating rehabilitation outcomes in post-stroke patients. While the research direction is relevant and suggests the need for further investigation, the strength of evidence supporting the claims is incomplete. Muscle interactions can serve as biomarkers, but improvements in function are not directly demonstrated, and the method's robustness is not benchmarked against existing approaches.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses an important clinical challenge by proposing muscle network analysis as a tool to evaluate rehabilitation outcomes. The research direction is relevant, and the findings suggest further research. The strength of evidence supporting the claims is, however, limited: the improvements in function are not directly demonstrated, the robustness of the method is not benchmarked against already published approaches, and key terminology is not clearly defined, which reduces the clarity and impact of the work.

      Comments:

      There are several aspects of the current work that require clarification and improvement, both from a methodological and a conceptual standpoint.

      First, the actual improvements associated with the rehabilitation protocol remain unclear. While the authors report certain quantitative metrics, the study lacks more direct evidence of functional gains. Typically, rehabilitation interventions are strengthened by complementary material (e.g., videos or case examples) that clearly demonstrate improvements in activities of daily living. Including such evidence would make the findings more compelling.

      Second, the claim that the proposed muscle network analysis is robust is not sufficiently substantiated. The method is introduced without adequate reference to, or comparison with, the extensive literature that has proposed alternative metrics. It is also not evident whether a simpler analysis (e.g., EMG amplitude) might produce similar results. To highlight the added value of the proposed method, it would be important to benchmark it against established approaches. This would help clarify its specific advantages and potential applications. Moreover, several studies have shown very good outcomes when using AI and latent manifold analyses in patients with neural lesions. Interpreting the latent space appears even easier than interpreting muscle networks, as the manifolds provide a simple encoding-decoding representation of what the patient can still perform and what they can no longer do.

      Third, the terminology used throughout the manuscript is sometimes ambiguous. A key example is the distinction made between "functional" and "redundant" synergies. The abstract states: "Notably, we identified a shift from redundancy to synergy in muscle coordination as a hallmark of effective rehabilitation-a transformation supported by a more precise quantification of treatment outcomes."

      However, in motor control research, redundancy is not typically seen as maladaptive. Rather, it is a fundamental property of the CNS, allowing the same motor task to be achieved through different patterns of muscle activity (e.g., alternative motor unit recruitment strategies). This redundancy provides flexibility and robustness, particularly under fatiguing conditions, where new synergies often emerge. Several studies have emphasized this adaptive role of redundancy. Thus, if the authors intend to use "redundancy" differently, it is essential to define the term explicitly and justify its use to avoid misinterpretation.

    3. Reviewer #2 (Public review):

      Summary:

      This study analyzes muscle interactions in post-stroke patients undergoing rehabilitation, using information-theoretic and network analysis tools applied to sEMG signals with task performance measurements. The authors identified patterns of muscle interaction that correlate well with therapeutic measures and could potentially be used to stratify patients and better evaluate the effectiveness of rehabilitation.

      However, I found that the Methods and Materials section, as it stands, lacks sufficient detail and clarity for me to fully understand and evaluate the quality of the method. Below, I outline my main points of concern, which I hope the authors will address in a revision to improve the quality of the Methods section. I would also like to note that the methods appear to be largely based on a previous paper by the authors (O'Reilly & Delis, 2024), but I was unable to resolve my questions after consulting that work.

      I understand the general procedure of the method to be: (1) defining a connectivity matrix, (2) refining that matrix using network analysis methods, and (3) applying a lower-dimensional decomposition to the refined matrix, which defines the sub-component of muscle interaction. However, there are a few steps not fully explained in the text.

      (1) The muscle network is defined as the connectivity matrix A. Is each entry in A defined by the co-information? Is this quantity estimated for each time point of the sEMG signal and task variable? Given that there are only 10 repetitions of the measurement for each task, I do not fully understand how this is sufficient for estimating a quantity involving mutual information.

      In the previous paper (O'Reilly & Delis, 2024), the authors initially defined the co-information (Equation 1.3) but then referred to mutual information (MI) in the subsequent text, which I found confusing. In addition, while the matrix A is symmetrical, it should not be orthogonal (the authors wrote AᵀA = I) unless some additional constraint was imposed?

      (2) The authors should clarify what the following statement means: "Where a muscle interaction was determined to be net redundant/synergistic, their corresponding network edge in the other muscle network was set to zero."

      (3) It should be clarified what the 'm' values are in Equation 1.1. Are these the co-information values after the sparsification and applying the Louvain algorithm to the matrix 'A'? Furthermore, since each task will yield a different co-information value, how is the information from different tasks (r) being combined here?

      (4) In general, I recommend improving the clarity of the Methods section, particularly by being more precise in defining the quantities that are being calculated. For example, the adjacency matrix should be defined clearly using co-information at the beginning, and explain how it is changed/used throughout the rest of the section.

      (5) In the previous paper (O'Reilly & Delis, 2024), the authors applied a tensor decomposition to the interaction matrix and extracted both the spatial and temporal factors. In the current work, the authors simply concatenated the temporal signals and only chose to extract the spatial mode instead. The authors should clarify this choice.

    1. eLife Assessment

      The authors collected time-course RNA-seq data from four tree species in natural environments and analyzed seasonal patterns of gene expression. This fundamental study substantially advances our understanding of how seasonal environments shape gene expression. The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale and the dataset is extensive. The evidence supporting the conclusions is compelling, with caveats and limitations clearly described. The work will be of broad interest to colleagues studying evolution and gene expression.

    2. Reviewer #2 (Public review):

      This study investigates how seasonal environments shape the evolution of gene expression by analyzing two-year time-series transcriptomes from leaves and buds of four Fagaceae tree species. The revised manuscript incorporates additional data and analyses that directly address earlier concerns about sampling design and environmental variation, thereby strengthening the robustness of the conclusions.

      The major strengths of this work are the scale and quality of the dataset, the integration of genome assemblies with time-series transcriptomics, and the careful analyses showing that winter bud expression is strongly conserved across species. The additional samples and re-analyses demonstrate convincingly that these results are not artifacts of sampling period or site differences. The study also links gene expression dynamics to phenological observations and frames its findings in relation to broader evolutionary concepts such as phenological synchrony and the developmental hourglass model.

      Remaining limitations include the absence of direct mechanistic analyses of cis-regulatory and chromatin-level processes, the relatively coarse resolution of phenological trait measurements, and the weak association between seasonal expression divergence and sequence divergence. Importantly, these limitations are now explicitly acknowledged in the revised Discussion and framed as directions for future research.

      Overall, the authors have substantially achieved their aims. This revised version represents a robust and convincing contribution that provides valuable data resources and conceptual insights into how seasonal environments constrain and shape gene expression. It will be of interest not only to evolutionary biologists and plant scientists, but also to researchers considering the broader role of environmental cycles in gene regulatory evolution.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Comment; The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      We sincerely thank the reviewer for the detailed and thoughtful comments. We fully recognize the importance of carefully distinguishing genetic and environmental contributions in transcriptomic studies, particularly when addressing evolutionary questions. The reviewer identified two major concerns regarding our study design: (1) the use of different monitoring periods across species, and (2) the use of samples collected from different study sites. We addressed both concerns with additional analyses using 112 new samples and now present new evidence that supports the robustness of our conclusions.

      (1) Monitoring period variation does not bias our conclusions<br /> To address concerns about the differing monitoring periods, we added new RNA-seq data (42 samples each for bud and leaf samples for L. glaber and 14 samples each for bud and leaf samples for _L. eduli_s) collected from November 2021 to November 2022, enabling direct comparison across species within a consistent timeframe. Hierarchical clustering of this expanded dataset (Fig. S6) yielded results consistent with our original findings: winter-collected samples cluster together regardless of species identity. This strongly supports our conclusion that the seasonal synchrony observed in winter is not an artifact of the monitoring period and demonstrates the robustness of our conclusions across datasets.

      (2) Site variation is limited and does not confound our findings<br /> Although the study included three sites, two of them (Imajuku and Ito Campus) are only 7.3 km apart, share nearly identical temperature profiles (see Fig. S2), and are located at the edge of similar evergreen broadleaf forests. Only Q. acuta was sampled from a higher-altitude, cooler site. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      (3) Justification for our approach in natural systems<br /> We agree with the reviewer that experimental approaches such as common gardens, reciprocal transplants, and the use of co-occurring species are valuable for disentangling genetic and environmental effects. In fact, we have previously implemented such designs in studies using the perennial herb Arabidopsis halleri (Komoto et al., 2022, https://doi.org/10.1111/pce.14716) and clonal Someiyoshino cherry trees (Miyawaki-Kuwakado et al., 2024, https://doi.org/10.1002/ppp3.10548) to examine environmental effects on gene expression. However, extending these approaches to long-lived tree species in diverse natural ecosystems poses significant logistical and biological challenges. In this study, we addressed this limitation by including three co-occurring species at the same site, which allowed us to evaluate interspecific differences under comparable environmental conditions. Importantly, even when we limited our analyses to these co-occurring species, the results remained consistent, indicating that the observed variation in transcriptomic profiles cannot be attributed to environmental factors alone and likely reflects underlying genetic influences.

      Accordingly, we added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the manuscript to clarify the limitations and strengths of our design, to tone down the evolutionary claims where appropriate, and to more explicitly define the scope of our conclusions in light of the data. We hope that these efforts sufficiently address the reviewer’s concerns and strengthen the manuscript.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      We thank the reviewer 1 for the helpful comment. To evaluate the variation among biological replicates, we compared the expression level of each gene across different individuals. We observed high correlation between each pair of individuals (Q. glauca (n=3): an average correlation coefficient r = 0.947; Q. acuta (n=3): r = 0.948; L. glaber (n=3): r = 0.948)). This result suggests that the seasonal gene expression pattern is highly synchronized across individuals within the same species. We mentioned this point in the Result section in the revised manuscript. We also calculated the mean mapping rates for each species. As the reviewer expected, the mapping rate was slightly lower in Q. acuta (88.6 ± 2.3%) and L. glaber (84.3 ± 5.4%), whose RNA-Seq data were mapped to reference genomes of related but different species, compared to that in Q. glauca (92.6 ± 2.2%) and L. edulis (89.3 ± 2.7%). However, we minimized the impact of these differences on downstream analysis. These details have been included in the revised main text.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      We thank the reviewer for the helpful comment. As noted, we acknowledge that hierarchical clustering (Fig. 2A) is primarily a visualization and hypothesis-generating method. To assess the biological relevance of the clusters identified, we conducted a Mann-Whitney U test or the Steel-Dwass test to evaluate whether the environmental temperatures at the time of sample collection differed significantly among the clusters. This analysis (Fig. 2B) revealed statistically significant differences in temperature in the cluster B3 (p < 0.01), indicating that the gene expression clusters are associated with seasonal thermal variation. These results support the interpretation that the clusters reflect coordinated transcriptional responses to environmental temperature. We revised the Results section to clarify this point.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      We agree that genome assemblies of Fagaceae species are becoming increasing available. However, our study does not aim to emphasize the novelty of the genome assemblies per se. Rather, with the increasing availability of chromosome-level genomes, we regard genome assembly as a necessary foundation for more advanced analyses. The main objective of our study is to investigate how each gene is expressed in response to seasonal environmental changes, and to link genome information with seasonal transcriptomic dynamics. To address the reviewer’s comment in line with this objective, we added a discussion on the syntenic structure of eight genome assemblies spanning four genera within the Fagaceae, including a species from the genus Fagus (Ikezaki et al. 2025, https://doi.org/10.1101/2025.07.31.667835). This addition helps to position our work more clearly within the context of existing genomic resources.

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      A previous study on genome size variation in Fagaceae suggested that, given the consistent ploidy level across the family, genome expansion likely occurred through relatively small segmental duplications rather than whole-genome duplications. Because Figure 1B-D supports this view, we cited the following reference in the revised version of the manuscript. Chen et al. (2014) https://doi.org/10.1007/s11295-014-0736-y

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

      We relocated the explanation regarding the expression levels of single-copy and multi-copy genes to the section titled “Seasonal gene expression dynamics.” Additionally, we clarified in the Materials and Methods section that short-read sequencing data were used for both genome size estimation and phylogenetic reconstruction.

      Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      We are grateful for the reviewer for the detailed and thoughtful review of our manuscript.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      Leaf samples clustered into the bud are newly flushed leaves collected in April for Q. glauca, May for Q. acuta, May and June for L. edulis, and August and September for L. glaber. To clarify this point, we highlighted these newly flushed leaf samples as asterisk in the revised figure (Fig. 2A).

      (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      We thank the reviewer for this insightful comment. As noted in the Discussion section, we hypothesize that such genome-wide seasonal expression patterns—and their divergence across species—are likely mediated by cis-regulatory elements and chromatin-level mechanisms. While a direct investigation of regulatory architecture was beyond the scope of the present study, we fully agree that incorporating regulatory genomic and epigenomic data would significantly deepen the mechanistic understanding of expression divergence. In this regard, we are currently working to identify putative cis-regulatory elements in non-coding regions and are collecting epigenetic data from the same tree species using ChIP-seq. We believe the current study provide a foundation for these future investigations into the regulatory basis of seasonal transcriptome variation. We made a minor revision to the Discussion to note that an important future direction is to investigate the evolution of non-coding sequences that regulate gene expression in response to seasonal environmental changes.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      We would like to note that phenological traits have been observed in the field on a monthly basis throughout the sampling period and the phenological data were plotted together with molecular phenology (e.g. Fig. 2A, C; Fig. 3C, D). Although the temporal resolution is limited, these observations captured species-specific differences in key phenological events such as leaf flushing and flowering times. We revised the manuscript to clarify this point.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      We fully agree with the reviewer that local environmental conditions, including microclimate and photoperiod differences, could potentially influence gene expression patterns. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were qualitatively similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      We believe these additional analyses help to decouple the effects of environment and genetics, and support our conclusion that both seasonal synchrony and phylogenetic constraints play key roles in shaping transcriptome dynamics. We added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the text accordingly to clarify this point and to acknowledge the potential impact of site-specific environmental variation.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      We thank the reviewer for the insightful comment. We would like to clarify that the analysis presented in Figures 5E and 5F was based on linear regression, not Pearson’s correlation. Although Δ_φ_ is a discrete variable, it takes values from 0 to 6 in 0.5 increments, resulting in 13 levels. We treated it as a quasi-continuous variable for the purposes of linear regression analysis. This approach is commonly adopted in practice when a discrete variable has sufficient resolution and ordering to approximate continuity. To enhance clarity, we revised the manuscript to explicitly state that linear regression was used, and we now reported the regression coefficient and associated p-value to support the interpretation of the observed trend.

      b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

      We agree with the reviewer’s comment. While we retained the original panels, we reframed our interpretation to emphasize that, despite statistical significance, the observed correlation is very weak—suggesting that coding region variation is unlikely to be the primary driver of seasonal gene expression patterns. Accordingly, we revised the “Relating seasonal gene expression divergence to sequence divergence” section in the Results, as well as the relevant part of the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Sentences around lines 250-251 are incomplete and need revision.

      We thank the reviewer for pointing this out. We revised the sentences in the subsection “Peak month distribution of rhythmic genes and intra-genus and inter-genera comparison” in the Results section to ensure clarity and completeness. In addition, to improve the interpretability of the peak month distribution, we added arrows to indicate the major peaks in the circular histograms shown in Fig. 3C and 3D.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1E-G, the term Copy number or Copy number variation could be misleading, as it is commonly associated with inter-individual gene copy number variation in a population. Since the analysis here refers to orthology relationships rather than population-level variation, a more precise term, such as orthogroup classification, may be preferable.

      We thank the reviewer for this helpful suggestion. We agree that the term “copy number” could be misleading in this context. Accordingly, we updated the labeling in Fig. 1 to reflect the more precise term “orthogroup classification.”

      (2) In Figure 3A, the x-axis label Period (month) may be misleading, as it could be mistaken for calendar months rather than referring to the periodicity of gene expression cycles. A more explicit label, such as Expression periodicity (months), might improve clarity for the reader.

      We thank the reviewer for this valuable suggestion. In the original version of Fig. 3A, we used the label “Period (month),” which could indeed be misinterpreted as referring to calendar months. To clarify that this axis represents the length of gene expression cycles, we revised the label to “Period length (months).” This change also aligns with the terminology used throughout the manuscript, where “Period” refers specifically to cycle length, and “Periodicity” denotes the presence or absence of rhythmic expression.

      Other minor revisions

      We also made minor revisions for the reference list and the grant number details, and included the accession numbers for all DNA and RNA sequence data deposited in the DNA Data Bank of Japan (DDBJ) in the Data deposition and code availability section, in addition to the BioProject ID.

    1. eLife Assessment

      The authors used comprehensive approaches to identify Gyc76C as an ITPa receptor in Drosophila. They revealed that ITPa acts via Gyc76C in the renal tubules and fat body to modulate osmotic and metabolic homeostasis. The designed experiments, data, and analyses convincingly support the main claims. The findings are important to help us better understand how ITP signals contributes to systemic homeostasis regulation.

    2. Reviewer #1 (Public review):

      Summary:

      In Drosophila melanogaster, ITP has functions in feeding, drinking, metabolism, excretion, and circadian rhythm. In the current study, the authors characterized and compared the expression of all three ITP isoforms (ITPa and ITPL1&2) in the CNS and peripheral tissues of Drosophila. An important finding is that they functionally characterized and identified Gyc76C as an ITPa receptor in Drosophila using both in vitro and in vivo approaches. In vitro, the authors nicely confirmed that the inhibitory function of recombinant Drosophila ITPa on MT secretion is Gyc76C-dependent (knockdown of Gyc76C specifically in two types of cells abolished the anti-diuretic action of Drosophila ITPa on renal tubules). They also confirmed that ITPa activates Gyc76C in a heterologous system. The authors used a combination of multiple approaches to investigate the roles of ITPa and Gyc76C on osmotic and metabolic homeostasis modulation in vivo. They revealed that ITPa signaling to renal tubules and fat body modulates osmotic and metabolic homeostasis via Gyc76C.

      Furthermore, they tried to identify the upstream and downstream of ITP neurons in the nervous system by using connectomics and single-cell transcriptomic analysis. I found this interesting manuscript to be well-written and described. The findings in this study are valuable to help understand how ITP signals work on systemic homeostasis regulation. Both anatomical and single-cell transcriptome analysis here should be useful to many in the field.

      Strengths:

      The question (what receptors of ITPa in Drosophila) that this study tries to address is important. The authors ruled out the Bombyx ITPa receptor orthologs as potential candidates. They identified a novel ITP receptor by using phylogenetic, anatomical analysis, and both in vitro and in vivo approaches.

      The authors exhibited detailed anatomical data of both ITP isoforms and Gyc76C (in the main and supplementary figures), which helped audiences understand the expression of the neurons studied in the manuscript.

      They also performed connectomes and single-cell transcriptomics analyses to study the synaptic and peptidergic connectivity of ITP-expressing neurons. This provided more information for better understanding and further study of systemic homeostasis modulation.

      Comments on revisions:

      In the revised manuscript, the authors addressed all my concerns.

      There is one more suggestion: The scale bar for fly and ovary images should be included in Figures 9, 10, and 12.

    3. Reviewer #2 (Public review):

      The physiology and behaviour of animals are regulated by a huge variety of neuropeptide signalling systems. In this paper, the authors focus on the neuropeptide ion transport peptide (ITP), which was first identified and named on account of its effects on the locust hindgut (Audsley et al. 1992). Using Drosophila as an experimental model, the authors have mapped the expression of three different isoforms of ITP, all of which are encoded by the same gene.

      The authors then investigated candidate receptors for isoforms of ITP. Firstly, Drosophila orthologs of G-protein coupled receptors (GPCRs) that have been reported to act as receptors for ITPa or ITPL in the insect Bombyx mori were investigated. Importantly, the authors report that ITPa does not act as a ligand for the GPCRs TkR99D and PK2-R1. Therefore, the authors investigated other putative receptors for ITPs. Informed by a previously reported finding that ITP-type peptides cause an increase in cGMP levels in cells/tissues (Dircksen, 2009, Nagai et al., 2014), the authors investigated guanylyl cyclases as candidate receptors for ITPs. In particular, the authors suggest that Gyc76C may act as an ITP receptor in Drosophila. Evidence that Gyc76C may be involved in mediating effects of ITP in Bombyx was first reported by Nagai et al. (2014) and here the authors present further evidence, based on a proposed concordance in the phylogenetic distribution ITP-type neuropeptides and Gyc76C and experimental demonstration that ITPa causes dose-dependent stimulation of cGMP production in HEK cells expressing Gyc76C. Having performed detailed mapping of the expression of Gyc76C in Drosophila, the authors then investigated if Gyc76C knockdown affects the bioactivity of ITPa in Drosophila. The inhibitory effect of ITPa on leucokinin- and diuretic hormone-31-stimulated fluid secretion from Malpighian tubules was found to be abolished when expression of Gyc76C was knocked down in stellate cells and principal cells, respectively.

      Having investigated the proposed mechanism of ITPa signalling in Drosophila, the authors then investigate its physiological roles at a systemic level. The authors present evidence that ITPa is released during desiccation and accordingly overexpression of ITPa increases survival when animals are subjected to desiccation. Furthermore, knockdown of Gyc76C in stellate or principal cells of Malphigian tubules decreases survival when animals are subject to desiccation. Furthermore, the relevance of the phenotypes observed to potential in vivo actions of ITPa is also explored and publicly available connectomic data and single-cell transcriptomic data are analysed to identify putative inputs and outputs of ITPa expressing neurons.

      Strengths of this paper.

      (1) The main strengths of this paper are:

      i) the detailed analysis of the expression and actions of ITP and the phenotypic consequences of over-expression of ITPa in Drosophila.

      ii). the detailed analysis of the expression of Gyc76C and the phenotypic consequences of knockdown of Gyc76C expression in Drosophila.

      iii). the experimental demonstration that ITPa causes dose-dependent stimulation of cGMP production in HEK cells expressing Gyc76C, providing biochemical evidence that the effects of ITPa in Drosophila are, at least in part, mediated by Gyc76C.

      (2) Furthermore, the paper is generally well written and the figures are of good quality.

      Weaknesses of this paper.

      A weakness of this paper is the phylogenetic analysis to investigate if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. However, I recognise that such a detailed analysis may extend beyond the scope of this paper, which is already rich in data.

    4. Reviewer #3 (Public review):

      Summary:

      The goal of this paper is to characterize an anti-diuretic signaling system in insects using Drosophila melanogaster as a model. Specifically, the authors wished to characterize a role for ion transport peptide (ITP) and its isoforms in regulating diverse aspects of physiology and metabolism. The authors combined genetic and comparative genomic approaches with classical physiological techniques and biochemical assays to provide a comprehensive analysis of ITP and its role in regulating fluid balance and metabolic homeostasis in Drosophila. The authors further characterized a previously unrecognized role for Gyc76C as a receptor for ITPa, an amidated isoform of ITP, and in mediating the effects of ITPa on fluid balance and metabolism. The evidence presented in favor of this model is very strong as it combines multiple approaches and employs ideal controls. Taken together, these findings represent an important contribution to the field of insect neuropeptides and neurohormones and has strong relevance for other animals. The authors have addressed all weaknesses raised in my previous review.

    5. Author Response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      The scale bar for fly and ovary images should be included in Figures 9, 10, and 12.

      We agree with this comment and apologize for the oversight. We have now modified Figures 9, 10, and 12 to include the scale bars for the ovary images. The fly images were acquired using a stereo microscope where scale bar calculation was not possible. However, all images were acquired at the same magnification for consistency.

      Reviewer #2 (Public review):

      A weakness of this paper is the phylogenetic analysis to investigate if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. However, I recognise that such a detailed analysis may extend beyond the scope of this paper, which is already rich in data.

      We thank the reviewer for their comment and the suggestion to perform a fine-grained species level comparison of ITP and Gyc76C genes across protostomes. We are unsure of the utility of this analysis for the present study given that we have now shown that ITPa can activate Gyc76C using both an ex vivo and a heterologous assay, the latter being the gold standard in GPCR and guanylate cyclase discovery (see Huang et al 2025 https://doi.org/10.1073/pnas.2420966122; Beets et al 2023 https://doi.org/10.1016/j.celrep.2023.113058); Chang et al 2009 https://doi.org/10.1073/pnas.0812593106.

      Additionally, absence of a gene in a genome/proteome is hard to prove especially when many/most of the protostomian datasets are not as high-quality as those of model systems (e.g. Drosophila melanogaster and Caenorhabditis elegans). Secondly, based on previous findings in Bombyx mori (Nagai et al. 2014 https://doi.org/10.1074/jbc.m114.590646 and Nagai et al. 2016 https://doi.org/10.1371/journal.pone.0156501) and Drosophila (Xu et al. 2023 https://doi.org/10.1038/s41586-023-06833-8 and our study) it is evident that different products of the ITP gene (ITPa and ITPL) could signal via different receptor types depending on the species. Hence, we would need to explore the presence of several genes (ITP, tachykinin, pyrokinin, tachykinin receptor, pyrokinin receptor, CG30340 orphan receptor and Gyc76C) to fully understand which components of these diverse signaling systems are present in a given species to decipher the potential for cross-talk.

      While this species-level comparison will certainly be useful in the context of ITP-Gyc76C evolution, it will not alter the conclusions of the present study – ITPa acts via Gyc76C in Drosophila. We therefore agree with the reviewer that these analyses are beyond the scope of this paper.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary:  

      In Drosophila melanogaster, ITP has functions on feeding, drinking, metabolism, excretion, and circadian rhythm. In the current study, the authors characterized and compared the expression of all three ITP isoforms (ITPa and ITPL1&2) in the CNS and peripheral tissues of Drosophila. An important finding is that they functionally characterized and identified Gyc76C as an ITPa receptor in Drosophila using both in vitro and in vivo approaches. In vitro, the authors nicely confirmed that the inhibitory function of recombinant Drosophila ITPa on MT secretion is Gyc76C-dependent (knockdown Gyc76C specifically in two types of cells abolished the anti-diuretic action of Drosophila ITPa on renal tubules). They also used a combination of multiple approaches to investigate the roles of ITPa and Gyc76C on osmotic and metabolic homeostasis modulation in vivo. They revealed that ITPa signaling to renal tubules and fat body modulates osmotic and metabolic homeostasis via Gyc76C.  

      Furthermore, they tried to identify the upstream and downstream of ITP neurons in the nervous system by using connectomics and single-cell transcriptomic analysis. I found this interesting manuscript to be well-written and described. The findings in this study are valuable to help understand how ITP signals work on systemic homeostasis regulation. Both anatomical and single-cell transcriptome analysis here should be useful to many in the field. 

      We thank this reviewer for the positive and thorough assessment of our manuscript.  

      Strengths:  

      The question (what receptors of ITPa in Drosophila) that this study tries to address is important. The authors ruled out the Bombyx ITPa receptor orthologs as potential candidates. They identified a novel ITP receptor by using phylogenetic, anatomical analysis, and both in vitro and in vivo approaches. 

      The authors exhibited detailed anatomical data of both ITP isoforms and Gyc76C (in the main and supplementary figures), which helped audiences understand the expression of the neurons studied in the manuscript.  

      They also performed connectomes and single-cell transcriptomics analysis to study the synaptic and peptidergic connectivity of ITP-expressing neurons. This provided more information for better understanding and further study on systemic homeostasis modulation.  

      Weaknesses:  

      In the discussion section, the authors raised the limitations of the current study, which I mostly agree with, such as the lack of verification of direct binding between ITPa and Gyc76C, even though they provided different data to support that ITPa-Gyc76C signaling pathway regulates systemic homeostasis in adult flies. 

      We now provide evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Reviewer #2 (Public Review):  

      Summary:  

      The physiology and behaviour of animals are regulated by a huge variety of neuropeptide signalling systems. In this paper, the authors focus on the neuropeptide ion transport peptide (ITP), which was first identified and named on account of its effects on the locust hindgut (Audsley et al. 1992). Using Drosophila as an experimental model, the authors have mapped the expression of three different isoforms of ITP (Figures 1, S1, and S2), all of which are encoded by the same gene.  

      The authors then investigated candidate receptors for isoforms of ITP. Firstly, Drosophila orthologs of G-protein coupled receptors (GPCRs) that have been reported to act as receptors for ITPa or ITPL in the insect Bombyx mori were investigated. Importantly, the authors report that ITPa does not act as a ligand for the GPCRs TkR99D and PK2-R1 (Figure S3). Therefore, the authors investigated other putative receptors for ITPs. Informed by a previously reported finding that ITP-type peptides cause an increase in cGMP levels in cells/tissues (Dircksen, 2009, Nagai et al., 2014), the authors investigated guanylyl cyclases as candidate receptors for ITPs. In particular, the authors suggest that Gyc76C may act as an ITP receptor in Drosophila.  

      Evidence that Gyc76C may be involved in mediating effects of ITP in Bombyx was first reported by Nagai et al. (2014) and here the authors present further evidence, based on a proposed concordance in the phylogenetic distribution ITP-type neuropeptides and Gyc76C (Figure 2). Having performed detailed mapping of the expression of Gyc76C in Drosophila (Figures 3, S4, S5, S6), the authors then investigated if Gyc76C knockdown affects the bioactivity of ITPa in Drosophila. The inhibitory effect of ITPa on leucokinin- and diuretic hormone-31-stimulated fluid secretion from Malpighian tubules was found to be abolished when expression of Gyc76C was knocked down in stellate cells and principal cells, respectively (Figure 4). However, as discussed below, this does not provide proof that Gyc76C directly mediates the effect of ITPa by acting as its receptor. The effect of Gyc76C knockdown on the action of ITPa could be an indirect consequence of an alteration in cGMP signalling.  

      Having investigated the proposed mechanism of ITPa in Drosophila, the authors then investigated its physiological roles at a systemic level. In Figure 5 the authors present evidence that ITPa is released during desiccation and accordingly, overexpression of ITPa increases survival when animals are subjected to desiccation. Furthermore, knockdown of Gyc76C in stellate or principal cells of Malphigian tubules decreases survival when animals are subject to desiccation. However, whilst this is correlative, it does not prove that Gyc76C mediates the effects of ITPa. The authors investigated the effects of knockdown of Gyc76C in stellate or principal cells of Malphigian tubules on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. It is not clear, however, why animals overexpressing ITPa were also not tested for its effect on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. In Figures 6 and S8, the authors show the effects of Gyc76C knockdown in the female fat body on metabolism, feeding-associated behaviours and locomotor activity, which are interesting. Furthermore, the relevance of the phenotypes observed to potential in vivo actions of ITPa is explored in Figure 7. The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A. Furthermore, as discussed above, the results of these experiments do not prove that the effects of ITPa are mediated by Gyc76C because the effects reported here could be correlative, rather than causative. 

      We thank this reviewer for an extremely thorough and fair assessment of our manuscript. 

      We have now performed salt stress tolerance and chill coma recovery assays using flies over-expressing ITPa (new Figure 10 Supplement 1).

      We agree that the use of the term “largely mirrors” to describe the effects of ITPa overexpression and Gyc76C knockdown is not appropriate and have changed this sentence. We also agree that the experiments did not provide direct evidence that the effects of ITPa are mediated by Gyc76C. To address this, we now provide evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Lastly, in Figures 8, S9, and S10 the authors analyse publicly available connectomic data and single-cell transcriptomic data to identify putative inputs and outputs of ITPa-expressing neurons. These data are a valuable addition to our knowledge ITPa expressing neurons; but they do not address the core hypothesis of this paper - namely that Gyc76C acts as an ITPa receptor.  

      The goal of our study was to comprehensively characterize an anti-diuretic system in Drosophila. Hence, in addition to identifying the receptor via which ITPa exerts its effects, we also wanted to understand how ITPa-producing neurons are regulated. Connectomic and single-cell transcriptomic analyses are highly appropriate for this purpose. We have now updated the connectomic analyses using an improved connectome dataset that was released during the revision of this manuscript. Our new analysis shows that lNSC<sup>ITP</sup> are connected to other endocrine cells that produce other homeostatic hormones (new Figure 13F). We also identify a pathway through which other ITP-producing neurons (LNd<sup>ITP</sup>) receive hygrosensory inputs to regulate water seeking behavior (new Figure 13E). Moreover, we now include results which showcase that ITPa-producing neurons (l-NSC<sup>ITP</sup>) are active (new Figure 8A and B) and release ITPa under desiccation. Together with other analyses, these data provide a comprehensive outlook on the when, what and how ITPa regulates systemic homeostasis.  

      Strengths:  

      (1) The main strengths of this paper are i) the detailed analysis of the expression and actions of ITP and the phenotypic consequences of overexpression of ITPa in Drosophila. ii). the detailed analysis of the expression of Gyc76C and the phenotypic consequences of knockdown of Gyc76C expression in Drosophila.  

      (2) Furthermore, the paper is generally well-written and the figures are of good quality. 

      We thank this reviewer for highlighting the strengths of this manuscript.

      Weaknesses:  

      (1) The main weakness of this paper is that the data obtained do not prove that Gyc76C acts as a receptor for ITPa. Therefore, the following statement in the abstract is premature: "Using a phylogenetic-driven approach and the ex vivo secretion assay, we identified and functionally characterized Gyc76C, a membrane guanylate cyclase, as an elusive Drosophila ITPa receptor." Further experimental studies are needed to determine if Gyc76C acts as a receptor for ITPa. In the section of the paper headed "Limitations of the study", the authors recognise this weakness. They state "While our phylogenetic analysis, anatomical mapping, and ex vivo and in vivo functional studies all indicate that Gyc76C functions as an ITPa receptor in Drosophila, we were unable to verify that ITPa directly binds to Gyc76C. This was largely due to the lack of a robust and sensitive reporter system to monitor mGC activation." It is not clear what the authors mean by "the lack of a robust and sensitive reporter system to monitor mGC activation". The discovery of mGCs as receptors for ANP in mammals was dependent on the use of assays that measure GC activity in cells (e.g. by measuring cGMP levels in cells). Furthermore, more recently cGMP reporters have been developed. The use of such assays is needed here to investigate directly whether Gyc76C acts as a receptor for ITPa. In summary, insufficient evidence has been obtained to conclude that Gyc76C acts as a receptor for ITPa. Therefore, I think there are two ways forward, either:  

      (a) The authors obtain additional biochemical evidence that ITPa is a ligand for Gyc76C.  

      or  

      (b) The authors substantially revise the conclusions of the paper (in the title, abstract, and throughout the paper) to state that Gyc76C MAY act as a receptor for ITPa, but that additional experiments are needed to prove this. 

      We thank the reviewer for this comment and agree with the two options they propose. We had previously tried different a cGMP reporter (Promega GloSensor cGMP assay) to monitor activation of Gyc76C by ITPa in a heterologous system. Unfortunately, we were not successful in monitoring Gyc76C activation by ITPa. We now utilized another cGMP sensor, Green cGull, to show that ITPa can indeed activate Gyc76C heterologously expressed in HEK cells (new Figure 7 and Figure 7 Supplement 1). However, we still cannot rule out the possibility that ITPa can act on additional receptors in vivo. This is based on our ex vivo Malpighian tubule assays (new Figure 6E and F). ITPa inhibits DH31- and LK-stimulated secretion and we show that this effect is abolished in Gyc76C knockdown specifically in principal and stellate cells, respectively. Interestingly, application of ITPa alone can stimulate secretion when Gyc76C is knocked down in principal cells (new Figure 6E). This could be explained by: 1) presence of another receptor for ITPa which results in diuretic actions and/or 2) low Gyc76C signaling activity (RNAi based knockdown lowers signaling but does not abolish it completely) could alter other intracellular messenger pathways that promote secretion. We have added text to indicate the possibility of other ITPa receptors. Nonetheless, our conclusions are supported by the heterologous assay results which indicate that ITPa can activate Gyc76C. Therefore, we do not alter the title. 

      (2) The authors state in the abstract that a phylogenetic-driven approach led to their identification of Gyc76C as a candidate receptor for ITPa. However, there are weaknesses in this claim. Firstly, because the hypothesis that Gyc76C may be involved in mediating effects of ITPa was first proposed ten years ago by Nagai et al. 2014, so this surely was the primary basis for investigating this protein. Nevertheless, investigating if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins is a valuable approach to addressing this issue. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. Thus, are there protostome species in which both ITP-type and Gyc76C-type genes/proteins have been lost? Furthermore, are there any protostome species in which an ITP-type gene is present but an Gyc76C-type gene is absent, or vice versa? If there are protostome species in which an ITP-type gene is present but a Gyc76C-type gene is absent or vice versa, this would argue against Gyc76C being a receptor for ITPa. In this regard, it is noteworthy that in Figure 2A there are two ITP-type precursors in C. elegans, but there are no Gyc76Ctype proteins shown in the tree in Figure 2B. Thus, what is needed is a more detailed analysis of protostomes to investigate if there really is correspondence in the phylogenetic distribution of Gyc76C-type and ITP-type genes at the species level. 

      We thank the reviewer for this comment. While the previous study by Nagai et al had implicated Gyc76C in the ITP signaling pathway, how they narrowed down Gyc76C as a candidate was not reported. Therefore, our unbiased phylogenetic approach was necessary to ensure that we identified all suitable candidate receptors. Indeed, our phylogenetic analysis also identified Gyc32E as another candidate ITP receptor. However, we did not pursue this receptor further as our expression data (new Figure 4 Supplement 2) indicated that Gyc32E is not expressed in osmoregulatory tissues and therefore likely does not mediate the osmotic effects of ITPa. 

      We also appreciate the suggestion to perform a more detailed phylogenetic analysis for the peptide and receptor. We did not include C. elegans receptors in the phylogenetic analysis because they tend to be highly evolved and routinely cause long-branch attraction (see: Guerra and Zandawala 2024: https://doi.org/10.1093/gbe/evad108). We (specifically the senior author) have previously excluded C. elegans receptors in the phylogenetic analysis of GnRH and Corazonin receptors for similar reasons (see: Tian and Zandawala et al. 2016: 10.1038/srep28788). 

      Unfortunately, absence of a gene in a genome is hard to prove especially when they are not as high-quality as the genomes of model systems (e.g. Drosophila and mice). Moreover, given the concern of this reviewer that our physiological and behavioral data on ITPa and Gyc76C only provide correlative evidence, we decided against performing additional phylogenetic analysis which also provides correlative evidence. Our only goal with this analysis was to identify a candidate ITPa receptor. Since we have now functionally characterized this receptor using a heterologous system, we feel that the current phylogenetic analysis was able to successfully serve its purpose.  

      (3) The manuscript would benefit from a more comprehensive overview and discussion of published literature on Gyc76C in Drosophila, both as a basis for this study and for interpretation of the findings of this study.  

      We thank the reviewer for this comment. We have now included a broader discussion of Gyc76C based on published literature.  

      Reviewer #3 (Public Review):  

      Summary:  

      The goal of this paper is to characterize an anti-diuretic signaling system in insects using Drosophila melanogaster as a model. Specifically, the authors wished to characterize a role of ion transport peptide (ITP) and its isoforms in regulating diverse aspects of physiology and metabolism. The authors combined genetic and comparative genomic approaches with classical physiological techniques and biochemical assays to provide a comprehensive analysis of ITP and its role in regulating fluid balance and metabolic homeostasis in Drosophila. The authors further characterized a previously unrecognized role for Gyc76C as a receptor for ITPa, an amidated isoform of ITP, and in mediating the effects of ITPa on fluid balance and metabolism. The evidence presented in favor of this model is very strong as it combines multiple approaches and employs ideal controls. Taken together, these findings represent an important contribution to the field of insect neuropeptides and neurohormones and have strong relevance for other animals. 

      We thank this reviewer for the positive and thorough assessment of our manuscript.

      Strengths:  

      Many approaches are used to support their model. Experiments were wellcontrolled, used appropriate statistical analyses, and were interpreted properly and without exaggeration.  

      Weaknesses:  

      No major weaknesses were identified by this reviewer. More evidence to support their model would be gained by using a loss-of-function approach with ITPa, and by providing more direct evidence that Gyc76C is the receptor that mediates the effects of ITPa on fat metabolism. However, these weaknesses do not detract from the overall quality of the evidence presented in this manuscript, which is very strong.  

      We agree with this reviewer regarding the need to provide additional evidence using a loss-of-function approach with ITPa. We now characterize the phenotypes following knockdown of ITP in ITP-producing cells (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C. Unfortunately, we are not able to provide evidence that ITPa acts on Gyc76C in the fat body using the assay suggested by this reviewer (explained in detail below). Instead, we now provide direct evidence of Gyc76C activation by ITPa in a heterologous system (new Figure 7 and Figure 7 Supplement 1).

      Reviewer #1 (Recommendations For The Authors):  

      Here, I have several extra concerns about the work as below:  

      (1) The authors confirmed the function of ITPa in regulating both osmotic and metabolic homeostasis by specifically overexpressing ITPa driven by ITP-RCGal4 in adult flies (Figures. 5 and 7). Have authors ever tried to knock down ITP in ITP-RC-Gal4 neurons? What was the phenotype? Especially regarding the impact on metabolic homeostasis, does knocking down ITP in ITP neurons mimic the phenotypes of Gyc76C fat body knockdown flies? 

      We thank the reviewer for this suggestion. We now characterize the phenotypes following knockdown of ITP using ITP-RC-Gal4 (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C.

      The authors mentioned that the existing ITP RNAi lines target all three isoforms. It would be interesting if the authors could overexpress ITPa in ITPRC-Gal4>ITP-RNAi flies and confirm whether any phenotypes induced by ITP knockdown could be rescued. It will further confirm the role of ITPa in homeostasis regulation.  

      We thank the reviewer for this suggestion. Unfortunately, this experiment is not straightforward because knockdown with ITP RNAi does not completely abolish ITP expression (see Figure 9A). Hence, the rescue experiment needs to be ideally performed in an ITP mutant background. However, ITP mutation leads to developmental lethality (unpublished observation) so we cannot generate all the flies necessary for this experiment. Therefore, we cannot perform the rescue experiments at this time. In future studies, we hope to perform knockdown of specific ITP isoforms using the transgenes generated here (Xu et al 2023: 10.1038/s41586-023-06833-8).   

      (2) In Figures 5A and B, the authors nicely show the increased release of ITPa under desiccation by quantifying the ITPa immunolabelling intensity in different neuronal populations. It may be induced by the increased neuronal activity of ITPa neurons under the desiccated condition. Have the authors confirmed whether the activity of ITPa-expressing neurons is impacted by desiccation?  

      The TRIC system may be able to detect the different activity of those neurons before and after desiccation. This may further explain the reduced ITPa peptide levels during desiccation.  

      We thank the reviewer for this suggestion. We have now monitored the activity of ITPa-expressing neurons using the CaLexA system (Masuyama et al 2012: 10.3109/01677063.2011.642910). Our results indicate that ITPa neurons are indeed active under desiccation (new Figure 8A and B). These results are also in agreement with ITPa immunolabelling showing increased peptide release during desiccation (new Figure 8C and D). Together, these results show that ITPa neurons are activated and release ITPa under desiccation.  

      (3) What about the intensity of ITPa immunolabelling in other ITPa-positive neurons (e.g., VNC) under desiccation? If there is no change in other ITPa neurons, it will be a good control. 

      We thank the reviewer for this suggestion. Unfortunately, ITPa immunostaining in VNC neurons is extremely weak preventing accurate quantification of ITPa levels under different conditions. We did hypothesize that ITPa immunolabelling in clock neurons (5<sup>th</sup>-LN<sub>v</sub> and LN<Sub>d</sub><sup>ITP</sup>) would not change depending on the osmotic state of the animal. However, our results (Figure 8C and D) indicate that ITPa from these neurons is also released under desiccation. Interestingly, LNd<sup>ITP</sup>, which also coexpress Neuropeptide F (NPF) have recently been implicated in water seeking during thirst (Ramirez et al, 2025: 10.1101/2025.07.03.662850). Our new connectomic-driven analysis shows that these neurons can receive thermo/hygrosensory inputs (new Figure 13E). Hence, it is conceivable that other ITPa-expressing neurons also release ITPa during thirst/desiccation.

      (4) The adult stage, specifically overexpression of ITPa in ITP neurons, does show significant phenotypes compared to controls in both osmotic and metabolic homeostasis-related assays. It would be helpful if authors could show how much ITPa mRNA levels are increased in the fly heads with ITPa overexpression (under desiccation & starvation or not). 

      We thank the reviewer for this suggestion. We have now included immunohistochemical evidence showing increase in ITPa peptide levels in flies with ITPa overexpression (new Figure 10A). We feel that this is a better indicator of ITPa signaling level instead of ITPa mRNA levels.   

      (5) Another question concerns the bloated abdomens of ITPa-overexpressing flies. Are the bloated abdomens of ITPa OE female flies (Figure 5E) due to increased ovary size (Figure 7G)? Have the authors also detected similar bloated abdomens in male flies with ITPa overexpression? Since both male and female flies show more release of ITPa during the desiccation.  

      We thank the reviewer for this comment. The bloated abdomen phenotype seen in females can be attributed to increased water content since we see a similar phenotype in males (see Author response image 1 below).

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Page 1 - change "Homeostasis is obtained by" to "Homeostasis is achieved by".  

      Changed

      (2) Page 1 - change "Physiological responses" to "Physiological processes". 

      Changed

      (3) Page 2 - Change "Recently, ITPL2 was also shown to mediate anti-diuretic effects via the tachykinin receptor" to "Recently, ITPL2 was also shown to exert anti-diuretic effects via the tachykinin receptor". 

      Changed

      (4) Page 9 - "(C) Adult-specific overexpression of ITPa using ITP- RC-GAL4TS (ITP-RC-T2A-GAL4 combined with temperature-sensitive tubulinGAL80) increases desiccation" Unless I am misunderstanding Fig 5C, I think what is shown is that overexpression of ITPa prolongs survival during a period of desiccation. I am not sure what the authors mean by "increases desiccation". In the text (page 9) the authors state "ITPa overexpression improves desiccation tolerance, which is a much clearer statement than what is in the figure legend. 

      We thank the reviewer for identifying this oversight. We have now changed the caption to “increases desiccation tolerance”.  

      (5) Page 11 - The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A.  

      Perhaps there is a misunderstanding of what is meant by "mirroring" - it means the same, not the opposite. 

      We thank the reviewer for this comment. We agree that the use of the term “largely mirrors” to describe the effects of ITPa overexpression and Gyc76C knockdown is not appropriate and have changed this sentence as follows: “Taken together, the phenotypes seen following Gyc76C knockdown in the fat body largely mirror those seen following ITP knockdown in ITP-RC neurons, providing further support that ITPa mediates its effects via Gyc76C.”

      (6) Page 12 - There appear to be words missing between "neurons during desiccation, as well as their downstream" and "the recently completed FlyWire adult brain connectome" 

      We thank the reviewer for highlighting this mistake. We have changed the sentence as following: “Having characterized the functions of ITP signaling to the renal tubules and the fat body, we wanted to identify the factors and mechanisms regulating the activity of ITP neurons during desiccation, as well as their downstream neuronal pathways. To address this, we took advantage of the recently completed FlyWire adult brain connectome (Dorkenwald et al., 2024, Schlegel et al., 2024) to identify pre- and post-synaptic partners of ITP neurons.”

      (7) Page 15 - "can release up to a staggering 8 neuropeptides" - I suggest that the word "staggering" is removed. The notion that individual neurons release many neuropeptides is now widely recognised (both in vertebrates and invertebrates) based on analysis of single-cell transcriptomic data. 

      Removed staggering.

      (8) Page 16 - "(Farwa and Jean-Paul, 2024)" - this citation needs to be added to the reference list and I think it needs to be changed to "Sajadi and Paluzzi, 2024". 

      We thank the reviewer for highlighting this oversight. The correct citation has now been added.

      (9) It is noteworthy that, based on a PubMed search, there are at least thirteen published papers that report on Gyc76C in Drosophila (PMIDs: 34988396, 32063902, 27642749, 26440503, 24284209, 23862019, 23213443,  21893139, 21350862, 16341244, 15485853, 15282266, 7706258). However, none of these papers are discussed/cited by the authors. This is surprising because the authors' hypothesis that Gyc76C acts as a receptor for ITPa surely needs to be evaluated and discussed with reference to all the published insights into the developmental/physiological roles of this protein. 

      We thank the reviewer for this comment. Some of the references mentioned above (21350862, 16341244, 15485853) mainly report on soluble guanylyl cyclases and not membrane guanylyl cyclase like Gyc76C. Based on other studies on Gyc76C and its role in immunity and development, we have now expanded the discussion on additional roles of ITPa.

      Reviewer #3 (Recommendations For The Authors):  

      I have only a few comments that will help the authors strengthen a couple of aspects of their model.  

      (1) The case for Gyc76C as a receptor for ITPa in regulating fluid homeostasis is clear, given the experiments the authors carried out where they applied ITPa to tubules and showed that the effects of ITPa on tubule secretion were blocked if Gyc76C was absent in tubules. This approach, or something similar, should be used to provide conclusive proof that ITPa's metabolic effects on the fat body go through Gyc76C.  

      At present (unless I missed it) the authors only show that gain of ITPa has the opposite phenotype to fat body-specific loss of Gyc76C. While this would be the expected result if ITPa/Gyc76C is a ligand-receptor pair, it is not quite sufficient to conclusively demonstrate that Gyc76C is definitely the fat body receptor. Ex vivo experiments such as soaking the adult fat body carcasses with and without Gyc76C in ITPa and monitoring fat content via Nile Red could be one way to address this lack of direct evidence. The authors could also make text changes to explicitly mention this lack of conclusive evidence and suggest it as a future direction.

      We thank the reviewer for this comment. We have now conclusively demonstrated that Gyc76C is activated by ITPa in a heterologous assay (new Figure 7 and Figure 7 Supplement 1). With this evidence, we can confidently claim that ITPa can mediate its actions via Gyc76C in various tissues including the Malpighian tubules and fat body. Nonetheless, we liked the suggestion by this reviewer to perform the ex vivo assay and test the effect of ITPa on the fat body. Unfortunately, it is challenging to do this because increased ITPa signaling (chronically using ITPa overexpression) results in increased lipid accumulation in the fat body in vivo. Therefore, we would likely not see the effect of ITPa addition in an ex vivo fat body preparation since lipogenesis will not occur in the absence of glucose. However, ITPa could counteract the effects of other lipolytic factors such as adipokinetic hormone (AKH). To test this hypothesis, we monitored fat content in the fat body incubated with and without AKH (see Author response image 2 below showing representative images from this experiment). Since we did not observe any differences in fat levels between these two conditions, we were unable to test the effects of ITPa on AKH-activity using this assay.

      Author response image 2.

      (2) I did not see any loss of function data for ITPa - is this possible? If so this would strengthen the case for a 1:1 relationship between loss of ligand and loss of receptor. Alternatively, the authors could suggest this as an important future direction. 

      We agree with this reviewer regarding the need to provide additional evidence using a loss-of-function approach with ITPa. We have now characterized the phenotypes following knockdown of ITP in ITP-producing cells (new Figure 9). Our results are in agreement with phenotypes observed following Gyc76C knockdown, lending further support that ITPa mediates its effects via Gyc76C.

      (3) For clarity, please include the sex of all animals in the figure legend. Even though the methods say 'females used unless otherwise indicated' it is still better for the reader to know within the figure legend what sex is displayed. 

      We thank the reviewer for this suggestion and have now included sex of the animals in the figure legends.  

      (4) Please state whether females are mated or not, as this is relevant for taste preferences and food intake. 

      We apologize for this oversight. We used mated females for all experiments. This has now been included in the methods.  

      (5) More discussion on the previous study on metabolic effects of ITP in this study compared with past studies would help readers appreciate any similarities and/or differences between this study and past work (Galikova 2018, 2022) 

      We thank the reviewer for this suggestion. Unfortunately, it is difficult to directly compare our phenotypes with the metabolic effects of ITP reported in Galikova and Klepsatel 2022 because the previous study used a ubiquitous driver (Da-GAL4) to manipulate ITP levels. Ectopically overexpressing ITPa in non-ITP producing cells can result in non-physiological phenotypes. This is evident in their metabolic measurements where both global overexpression and knockdown of ITP results in reduced glycogen and fat levels, and starvation tolerance. Moreover, ITP-RC-GAL4 used in our study to overexpress and knockdown ITPa is more specific than the Da-GAL4 used previously. Da-GAL4 would include other ITP cells (e.g. ITP-RD producing cells). Since ITP is broadly expressed across the animal, it is difficult to parse out the phenotypes of ITPa and other isoforms using manipulations performed with Da-GAL4. We have mentioned this limitation in the results for ITP knockdown as follows: “A previous study employing ubiquitous ITP knockdown and overexpression suggests that Drosophila ITP also regulates feeding and metabolic homeostasis (Galikova and Klepsatel, 2022) in addition to osmotic homeostais (Galikova et al., 2018). However, given the nature of the genetic manipulations (ectopic ITPa overexpression and knockdown of ITP in all tissues) utilized in those studies, it is difficult to parse the effects of ITP signaling from ITPa-producing neurons.”

    1. eLife Assessment

      This study provides convincing evidence that homologous recombination can occur in telophase-arrested cells, independently of cohesin subunits Smc 1-3. These findings are valuable as they point to investigate the role of cohesins re-association with chromatin in the allelic inter-sister repair by homologous recombination.

    2. Reviewer #1 (Public review):

      Summary

      The cohesin complex is essential for maintaining sister chromatid cohesion from S phase until anaphase. Beyond this canonical role, it is also recruited to double-strand breaks (DSBs), supporting both local and global post-replicative cohesion, a phenomenon first reported in 2004. In a previous study, Ayra-Plasencia et al. demonstrated that in telophase, DSBs can be repaired by homologous recombination (HR) through re-coalescence of sister chromatids (Ayra-Plasencia & Machín, 2019). In the present work, the authors provide further insights into DSB repair in late mitosis, showing that:

      Scc1 is reloaded and reconstituted on chromatin together with Smc1.

      HR occurs with high efficiency.

      HR-driven MAT switching can occur in an Smc3-independent manner.

      Strengths

      The authors take full advantage of the yeast model system, employing the HO endonuclease to generate a single, site-specific DSB at the MAT locus on chromosome III. Combined with careful cell synchronization, this setup allows them to monitor HR-mediated repair events specifically in G2/M and late mitosis. Their demonstration that full-length Scc1 can be recovered upon DSB induction is compelling. Most importantly, the finding that efficient HR can take place during M phase is significant, as HR has long been thought to be largely inhibited at this stage of the cell cycle.

      Weaknesses

      While the authors provide evidence for Scc1 recovery and efficient HR in late mitosis, some critical points need to be clarified to improve the impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Cohesin drive inter-sister repair of DNA breaks by homologous recombination (HR) in G2/M. Cohesion is lost at the metaphase to anaphase transition upon digestion of the Scc1 subunit of cohesin by Esp1, raising the question as to whether and how break repair by HR could occur in late mitosis (late-M).

      Here the author investigate the behavior of cohesin in cells arrested in telophase and experiencing a DNA break at the mating-type locus on chr. III (a specialized recombination process required for mating-type switching) or upon random DNA break formation with the drug phleomycin.

      The revised version of the manuscript now convincingly establishes three facts:

      - The cohesin subunit Scc1 can re-associate with chromatin and the other Smc1-3 subunits upon formation of an unrepairable DSB at MAT in telophase.<br /> - HR can occur in telophase-arrested cells<br /> - Cohesin (an a fortiori cohesin that reassociated with chromatin) plays no role in non-allelic HR in telophase in the specific context of MAT switching.

      Unfortunately, the role of cohesin re-association with chromatin for the allelic inter-sister repair by HR is not addressed. In the absence of such evidence, the main claims of the paper making up the title (cohesin re-association and HR repair) appear disconnected. Even if the very last sentence of the abstract corrects the false sense from the title and the rest of the abstract that cohesin reconstitution has somehow something to do with efficient HR in late mitosis, I think a general rewriting of the abstract and a different title would better lift any ambiguity about the conclusions of the paper.

    4. Author response:

      The following is the authors’ response to the original reviews

      We would like to thank the reviewers for taking the time to thoroughly revise our work. We have considered their suggestions carefully and tried our best to respond to them point by point. Based on their recommendations, two major issues came forward: (1) the strength of our claims about the involvement of cohesin in HR-driven repair in late mitosis; and (2) the underlying mechanism that reconstitutes cohesin in late mitosis after DNA damage. In this revision, we focused on the former and left the latter out (yet it is discussed). We considered that the question of how cohesin returns in late mitosis after DNA damage is important and worthy of further research, but it is beyond the scope of this study (as it is the putative role of condensin). Thus, we have focused on buttressing our main claims, as otherwise pointed out by the reviewers. What have we done to strengthen the role of cohesin in late mitotic DSB repair?

      (1) We have biologically replicated and quantified the reappearance of Scc1 after DSB generation (new Figure 1e). We have also quantified changes for the other core subunits (new Figure 1c-e).

      (2) We now show that the newly synthetized Scc1 serves to assemble back the cohesin complex (new Figure 2a and S1).

      (3) We have performed chromatin fractionation and show that cohesin binding to chromatin increases after the HO-induced DSB (new Figure 2b and S2).

      (4) We have performed ChIP assays and show that, despite the increase in the chromatin-bound fraction, the HOcs DSB does not recruit new cohesin to the locus (new Figure 2c and S3).

      (5) A key assertion in the preprint version was that depleting cohesin using the auxin degron system impairs HR-driven MAT switching. This claim was based on a direct comparison of cultures treated or not with auxin (-/+ IAA). However, during the revision process, we realized that auxin treatment itself could interfere with MAT switching. Firstly, we noticed a diminished HOcs cutting efficiency by HO in +IAA cultures (Figure S6). Secondly, the apparently dramatic delay in gene conversion to MAT_α could actually be related to other undesirable effects of IAA downstream in the repair process. Thus, we decided to repeat this experiment with strains that differ in their response to auxin, so that we could compare all strains in the presence of auxin. We compared four isogenic strains: _SMC3; SMC3-aid*; SMC3 + OsTIR1; and SMC3-aid* + OsTIR1. As a result, we can now show that cohesin depletion does not affect MAT switching (see new Figure 4b-d).

      (6) We recently reported a negative chemical interaction between auxin and phleomycin. Auxin appears to diminish the ability of phleomycin to generate DSBs (Comm Biol 2025, doi: 10.1038/s42003-025-08416-x; see Figures S14 and S15 in that paper). While the underlying nature of this interaction is unknown to us (we are working on it), this leads us to omit the coalescence assay included in the preprint version (old Figure 4c), as the diminished coalescence upon IAA addition is actually due to this effect rather than cohesin depletion. This is also in agreement with the new data we include in the revised version, in which we observed only minor changes in cohesin reconstitution and chromatin binding after phleomycin (Figure 2a,b; S1 and S2).  

      (7) In addition to addressing these reviewers’ requests, we have better characterized the MAT switching in late mitosis by incorporating the kinetics of _rad9_Δ (deficient in the DNA damage checkpoint), _yku70_Δ (deficient in non-homologous end joining) and _mre11_Δ (deficient in DSB end tethering). The effect of _rad52_Δ (deficient in HR) has been described elsewhere (our iScience 2024, 10.1016/j.isci.2024.110250).

      As a result of these new experiments, new figure panels have been added in the main figures and as supplementary figures. To make room for the these panels in the main figures and keep the short report format, the following changes have been made: (i) old figures and new panels have been combined into four main figures, (ii) some panels from the old figures have been moved to supplementary figures, and (iii) some panels have been reordered for the sake of simplicity and fluidity in the main text. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The cohesin complex maintains sister chromatid cohesion from S phase to anaphase. Beyond that, DSBs trigger cohesin recruitment and post-replication cohesion at both damage sites and globally, which was originally reported in 2004. In their recent study, Ayra-Plasencia et al reported in telophase, DSBs are repaired via HR with re-coalesced sister chromatids (Ayra-Plasencia & Machín, 2019). In this study, they show that HR occurs in a Smc3-dependent way in late mitosis.

      Strengths:

      The authors take great advantage of the yeast system, they check the DSB processing and repair of a single DSB generated by HO endonuclease, which cuts the MAT locus in chromosome III. In combination with cell synchronization, they detect the HR repair during G2/M or late mitosis. and the cohesin subunit SMC3 is critical for this repair. Beyond that, full-length Scc1 protein can be recovered upon DSBs.

      Weaknesses:

      These new results basically support their proposal although with a very limited molecular mechanistic progression, especially compared with their recent work.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "Cohesin still drives homologous recombination repair of DNA double-strand breaks in late mitosis" by Ayra-Plasencia et al. investigates regulations of HR repair in conditional cdc15 mutants, which arrests the cell cycle in late anaphase/telophase. Using a non-competitive MAT switching system of S. cerevisiae, they show that a DSB in telophase-arrested cells elicits a delayed DNA damage checkpoint response and resection. Using a degron allele of SMC3 they show that MATa-to-alpha switching requires cohesin in this context. The presence of a DSB in telophase-arrested cells leads to an increase in the kleisin subunit Scc1 and a partial rejoining of sister chromatids after they have separated in a subset of cells.

      Strengths:

      The experiments presented are well-controlled. The induction systems are clean and well thought-out.

      Weaknesses:

      The manuscript is very preliminary, and I have reservations about its physiological relevance. I also have reservations regarding the usage of MAT to make the point that inter-sister repair can occur in late mitosis.

      Regarding these two weaknesses:

      - Physiological relevance: This is something we already addressed in our previous research work (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8), and which was further discussed in a follow-up theoretical paper (Bioessays. 2020 ;42(7):e2000021. doi: 10.1002/bies.202000021). In summary, this is physiologically relevant because a DSB in anaphase activates a late-mitotic checkpoint so the DSB can be repaired before cytokinesis. The fact that anaphase is quick and only a minor fraction of cells get a DSB in this cell cycle stage in an asynchronous population does not preclude its importance since it is enough a single mis-repaired DSB in hundreds of cells to mutate a population in an health- or evolution-relevant way.

      - MAT system in late mitosis: It was not our intention to use the MAT switching assay to state that inter-sister repair can occur in late-M. The purpose was to address whether HR was fully functional in this non-G2/M non-G1 stage. Having said that, it is very challenging to design a strategy based on sequence-specific DSB to tackle the inter-sister repair in late-M. Any endonuclease-generated DSB is going to cut in both sisters. This is something we also deeply discussed in our previous works (Nat Commun & Bioessays).    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) Smc3 degradation affects Rad53 activation upon DSBs, and this may directly lead to HR repair deficiency. Smc3 also could be phosphorylated by ATM and functions in DNA damage checkpoint activation, these alternative possibilities should also be tested before addressing the bona fide role of Smc3 in this context.

      Our previous data already suggested that Rad53 hyperphosphorylation still occurs after Smc3 degradation (Figure S6). Regardless, the question of whether the DNA damage checkpoint (DDC) may play a distinct role in the MAT switching has been addressed in this revision by comparing RAD9 versus rad9_Δ. Rad9 is a mediator in the DDC required for the activation of Rad53. We have seen that MAT switching in _rad9_Δ is as efficient as in _RAD9 (new Figure S5d-f).

      On the other hand, our new results, in which we have compared four different strains with all auxin system combinations in the presence of auxin, show that cohesin depletion does not affect MAT switching. Previously, we compared minus versus plus auxin and noticed diminished HO cutting efficiency. Thus, we repeated this experiment with four isogenic strains (SMC3; SMC3-aid*; SMC3 + OsTIR1; and SMC3-aid* + OsTIR1) that differ in their response to auxin and ability to degrade cohesin, so that we could compare all strains in the presence of auxin. As a result, we can now affirm that cohesin depletion does not affect MAT switching (see new Figure 4b-d). Therefore, HR appears efficient after cohesin depletion.

      (2) The requirement of cohesin subunit Smc3 and "coincidently" recovery of Scc1 are not sufficient to claim they act as a cohesin complex in this scenario. CoIP in the chromatin fraction after DSBs to prove the cohesin complex formation is recommended. If they act as a complex, are cohesin loader Scc2/4 required?

      We have constructed a SMC3-HA SCC1-myc strain. We have purified the chromatin-bound fraction as well as performing the co-IP. We have found Smc1-acSmc3-Scc1 forms a complex after Scc1 returns, and that at least a fraction of this complex binds to the chromatin in our HO model of DSBs in late anaphase (the cdc15-2 arrest). This is now shown in the new Figures 2a,b and S1,S2.

      As for the requirement of Scc2/4, we consider that the mechanisms underlying how Scc1 comes back, how a new cohesin complex is reassembled, and how it can partly bind to the chromatin in late anaphase are beyond the scope of this study and worth pursuing in a follow-up story.

      (3) Figure 3b. acetylated SMC3 was prominently detected in the absence of DSBs. During the cohesion cycle, the cohesin was released from chromatin in a separase-dependent manner at the anaphase onset. Released Smc3 was deacetylated by Hos1 subsequently. In principle, the acSMC3 level could be very low in late mitosis.

      In that figure (now renumbered as Fig S6), we did detect acetylated Smc3 for the remnant Smc3 still found in late mitosis, however, a direct comparison between the acetylated versus non-acetylated pools was not performed, and would require more sophisticated approaches. Note that blots are distinctly exposed until the band is detected, and that signal intensity is antibody-specific. The presence of an acSmc3 pool in the cdc15-2 arrest is now further confirmed by the new blots in Figures 2a, S1 and S2b.

      On the other hand, previous time course experiments from G1 and G2/M releases point out that Smc3 deacetylation is incomplete in anaphase, with up to 30% of acetylated Smc3 remaining (Beckouët et al, 2010 doi:10.1016/j.molcel.2010.08.008). This is consistent with the presence of acSmc3 in the cdc15-2 arrest.   

      (4) Did the author examine the acSMC3 levels returning after DSB, as Scc1's levels? If so, how about the Eco1's protein level? Chromatin fractionation could be conducted to check the chromatin-bound SMC3, acSMC3/Eco1, SCC1, SCC1 phosphorylation, and SMC1. These results will tell us whether cohesin functions in DSB repair in late M in a cohesion state.

      As stated above, we have now determined that cohesin depletion does not affect HR-driven MAT switching. As for the other questions, yes, we have performed both an assessment of acSmc3 in the pull down and chromatin fractionation, before and after DSBs (new Figures 2a, S1 and S2b). Interestingly, we have noticed a difference between the HO-generated and the phle-generated DSBs. It appears that the former leads to a better reconstituted Smc1-acSmc3-Scc1 complex and more chromatin-bound cohesin. The overall acSmc3 levels do not appear to significantly change in the whole cell extracts, although there could be further posttranslational modifications in telophase (see the changes in intensity between the two acSmc3 bands in Figure S1).

      The role of Eco1 has not been directly addressed but is discussed. The main point here is that Eco1 levels may be low after G2/M (e.g., Lyons and Morgan, 2011), but there is still a significant acSmc3 pool in anaphase as Hof1 does not deacetylate all Smc3 (Beckouët et al., 2010). 

      (5) Figure 4a, the return of full-length Scc1 is based on a single experiment. What's the mechanism? Inhibition of cleavage or re-expression? How about its mRNA levels?

      We have repeated the full-length Scc1 experiment two more times. Now, an expression graph is included as a new Figure 1e. The two other subunits, Smc1 and Smc3, have been assessed as well, with no major changes in abundance (new Figure 1c and d).

      We feel that the exact molecular mechanism of how Scc1 returns is beyond the scope of this study, but we discuss that the DDC may either inactivate separase or protect Scc1 against it. Indeed, there is literature that supports both mechanisms (e.g., Heidinger-Pauli et al., 2008 doi:10.1016/j.molcel.2008.06.005; Yam et al., 2020 doi:10.1093/nar/gkaa355).   

      Minor points:

      (6) FACS data should be shown for all cell synchronization experiments.

      From our previous own works, FACS profiles add little to late-M experiments. To properly confirm late-M, microscopy is a must. FACS cannot differentiate between G2/M (metaphase-like), anaphase, telophase and the ensuing G1 (as cdc15-2 cells do not immediately split apart after re-entering G1). In all experiments, Tel samples (late-M cdc15-2 arrest) were characterized by >95% large budded binucleated cells.

      (7) Figure 1d, A loading control of Rad53-P in is missing. The "Arrest" samples should be loaded again on the right to confirm the shift of Rad53, but not due to "smiling gels".

      It is true that the blot on the right has a right-handed smile; however, it is very clear the presence of the Rad53/Rad53-P partner. Because there is not a full shift from Rad53 to Rad53-P, the concern of misidentifying Rad53-P as a result of a blot smile is unfounded.

      (8) Figure 1c, After the HO cut, the resected DNA at the 726 bp site reaches to platform at about 4 hrs, while it still increases at the 5.6 kb site. Thus, it is difficult to conclude that "The time to reach half of the maximum possible resection (t1/2) was ~1 h at 0.7 Kb and ~2.5 h at 5.7 Kb from the DSB, respectively".

      We assumed that both loci reach the plateau at 0.8 (which is consistent with other studies), so the t1/2 was calculated when the resected intersected 0.4.

      (9) Figure 2b and 2c are wrongly labeled.

      We have fixed this (now Fig. 3d and e).

      (10) Figure 2d, Double check and make sure the quantitative data reflects the representative result. E.g. in Figure 2b (in fact should be 2c). For instance, in Figure 2b, the MATα signals seem to remain stable from 60' to 180', but they keep increasing in Figure 2d. In Yamaguchi & James E. Haber's paper, the signals and changes of MATa and MATα over time are way stronger compared to this study.

      We have double checked this. It is true that the sum of MATα, MATalpha and cut HOcs bands throughout the assay does not have the intensity seen for MATa before the HO induction (Tel), but MATalpha and HOcs signals cannot be established based on the equimolarity of the reaction as all band signals are probe-specific (the best indication of this can be seen in the signal comparison between MAT_α and _MAT distal at Tel). Alternatively, some resected HOcs may remain unrepaired.

      As for the referred example (now Figure 3e), note that they are double normalized to ACT1 and MAT_α (Tel), and the _ACT1 band gets fainter after 60’. This explains the increase in the MATalpha quantification in spite of what is apparently seen in the blot.

      (11) Typos and fonts: e.g. lines 111-112; line 76 "his link".

      We have fixed this. Thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) Physiological relevance. The authors show that HR can happen in the anaphase to telophase interval, yet does it outside of an hours-long artificial arrest upon inactivation of Cdc15? It is this reviewer's understanding that the duration of the anaphase to telophase transition is short, in the order of minutes. In fact, break signaling and resection are delayed by ~1 hour (Fig. 1), which suggests that cells avoid dealing with the damage and engaging in HR in the anaphase-telophase interval. Is there any described physiological context or checkpoint that blocks this transition for extended periods, that would make any of the findings in this paper relevant?

      This concern about the physiological relevance was addressed in our previous study (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8). In that paper’s Figure 1, we showed that G1 re-entry after a cdc15-2 release was delayed by several hours when DSBs had been previously generated at the cdc15-2 arrest. We also showed that such a delay depended on Rad9 (i.e., the DNA damage checkpoint). In addition, synchronized (not arrested) cells transiting through anaphase responded to DSB generation by slowing anaphase transition while partly regressing chromosome segregation (Figure S7 in that paper).

      (2) Methodological caveats. It is unclear why the authors chose to study DSB-repair in the context of MATa-to-alpha switching (which uses an ectopic donor on the other chromosome arm) as a model for inter-sister repair. It creates a disconnect in the claims of the paper, which means to study inter-sister repair. Studying the kinetics of DSB repair by cytology following low-dose irradiation or radiomimetic drugs would have been a better option. Phleomycin is used in Fig. 4, but the repair kinetics (e.g. Rad52 foci) is not studied.

      The MAT switching assay was used here to address how much HR was functional in late-M compared to G2/M (metaphase-like). Then, it was employed to check how cohesin depletion hampers HR in late-M. Even though this is something we already deeply discussed previously (Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8; Bioessays. 2020 ;42(7):e2000021. doi: 10.1002/bies.202000021), it is worth recapitulating the methodological challenges that the study of inter-sister repair has in late-M: (i) endonuclease-based DSBs are going to generate two DSBs, one per sister chromatid; (ii) the use of a homologous chromosome without the cutting site as a template is pointless because a sister of the homolog is always going to co-segregate with the broken chromatid, and the same caveat applies for any other ectopic sequence. In this context, the MATa with the HML ectopic intrachromosomal sequence is as valid as any other option, with the advantage that it is a very well-known system.

      On the other hand, most of the reviewer’s concerns about the inter-sister repair by cytology and the role of Rad52 was addressed in our previous paper (Nat Commun). Note that our new results about the cohesin role on MAT switching show that this HR-mediated DSB repair does not depend on cohesin (new Figure 4b-d).

      (3) Preliminary work. The requirement of cohesin for MAT switching in cdc15 mutants would have warranted several additional experiments. Indeed, Cohesin has been shown to regulate homology search in multiple ways upon DNA damage checkpoint-induced metaphase-arrest (see Piazza et al. Nat Cell Biol 2021 (10.1038/s41556-021-00783-x), not cited in the current manuscript). Consequently, is the effect of cohesin observed in the MAT system specific to telophase or is it true in other cell-cycle phases? What is the mechanism behind this requirement (one may expect it not to depend on the sister since the HML donor is available within the damaged chromatid)? Does cohesin re-accumulate around the DSB site or genome-wide? How does the Esp1 activity decay from anaphase onset? Is cohesin required for the horseshoe folding of chr. III involved in MATa-to-alpha switching? Furthermore, condensin is involved in MATa-specific switching (Li et al. PLoS Genet 2019, 10.1371/journal.pgen.1008339), and condensin remains active on chromatin in cdc15 arrested cells, as shown on chr. XII (Lazar-Stefanita et al. EMBO J. 2017 10.15252/embj.201797342), which calls for determining the impact contribution of condensin in the recoil of the right ch.XII arm (Fig 4c) and on MAT switching.

      There are several points here:

      - Is the effect of cohesin observed in the MAT system specific to telophase or is it true in other cell-cycle phases?

      Our new results show that cohesin depletion does not affect MAT switching when four different strains with all auxin system combinations are compared in the presence of auxin. Previously, when we compared minus versus plus auxin, we noticed diminished HO cutting efficiency. Therefore, we repeated the experiment using four isogenic strains (SMC3, SMC3-aid*, SMC3 + OsTIR1, and SMC3-aid* + OsTIR1), which differ in their response to auxin and ability to degrade cohesin. This allowed us to compare all strains in the presence of auxin. As a result, we can now confirm that cohesin depletion does not affect MAT switching (see the new Figures 4b–d). Therefore, HR appears efficient after cohesin depletion. In agreement, the new ChIPs we have performed do not detect an increment in local cohesin after the HO DSB in telophase (but it does in cells arrested in G2/M).

      - What is the mechanism behind this requirement (one may expect it not to depend on the sister since the HML donor is available within the damaged chromatid)?

      As just said, we have changed our previous conclusion on cohesin and MAT switching. It was an effect of auxin addition rather than cohesin depletion.

      - Does cohesin re-accumulate around the DSB site or genome-wide?

      We have performed ChIP around the HOcs. We have found that it does accumulate in G2/M after HO induction, but it does not in telophase (new Figures 2c and S3). As for the global binding of cohesin, our chromatin fractionation data suggest there is ~2-fold increase in Smc1-Smc3, which also binds to the newly formed Scc1, rendering an overall increase in the chromatin-bound canonical complex (new Figures 2b and S2). Altogether, this suggests a genome-wide binding but with little role in the repair of HO DSBs.

      - How does the Esp1 activity decay from anaphase onset?

      We have not checked this here but it is an interesting question for a follow-up story.

      - Is cohesin required for the horseshoe folding of chr. III involved in MATa-to-alpha switching?

      Probably not in view of our new data in Figures 2c and 4b-d. The Piazza papers are cited and discussed.

      - Contribution of condensin in the recoil of the right ch.XII arm (Fig 4c) and on MAT switching.

      The role of condensin, which overtakes some cohesin function in late-M as the reviewer reminds, is worth studying indeed. However, we feel this deserves a separate and focus-on study. We does discuss, though, that condensin loading onto the arms in anaphase may prevent Smc1-Smc3 from loading after DSBs.

      Other points:

      (4) Is the retrograde behavior in Fig. 4c dependent on recombination?

      No, this is something we addressed in our previous paper (see Figure 4 in Nat Commun. 2019; 10(1):2862. doi: 10.1038/s41467-019-10742-8).

      (5) Fig 3c: add a scheme of the system.

      A scheme was already shown in the old Figure 2a (note that the old Fig 3c is now Fig S6).

      (6) Fig 3b: annotate as in Fig 2b.

      We have fixed this (now the referred figures are S6a and 3d, respectively).

      (7) Authors used IAA concentrations 4- to 8-fold higher than commonly used. Given the solubility of IAA in DMSO (the most commonly used solvent), it is likely that authors treated their cells with >2% DMSO. This is expected to have broad transcriptional and physiological effects on yeast. A comparison of +IAA samples with a mock (DMSO) treatment would be more appropriate than a lack of treatment.

      The IAA stock solution was 500 mM in DMSO, so the final DMSO concentration for an 8 mM IAA solution was 1.6% (v/v). Although the stock concentration was high and some precipitation was observed during preparation, we always heated, sonicated, and vigorously vortexed the stock tube before adding IAA to the cultures. Thus, we kept the uncertainty in the final IAA concentration to a minimum.

    1. eLife Assessment

      This study provides important insights into bacterial genome evolution by analyzing single-cell genome sequences of cyanobacteria from Yellowstone hot springs. Using compelling evidence, the authors demonstrate that both homologous recombination within species and frequent hybridization across species are major drivers of genome diversification. Despite the challenges that are inherent to sparse and fragmented single-cell data, the analyses are thorough, carefully controlled, and supported by multiple complementary approaches, making the conclusions highly robust. This work represents a significant advance in our understanding of microbial evolution in natural environments.

    2. Reviewer #1 (Public review):

      Summary:

      What are the overarching principles by which prokaryotic genomes evolve? This fundamental question motivates the investigations in this excellent piece of work. While it is still very common in this field to simply assume that prokaryotic genome evolution can be described by a standard model from mathematical population genetics, and fit the genomic data to such a model, a smaller group of researchers rightly insists that we should not have such preconceived ideas and instead try to carefully look at what the genomic data tell us about how prokaryotic genomes evolve. This is the approach taken by the authors of this work. Lacking a tight theoretical framework, the challenge of such approaches is to device analysis methods that are robust to all our uncertainties about what the underlying evolutionary dynamics might be.

      The authors here focus on a collection of ~300 single-cell genomes from a relatively well-isolated habitat with a relatively simple species composition, i.e. cyanobacteria living in hot springs in Yellowstone National Park. They convincingly demonstrate that the relative simplicity of this habitat increases our ability to interpret what the genomic data tells us about the evolutionary dynamics.

      Using a very thorough and multi-faceted analysis of these data, the authors convincingly show that there are three main species of Synechococcus cyanobacteria living in this habitat, and that apart from very frequent recombination within each species (which is in line with insights from other recent studies) there is also a remarkably frequent occurrence of hybridization events between the different species, and with as of yet unindentified other genomes. Moreover, these hybridization events drive much of the diversity within each species. The authors also show convincing evidence that many of these hybridization events are not neutral but are driven by natural selection.

      Strengths:

      The great strength of this paper is that, by not making any preconceived assumptions about what the evolutionary dynamics is expected to look like, but instead devicing careful analysis methods to tease apart what the data tells us about what has happened in the evolution in these genomes, highly novel and unexpected results are obtained, i.e. the major role of hybridization across the 3 main species living in this habitat.

      The analysis is very thorough and reading the detailed descriptions in the appendices it is clear that these authors took a lot of care in using these methods and avoiding the pitfalls that unfortunately affect many other studies in this research area.

      The picture of the evolutionary dynamics of these three Synechococcus species that emerges from this analysis is quite novel and surprising. I think this study is a major stepping stone toward development of more realistic quantitative theories of genome evolution in prokaryotes.

      The analysis methods that the authors employ are also partially quite novel and will no doubt by very valuable for analysis of many other datasets.

      Weaknesses:

      The main text is tight and concise, but this sort of hides the very large amount of careful complementary analyses that went into the conclusions presented in the main text. The appendices are quite well written but they are substantial, so that really understanding the paper is not an easy read. However, I do not really think the authors can be faulted for this. The topic is complex and a lot of care is required to make sure conclusions are valid.

      A very interesting observation is that a lot of hybridization events (i.e. about half) originate from species other than the alpha, beta, and gamma Synechococcus species from which the genomes that are analyzed here derive. For this to occur, these other species must presumably also be living in the same habitat and must be relatively abundant. But if they are, why are they not being captured by the sampling? I did not see a clear explanation for this very common occurrence of hybridization events from outside of these Synechococcus species. The authors raise the possibility that these other species used to live in these hot springs but are now extinct or that the occur in other pools. I guess this is possible but I still find it puzzling and wonder if these donors could have been filtered out at some step of the experimental and/or analysis procedures.

    3. Reviewer #2 (Public review):

      Summary.

      Birzu et al. describe two sympatric hotspring cyanobacterial species ("alpha" and "beta") and infer recombination across the genome, including inter-species recombination events (hybridization) based on single-cell genome sequencing. The evidence for hybridization is strong and the authors took care to control for artefacts such as contamination during sequencing library preparation. Despite hybridization, the species remain genetically distinct from each other. The authors also present evidence for selective sweeps of genes across both species - a phenomenon which is widely observed for antibiotic resistance genes in pathogens, but rarely documented in environmental bacteria.

      Strengths.

      This manuscript describes some of the most thorough and convincing evidence to date of recombination happening within and between co-habitating bacteria in nature. Their single-cell sequencing approach allows them to sample the genetic diversity from two dominant species. Although single-cell genome sequences are incomplete, they contain much more information about genetic linkage than typical short-read shotgun metagenomes, enabling a reliable analysis of recombination. The authors also go to great lengths to quality-filter the single-cell sequencing data and to exclude contamination and read mismapping as major drivers of the signal of recombination. This is a fascinating dataset with intricate analyses showing the great extent of between-species hybridization that is possible in nature.

      Weaknesses.

      This revised version is much improved, with a much clearer flow and organisation within both the main text and supplement. The remaining weaknesses that I note below are certainly not critical, but are simply useful context for the reader to keep in mind.

      My main concern is that the evidence for selection on the hybridized genes is incomplete and statements about the 'overwhelming evidence for the crucial role played by selection' (lines 334-5) are a bit overstated. What fraction of the hybridization events were driven by positive selection? The breakdown of hard (15%) vs soft (85%) sweeps is given, out of 153 (as sidenote, it is not clear if this is 153 genes or events, troughs, etc.). But how many of the hybridization events (or genes) have evidence for a selective sweep relative to those that do not? I recognize that this may be a hard question to answer, because it may be statistically easier to identify a hybridization event that rises to high frequency due to positive selection from a neutral event that remains rare. Even a rough estimate would be useful; would it be something like 153 out of the number of core genes tested (~700)?

      Regardless, I think that Figure 6 (A and B) could benefit from comparison to a neutral model, including hybridization but no selection to see if a similar pattern (notably, higher synonymous diversity in alpha troughs compared to the backbone) could arise due to hybridization alone without selection.

      An implicit assumption in microbiology is often that cross-species recombination events are driven by selection. The authors recognize that "diversity troughs resulted from selective sweeps [...] likely overcame mechanistic barriers to recombination, genetic incompatibilities, and ecological differences" (lines 335-7) and thus would not be retained unless they had some strong adaptive value to offset these costs. There are surprisingly few tests of the hypothesis that cross-species recombination events tend to be driven by selection. An analysis of Streptococcus spp. genomes showed that between-species recombination events tended to be accompanied by positive selection, whereas most within-species events were not (Shapiro et al. Trends in Microbiology 2009; reanalysis of data from Lefebure & Stanhope, Genome Biology 2007). There are probably other examples out there, but the authors could highlight that they provide rare data to support a common expectation.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      What are the overarching principles by which prokaryotic genomes evolve? This fundamental question motivates the investigations in this excellent piece of work. While it is still very common in this field to simply assume that prokaryotic genome evolution can be described by a standard model from mathematical population genetics, and fit the genomic data to such a model, a smaller group of researchers rightly insists that we should not have such preconceived ideas and instead try to carefully look at what the genomic data tell us about how prokaryotic genomes evolve. This is the approach taken by the authors of this work. Lacking a tight theoretical framework, the challenge of such approaches is to devise analysis methods that are robust to all our uncertainties about what the underlying evolutionary dynamics might be.

      The authors here focus on a collection of ~300 single-cell genomes from a relatively well-isolated habitat with relatively simple species composition, i.e. cyanobacteria living in hotsprings in Yellowstone National Park, and convincingly demonstrate that the relative simplicity of this habitat increases our ability to interpret what the genomic data tells us about the evolutionary dynamics.

      Using a very thorough and multi-faceted analysis of these data, the authors convincingly show that there are three main species of Synechococcus cyanobacteria living in this habitat, and that apart from very frequent recombination within each species (which is in line with insights from other recent studies) there is also a remarkably frequent occurrence of hybridization events between the different species, and with as of yet unidentified other genomes. Moreover, these hybridization events drive much of the diversity within each species. The authors also show convincing evidence that these hybridization events are not neutral but are driven by selected by natural selection.

      Strengths:

      The great strength of this paper is that, by not making any preconceived assumptions about what the evolutionary dynamics is expected to look like, but instead devising careful analysis methods to tease apart what the data tells us about what has happened in the evolution in these genomes, highly novel and unexpected results are obtained, i.e. the major role of hybridization across the 3 main species living in this habitat.

      The analysis is very thorough and reading the detailed supplementary material it is clear that these authors took a lot of care in devising these methods and avoiding the pitfalls that unfortunately affect many other studies in this research area.

      The picture of the evolutionary dynamics of these three Synechococcus species that emerge from this analysis is highly novel and surprising. I think this study is a major stepping stone toward the development of more realistic quantitative theories of genome evolution in prokaryotes.

      The analysis methods that the authors employ are also partially novel and will no doubt be very valuable for analysis of many other datasets.

      We thank the reviewer for their appreciation of our work.

      Weaknesses:

      I feel the main weakness of this paper is that the presentation is structured such that it is extremely difficult to read. I feel readers have essentially no chance to understand the main text without first fully reading the 50-page supplement with methods and 31 supplementary materials. I think this will unfortunately strongly narrow the audience for this paper and below in the recommendations for the authors I make some suggestions as to how this might be improved.<br /> A very interesting observation is that a lot of hybridization events (i.e. about half) originate from species other than the alpha, beta, and gamma Synechococcus species from which the genomes that are analyzed here derive. For this to occur, these other species must presumably also be living in the same habitat and must be relatively abundant. But if they are, why are they not being captured by the sampling? I did not see a clear explanation for this very common occurrence of hybridization events from outside of these Synechococcus species. The authors raise the possibility that these other species used to live in these hot springs but are now extinct. I'm not sure how plausible this is and wonder if there would be some way to find support for this in the data (e.g that one does not observe recent events of import from one of these unknown other species). This was one major finding that I believe went without a clear interpretation.

      We agree with the reviewer that the extent of hybridization with other species is surprising. While we do feel that our metagenome data provide convincing evidence that “X” species are not present in MS or OS, we cannot currently rule out the presence of X in other springs. In the revision we explicitly mention the alternative hypothesis (Lines 239-242).

      The core entities in the paper are groups of orthologous genes that show clear evidence of hybridization. It is thus very frustating that exactly the methods for identifying and classifying these hybridization events were really difficult to understand (sections I and V of the supplement). Even after several readings, I was unsure of exactly how orthogroups were classified, i.e. what the difference between M and X clusters is, what a `simple hybrid' corresponds to (as opposed to complex hybrids?), what precisely the definitions of singlet and non-singlet hybrids are, etcetera. It also seems that some numbers reported in the main text do not match what is shown in the supplement. For example, the main text talks about "around 80 genes with more than three clusters (SM, Sec. V; fig. S17).", but there is no group with around 80 genes shown in Fig S17! And similarly, it says "We found several dozen (100 in α and 84 in β) simple hybrid loci" and I also cannot match those numbers to what is shown in the supplement. I am convinced that what the authors did probably made sense. But as a reader, it is frustrating that when one tries to understand the results in detail, it is very difficult to understand what exactly is going on. I mention this example in detail because the hybrid classification is the core of this paper, but I had similar problems in other sections.

      We thank the reviewer for pointing out these issues with our original presentation. In the revision, we have redone most of the analysis to simplify the methods and check the consistency of the results. We did not find any qualitative differences in our results after reanalysis, but some of the numbers for different hybridization patterns have changed. The most notable difference is an increase in the number of alpha-gamma simple hybrids and a corresponding decrease in mixed-species clusters (now labeled mosaic hybrids). These transfers are difficult to assign because we only have access to a single gamma genome. We have added a short explanation of this point in Lines 219-222.

      To improve the presentation, we significantly expanded the “Results” section to better explain our analysis and the different steps we take. We included two additional figures (Figs. 3 and 4) that illustrate the different types of hybrids and the heterogeneity in the diversity of alpha which is discussed in the main text and is important for interpreting our results. We also included two additional figures (Figs. 2 and 6) that were previously in the Appendix but were mentioned in the main text. We believe these changes should address most of the issues raised by the reviewer and hopefully make the manuscript easier to read.

      Although I generally was quite convinced by the methods and it was clear that the authors were doing a very thorough job, there were some instances where I did not understand the analysis. For example, the way orthogroups were built is very much along the lines used by many in the field (i.e. orthoMCL on the graph of pairwise matchings, building phylogenies of connected components of the graph, splitting the phylogenies along long branches). But then to subdivide orthogroups into clusters of different species, the authors did not use the phylogenetic tree already built but instead used an ad hoc pairwise hierarchical average linkage clustering algorithm.

      The reviewer is correct that there is an unexplained discrepancy between the clustering methods we used at different steps in our pipeline. We followed previous work by using phylogenetic distances for the initial clustering of orthogroups. On these scales we expect hybridization to play a minor role and phylogenetic distances to correlate reasonably well with evolutionary divergence. However, because of the extensive hybridization we observed, the use of phylogenetic models for species clustering is more difficult to justify. We therefore chose to simply use pairwise nucleotide distances, which make fewer assumptions about the underlying evolutionary processes and should be more robust. We have briefly explained our reasoning and the details of our clustering method in the revision (Lines 182-190).

      Reviewer #2 (Public Review):

      Summary:

      Birzu et al. describe two sympatric hotspring cyanobacterial species ("alpha" and "beta") and infer recombination across the genome, including inter-species recombination events (hybridization) based on single-cell genome sequencing. The evidence for hybridization is strong and the authors took care to control for artefacts such as contamination during sequencing library preparation. Despite hybridization, the species remain genetically distinct from each other. The authors also present evidence for selective sweeps of genes across both species - a phenomenon which is widely observed for antibiotic resistance genes in pathogens, but rarely documented in environmental bacteria.

      Strengths:

      This manuscript describes some of the most thorough and convincing evidence to date of recombination happening within and between cohabitating bacteria in nature. Their single-cell sequencing approach allows them to sample the genetic diversity from two dominant species. Although single-cell genome sequences are incomplete, they contain much more information about genetic linkage than typical short-read shotgun metagenomes, enabling a reliable analysis of recombination. The authors also go to great lengths to quality-filter the single-cell sequencing data and to exclude contamination and read mismapping as major drivers of the signal of recombination.

      We thank the reviewer for their appreciation of our work.

      Weaknesses:

      Despite the very thorough and extensive analyses, many of the methods are bespoke and rely on reasonable but often arbitrary cutoffs (e.g. for defining gene sequence clusters etc.). Much of this is warranted, given the unique challenges of working with single-cell genome sequences, which are often quite fragmented and incomplete (30-70% of the genome covered). I think the challenges of working with this single-cell data should be addressed up-front in the main text, which would help justify the choices made for the analysis.

      We have significantly expanded the “Results” section to better justify and explain the choices we made during our analysis. We hope these changes address the reviewer’s concerns and make the manuscript more accessible to readers.

      The conclusions could also be strengthened by an analysis restricted to only a subset of the highest quality (>70% complete) genomes. Even if this results in a much smaller sample size, it could enable more standard phylogenetic methods to be applied, which could give meaningful support to the conclusions even if applied to just ~10 genomes or so from each species. By building phylogenetic trees, recombination events could be supported using bootstraps, which would add confidence to the gene sequence clustering-based analyses which rely on arbitrary cutoffs without explicit measures of support.

      It seems to us that the reviewer’s suggestion presupposes that the recombination events we find can be described as discrete events on an asexual phylogeny, similar to how rare mutations are treated in standard phylogenetic inference. Popular tools, such as ClonalFrame and its offshoots, have attempted to identify individual recombination events starting from these assumptions. But the main conclusion of both our linkage and SNP block analysis is that the ClonalFrame assumptions do not hold for our data. Under a clonal frame, the SNP blocks we observe should be perfectly linked, similar to mutations on an asexual tree. But our results in Fig. 7D show the opposite. Part of the issue may have been that in our original presentation, we only briefly discuss the results of our linkage analysis and refer readers to the Appendix for more details. To fix this issue we have added an extra figure (Fig. 2), showing rapid linkage decrease in both species and that at long distances the linkage values are essentially identical to the unlinked case, similar to sexual populations. We hope that this change will help clarify this point.

      The manuscript closes without a cartoon (Figure 4) which outlines the broad evolutionary scenario supported by the data and analysis. I agree with the overall picture, but I do think that some of the temporal ordering of events, especially the timing of recombination events could be better supported by data. In particular, is there evidence that inter-species recombination events are increasing or decreasing over time? Are they currently at steady-state? This would help clarify whether a newly arrived species into the caldera experiences an initial burst of accepting DNA from already-present species (perhaps involving locally adaptive alleles), or whether recombination events are relatively constant over time.

      The reviewer raises some very interesting questions about the dynamics of recombination in the population, which we hope to pursue in future work. We have added this as an open question in the Discussion (Lines 365-382).

      These questions could be answered by counting recombination events that occur deeper or more recently in a phylogenetic tree.

      The reviewer here seems to presuppose that recombination is rare enough that a phylogenetic tree can reliably be inferred, which is contrary to our linkage analysis (see the response to an earlier comment). Perhaps the reviewer missed this point in our original manuscript since it was discussed primarily in the Appendix. See also our response to a previous comment by the reviewer.

      The cartoon also shows a 'purple' species that is initially present, then donates some DNA to the 'blue' species before going extinct. In this model, 'purple' DNA should also be donated to the more recently arrived 'orange' species, in proportion to its frequency in the 'blue' genome. This is a relatively subtle detail, but it could be tested in the real data, and this may actually help discern the order of the inferred recombination events.

      We have included an extra figure in the main text (Fig. 6) that addresses the question of timing of events. A quantitative test of our cartoon model along the lines the reviewer suggested would certainly be worthwhile and we hope to do that in future work.  

      The abstract also makes a bold claim that is not well-supported by the data: "This widespread mixing is contrary to the prevailing view that ecological barriers can maintain cohesive bacterial species..." In fact, the two species are cohesive in the sense that they are identifiable based on clustering of genome-wide genetic diversity (as shown in Fig 1A). I agree that the mixing is 'widespread' in the sense that it occurs across the genome (as shown in Figure 2A) but it is clearly not sufficient to erode species boundaries. So I believe the data is consistent with a Biological Species Concept (sensu Bobay & Ochman, Genome Biology & Evolution 2017) that remains 'fuzzy' - such that there are still inter-species recombination events, just not sufficient to erode the cohesion of genomic clusters. Therefore, I think the data supports the emerging picture of most bacteria abiding by some version of a BSC, and is not particularly 'contrary' to the prevailing view.

      We have revised the phrase mentioned by the reviewer to “prevent genetic mixture between bacterial species,” which more accurately represents our conclusions. 

      The final Results paragraph begins by posing a question about epistatic interactions, but fails to provide a definitive answer to the extent of epistasis in these genomes. Quantifying epistatic effects in bacterial genomes is certainly of interest, but might be beyond the scope of this paper. This could be a Discussion point rather than an underdeveloped section of the Results.

      We agree with the reviewer that an exhaustive analysis of epistasis in the population is beyond the scope of the manuscript. Our original intention was to answer whether SNP blocks we discovered showed evidence of strong linkage, as might be expected if only a small number of strains are present in the population. In light of the previous comments by the reviewer regarding the consistency with the clonal frame hypothesis, we believe this is especially relevant for our results. Moreover, the results we found‑especially for the beta population‑were quite conclusive: SNP block linkages in beta are indistinguishable from an unlinked model. To avoid misdirecting the reader about the significance of our results, we have revised the relevant paragraph (Lines 316-319).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Although I am entirely convinced of the validity of the results, methodology, and interpretations presented in this work, I must say I found the paper very hard to read. And I think I am really quite familiar with these kinds of approaches. I fear that for people other than experts on these kinds of comparative genomic analyses, this paper will be almost impossible to read. With the aim of expanding the audience for this compelling work, I think the authors might want to consider ways to improve the presentation.

      At the end of a long project, the obtained results typically form a web of mutual interconnections and dependencies and one of the key challenges in presenting the results in a paper is having to untangle this web of connected results and analysis into a linear ordered narrative so that, at any point in the narrative, understanding the next point only depends on previous points in the narrative. I frankly feel that this paper fails at this.

      The paper reads to me as if one author put together the supplement by essentially writing a report of all the analyses that were done together with supplementary figures summarizing all those analyses, and that another author then wrote the main text by using the materials in the supplement almost in the way a cook uses ingredients for a dish. Almost every other sentence in the main text refers to results in the (31!) supplementary figures and can only be understood by reading the appropriate corresponding sections in the supplementary materials. I found it essentially impossible to read the main text without having first read the entire 50-page supplement.

      I think the paper could be hugely improved by trying to restructure the presentation so as to make it more linear. The main text can be expanded to include a summary of the crucial methods and analysis results from the supplement needed to understand the narrative in the main text. For example, as it currently stands it is really challenging to understand what is shown in figures 2 and 3 of the main text without having to first read a very substantial part of the supplement. Figure 3, even after having read the relevant sections in the supplement, took me quite a while to understand and almost felt like a puzzle to decypher. Rethinking which parts of the supplement are really necessary would also help. Finally, it would also help if the terminology was kept as simple, transparent, and consistent as possible.

      I understand that my suggestion to thoroughly reorganize the presentation may feel like a big hassle, but I am afraid that in its current form, these important results are essentially rendered inaccessible to all but a small group of experts in this area. This paper deserves a wider readership.

      We thank the reviewer for these valuable suggestions. In the revision, we have significantly expanded and restructured the “Results” section to make the presentation more linear, as the reviewer suggested (see our reply to the public comment by the reviewer for details). We hope these changes will make the manuscript easier to read.

      Reviewer #2 (Recommendations For The Authors):

      I found this paper challenging to follow since the main text was so condensed and the supplementary material so extensive. Given that eLife does not impose strong limits on the length of the main text, I suggest moving some key sections from the supplement into the main text to make it easier for the reader to follow rather than flipping back and forth. Adding to the confusion, supplementary figures were referenced out of order in the main text (e.g. S23 is referenced before S1). Please check the numbering and ensure figures are mentioned in the main text in the correct order.

      We thank the reviewer for their feedback on the presentation of the results. In response to similar comments from Reviewer #1, we have significantly expanded and restructured the “Results” section to make it easier to read (see also our responses to Reviewer #1).

      Page 2: The term 'coevolution' is typically reserved for two species that mutually impose selective pressures on one another (e.g. predator-prey interactions; see Janzen, Evolution 1980). In the context of these two cyanobacterial species, it's not clear that this is the case so I would simply refer to them 'cohabitating' or being sympatric in the same environment.

      It is true that the term "coevolution” has become associated with predator-prey interactions, as the reviewer said. However, we feel that in our case “coevolution” fairly accurately describes the continual hybridization over long time scales we observe. We have therefore chosen to keep the term.

      Page 3: The authors mention that the gamma SAG is ~70% complete, which turns out to be quite high. It would be useful to mention early in the Results the mean/median completeness across SAGs, and how this leads to some challenges in analysing the data. Some of the material from the Supplement could be moved into the Results here.

      We have added a short note on the completeness in the Results (Lines 153-154). We have also added an extra figure in Appendix 1 with the completeness of all the SAGs for interested readers.

      I was left puzzled by the sentence: "Alternatively, high rates of recombination could generate different genotypes within each genome cluster that are adapted to different temperatures, with the relative frequencies of each cluster being only a correlated and not a causal driver of temperature adaptation." This is suggesting that individual genes or alleles, rather than entire genomes, could be adapted to temperature. But figure 1B seems to imply that the entire genome is adapted to different temperatures. Anyway, this does not seem to be a key point and could probably be removed (or clarified if the authors deem this an important point, which I failed to understand).

      We have revised this section to clarify the alternative hypothesis mentioned by the reviewer (Lines 100-103).

      Page 4. 'Several dozen' hybrid genes were found, but please also specify how many genes were tested. In general, it would be good to briefly outline the sample size (SAGs or genes) considered for each analysis.

      We have added the total numbers of genes we analyzed at each step of our analysis.

      'Mosaic hybrid loci' are mentioned alongside the issue of poor alignment. Presumably, the mosaic hybrid loci are first filtered to remove the poor alignments? This should be specified, and please mention how many loci are retained before/after this filter.

      We thank the reviewer for highlighting this important point. In the revision, we have implemented a more aggressive filtering of genes with poor alignments. We have added an extra paragraph to Appendix 1 (step 5 in the pipeline analysis) briefly explaining the issue.

      Page 5. "By contrast, the diversity of mosaic loci was typical of other loci within beta, suggesting most of the beta genome has undergone hybridization." Please point to the data (figure) to support this statement.

      We have restructured our discussion of the different hybrid loci so this comment is no longer relevant. In case the reviewer is interested, the synonymous diversity within beta was 0.047, while in mosaic hybrids it was 0.064.

      Page 6. "The largest diversity trough contained 28 genes." Since this trough is discussed in detail and seems to be of interest, it would be nice to illustrate it, perhaps as an inset in Figure 2 or as a separate figure. If I understood correctly, this trough includes genes (in a nitrogen-fixation pathway) that are present in all genomes, but are exchanged by homologous recombination. So I don't think it's correct to say that the "ancestors acquired the ability to fix nitrogen." Rather, the different alleles of these same genes were present in the ancestor. So perhaps there was a selective sweep involving alleles in this region that provided adaptation to local nitrogen sources or concentrations, but not a gain of new genes. Perhaps I misunderstood, in which case clarification would be appreciated.

      The reviewer raises an interesting possibility. We agree that it is in principle possible that the ancestor contained the nitrogen fixation genes and the selective sweep simply replaced the ancestral alleles. In this particular case, there is additional evidence that the entire pathway was acquired around roughly the same time from gene order. The gene order between alpha and beta is almost entirely different, with only a few segments containing more than 2-3 genes in the same order, as shown by Bhaya et al. 2007 and confirmed by additional unpublished analysis of the SAGs. One of the few exceptions is the nitrogen fixation pathway, which has essentially the same gene order over more than 20 kbp. Thus, if the ancestor of both alpha and beta contained the nitrogen-fixation pathway, we would expect these genes to be scatter across the genome. We have revised the sentences in question to clarify this point (Lines 260-271).

      Page 6. Last paragraph on epistasis references Fig 3C, but I believe it should be Fig 3D.

      Fixed.

      Page 7. Figure 3 legend. "Note that alpha-2 is identical to gamma here." I believe it should be beta, not gamma.

      The reviewer is correct. We have fixed this error.

      Page 8. What is the evidence for "at least six independent colonizers"? I could not find the data supporting this claim.

      The statement mentioned by the reviewer was based on the maximum number of species clusters we identified in different core genes. However, during the revision, we found that only a handful of genes contained five or more clusters. We did find several tens of genes with four clusters. In addition, Rosen et al. (2018) also found additional 16S clusters at low frequency in the same springs. Based on these results we conservatively estimate that at least four independent strains colonized the caldera, but the number could be much greater. We have revised the text in question accordingly (Lines 336-339) and added Fig. 2 in Appendix 1 to support the conclusion.

      Page 9. Line 200: "acting to homogenize the population." It should be specified that the population is only homogenized at these introgressed loci, not genome-wide. Otherwise, the genome-wide species clusters seen in Fig 1 would not be maintained.

      It is true that the selective sweeps that lead to diversity throughs only homogenize the introgressed loci. But other hybrid segments could also rise to high frequency in the population during the sweep through hitchhiking. The fact that we observe SNP blocks generated through secondary recombination events of introgressed segments throughout the genome supports this view. While we do not fully understand the dynamics of this process currently, we do feel that the current evidence supports the statement that mixing is occurring throughout the genome and not just at a few loci so we have kept the original statement.

      The final sentence (lines 221-222) is vague and uninformative. On the one hand, "investigating whether hybridization plays a major role" is what the current manuscript has already done - depending on what is meant by 'major' (how much of the genome? Or whether there are ecological implications?). It is also not clear what is meant by a predictive theory and 'possible evolutionary scenarios. This should be elaborated upon, otherwise, it is not clear what the authors mean. Otherwise, this sentence could be cut.

      We thank the reviewer for their feedback. One possible source of confusion could be that in this sentence we were referring to detecting hybridization in other communities. We have changed “these communities” to “other communities” to make this clearer.

      Supplement.

      Broadly speaking, I appreciate the thorough and careful analysis of the single cell data. On the other hand, it is hard to evaluate whether these custom analyses are doing what is intended in many cases. Would it be possible to consider an analysis using more established methods, e.g. taking a subset of genomes with 'good' completeness and using Panaroo to find the core and accessory genome, then ClonalFrameML or Gubbins to infer a phylogeny and recombination events? Such analyses could probably be applied to a subset of the sample with relatively complete genomes. I don't want to suggest an overly time-consuming analysis, but the authors could consider what would be feasible.

      We have added a comparison between our analysis and that from two other methods, including ClonalFrameML mentioned by the author. One important point that we feel might have been lost in the first version is that our linkage results imply that recombination is not rare such that it can be mapped onto an asexual tree as assumed by ClonalFrameML. Note that this is not simply due to technical limitations due to incomplete coverage and is instead a consequence of the evolutionary dynamics of the population. Consistent with this, we found several inconsistencies in how recombination events were assigned by ClonalFrameML. We have summarized these conclusions in Appendix 7 of the revised manuscript.

      Page 8. Line 190. What is meant by 'minimal compositional bias'?

      We mean that the sample is not biased towards strains that grow in the lab. We have revised the sentence to clarify.

      Page 25. Figure S14 is not referenced in the text.

      We have added part of this figure to the main text since it illustrates one of our main results, namely that sites at long genomic distances are essentially unlinked.

      Page 26. The 'unlinked controls' (line 530) are very useful, but it would be even more informative to see if these controls also show the same decline in linkage with distance in the genome as observed in the real data. In particular, it would be good to know if the observed rapid decline in linkage with distance in the low-diversity regions is also observed in controls. Currently, it is unclear if this observation might be due to higher uncertainty in inferring linkage in low-diversity regions, which by definition have less polymorphism to include in the linkage calculation.

      We thank the reviewer for the suggestion. After further consideration, we have decided to remove the subsection on linkage decrease in the low-diversity regions. We feel such detailed quantitative analysis would be better suited for a more technical paper, which we hope to do at a later time.

      Page 26. There are some sections with missing identifiers (Sec ??).

      Fixed.

      Page 27. The information about the typical breadth of SAG coverage (~30%) would be better to include earlier in the Supplement, and also mentioned in the main text so the reader can more easily understand the nature of the dataset.

      We have added an extra figure with the SAG coverages to Appendix 1.

      Page 29. Any sensitivity analysis around the S = 0.9 value? Even if arbitrary, could the authors provide justification why they think this value is reasonable?

      We have significantly revised this section in response to earlier comments by one of the reviewers. We hope that this would clarify the details of our methods to interested readers. To answer the reviewer’s specific question, we chose this heuristic after examining the fraction of cells of each species in different species clusters. For the clusters assigned to alpha and beta, we found a sharp peak near one and that a cutoff of 0.9 captured most clusters while still being high enough to inconsistent with a mixed cluster.

      Page 30. I could not see where Fig. S17 was mentioned in the text. Also, how are 'simple hybrid genes' defined?

      We have removed this figure in the revision. The definition of the different types of hybrid genes have been added to the main text in response to a comment from the other reviewer.

      Page 36. It is hard to see that divergence is 'high' relative to what reference. Would it be possible to include the expected value (from ref. 12) in the plot, or at least explicitly mentioned in the text?

      We have added the mean synonymous and non-synonymous divergences between alpha and beta to the figures for reference.

      Page 38. Line 770 "would be comparable to that of beta." This is not necessarily the case since beta could have a different time to its most recent common ancestor. It could have a different time to the last bottleneck or selective sweep, etc.

      We thank the reviewer for pointing out this misleading statement. Our point here was that in the first scenario the TMRCA of alpha and beta would be similar since the diversity in the high-diversity alpha genes is similar to beta. We have clarified this statement in the revision.

      Page 39. Line 793. The use of the term 'genomic backbone' implies the presence of a clonal frame, which is not what the data seems to support. Perhaps another term such as 'genetic diversity' would more appropriately capture the intended meaning here.

      We agree with the reviewer that the low-diversity regions may not be asexual. We used “genomic backbone” to distinguish from the “clonal frame,” which is usually used to mean that the backbone is asexual. We have added a note in the revision to clarify this point.

      Page 39. Lines 802-805. I found this explanation hard to follow. Could the logic be clarified?

      We simply meant that although the beta distribution is unimodal, it is not consistent with a simple Poisson distribution, unlike in alpha. We have added an extra sentence to clarify this.

    1. eLife Assessment

      This valuable study uses tools of population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Analyses of computationally predicted human-specific lncRNAs and their genomic targets lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. The conclusions regarding evolutionary acceleration and adaptation, however, only incompletely take data and literature on human/chimpanzee genetics and functional genomics into account.

    2. Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As before, I appreciate the changes made in response to my comments, and I think everyone is approaching this in the spirit of arriving at the best possible manuscript, but we still have some deep disagreements on the nature of the relevant statistical approach and defining adequate controls. I highlight a couple of places that I think are particularly relevant, but note that given the authors disagree with my interpretation, they should feel free to not respond!

      (1) On the subject of the 0.034 threshold, I had previously stated:<br /> "I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below."

      In their reply to me, the authors state:<br /> "What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human-chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzeees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzeees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377."

      I have some notes here. First, Agoglia et al, Nature, 2021, also examined the nature of cis vs trans regulatory differences between human and chimps using a very similar set up to Song et al; their Supplementary Table 4 enables the discovery of genes with cis vs trans effects although admittedly this is less straightforward than the Song et al data. Second, I can't actually tell how the 4377 number is arrived at. From Song et al, "Of 4,671 genes with regulatory changes between human-only and chimpanzee-only iPSC lines, 44.4% (2,073 genes) were regulated primarily in cis, 31.4% (1,465 genes) were regulated primarily in trans, and the remaining 1,133 genes were regulated both in cis and in trans (Fig. 2C). This final category was further broken down into a cis+trans category (cis- and trans-regulatory changes acting in the same direction) and a cis-trans category (cis- and trans-regulatory changes acting in opposite directions)." Even when combining trans-only and cis&trans genes that gives 2,598 genes with evidence for some trans regulation. I cannot find 4,377 in the main text of the Song et al paper.

      Elsewhere in their response, the authors respond to my comment that 0.034 is an arbitrary threshold by repeating the analyses using a cutoff of 0.035. I appreciate the sentiment here, but I would not expect this to make any great difference, given how similar those numbers are! A better approach, and what I had in mind when I mentioned this, would be to test multiple thresholds, ranging from, eg, 0.05 to 0.01 at some well-defined step size.

      (2) The authors have introduced a new TFBS section, as a control for their lncRNAs - this is welcome, though again I would ask for caution when interpreting results. For instance, in their reply to me the authors state:<br /> "The number of HS TFs and HS lncRNAs (5 vs 66) alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between and the three )."

      But this assumes the denominator is the same! There are 35899 lncRNAs according to the current GENCOVE build; 66/35899 = 0.0018, so, 0.18% of lncRNAs are HS. The authors compare this to 5 TFs. There are 19433 protein coding genes in the current GENCOVE build, which naively (5/19433) gives a big depletion (0.026%) relative to the lnc number. However, this assumes all protein coding genes are TFs, which is not the case. A quick search suggests that ~2000 protein coding genes are TFs (see, eg, https://pubmed.ncbi.nlm.nih.gov/34755879/); which gives an enrichment (although I doubt it is a statistically significant one!) of HS TFs over HS lncRNAs (5/2000 = 0.0025). Hence my emphasis on needing to be sure the controls are robust and valid throughout!

      (3) In my original review I said:<br /> line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      In their reply to me, the authors state:<br /> Here, we actually made an analogy but not an inference; therefore, we used such words as "suggesting" and "similar" instead of using more confirmatory words. We have revised the latter half sentence, saying "raising the possibility that these sequences have evolved considerably during human evolution".

      Is the aim here to draw attention to the ~2.2% of DBS that do not have a counterpart? In that case, it would be better to rewrite the sentence to emphasise those, not the ones that are shared between the two species? I do appreciate the revised wording, though.

      (4) Finally, Line 408: "Ensembl-annotated transcripts (release 79)" Release 79 is dated to March 2015, which is quite a few releases and genome builds ago. Is this a typo? Both the human and the chimpanzee genome have been significantly improved since then!

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As it had been nearly two years since I last saw the manuscript, I reread the full text to familiarise myself again with the findings presented. While I appreciate the changes made and think they have strengthened the manuscript, I still find parts of it a bit too speculative or hyperbolic. In particular, I think claims of evolutionary acceleration and adaptation require more careful integration with existing human/chimpanzee genetics and functional genomics literature.

      We thank the reviewer heartfully for the great patience and valuable comments, which have helped us further improve the manuscript. Before responding to comments point by point, we provide a summary here.

      (1) On parameters and cutoffs.

      Parameters and cutoffs influence data analysis. The large number of Supplementary Notes, Supplementary Figures, and Supplementary Tables indicates that we paid great attention to the influence of parameters and robustness of analyses. Specifically, here we explain the DBS sequence distance cutoff of 0.034, which determines the top 20% genes that most differentiate humans from chimpanzees and influences the gene set enrichment analysis (Figure 2). As described in the revised manuscript, we estimated this cutoff based on Song et al., verified its rationality based on Prufer et al. (Song et al. 2021; Prufer et al. 2017), and measured its influence by examining slightly different cutoff values (e.g., 0.035).

      (2) Analyses of HS TFs and HS TF DBSs.

      It is desirable to compare the contribution of HS lncRNAs and HS TFs to human evolution. Identifying HS TFs faces the challenges that different institutions (e.g., NCBI and Ensembl) annotate orthologous genes using different criteria, and that multiple human TF lists have been published by different research groups. Recently, Kirilenko et al. identified orthologous genes in hundreds of placental mammals and birds and organized different types of genes into datasets of parewise comparison (e.g., hg38-panTro6) using humans and mice as references (Kirilenko et al. Integrating gene annotation with orthology inference at scale. Science 2023). Based on (a) the many2zero and one2zero gene lists in the “hg38-panTro6” dataset, (b) three human TF lists reported by two studies (Bahram et al. 2015; Lambert et al. 2018) and used in the SCENIC package, we identified HS TFs. The number of HS TFs and HS lncRNAs (5 vs 66) alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>).

      TF DBS (i.e., TFBS) prediction has also been challenging because they are very short (mostly about 10 bp) and TF-DNA binding involves many cofactors (Bianchi et al. Zincore, an atypical coregulator, binds zinc finger transcription factors to control gene expression. Science 2025). We used two TF DBS prediction programs to predict HS TF DBSs, including the well-established FIMO program (whose results have been incorporated into the JASPAR database) (Rauluseviciute et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles Open Access. NAR 2023) and the recently reported CellOracle program (Kamimoto et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 2023). Then, we performed downstream analyses and obtained two major results. One is that on average (per base), fewer selection signals are detected in HS TF DBSs (anyway, caution is needed because TF DBSs are very short); the other is that HS TFs and HS lncRNAs contribute to human evolution in quite different ways (Supplementary Figs. 25 and 26).

      (3) On genes with more transcripts may appear as spurious targets of HS lncRNAs.

      Now, the results of HS TF DBSs allow us to address the question of whether genes with more transcripts may appear as spurious targets of HS lncRNAs. We note that (a) we predicted HS lncRNA DBSs and HS TF DBSs in the same promoter regions before the same 179128 Ensembl-annotated transcripts (release 79), (b) we used the same GTEx transcript expression matrices in the analyses of HS TF DBSs and HS lncRNA DBSs (the GTEx database includes gene expression matrices and transcript expression matrices, the latter includes multiple transcripts of a gene). Thus, the analyses of HS TF DBSs provide an effective control for examining the question of whether genes with more transcripts may appear as spurious targets of HS lncRNAs, and consequently, cause the high percentages of HS lncRNA-target transcript pairs that show correlated expression in the brain (Figure 3). We find that the percentages of HS TF-target transcript pairs that show correlated expression are also high in the brain, but the whole profile in GTEx tissues is significantly different from that of HS lncRNA DBSs (Figure 3A; Supplementary Figure 25). On the other hand, on the distribution of significantly changed DBSs in GTEx tissues, the difference between HS lncRNA DBSs and HS TF DBSs is more apparent (Figure 3B; Supplementary Figure 26). Together, these suggest that the brain-enriched distribution of co-expressed HS lncRNA-target transcript pairs must arise from HS lncRNA-mediated transcriptional regulation rather than from the transcript number difference.

      (4) Additional notes on HS TFs and HS TF DBSs.

      First, the “many2zero” and “one2zero” gene lists in the “hg38-panTro6” dataset of Kirilenko et al. provide the most update, but not most complete, data on human-specific genes because “hg38-panTro6” is a pairwise comparison. On the other hand, the Ensembl database also annotates orthologous genes, but lacks such pairwise comparisons as “hg38-panTro6”. Therefore, not all HS genes based on “hg38-panTro6” agree with orthologous genes in the Ensembl database. Second, if HS genes are identified based on both Ensembl and Kirilenko et al., HS TFs will be fewer.

      (5) On speculative or hyperbolic claims.

      First, the title “Human-specific lncRNAs contributed critically to human evolution by distinctly regulating gene expression” is now further supported by HS TF DBSs analyses. Second, we have carefully revised the entire manuscript, trying to make it more readable, accurate, logically reasonable, and biologically acceptable. Third, specifically, in the revision, we avoid speculative or hyperbolic claims in results, interpretations, and discussions as possible as we can. This includes the tone-down of statements and claims, for example, using “reshape” to replace “rewire” and using “suggest” to replace “indicate”. Since the revisions are pervasive, we do not mark all of them, except those that are directly relevant to the reviewer’s comments.

      (1) Line 155: "About 5% of genes have significant sequence differences in humans and chimpanzees," This statement needs a citation, and a definition of what is meant by 'significant', especially as multiple lines below instead mention how it's not clear how many differences matter, or which of them, etc.

      Different studies give different estimates, from 1.24% (Ebersberger et al. Genomewide Comparison of DNA Sequences between Humans and Chimpanzees. Am J Hum Genet. 2002) to 5% (Britten RJ. Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. PNAS 2002). The 5% for significant gene sequence differences arises when considering a broader range of genetic variations, particularly insertions and deletions of genetic material (indels). To provide more accurate information, we have replaced this simple statement with a more comprehensive one and cited the above two papers.

      (2) line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      Here, we actually made an analogy but not an inference; therefore, we used such words as “suggesting” and “similar” instead of using more confirmatory words. We have revised the latter half sentence, saying “raising the possibility that these sequences have evolved considerably during human evolution”.

      (3) line 210: "Based on a recent study that identified 5,984 genes differentially expressed between human-only and chimpanzee-only iPSC lines (Song et al., 2021), we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences". I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below. I also find that my previous concerns with the very disparate numbers of results across the three archaics have not been suitably addressed.

      (1) Indeed, “we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences” is an improper claim; we made this mistake due to the flawed use of English.

      (2) What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human–chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzeees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzeees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377.

      (3) If we chose DBS sequence distance cutoff=0.033 or 0.035, slightly more or fewer genes would be determined, raising the question of whether they would significantly influence the downstream gene set enrichment analysis (Figure 2). We found that 91 genes have a DBS sequence distance of 0.034. Thus, if cutoff=0.035, 4248-91=4157 genes were determined, and the influence on gene set enrichment analysis was very limited.

      (4) On the disparate numbers of results across the three archaics. Figure 1A is based on Figure 2 in Prufer et al. 2017. At first glance, our Figure 1A indicates that Altai Neanderthal is older than Denisovan (upon kya), making our result “identified 1256, 2514, and 134 genes in Altai Neanderthals, Denisovans, and Vindija Neanderthals” unreasonable. However, Prufer et al. (2017) reported that “It has been suggested that Denisovans received gene flow from a hominin lineage that diverged prior to the common ancestor of modern humans, Neandertals, and Denisovans……In agreement with these studies, we find that the Denisovan genome carries fewer derived alleles that are fixed in Africans, and thus tend to be older, than the Altai Neandertal genome”. This note by Prufer et al. provides an explanation for our result, which is that more genes with large DBS sequence distances were identified in Denisovans than in Altai Neanderthals. Of course, the 1256, 2514, and 134 depend on the cutoff of 0.034. If cutoff=0.035, these numbers change slightly, but their relationships remain (i.e., more genes in Denisovans). We examined multiple cutoff values and found that more genes in Denisovans have large DBS sequence distances than in Altai Neanderthals.

      (4) I also think that there is still too much of a tendency to assume that adaptive evolutionary change is the only driving force behind the observed results in the results. As I've stated before, I do not doubt that lncRNAs contribute in some way to evolutionary divergence between these species, as do other gene regulatory mechanisms; the manuscript leans down on it being the sole, or primary force, however, and that requires much stronger supporting evidence. Examples include, but are not limited to:

      (1) Indeed, the observed results are also caused by other genomic elements and mechanisms (but it is hardly feasible to identify and differentiate them in a single study), and we do not assume that adaptive evolutionary change is the only driving force. Careful revisions have been made to avoid leaving readers the impression that we have this tendency or hold the simple assumption.

      (2) Comparing HS lncRNAs to HS TFs is critical, and we have done this.

      (5) line 230: "These results reveal when and how HS lncRNA-mediated epigenetic regulation influences human evolution." This statement is too speculative.

      We have toned down the statement, just saying “These results provide valuable insights into when and how HS lncRNA-mediated epigenetic regulation impacts human evolution”.

      Line 268: "yet the overall results agree well with features of human evolution." What does this mean? This section is too short and unclear.

      (1) First, the sentence “Selection signals in YRI may be underestimated due to fewer samples and smaller sample sizes (than CEU and CHB), yet the overall results agree well with features of human evolution” has been deleted, because CEU, CHB, and YRI samples are comparable (100, 99, and 97, respectively).

      (2) Now the sentence has been changed to “These results agree well with findings reported in previous studies, including that fewer selection signals are detected in YRI (Sabeti et al., 2007; Voight et al., 2006)”.

      (3) On “This section is too short and unclear” - To make the manuscript more readable, we adopt short sections instead of long ones. This section expresses that (a) our finding that more selection signals were detected in CEU and CHB than in YRI agrees with well-established findings (Voight et al. A Map of Recent Positive Selection in the Human Genome. PLoS Biology 2006; Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 2007), (b) in considerable DBSs, selection signals were detected by multiple tests.

      Line 325: "and form 198876 HS lncRNA-DBS pairs with target transcripts in all tissues." This has not been shown in this paper - sequence based analyses simply identify the “potential” to form pairs.

      This section describes transcriptomic analysis using the GTEx data. Indeed, target transcripts of HS lncRNAs are results of sequence-based analysis, and a predicted target is not necessarily regulated by the HS lncRNA in a tissue. Here, “pair” means a pair of HS lncRNA-target transcript whose expression shows significant Pearson correlation in a GTEx tissue (by the way, we do not mean correlation equals regulation; actually, we identified HS lncRNA-mediated transcriptional regulation upon both DBS-targeting relationship and correlation relationship).

      Line 423: "Our analyses of these lncRNAs, DBSs, and target genes, including their evolution and interaction, indicate that HS lncRNAs have greatly promoted human evolution by distinctly rewiring gene expression." I do not agree that this conclusion is supported by the findings presented - this would require significant additional evidence in the form of orthogonal datasets.

      (1) As mentioned above, we have used “reshape” to replace “rewire” and used “suggest” to replace “indicate”. In addition, we have substantially revised the Discussion, in which this sentence is replaced by “our results suggest that HS lncRNAs have greatly reshaped (or even rewired) gene expression in humans”.

      (2) Multiple citations have been added, including Voight et al. 2006 (Voight et al. A Map of Recent Positive Selection in the Human Genome. PLoS Biology 2006) and Sabeti et al. 2007 (Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 2007).

      (3) We have analyzed HS TF DBSs, and the obtained results also support the critical contribution of HS lncRNAs.

      I also return briefly to some of my comments before, in particular on the confounding effects of gene length and transcript/isoform number. In their rebuttal the authors argued that there was no need to control for this, but this does in fact matter. A gene with 10 transcripts that differ in the 5' end has 10 times as many chances of having a DBS than a gene with only 1 transcript, or a gene with 10 transcripts but a single annotated TSS. When the analyses are then performed at the gene level, without taking into account the number of transcripts, this could introduce a bias towards genes with more annotated isoforms. Similarly, line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one. It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length, show up here too.

      (1) In gene set enrichment analysis (Figure 2, which is a gene-level analysis), when determining genes differentiating humans from chimpanzees based on DBS sequence distance, if a gene has multiple transcripts/DBSs, we choose the DBS with the largest distance. That is, the input to g:Profiler is a non-redundant gene list.

      (2) In GTEx data analysis (Figure 3, which is a transcriptome-level analysis), the analyses of HS TF DBSs using the GTEx data provide evidence suggesting that different DBS/transcript numbers of genes are unlikely to cause confounding effects. As explained above, we predicted HS TF DBSs in the same promoter regions of 179128 Ensembl-annotated transcripts (release 79), but Supplementary Figures 25 and 26 are distinctly different from Figure 3AB.

      (3) In evolutionary analysis, a gene with 10 DBSs has a higher chance of having selection signals than a gene with 1 DBS. This is biologically plausible, because many conserved genes have novel transcripts whose expression is species-, tissue-, or developmental period-specific, and DBSs before these novel transcripts may differ from DBSs before conserved transcripts.

      (4) “line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for the length of the DBS?” - This is a defect. We have now computed SNP numbers per base and used the new table to replace the old Supplementary Table 8. After examining the new table, we find that the major results of SNP analysis remain.

      (5) On “Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one” - We do not think there are reasons to control for the length of DBSs; also, what “All else being equal” means matters. First, DBS sequences have specific features; thus, the feature of a long DBS is stronger than the feature of a short one, making a long DBS less likely to be generated by chance in the genome and less likely to be predicted wrongly than a short one. This means that longer DBSs are less likely to be false ones (note our explanation that the chance of a DBS of 147 bp, the mean length of DBSs, to be wrongly predicted is extremely low, p<8.2e-19 to 1.5e-48). Second, the difference in length suggests a difference in binding affinity, which in turn influences the regulation of the specific transcripts and influences the analysis of GTEx data. Third, it cannot be excluded that some SNPs may be selection signals (detecting selection signal is challenging, and many selection signals cannot be detected by statistical tests, see Grossman et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 2010).

      (6) On “It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length” - Indeed, strength is influenced by length, see the above response.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Finally, figure 1 panels D and F are not legible - the font is tiny! There's also a typo in panel A, where "Homo Sapien" should be "Homo sapiens".

      (1) “Homo sapien” is changed to “Homo sapiens”.

      (2) Even if we double the font size, they are still too small. Inserting a very large panel D into Figure 1 will make Figure 1 ugly, and converting Figure 1D into an independent figure is unnecessary. Actually, panels 1D and F are illustrative figures; the full Fig.1D is Supplementary Figure 6, and the full Fig.1F is Figure 3. We have revised Fig.1’s legend to explain these.

    1. eLife Assessment

      This valuable study is a comprehensive investigation into the regulatory mechanisms and regional distribution of enteroendocrine cell subtypes in the Drosophila midgut, significantly advancing the understanding of how WNT and BMP gradients contribute to EE diversity. The methodological foundation and robust genetic evidence are solid in supporting the key roles of compartment boundary signals, particularly WNT and BMP, in specifying EE subtypes and division modes. However, there is a lack of full mechanistic insight regarding Notch pathway involvement, incomplete quantification of phenotype data, and insufficient global pattern analysis, which detracts from fully supporting some proposed models. Overall, the study provides a platform for future work but would benefit from stronger data integration and expanded mechanistic exploration.

    2. Reviewer #1 (Public review):

      This valuable study explores the regulatory mechanisms underlying the regional distribution of enteroendocrine cell subtypes in the Drosophila midgut. The regional distribution of EE cell subtypes is carefully documented, and the data convincingly show that each EE cell subtype has a unique spatial pattern. The study aims at determining how the spatial distribution of EE cell subtypes is established and maintained, and explores the roles of three pathways: Notch, WNT, and BMP. The data show evidence that Notch signaling regulates the subtype specificity, being necessary for the specification of Type II, but not Type I and III EE cell subtype specification. The immunofluorescence data in Figure 3 are convincing, but the analysis is incomplete due to a lack of quantification. How Notch signaling activity relates to the emergence of the regional EE cell patterns remains unclear.

      As WNT and BMP are known as morphogens, the study explores their expression patterns and their roles in establishing and maintaining the subtype identities. The observed patterns of WNT and BMP are consistent with earlier studies. Manipulation of WNT and BMP pathway activities in intestinal stem cells is shown to have some region-specific effects on specific EE cell subtypes. The overall conclusion that both WNT and BMP have local effects on EE cell subtypes is based on solid evidence. However, the study falls short in achieving its main objective, i.e., to explain the regional subtype patterns by the action of WNT and BMP gradients. Despite displaying a large volume of phenotypic data in Figures 4-7, the study remains incomplete in providing sufficient evidence to support the models shown in Figures 7 M and N. The main challenge is that the reader is provided with a large volume of individual data fragments of selected regions (e.g., Figures 4 and 5) or images of whole midgut without proper quantification (Figure 7). There is not sufficient effort made to display the data in a way that allows observing changes in the global patterns of EE cell subtypes throughout the midgut and compare these patterns with the observed WNT and BMP gradients.

    3. Reviewer #2 (Public review):

      Summary:

      By labeling the three major enteroendocrine cell markers - AstC, Tk, and CCHa2-the authors systematically investigated the distribution of distinct EE subtypes along the Drosophila midgut, as well as their emergence via symmetric and asymmetric divisions of enteroendocrine progenitor cells. Moreover, they dissected the molecular mechanisms underlying regional patterning by modulating Wnt and BMP signaling pathways, revealing that these compartment boundary signals play key roles in regulating EE subtype diversity.

      Strengths:

      This work establishes a solid methodological and conceptual foundation for future studies on how stem cells acquire positional identity and modulate region-specific behaviors.

      Weaknesses:

      Given that the transcriptional profiles of intestinal stem cells across different regions are highly similar, it is reasonable to hypothesize that the behavior of ISCs and enteroendocrine precursor cells may be regulated non-autonomously, potentially through interactions with enterocytes, which exhibit more distinct region-specific characteristics.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the mechanisms underlying the regional patterning of enteroendocrine cell (EE) subtypes along the Drosophila midgut. Through detailed immunohistochemical mapping and genetic perturbation of Notch, WNT, and BMP signaling pathways, they sought to determine how extrinsic morphogen gradients and intrinsic stem cell identity contribute to EE diversity.

      Strengths:

      A major strength of this work is the meticulous regional analysis of EE pairs and the use of multiple genetic tools to manipulate signaling pathways in a spatiotemporally controlled manner. The data robustly demonstrate that WNT and BMP signaling gradients play key roles in specifying EE subtypes and division modes across different gut regions.

      Weaknesses:

      However, the study does not fully explore the mechanistic basis for the region-specific dependence on Notch signaling. Additionally, while the authors propose that symmetric divisions occur in R1a and R4b, the observed heterogeneity in CCHa2 expression within AstC+ pairs in R4b suggests that asymmetric mechanisms may still be at play, possibly involving apical-basal polarity as previously reported.

      Appraisal of achievements:

      The authors successfully achieve their aims by providing a compelling model in which intercalated WNT and BMP gradients regulate EE subtype specification and EEP division modes. The genetic data strongly support the conclusion that these pathways are central to establishing regional EE diversity during pupal development.

    5. Author response:

      We would like to express our gratitude to all three reviewers for their time and valuable feedback on the manuscript. Below, we provide our point-by-point responses to their comments. Additionally, we summarize here the experiments we plan to conduct in accordance with the reviewers' suggestions:

      Revision plan 1. To further explore the mechanisms of Notch signaling in the decision of regional EE pattern.

      Our observation of EE subtype changes in Notch mutant clones revealed that Notch is required for the specification of Type II EEs, but whether it promotes the generation of Type III EEs is not quite clear. In this revision, we will complete the quantification of Type I and Type III EEs in Notch mutant clones to demonstrate whether Notch signaling participate the determination of these two EE subtypes. Further, we will attempt to combine Notch mutant with different manipulation of WNT and BMP gradients to investigate their interplays.

      Revision plan 2. To supplement the global pattern of WNT and BMP gradient along the whole gut.

      The levels of WNT and BMP gradients are variable in different gut regions both under normal condition and genetic manipulation, leading to different outcomes of EE subtype composition. To further support our model, we will supply the changes of WNT and BMP gradients along the whole gut after genetic manipulation, and perform semi-quantification of their levels to correlate with EE subtype compositions. Additionally, we will also test the gradient levels at different time point during pupal stage to interpret the establishment of regional identity during the development.

      Revision plan 3. To investigate the involvement of apical-basal polarity in the determination of regional EE diversity.

      Although we have demonstrated WNT and BMP gradients orchestrate the regional EE identity, but some observations cannot be fully explained by their roles, such as asymmetric expression of CCHa2 in EE pairs from R4b. A potential mechanism is apical-basal polarity, which has been reported to determine cell fate of ISC progenies at pupal stage. We will specifically knockdown or overexpress key genes related to apical-basal polarity in ISCs or EEs to test whether they are involved preliminarily.

      Please find our detailed point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      This valuable study explores the regulatory mechanisms underlying the regional distribution of enteroendocrine cell subtypes in the Drosophila midgut. The regional distribution of EE cell subtypes is carefully documented, and the data convincingly show that each EE cell subtype has a unique spatial pattern. The study aims at determining how the spatial distribution of EE cell subtypes is established and maintained, and explores the roles of three pathways: Notch, WNT, and BMP. The data show evidence that Notch signaling regulates the subtype specificity, being necessary for the specification of Type II, but not Type I and III EE cell subtype specification. The immunofluorescence data in Figure 3 are convincing, but the analysis is incomplete due to a lack of quantification. How Notch signaling activity relates to the emergence of the regional EE cell patterns remains unclear.

      Indeed, the role of Notch signaling in regional EE determination was not fully characterized in this work. As the requirement of Notch activation for the differentiation of enterocytes, introduction of Notch or Delta mutant led to rapid accumulation of ISCs and EEs, making it being a challenge to dive into the details of how EE subtypes were generated. We will try to complete the quantification of Type I and Type III EEs in the Notch mutant clones from different gut regions to figure out whether Notch could influence the specification of these two EE subtypes. Additionally, different from WNT and BMP gradients, Notch signaling can only function locally and is not significantly changed along the whole gut, including Type II EE-enriched R1a and Type I EE-enriched R4b, which implies that function of Notch signaling may can be overridden by the impact of specific combination of WNT and BMP gradients. To test this hypothesis, we will attempt to combine Notch mutant with the activation or inhibition of WNT and BMP signaling since pupal stage, and further examine whether the regional EE identity could be altered, especially in R1a and R4b regions.

      As WNT and BMP are known as morphogens, the study explores their expression patterns and their roles in establishing and maintaining the subtype identities. The observed patterns of WNT and BMP are consistent with earlier studies. Manipulation of WNT and BMP pathway activities in intestinal stem cells is shown to have some region-specific effects on specific EE cell subtypes. The overall conclusion that both WNT and BMP have local effects on EE cell subtypes is based on solid evidence. However, the study falls short in achieving its main objective, i.e., to explain the regional subtype patterns by the action of WNT and BMP gradients. Despite displaying a large volume of phenotypic data in Figures 4-7, the study remains incomplete in providing sufficient evidence to support the models shown in Figures 7 M and N. The main challenge is that the reader is provided with a large volume of individual data fragments of selected regions (e.g., Figures 4 and 5) or images of whole midgut without proper quantification (Figure 7). There is not sufficient effort made to display the data in a way that allows observing changes in the global patterns of EE cell subtypes throughout the midgut and compare these patterns with the observed WNT and BMP gradients.

      As the variation of WNT and BMP gradients along the whole gut, manipulating these two pathways is not able to align their activation levels in different gut regions. This forced us to analyze the change of each region separately, making it to be a challenge to provide a comprehensive global overview. We will supplement the comprehensive profile of WNT and BMP activity under the manipulation of these two signaling pathways to correlated with the change of EE identity, and also try to perform a semi-quantitative interpretation to further support the model in Figure 7M and 7N.

      Reviewer #2 (Public review):

      Summary:

      By labeling the three major enteroendocrine cell markers - AstC, Tk, and CCHa2-the authors systematically investigated the distribution of distinct EE subtypes along the Drosophila midgut, as well as their emergence via symmetric and asymmetric divisions of enteroendocrine progenitor cells. Moreover, they dissected the molecular mechanisms underlying regional patterning by modulating Wnt and BMP signaling pathways, revealing that these compartment boundary signals play key roles in regulating EE subtype diversity.

      Strengths:

      This work establishes a solid methodological and conceptual foundation for future studies on how stem cells acquire positional identity and modulate region-specific behaviors.

      Weaknesses:

      Given that the transcriptional profiles of intestinal stem cells across different regions are highly similar, it is reasonable to hypothesize that the behavior of ISCs and enteroendocrine precursor cells may be regulated non-autonomously, potentially through interactions with enterocytes, which exhibit more distinct region-specific characteristics.

      This is a quite complicated point to discuss. Drosophila adult midgut is established by pISCs (pupal ISCs), which arise from AMPs (adult midgut progenitors) in larval midgut. AMPs are encased by PCs (peripheral cells) to be islands, scattered throughout the entire larval midgut by mid L3 stage (Mathur D. et al. Science. 2010). After pupariation, larval midgut is delaminated to become the yellow body and finally meconium in the pupal midgut. Simultaneously, PCs break down and die, allowing AMPs to give rise to the presumptive adult epithelium (generating enterocyte precursors) and the specification of ISCs in the adult midgut (Jiang H, Edgar BA. Development. 2009; Micchelli CA. et al. Gene Expr Patterns. 2011). During the pupal stage, pISCs only proliferate to generate new ISCs and EE lineages, while adult enterocytes start to appear after eclosion (Takashima S. et al. Dev Biol. 2011). This rules out the possibility that the interaction with enterocytes regulates regional ISC identity during pupal stage.

      However, whether AMPs already acquire the regional identity during larval stage, and whether pISCs interact with enterocyte precursors at pupal stage, are not quite clear. Our study revealed that pISCs can be influenced by WNT and BMP gradients to acquire regional identity, and further establish regional EE diversity. The change of WNT and BMP gradients during the metamorphosis will be supplemented in revision. While WNT and BMP signaling ligands are provided by muscles and adult enterocytes, and even other surrounding tissues, to regulate regional ISC identity, which indicates that non-autonomous mechanisms indeed exist.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the mechanisms underlying the regional patterning of enteroendocrine cell (EE) subtypes along the Drosophila midgut. Through detailed immunohistochemical mapping and genetic perturbation of Notch, WNT, and BMP signaling pathways, they sought to determine how extrinsic morphogen gradients and intrinsic stem cell identity contribute to EE diversity.

      Strengths:

      A major strength of this work is the meticulous regional analysis of EE pairs and the use of multiple genetic tools to manipulate signaling pathways in a spatiotemporally controlled manner. The data robustly demonstrate that WNT and BMP signaling gradients play key roles in specifying EE subtypes and division modes across different gut regions.

      Weaknesses:

      However, the study does not fully explore the mechanistic basis for the region-specific dependence on Notch signaling. Additionally, while the authors propose that symmetric divisions occur in R1a and R4b, the observed heterogeneity in CCHa2 expression within AstC+ pairs in R4b suggests that asymmetric mechanisms may still be at play, possibly involving apical-basal polarity as previously reported.

      As previously mentioned, we acknowledge that the role of Notch signaling in regional EE determination remains further exploration. We will supplement the quantification of Type I and Type III EEs in Figure 3 and Figure S4, and further combine Notch mutant with activation or inhibition of WNT and BMP signaling to test whether they have any interplays, especially in R1a and R4b.

      Apical-basal polarity has been reported to determine the precise segregation of Pros to control ISC number and cell fate at the pupal stage (Wu S. et al. Cell Rep. 2023). During this time, generation of regional EEs are completed and may also be affected except for the influence of Notch, WNT and BMP pathways. Therefore, the apical-basal polarity is quite a potential mechanism to induce asymmetric cell division in R4b, which we will perform experiments to test.

      Appraisal of achievements:

      The authors successfully achieve their aims by providing a compelling model in which intercalated WNT and BMP gradients regulate EE subtype specification and EEP division modes. The genetic data strongly support the conclusion that these pathways are central to establishing regional EE diversity during pupal development.

    1. eLife Assessment

      This valuable study addresses the effects of selection on aggression on fitness and life-history trade-offs in Drosophila melanogaster. However, the evidence presented is incomplete and does not support the claims proposed in the study of increased survival of highly aggressive males at the expense of reproductive success and shorter mating duration. The main limitation of the study is the choice to use males from only one aggressive Drosophila line in combination with CantonS females, that do not allow disambiguation between nonaggression-related factors, such as hybrid vigor and aggression-related factors influencing mating and lifespan.

    2. Reviewer #1 (Public review):

      Summary:

      This study asks how selection for male aggressiveness affects life-history and reproductive fitness traits in Drosophila melanogaster males.

      Strengths:

      Multiple comprehensive assays are used to address the question.

      Weaknesses:

      (1) The flies used for comparisons are inadequate. Behavioral assays compare Bully males mated to non-coevolved Cs females with Cs males mated to coevolved Cs females.

      (2) Lifespan analysis is done on male progeny of Cs females mated to either genetically more distant Bully or co-evolved Cs males; the longer lifespan and performance on the former is interpreted as a trade-off with aggressiveness, rather than a simple explanation of hybrid vigor.

      (3) Differences in CHCs between Bully and Cs males and Cs females mated to those males are not shown to cause differences in measured behavioral outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      The authors compare "Bully" lines, selected for male aggression, to Canton-S controls and find that Bully males have lower mating success, shorter mating durations, and remate sooner. Chemical analyses show Bully males have distinct cuticular hydrocarbons (CHC) signatures and transfer markedly less cVA to females, offering a plausible mechanistic link to weaker mate-guarding.

      Paradoxically, Bully males live longer and remain fertile at older ages when CS males no longer mate, indicating a shift in the reproduction-survival trade-off in aggression-selected populations.

      Importantly, the work sheds light on proximate mechanisms, demonstrating that shifts in CHCs and pheromone transfer co-occur with changes in fitness traits, thus offering new entry points for understanding life-history evolution.

      Strengths:

      The manuscript's strengths lie in its comprehensive and integrative approach framed within an evolutionary context. By combining behavioral assays, chemical profiling, and lifespan measurements, the authors reveal a coherent pattern linking aggression selection to life-history trade-offs. The direct quantification of cVA in female reproductive tracts after mating provides a particularly compelling mechanistic correlate, strengthening the link between behavior and chemical signaling. Findings on altered 5-T and 5-P levels further highlight how chemical communication shapes mating and mate-guarding strategies. Analytical approaches are largely rigorous, and the results provide valuable insights into the pleiotropic effects of selection on socially relevant traits. The study will be of interest to Drosophila biologists working on sexual selection, behavioral evolution, and aging.

      Weaknesses:

      The weaknesses are primarily conceptual rather than procedural. The generality of the findings is uncertain, as selection appears to be represented by only one (and a second closely related) Bully line, limiting conclusions about selection responses versus line-specific drift or founder effects. The causal link between aggression selection and increased longevity is not established: the data show a correlated shift but do not identify mechanisms underlying lifespan extension. In several places, the manuscript uses causal language (e.g., that selection 'influences' longevity or mating strategy) where association would be more accurate; this should be toned down to avoid overstatement. Ecological relevance is also not addressed, since laboratory conditions may bias the balance between costs and benefits of aggression compared with variable natural environments. Addressing these points would strengthen both the impact and clarity of the study.

    4. Author response:

      eLife Assessment

      This valuable study addresses the effects of selection on aggression on fitness and life-history trade-offs in Drosophila melanogaster. However, the evidence presented is incomplete and does not support the claims proposed in the study of increased survival of highly aggressive males at the expense of reproductive success and shorter mating duration. The main limitation of the study is the choice to use males from only one aggressive Drosophila line in combination with CantonS females, that do not allow disambiguation between nonaggression-related factors, such as hybrid vigor and aggression-related factors influencing mating and lifespan.

      We would like to clarify the points raised in the eLife assessment.

      The report states that we relied on a single line of hyper-aggressive males tested with CantonS females, and implies that Bully and Cs have not co-evolved. This is a misunderstanding: Bully flies were derived from Cs population. Thus, Bully and Cs have co-evolved. In addition to the Bully A line presented in the main figures of the manuscript, we replicated several of our findings with a second independent selected line, Bully B. Results from courtship assays involving both Bully A and Bully B couples males and females were presented in Figure Supp1. We apologies for not having made this more explicit in the original manuscript, which we will correct. These experiments should alleviate the concerns from the reviewers; they demonstrate that our conclusions are supported by two independent hyper-aggressive lines, and these include assays with selected male and female flies.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study asks how selection for male aggressiveness affects life-history and reproductive fitness traits in Drosophila melanogaster males.

      Strengths:

      Multiple comprehensive assays are used to address the question.

      We thank the reviewer for recognizing these strengths.

      Weaknesses:

      (1) The flies used for comparisons are inadequate. Behavioral assays compare Bully males mated to non-coevolved Cs females with Cs males mated to coevolved Cs females.

      We thank the reviewer for this comment, which made us realize that we had not sufficiently highlighted some of our experiments. The Bully lines used in our work were derived from Canton-S flies and thus did co-evolve with Cs. As originally described by Penn et al. (2010), highly aggressive “Bully” lines were generated through selective breeding from Canton-S males that consistently won aggressive encounters. After 34–37 generations, stable Bully lines were established. Thus, Bully and Cs flies have co-evolved and 2) the selection applied was male-specific. Independent selection replicates produced distinct lines, including Bully A and Bully B. Previous studies only characterized Bully A (Penn et al., 2010; Chowdhury et al., 2017), but our work includes both Bully A and Bully B (Fig. S1).

      The rationale for pairing Bully or Cs males with Cs females (with which both male types co-evolved) follows the approach used by Dierick et al. (2006), who investigated how the male-specific selection for aggression affected courtship and mating behaviors by testing them with standard Canton-S females. This design allows to isolate the effects of male genotype and behavior on courtship and mating outcomes, avoiding confounding effects from female behavioral changes.

      We initially compared selected Bully pairs (Bully males × Bully females) (Fig. S1) with Cs pairs and observed similarly shortened mating durations in both Bully × Bully and Bully × Cs matings (Fig. S1, Fig. 1F and G). Thus, the reduction in mating duration arises specifically from Bully males. We therefore chose to use Cs females as a standard background to assess the consequences of male-specific selection for aggression on reproductive behaviors.

      (2) Lifespan analysis is done on male progeny of Cs females mated to either genetically more distant Bully or co-evolved Cs males; the longer lifespan and performance on the former is interpreted as a trade-off with aggressiveness, rather than a simple explanation of hybrid vigor.

      We appreciate this comment, which again stems from a poor explanation from our part about the origin of the Bully line in the original manuscript. The Bully flies were derived from the same original population as the Cs line. Hybrid vigor typically arises when crossing individuals from distinct populations, which is not the case here as both Bully and CS come from the same population.

      To further support our conclusions, we conducted additional experiments using progeny from within-line crosses (Bully males × Bully females) and results revealed the same phenotype: the progeny of these flies also exhibited significantly longer lifespans than Cs males x Cs females progeny. This finding argues against hybrid vigor as the main explanation for the observed phenotype, since both the Bully and Cs crosses result in inbreeding, yet give longer lifespan in Bully. We will include these additional longevity data (currently not included in the manuscript) to strengthen our results and reinforce our interpretation.

      (3) Differences in CHCs between Bully and Cs males and Cs females mated to those males are not shown to cause differences in measured behavioral outcomes.

      We thank the reviewer for raising this important point regarding causality. One way to establish a causal link between differences in CHCs observed in Bully and Cs flies and the corresponding behavioral outcomes would be to experimentally manipulate CHC profiles. For instance, one could perfume oenocyte-less males with the compounds found in higher abundance in Bully flies, then perform behavioral assays to assess causality. We agree that such experiments would be highly informative in determining the functional roles of specific CHCs elevated in Bully males. However, this approach is technically challenging, as the perfuming technique must be optimized to transfer precise amounts of each compound. For example, this method can be used to gradually perfume flies to assess dose–response behavioral effects, whereas matching exactly the natural concentrations found in individuals, especially given inter-individual variability, remains difficult.

      We considered conducting such experiments during our study but did not pursue them for these technical reasons. Nevertheless, we can include a statement in the Discussion acknowledging this as an important future direction to test the causal relationship between CHC variation and behavior.

      Reviewer #2 (Public review):

      Summary:

      The authors compare "Bully" lines, selected for male aggression, to Canton-S controls and find that Bully males have lower mating success, shorter mating durations, and remate sooner. Chemical analyses show Bully males have distinct cuticular hydrocarbons (CHC) signatures and transfer markedly less cVA to females, offering a plausible mechanistic link to weaker mate-guarding.

      Paradoxically, Bully males live longer and remain fertile at older ages when CS males no longer mate, indicating a shift in the reproduction-survival trade-off in aggression-selected populations.

      Importantly, the work sheds light on proximate mechanisms, demonstrating that shifts in CHCs and pheromone transfer co-occur with changes in fitness traits, thus offering new entry points for understanding life-history evolution.

      We thank the reviewer for this positive summary of our work.

      Strengths:

      The manuscript's strengths lie in its comprehensive and integrative approach framed within an evolutionary context. By combining behavioral assays, chemical profiling, and lifespan measurements, the authors reveal a coherent pattern linking aggression selection to life-history trade-offs. The direct quantification of cVA in female reproductive tracts after mating provides a particularly compelling mechanistic correlate, strengthening the link between behavior and chemical signaling. Findings on altered 5-T and 5-P levels further highlight how chemical communication shapes mating and mate-guarding strategies. Analytical approaches are largely rigorous, and the results provide valuable insights into the pleiotropic effects of selection on socially relevant traits. The study will be of interest to Drosophila biologists working on sexual selection, behavioral evolution, and aging.

      We thank the reviewer for recognizing the integrative design and mechanistic contributions of our study.

      Weaknesses:

      The weaknesses are primarily conceptual rather than procedural. The generality of the findings is uncertain, as selection appears to be represented by only one (and a second closely related) Bully line, limiting conclusions about selection responses versus line-specific drift or founder effects. The causal link between aggression selection and increased longevity is not established: the data show a correlated shift but do not identify mechanisms underlying lifespan extension. In several places, the manuscript uses causal language (e.g., that selection 'influences' longevity or mating strategy) where association would be more accurate; this should be toned down to avoid overstatement. Ecological relevance is also not addressed, since laboratory conditions may bias the balance between costs and benefits of aggression compared with variable natural environments. Addressing these points would strengthen both the impact and clarity of the study.

      (1) Generality of findings and potential line effects

      We agree that our results presented in the main figures of the manuscript relied mainly on one Bully line (Bully A). To address potential line-specific effects, we replicated key courtship experiments with another independent line, Bully B, selected in parallel from the same Canton-S stock but through distinct selection replicates. The results obtained from Bully B closely matched those from Bully A, suggesting that the observed phenotypes are consistent consequences of aggression selection rather than random drift or founder effects.

      (2) Causality versus correlation

      We concur that some sentences in the manuscript could overstate causal interpretations. We will revise the text to clearly distinguish correlation from causation and to avoid implying direct causal relationships where data only support association.

      (3) Ecological relevance

      We appreciate this point. Our experiments were performed under controlled laboratory conditions, which may not fully capture the ecological contexts shaping the costs and benefits of aggression. We will acknowledge this limitation and expand the Discussion to consider how environmental variability could modulate the fitness trade-offs associated with aggression in natural populations.

      We thank both reviewers for their constructive feedback, which will help us strengthen the rigor and clarity of the manuscript. We believe that the additional results and revisions will satisfactorily address their concerns.

    1. eLife Assessment

      This valuable study examines how mammals descend effectively and securely along vertical substrates. The conclusions from comparative analyses based on behavioral data and morphological measurements collected from 21 species across a wide range of taxa are convincing, making the work of interest to all biologists studying animal locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

    3. Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      We analyzed gait patterns using methods commonly found in the literature and discussed our results accordingly. However, the study of limbs support polygons was indeed developed specifically for studying locomotion on horizontal supports, and may not be applicable for studying vertical locomotion, which is in fact a type of locomotion shared by all arboreal species. In the future, it would be interesting to consider new methods for analyzing vertical gaits.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      We agree with this statement. In the future, we plan to study other species, particularly large-bodied ones with varied intermembral indexes.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

      We completely agree with this, and we would like to emphasize that our intention here was simply to conduct a modest inference test, the purpose of which is to provide food for thought for future studies, and whose results should be considered in light of a comprehensive evolutionary model.

      Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Yes, that is indeed the main cost/benefit trade-off with this type of study. Working with captive animals allows for large comparative studies, but there is a risk of variations in locomotor behavior among individuals in the natural environment, as well as few individuals per species in the dataset. That is why we plan and encourage colleagues to conduct studies in the natural environment to compare with these results. However, this type of study is very time-consuming and requires focusing on a single species at a time, which limits the comparative aspect.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

      We agree that this figure is dense. One possible solution would be to combine species by phylogenetic groups to reduce the amount of information, as we did with Fig. 3 on the dataset relating to gaits. However, we believe that this would be unfortunate in the case of speed and duty factor because we would have to provide the complete figure in SI anyway, as the species-level information is valuable. We therefore prefer to keep this comprehensive figure here and we will enlarge the data points to improve their visibility, and provide the figure with a sufficiently high resolution to allow zooming in on the details.