10,000 Matching Annotations
  1. May 2024
    1. eLife assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

    3. Reviewer #2 (Public Review):

      Summary: In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      Strengths:<br /> The notion of using persistent homology to group random walks to identify trajectories in the data is novel.<br /> The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:<br /> The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

    4. Reviewer #3 (Public Review):

      Summary:<br /> Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:<br /> Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:<br /> I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.<br /> (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".<br /> Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.<br /> (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.<br /> (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

    1. Reviewer #3 (Public Review):

      Summary:

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli.

      Strengths:

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds.

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions.

      Weaknesses:

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion.

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.

    2. eLife assessment

      This important study introduces a biologically constrained model of telencephalic area of adult zebrafish to highlight the significance of precisely balanced memory networks in olfactory processing. The authors convincingly show that their model performs better in multiple situations (for e.g. in terms of network stability and shaping the geometry of representations), compared to traditional attractor networks and persistent activity. However the study lacks a mechanistic understanding of the results in terms of parameter sensitivity analysis. The work supports recent studies reporting functional E/I subnetworks in several sensory cortexes, and will be of interest to both theoretical and experimental neuroscientists studying network dynamics based on structured excitatory and inhibitory interactions.

    3. Reviewer #1 (Public Review):

      Summary:

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing.

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks.

      Strengths:

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models.

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation.<br /> (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.<br /> (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model.

      Weaknesses:

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model.

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning.<br /> One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks.

      Strengths:

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper.

      Weaknesses:

      Intuitively, classification (decodability) in discrete attractor networks is much better than in networks that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks.

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B).

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

      Excellent suggestions. The EM imaging indeed revealed an increase in enlarged cellular vesicles containing various contents in usp-50 mutants. However, the detailed molecular features of these vesicles remain unclear. Therefore, we plan to utilize ESCRT components for double staining with early or late endosome markers. This will enable us to accurately characterize the anomalous structures detected in the usp-50 mutants.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      -The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion.

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      -The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model.

      Excellent point. We plan to conduct additional genetic analyses, including the construction of double mutants between usp-50 and various rabex-5 mutations, to further elucidate the extent to which USP8 regulates endosome maturation via Rabex5.

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation.

      Weaknesses:

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript.

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it.

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. To elucidate the underlying mechanisms, we investigated the formation of multivesicular bodies (MVBs), a process tightly linked to USP8 function. Extensive electron microscopy (EM) analysis indicated that MVB-like structures are largely intact in usp-50 mutant cells, suggesting that USP8/USP-50 likely regulate lysosome formation through alternative pathways in addition to their roles in MVB formation and ESCRT component function. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Interestingly, loss-of-function mutations in usp8 often lead to the enlargement of early endosomes, yet the mechanisms underlying this phenomenon remain unclear. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged MVB-like vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

    1. Reviewer #2 (Public Review):

      Summary:

      This manuscript reports interesting findings about the navigational behavior of mice. The authors have dissected this behavior in various components using a sophisticated behavioral maze and statistical analysis of the data. ​

      Strengths:

      The results are solid and they support the main conclusions, which will be of considerable value to many scientists.

      Weaknesses:

      Figure 1: In some trials the mice seem to be doing thigmotaxis, walking along the perimeter of the maze. This is perhaps due to the fear of the open arena. But, these paths along the perimeter would significantly influence all metrics of navigation, e.g. the distance or time to reward. Perhaps analysis can be done that treats such behavior separately and the factors it out from the paths that are away from the perimeter. 

      Figure 1c: the color axis seems unusual. Red colors indicate less frequently visited regions (less than 25%) and white corresponds to more frequently visited places (>25%)? Why use such a binary measure instead of a graded map as commonly done?

      Some figures use linear scale and others use logarithmic scale. Is there a scientific justification? For example, average latency is on a log scale and average speed is on a linear scale, but both quantify the same behavior. The y-axis in panel 1-I is much wider than the data. Is there a reason for this? Or can the authors zoom into the y-axis so that the reader can discern any pattern?<br /> <br /> 1F shows no significant reduction in distance to reward. Does that mean there is no improvement with experience and all the improvement in the latency is due to increasing running speed with experience?

      Figure 3: The distance traveled was reduced by nearly 10-fold and speed increased by by about 3fold. So, the time to reach the reward should decrease by only 3 fold (t=d/v) but that too reduced by 10fold. How does one reconcile the 3fold difference between the expected and observed values? 

      Figure 4: The reader is confused about the use of a binary color scheme here for the checking behavior: gray for a large amount of checking, and pink for small. But, there is a large ellipse that is gray and there are smaller circles that are also gray, but these two gray areas mean very different things as far as the reader can tell. Is that so? Why not show the entire graded colormap of checking probability instead of such a seemingly arbitrary binary depiction? 

      Figure 4C: What would explain the large amount of checking behavior at the perimeter? Does that occur predominantly during thigmotaxis? 

      Was there a correlation between the amount of time spent by the animals in a part of the maze and the amount of reward checking? Previous studies have shown that the two behaviors are often positively correlated, e.g. reference 20 in the manuscript.  How does this fit with the path integration hypothesis? 

      "Scratches and odor trails were eliminated by washing and rotating the maze floor between trials." Can one eliminate scratches by just washing the maze floor? Rotation of the maze floor between trials can make these cues unreliable or variable but will not eliminate them. Ditto for odor cues.

      "Possible odor gradient cues were eliminated by experiments where such gradients were prevented with vacuum fans (Fig. S6E)" What tests were done to ensure that these were *eliminated* versus just diminished? 

      "Probe trials of fully trained mice resulted in trajectories and initial hole checking identical to that of regular trials thereby demonstrating that local odor cues are not essential for spatial learning." As far as the reader can tell, probe trials only eliminated the food odor cues but did not eliminate all other odors. If so, this conclusion can be modified accordingly. <br /> The interpretation of direction selectivity is a bit tricky. At different places in this manuscript, this is interpreted as a path integration signal that encodes goal location, including the Consync cells. However, studies show that (e.g. Acharya et al. 2016) direction selectivity in virtual reality is comparable to that during natural mazes, despite large differences in vestibular cues and spatial selectivity. How would one reconcile these observations with path integration interpretation? 

      The manuscript would be improved if the speculations about place cells, grid cells, BTSP, etc. were pared down. I could easily imagine the outcome of these speculations to go the other way and some claims are not supported by data. "We note that the cited experiments were done with virtual movement constrained to 1D and in the presence of landmarks. It remains to be shown whether similar results are obtained in our unconstrained 2D maze and with only self-motion cues available." There are many studies that have measured the evolution of place cells in non-virtual mazes, look up papers from the 1990s. Reference 43 reports such results in a 2D virtual maze.

    2. eLife assessment

      This important work presents a creative and thoughtful analysis of mouse foraging behavior and its dependence on body reference frame-based vs world reference frame-based cues. It convincingly demonstrates that a robust map capable of supporting taking novel shortcuts is learned based primarily on self-motion cues from a known starting location and this can be done in contexts where there is little reliance on distal visual landmarks; this may be a unique finding outside of the human literature. The discussion is rich with ideas about the role of the hippocampus in supporting the behavior that should be interesting to test in future analyses of brain recordings as mice perform the tasks considered by the study.

    3. Reviewer #1 (Public Review):

      Assessment:

      This important work advances our understanding of navigation and path integration in mammals by using a clever behavioral paradigm. The paper provides compelling evidence that mice are able to create and use a cognitive map to find "short cuts" in an environment, using only the location of rewards relative to the point of entry to the environment and path integration, and need not rely on visual landmarks.

      Summary:

      The authors have designed a novel experimental apparatus called the 'Hidden Food Maze (HFM)' and a beautiful suite of behavioral experiments using this apparatus to investigate the interplay between allothetic and idiothetic cues in navigation. The results presented provide a clear demonstration of the central claim of the paper, namely that mice only need a fixed start location and path integration to develop a cognitive map. The experiments and analyses conducted to test the main claim of the paper -- that the animals have formed a cognitive map -- are conclusive. While I think the results are quite interesting and sound, one issue that needs to be addressed is the framing of how landmarks are used (or not), as discussed below, although I believe this will be a straightforward issue for the authors to address.

      Strengths:

      The 90-degree rotationally symmetric design and use of 4 distal landmarks and 4 quadrants with their corresponding rotationally equivalent locations (REL) lends itself to teasing apart the influence of path integration and landmark-based navigation in a clever way. The authors use a really complete set of experiments and associated controls to show that mice can use a start location and path integration to develop a cognitive map and generate shortcut routes to new locations.

      Weaknesses:

      I have two comments. The second comment is perhaps major and would require rephrasing multiple sentences/paragraphs throughout the paper.

      (1) The data clearly indicate that in the hidden food maze (HFM) task mice did not use external visual "cue cards" to navigate, as this is clearly shown in the errors mice make when they start trials from a different start location when trained in the static entrance condition. The absence of visual landmark-guided behavior is indeed surprising, given the previous literature showing the use of distal landmarks to navigate and neural correlates of visual landmarks in hippocampal formation. While the authors briefly mention that the mice might not be using distal landmarks because of their pretraining procedure - I think it is worth highlighting this point (about the importance of landmark stability and citing relevant papers) and elaborating on it in greater detail. It is very likely that mice do not use the distal visual landmarks in this task because the pretraining of animals leads to them not identifying them as stable landmarks. For example, if they thought that each time they were introduced to the arena, it was "through the same door", then the landmarks would appear to be in arbitrary locations compared to the last time. In the same way, we as humans wouldn't use clouds or the location of people or other animate objects as trusted navigational beacons. In addition, the animals are introduced to the environment without any extra-maze landmarks that could help them resolve this ambiguity. Previous work (and what we see in our dome experiments) has shown that in environments with 'unreliable' landmarks, place cells are not controlled by landmarks - https://www.sciencedirect.com/science/article/pii/S0028390898000537, https://pubmed.ncbi.nlm.nih.gov/7891125/. This makes it likely that the absence of these distal visual landmarks when the animal first entered the maze ensured that the animal does not 'trust' these visual features as landmarks.

      (2) I don't agree with the statement that 'Exogenous cues are not required for learning the food location'. There are many cues that the animal is likely using to help reduce errors in path integration. For example, the start location of the rat could act as a landmark/exogenous cue in the sense of partially correcting path integration errors. The maze has four identical entrances (90-degree rotationally symmetric). Despite this, it is entirely plausible that the animal can correct path integration errors by identifying the correct start entrance for a given trial, and indeed the distance/bearing to the others would also help triangulate one's location. Further, the overall arena geometry could help reduce PI error. For example, with a food source learned to be "near the middle" of the arena, the animal would surely not estimate the position to be near the far wall (and an interesting follow-on experiment would be to have two different-sized, but otherwise nearly identical arenas). As the rat travels away from the start location, small path integration errors are bound to accumulate, these errors could be at least partially corrected based on entrance and distal wall locations. If this process of periodically checking the location of the entrance to correct path integration errors is done every few seconds, path integration would be aided 'exogenously' to build a cognitive map. While the original claim of the paper still stands, i.e. mice can learn the location of a hidden food size when their starting point in the environment remains constant across trials. I would advise rewording portions of the paper, including the discussion throughout the paper that states claims such as "Exogenous cues are not required for learning the food location" to account for the possibility that the start and the overall arena geometry could be used as helpful exogenous cues to correct for path integration errors.

    4. Reviewer #3 (Public Review):

      Summary:

      How is it that animals find learned food locations in their daily life? Do they use landmarks to home in on these learned locations or do they learn a path based on self-motion (turn left, take ten steps forward, turn right, etc.). This study carefully examines this question in a well-designed behavioral apparatus. A key finding is that to support the observed behavior in the hidden food arena, mice appear to not use the distal cues that are present in the environment for performing this task. Removal of such cues did not change the learning rate, for example. In a clever analysis of whether the resulting cognitive map based on self-motion cues could allow a mouse to take a shortcut, it was found that indeed they are. The work nicely shows the evolution of the rodent's learning of the task, and the role of active sensing in the targeted reduction of uncertainty of food location proximal to its expected location.

      Strengths:

      A convincing demonstration that mice can synthesize a cognitive map for the finding of a static reward using body frame-based cues. This shows that the uncertainty of the final target location is resolved by an active sensing process of probing holes proximal to the expected location. Showing that changing the position of entry into the arena rotates the anticipated location of the reward in a manner consistent with failure to use distal cues.

      Weaknesses:

      The task is low stakes, and thus the failure to use distal cues at most costs the animal a delay in finding the food; this delay is likely unimportant to the animal. Thus, it is unclear whether this result would generalize to a situation where the animal may be under some time pressure, urgency due to food (or water) restriction, or due to predatory threat. In such cases, the use of distal cues to make locating the reward robust to changing start locations may be more likely to be observed.

    5. Author response:

      We would like to thank all the reviewers and editors for their thoughtful and detailed comments, critiques and suggestions. We will revise our manuscript in accordance with all the points raised by the reviewers. Here we summarize some of the main points that we intend to address in our revised manuscript.

      The reviewers noted that we were not sufficiently careful in identifying possible exogenous cues that the mice might be using to locate the cues and that we did not consider why such cues might be ineffective. As the reviewers point out, the mice may be ignoring the visual landmarks (and floor scratches) because they are not reliable cues and their relation to the food varies with the entrance the mice have used. In particular, a reviewer refers to papers that show that “in environments with 'unreliable' landmarks, place cells are not controlled by landmarks”. These papers were known to the authors but failed to make final cut of our extensive discussion. This important point will be thoroughly addressed.

      Another critical point was the mice were often doing thigmotaxis. The literature on thigmotaxis was known to us and we will now directly refer to this point. We do note that the final average start to food trajectory (TEV) is directly to the food. In other words, the thigmotaxic trajectories and “towards the center” trajectories effectively average out.

      There was a very cogent point about the difficulty of totally eliminating odor cues that we will now address. Finally, based on studies using a virtual reality environment, one reviewer questioned the use of “path integration” as a signal that encodes goal location. The relevance of path integration to spatial learning and performance is a very difficult issue that, to our knowledge, has never been entirely settled in the vast spatial learning literature. We do not think that our data can “settle’ this issue but will try to at least be explicit re the complexity of the path integration hypothesis as it applies to both our own data and the virtual reality literature. In particular, we will discuss the potential roles of optic flow versus proprioceptive and vestibular inputs to a putative path integration mechanism.

      Finally, the reviewers raised many important technical points re statistics reporting and how the figures are presented. In our revision, we will completely comply with all these helpful critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of Vglut2 in noradrenergic neurons does not impact steadystate breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice.

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2-dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice.

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study fails to document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. This limits the reader's understanding of why conditional Vglut2 knockdown is dispensable for breathing under the conditions tested.

      We thank the reviewers for their positive evaluation of our work. First, we would like to highlight that multiple studies have provided anatomical evidence of innervation of multiple cardio-respiratory nuclei by Vglut2+ noradrenergic fibers. Thus, the anatomical substrates are present for noradrenergic based Vglut2 signaling to either play a direct role in breathing control or, upon perturbation, to indirectly affect breathing through disrupted metabolic or cardiovascular control. We have included supplemental table 1 that summarizes central noradrenergic Vglut2+ innervations of respiratory and autonomic nuclei. Additionally, Ultrastructural evidence shows asymmetric synaptic contacts assuming glutamatergic transmission between C1 neurons and LC, A1, A2 and the dorsal motor nucleus of the vagus (DMV) (Milner et al., 1989; Abbott et al., 2012; Holloway et al., 2013; DePuy et al., 2013).

      Functionally, electrophysiological evidence showed that photostimulating C1 neurons activate LC, A1, A2 noradrenergic neurons monosynaptically by releasing glutamate (Holloway et al., 2013; DePuy et al., 2013) and optogenetic stimulation of LC neurons excite the downstream parabrachial nucleus (PBN) neurons by releasing glutamate. Thus, at least the glutamatergic signaling from C1 and LC noradrenergic neurons (two noradrenergic nuclei that have been shown to play a role in breathing control) is evident at the cellular level under normal conditions. Other evidence, highlighted in our manuscript, is more circumstantial.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their real-time expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies.

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.

      Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018). Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance.

      All experiments contained both males and females as described in the original submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed. For the fate map and in situ experiments, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males, though the group size is small. Though all the anatomical and phenotypic data in this manuscript are presented as combined graphs, we have differentially labeled our data points by sex. The reviewer does raise important questions regarding possible sexual dimorphisms in the central noradrenergic system and whether such dimorphisms may extend to glutamate transporter co-expression. Our thorough interrogation of respiratory-metabolic parameters fails to reveal any sex specific differences in control or experimental mice. Thus, it is unclear if any of the previously described and cited dimorphisms are functionally relevant in this setting. Given the large differences in the real time expression and cumulative fate maps of Vglut2, a worthwhile interrogation of differential glutamate transporter expression would be best served by longitudinal studies with large group sizes across age as it is not clear what underlies the dynamic VGlut2 expression changes. Such changes may at times be greater in males and other times in females, driven by experience or physiological challenges etc., but resulting in averaged cumulative fatemaps that are similar between sexes. Such a longitudinal quantitative study of real-time and fatemapped cell populations across the central NA system would be of a scale that is beyond the scope of this report, especially when no phenotypic changes have been observed in our respiratory data.

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis.

      As noted, we discuss that we only address requirement, not sufficiency, of NA Vglut2 in breathing. Functional sufficiency experiments usually involve increasing the relevant output. However, these experiments can lead to non-specific, pleiotropic effects that would be difficult to disambiguate, even if done with high cellular specificity. Viral or genetic overexpression of Vglut2 in NA neurons may be a feasible approach. Conditional ablation of TH or DBH with concurrent chemo or optogenetic stimulation may also be informative. These approaches would require significant investments in mouse model generation and suffer additional experimental limitations.

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables.

      While surgical implantation of sensors would provide a more direct assessment of temperature, it requires components that were not available at the time of the study and addresses a question (temperature changes during a time course of gas exposure) that go beyond the scope of the current work focused on respiratory response. As we have done for prior experiments (Martinez et al., 2019; Ray et al., 2011), the body temperature was measured immediately before and after measuring breathing only. Our flow through system using inline gas sensors (AEI P-61B CO2 sensor and AEI N-22M O2 sensor) ensure that gas challenges were constant and consistent across all measurements. Any disruption in gas composition would have been noted by our software analysis system, Breathe Easy, and the data rejected. We did not observe any such perturbations.

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation?

      We agree that compensation is always a possibility at the synaptic, cellular, and circuit levels that may involve a variety of transcriptional, translational, cellular, and circuit mechanisms (i.e., synaptic strength). This could be interrogated by combining multiple conditional alleles and recombinase drivers for various transmitters and receptors, but would, in our experience, take multiple years for the requisite breeding to be completed.

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate?

      These are all excellent points, but prior studies suggest that reductions in NA signaling would itself have an apparent effect (Zanella et al., 2006; Kuo et al., 2016). Although several studies showed that LC and C1 NA neurons co-release noradrenaline and glutamate, no direct evidence yet makes clear that glutamate facilitates NA release or vice versa. However, it would be of great interest to test if reduced or lack of NA compensated for loss of glutamate in the future. We do fully acknowledge that compensation in the manuscript that any number of compensatory events could be at play in these findings.

      Reviewer #3 (Public Review):

      Summary:

      The authors, Y Chang and colleagues, have performed elegant studies in transgenic mouse models that were designed to examine glutamatergic transmission in noradrenergic neurons, with a focus on respiratory regulation. They generated 3 different transgenic lines, in which a red fluorophore was expressed in dopamine-B-hydroxylase (DBH; noradrenergic and adrenergic neurons) neurons that did not express a vesicular glutamate transporter (Vglut) and a green fluorophore in DBH neurons that did express one of either Vglut1, Vglut2 or Vglut3.

      Further experiments generated a transgenic mouse with knockout of Vglut2 in DBH neurons. The authors used plethysmography to measure respiratory parameters in conscious, unrestrained mice in response to various challenges.

      Strengths:

      The distribution of the Vglut expression is broadly in agreement with other studies, but with the addition of some novel Vglut3 expression. Validation of the transgenic results, using in situ hybridization histochemistry to examine mRNA expression, revealed potential modulation of Vglut2 expression during phases of development. This dataset is comprehensive, wellpresented and very useful.

      In the physiological studies the authors observed that neither baseline respiratory parameters, nor respiratory responses to hypercapnea (5, 7, 10% CO2) or hypoxia (10% O2) were different between knockout mice and littermate controls. The studies are well-designed and comprehensive. They provide observations that are supportive of previous reports using similar methodology.

      Weaknesses:

      In relation to the expression of Vglut2, the authors conclude that modulation of expression occurs, such that in adulthood there are differences in expression patterns in some (nor)adrenergic cell groups. Altered sensitivity is provided as an explanation for different results between studies examining mRNA expression. These are likely explanations; however, the conclusion would really be definitive with inclusion of a conditional cre expressing mouse. Given the effort taken to generate this dataset, it seems to me that taking that extra step would be of value for the overall understanding of glutamatergic expression in these catecholaminergic neurons

      The seemingly dynamic Vglut2 expression pattern across the NA system is intriguing. As noted in our comments to reviewer 2, a robust age dependent interrogation would require a large magnitude study. The reviewer correctly points out that a temporally controlled recombinase fate mapping experiment would offer greater insight into the dynamic expression of Vglut2. We strongly agree with that idea and did work to develop a Vglut2-CreER targeted allele that, despite our many other successes in mouse genetic engineering (Lusk et al., 2022; Sun and Ray, 2016), did not succeed on the first attempt. We aim to complete the line in the near future so that we may better understand the Vglut2 expression pattern in central noradrenergic neurons in a time-specific manner and sex specific manner.

      The respiratory physiology is very convincing and provides clear support for the view that Vglut2 is not required for modulation of the respiratory parameters measured and the reflex responses tested. It is stated that this is surprising. However, comparison with the data from Abbott et al., Eur J Neurosci (2014) in which the same transgenic approach was used, shows that they also observed no change in baseline breathing frequency. Differences were observed with strong, coordinated optogenetic stimulation, but, as discussed in this manuscript, it is not clear what physiological function this is relevant to. It just shows that some C1 neurons can use glutamate as a signaling molecule. Further, Holloway et al., Eur J Neurosci (2015), using the same transgenic mouse approach, showed that the respiratory response to optogenetic activation of Phox2 expressing neurons is not altered in DBH-Vglut2 KO mice. The conclusion seems to be that some C1 neuron effects are reliant upon glutamatergic transmission (C1DMV for example), and some not.

      We agree that activation of C1 neurons may be sufficient to modulate breathing when artificially stimulated and that such stimulation relies on glutamatergic transmission for its effect. This is why we find our results surprising and important in clarifying for the field that glutamatergic signaling in noradrenergic cells is dispensable for breathing and hypoxic and hypercapnic responses under physiological conditions.

      Further contrast is made in this manuscript to the work of Malheiros-Lima and colleagues (eLife 2020) who showed that the activation of abdominal expiratory nerve activity in response to peripheral chemoreceptor activation with cyanide was dependent upon C1 neurons and could be attenuated by blockade of glutamate receptors in the pFRG - i.e. the supposition that glutamate release from C1 neurons was responsible for the function. However, it is interesting to observe that diaphragm EMG responses to hypercapnia (10% CO2) or cyanide, and the expiratory activation to hypercapnia, were not affected by the glutamate receptor blockade. Thus, a very specific response is affected and one that was not measured in the current study.

      As we mention above, we do not dispute that glutamate signaling can be manipulated to create a response in non-physiological conditions – we suggest that framing the interpretation around the glutamatergic role in a model that better matches physiological conditions should inform our interpretation. Furthermore, we do include an examination of expiratory flow – which was not impacted by loss of glutamatergic activity in NA neurons – which would be likely to have been impacted if abdominal expiratory nerve activity was modified.

      These previous published observations are consistent with the current study which provides a more comprehensive analysis of the role of glutamatergic contributions respiratory physiology. A more nuanced discussion of the data and acknowledgement of the differences, which are not actually at odds, would improve the paper and place the information within a more comprehensive model.

      Thank you for the comments. As noted in the original and extended discussion, we respectfully disagree with the perspective that our results align with prior results.

      Recommendations for the authors:

      The three reviewers believe this is an important study. They have numerous suggestions for improvement of the manuscript (outlined below), but no new experiments are required. The Editor requests some nomenclature changes as indicated in attachment 1.

      Reviewer #1 (Recommendations For The Authors):

      Abstract/Introduction: Although the need for this study is obvious, it is important that the authors explicitly communicate their working hypothesis < before the start of the work> to the reader. In the current form, it is unclear whether the authors aimed to test the hypothesis that glutamatergic signaling from noradrenergic neurons is important to breathing or whether to test the hypothesis that glutamatergic signaling from noradrenergic neurons is not important to breathing. If it is the latter-it is not important-then the study (related to the breathing measurements) is poorly justified and designed, as additional orthogonal approaches (e.g., actual measurements of glutamatergic signaling at the cellular level) are almost requisite. If the authors' hypothesis was originally based on existing literature suggesting that glutamatergic signaling from noradrenergic neurons is important to breathing, then the experimental design appropriate.

      Thank you for the suggestion. The working hypothesis has been added in the abstract (line 2425) and the introduction (line 92-94)), making clear that we initially hypothesized that glutamatergic signaling from noradrenergic neurons is important in breathing.

      Results: While the steady state measurements for breathing metrics are clearly important in defining how glutamatergic signaling may contribute to be pulmonary function, the role of glutamatergic signaling may have a greater role in the dynamics of patterns (i.e., regularity of the breathing rhythms) such traits can be described using SD1 and SD2 from Poincare maps, and/or entropy measurements. Such an analysis should be performed.

      Thank you for the suggestion. The dynamic patterns of respiratory rate (Vf), tidal volume (VT), minute ventilation (VE), inspiratory duration (TI), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/TI), expiratory flow rate (VT/TE) have been shown as Poincaré plots and quantified and tested using the SD1 and SD2 statistics in the supplemental figures of Figure 4-7.

      Results: Analyses of Inspiratory time (Ti) and flow rate (i.e., Tidal Volume / Ti) should be assessed and included.

      Thank you for the suggestion. Inspiratory duration (Ti), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/Ti), and expiratory flow rate (VT/TE) have been included in the Figures 4-7.

      Results/Methods: If similar analytical approaches were used in the current study as to that in Lusk et al. 2022, it appears that data was discontinuously sampled, rejecting periods of movement and only including periods of quiescent breathing. Were the periods of quiescent breathing different? Information should be provided to describe the total sampling duration included.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5 minutes of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5-minute epochs following initiation of the gas exposure separately, e.g., epoch 1 = 5-10min, epoch 2 = 10-15min, and epoch 3 = 15-20min. All breaths included as quiescent breathing were analyzed in the aggregate for each group and experimental condition, we did not compare individual periods of quiescent breathing within or across an animal(s)/group(s)/experimental condition(s). We have added the details in the Materials and Methods (line 637-642).

      Results: As mice were conscious in this study, were sniff periods (transient periods of fast breathing, i.e.,>8Hz) included in the analysis?

      No, only regular quiescent breathing periods were included in the analysis.

      Discussion: The authors need to discuss the limitations of their findings.

      • How should the reader interpret the findings? Concluding that glutamatergic signaling is dispensable implies that it occurs in room air, hypoxia, and hypercapnia.

      We have edited our discussion for clarity to highlight our conclusions that Vglut2-based glutamatergic signaling from noradrenergic neurons is ultimately dispensable for baseline breathing and hypercapnia and hypoxic chemoreflex in unanesthetized and unrestrained mice.

      • Assuming that glutamatergic signaling is active during the conditions tested, then the authors should discuss what may be the potential compensations.

      We have provided additional discussion surrounding potential compensatory events that may have taken place and could result in the unchanged phenotype in the experimental group.

      • The authors need to discuss how age and state of consciousness may play a role in their finds. The current discussion gives the impression that their findings are broadly applicable in all cases, but the lack of differences in this study may not hold true under different conditions.

      The study was done in adult (6–8-week-old) unanesthetized and unrestrained mice. In the discussion (line 472-474), we highlight that in our unpublished results, loss of NA-expressed Vglut2 does not change the survival curve in P7 neonate mice undergoing repeated bouts of autoresuscitation until death. Thus, we believed that Vglut2-based glutamatergic signaling in central NA neurons is dispensable for baseline breathing and the hypercapnic and hypoxic chemoreflexes in unanesthetized and unrestrained mice across different ages. Otherwise, we do not imply that we have interrogated any other aspects of breathing in our discussion.

      Methods: Further description of the analysis window for the respiratory metrics should be provided. Were breath values for each condition taken throughout the entire condition? This is particularly important for hypoxia, where the stereotypical respiratory response is biphasic.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5min of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5min time periods separately including 5-10min, 10-15min, and 15-20min during the hypoxic challenge as noted in our original manuscript, we graph and assess three 5min epochs during hypoxic exposure to capture the dynamic nature of the hypoxic ventilatory response. We have added the details in the Materials and Methods (line 637-642).

      Methods: How was consciousness determined?

      The conscious mice mentioned in the manuscript refer to the mice without anesthesia. We have replaced “awake” and “conscious” with “unanesthetized” in the text.

      Reviewer #2 (Recommendations For The Authors):

      Since no EEG/EMG recording was performed it would be more appropriate to remove "awake" and "conscious" throughout the manuscript and include the term "unanesthetized".

      Thank you for the suggestion. “Awake” and “conscious” have been replaced by “unanesthetized” in the text.

      Line 545: Why 32C? Isn't this temperature too high for animals?

      30-32°C is the thermoneutral zone for mice. It is the range of ambient temperature where mice can maintain a stable core temperature with their minimal metabolic rate (Gordon, 1985). Whole-body plethysmography uses the barometric technique to detect pressure oscillations caused by changes in temperature and humidity with each breathing act when an animal sits in a sealed chamber (Mortola et al., 2013). Thus, maintaining the chamber temperature near the thermoneutral zone during the plethysmography assay is required to maintain constancy in respiratory and metabolic parameters from trial to trial as well as to maintain linearity of ventilatory pressure changes due to humidification, rarefaction, and thermal expansion and contraction during inspiration and expiration (Ray et al., 2011). The chamber temperature that has been used for adult plethysmography has been set across a range 30-34°C (Hodges et al., 2008; Ray et al., 2011; Hennessy et al., 2017). We use 32°C in this manuscript which is consistent with previously published literature from other groups and our own work (Sun et al., 2017; Lusk et al., 2022).

      I would include the units of the physiological variables in the tables.

      Thank you for the suggestion. The units of the physiological variables have been added in all the tables.

      Reviewer #3 (Recommendations For The Authors):

      Why is the C3 group not considered in this study?

      The C3 adrenergic group, best characterized in rat, is only seen in rodents but not in many other species including primates (including human) (Kitahama et al., 1994). Thus, the C3 group is not the focus of this study where we aim to discuss if glutamate derived from noradrenergic neurons could be the potential therapeutic target of human respiratory disorders. The C3 adrenergic group is typically described as a population containing only about 30 neurons. We have added the fate map data and the adult expression pattern for the three vesicular glutamate transporters for the C3 group in the figure 1 and 2 supplements for reference.

      Sub CD/CV does not appear to be defined in the manuscript.

      Thank you for the point. The definition of sub CD/CV has been added in the text (line 126).

      The data on line 131-133 is interesting but could be described more effectively and clearly.

      Thank you for the suggestion. The text has been modified accordingly.

      The end of the paragraph at lines 140 onwards is rather repeated in the paragraph that starts at line 146.

      The repeated text has been removed accordingly.

      Whilst anterior and posterior are correct anatomical terms, for a quadraped, rostral and caudal are more widely used - particularly in the brainstem field. Is there a particular reason for using anterior/posterior?

      We followed the anatomical terminations in the Robertson et al. (2013) where they used anterior/posterior to describe C2/A2 and C1/A1.

      On the protocol lines include in Figure 4-7 it would be worth adding the test day. This seems a little strange. Why wait up to one week after the habituation to perform the stimulation. How many mice were left for each day between habituation and experimentation, and does this timing affect responses? Do mice forget the habituation after a period?

      Thank you for the point. We have added the test day for plethysmography in figures 4-7. After the 5 days of habituation, we began the plethysmography recordings on the sixth day. A maximum of 6 mice can be assayed for plethysmography per day due to the limited number of barometric flow through plethysmography and metabolic measurement systems we have. Thus, all animals were finished with plethysmography “within” one week of the last day of habituation. This protocol is consistent with our previous published work (Martinez et al., 2019; Lusk et al., 2022; Lusk et al., 2023). For the experiments in this manuscript, mice were assayed within 3 days after habituation. As noted in our methods and figures, each mouse is given as much as 40 mins to acclimate to the chamber (determined by directly observed quiet breathing) before data acquisition. We have no reason or evidence that indicates testing order and thus timing was a factor. The detailed explanation for the plethysmography protocol has been added in the material and methods section (line 606-625).

      Please state clearly that each mouse is only exposed to one gas mixture (what I interpret is the case), or could one mouse be exposed to several different stimuli?

      Each mouse is only exposed to one gas challenge (5% CO2, 7% CO2, 10% CO2, or 10% O2) in a testing period. Each testing period for an individual mouse was separated by 24hs to allow for a full recovery. The protocol is to put the mouse under room air for 45mins, switch to one gas challenge for 20mins, and switch back to room air for 20mins.

      With apologies if I missed this, but did each of the respiratory stimuli produce a statistically significant response in the control mice? For example, the response to 10%O2?

      Yes, each respiratory stimuli including 5/7/10% CO2 and 10% O2 produced a statistically significant response in both mutant and control mice. We have labeled the statistical significance in the Figures 4-7. Thank you for pointing this out.

      Line 312: Optogenetic stimulation induced an increase from 130 to 180 breaths per min (Abbott et al., EJN 2014). It is surprising that this is called "modest". Baseline respiratory frequency was presented.

      Thank you for the point. The word “modest” has been removed and the discussion has been changed accordingly (line 355-360).

      Line 338: This discussion is not sufficiently nuanced. It is the increased Dia amplitude (to KCN only, not 10%CO2 ) and the stimulation of active expiration, to both stimuli, that is blocked by kyn in pFRG. There is no effect of breathing frequency. The current study would not detect such differences in active expiration.

      Thank you for the suggestion. The discussion has been modified accordingly (line 382-388).

    2. Reviewer #4 (Public Review):

      Summary:

      Although previous research suggested that noradrenergic glutamatergic signaling could influence respiratory control, the work performed by Chang and colleagues reveals that excitatory (specifically Vglut2) neurons is dynamically and widely expressed throughout the central noradrenergic system, but it is not significantly crucial to change baseline breathing as well the hypercapnia and hypoxia ventilatory responses. The central point that will make a significant change in the field is how NA-glutamate transmission may influence breathing control and the dysfunction of NA neurons in respiratory disorders.

      Strengths:

      There are several strengths such as the comprehensive analysis of Vglut1, Vglut2, and Vglut3 expression in the central noradrenergic system and the combined measurements of breathing parameters in conscious unrestrained mice.

      Other considerations :

      These results strongly suggest that glutamate may not be necessary for modulating breathing under normal conditions or even when faced with high levels of carbon dioxide (hypercapnia) or low oxygen levels (hypoxia). This finding is unexpected, considering many studies have underscored glutamate's vital role in respiratory regulation, more so than catecholamines. This leads us to question the significance of catecholamines in controlling respiration. Moreover, if glutamate is not essential for this function, we need to explore its role in other physiological processes such as sympathetic nerve activity (SNA), thermoregulation, and sensory physiology.

    3. eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of vesicular glutamate transporters from noradrenergic neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice.

    4. Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice.

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2-dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice.

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study does not document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. The authors effectively recognize this issue and appropriately discuss their findings in this context.

    5. Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their real-time expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies.

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.<br /> Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018). Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance.<br /> An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis.

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables.

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation?

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate?

    1. eLife assessment

      This important study by Yogesh and Keller provides a set of results describing the response properties of cholinergic input and its functional impacts in the mouse visual cortex. They found that cholinergic inputs are elevated by locomotion in a binary manner regardless of locomotor speeds, and activation of cholinergic input differently modulated the activity of Later 2/3 and Layer 5 visual cortex neurons induced by bottom-up (visual stimuli) and top-down (visuomotor mismatch) inputs. The experiments are cutting-edge and well-executed, and the results are convincing.

    2. Reviewer #1 (Public Review):

      The paper submitted by Yogesh and Keller explores the role of cholinergic input from the basal forebrain (BF) in the mouse primary visual cortex (V1). The study aims to understand the signals conveyed by BF cholinergic axons in the visual cortex, their impact on neurons in different cortical layers, and their computational significance in cortical visual processing. The authors employed two-photon calcium imaging to directly monitor cholinergic input from BF axons expressing GCaMP6 in mice running through a virtual corridor, revealing a strong correlation between BF axonal activity and locomotion. This persistent activation during locomotion suggests that BF input provides a binary locomotion state signal. To elucidate the impact of cholinergic input on cortical activity, the authors conducted optogenetic and chemogenetic manipulations, with a specific focus on L2/3 and L5 neurons. They found that cholinergic input modulates the responses of L5 neurons to visual stimuli and visuomotor mismatch, while not significantly affecting L2/3 neurons. Moreover, the study demonstrates that BF cholinergic input leads to decorrelation in the activity patterns of L2/3 and L5 neurons.

      This topic has garnered significant attention in the field, drawing the interest of many researchers actively investigating the role of BF cholinergic input in cortical activity and sensory processing. The experiments and analyses were thoughtfully designed and conducted with rigorous standards, providing evidence of layer-specific differences in the impact of cholinergic input on neuronal responses to bottom-up (visual stimuli) and top-down inputs (visuomotor mismatch).

    3. Reviewer #2 (Public Review):

      The manuscript investigates the function of basal forebrain cholinergic axons in mouse primary visual cortex (V1) during locomotion using two-photon calcium imaging in head-fixed mice. Cholinergic modulation has previously been proposed to mediate the effects of locomotion on V1 responses. The manuscript concludes that the activity of basal forebrain cholinergic axons in visual cortex provides a signal which is more correlated with binary locomotion state than locomotion velocity of the animal and finds no evidence for modulation of cholinergic axons by locomotion velocity. Cholinergic axons did not seem to respond to grating stimuli or visuomotor prediction error. Optogenetic stimulation of these axons increased the amplitude of responses to visual stimuli and decreased the response latency of layer 5 excitatory neurons, but not layer 2/3 neurons. Moreover, optogenetic or chemogenetic stimulation of cholinergic inputs reduced pairwise correlation of neuronal responses. These results provide insight into the role of cholinergic modulation to visual cortex and demonstrate that it affects different layers of visual cortex in a distinct manner. The experiments are well executed and the data appear to be of high quality. However, further analyses may be required to fully support some of the study's conclusions. Specifically, the analyses of the effects of locomotion and stimulation of cholinergic inputs present grand averages of responses across all neurons, and therefore may mask heterogeneity across layer 2/3 and layer 5 neurons.

    1. eLife assessment

      This important study has practical and theoretical implications for understanding rhythm perception and production in human cognition. The evidence for individual frequency preferences and a deterioration in frequency adaptation with age is convincing. These findings will inform existing models of rhythm perception and production, and the reported effects of age may have clinical implications.

    2. Reviewer #2 (Public Review):

      Summary:

      The current work describes a set of behavioral tasks to explore individual differences in the preferred perceptual and motor rhythms. Results show a consistent individual preference for a given perceptual and motor frequency across tasks and, while these were correlated, the latter is slower than the former one. Additionally, the adaptation accuracy to rate changes is proportional to the amount of rate variation and, crucially, the amount of adaptation decreases with age.

      Strengths:

      Experiments are carefully designed to measure individual preferred motor and perceptual tempo. Furthermore, the experimental design is validated by testing the consistency across tasks and test-retest, what makes the introduced paradigm a useful tool for future research.<br /> The obtained data is rigorously analyzed using a diverse set of tools, each adapted to the specificities across the different research questions and tasks.<br /> This study identifies several relevant behavioral features: (i) each individual shows a preferred and reliable motor and perceptual tempo and, while both are related, the motor is consistently slower than the pure perceptual one; (ii) the presence of hysteresis in the adaptation to rate variations; and (iii) the decrement of this adaptation with age. All these observations are valuable for the auditory-motor integration field of research, and they could potentially inform existing biophysical models to increase their descriptive power.

      Weaknesses:

      To get a better understanding of the mechanisms underlying the behavioral observations, it would have been useful to compare the observed pattern of results with simulations done with existing biophysical models. However, this point is addressed if the current study is read along with this other publication of the same research group: Kaya, E., & Henry, M. J. (2024, February 5). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator. https://doi.org/10.31234/osf.io/q9uvr

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments and the lovely, crisp presentation of the data, The separation of neurons into tonic, phasic and adapting classes is also interesting, and informative. The ability to successfully isolate and dissociate peripheral ganglia from such older animals is also quite rare and commendable! There is much useful detail here.

      Thank you for recognizing the effort we put on presenting the data and analyzing the neuronal populations. I also believe the ability to isolate neurons from old animals is worth communicating to the scientific community.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      Whereas the description of the data are very nice and useful, the manuscript does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe changes in signaling mechanisms, such as that of M1 mAChRs to the phenomena that is supported by data.

      I appreciate the new comment. We had agreed that our rapamycin experiments did not allow to ascribe the mechanism to the signaling pathway of mTOR. The new comment mentions M1 mAChRs signaling as another potential signaling mechanism. Our work centered on determining whether aging altered the function of sympathetic motor neurons and defining the mechanism. We presented evidence showing that the mechanism is a reduction of the M-current. We did not attempt to identify the signaling mechanism linking aging to a reduction in M-current. Therefore, we agree with the reviewer that we do not provide further details on the mechanism and that that remains an open question. However, I find it harsh to say that “the effect is more of an epiphenomenon of unclear insight”. How could we possibly test that the effect of aging on the excitability of these neurons only arises as a secondary effect or that is not causal? How could we test for sufficiency and necessity of aging? How could we modify the state of aging to test for causality? We would have to reverse aging and show that the effect on the excitability is gone. And that is exactly what we tried to do with the rapamycin experiment.

      Reviewer #1 (Recommendations For The Authors):

      (1) The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. Fig. 5 is a good example. What does p = 0.7 mean? Or p = 0.6? Does this help the reader with useful information?

      I thank Reviewer 1 for raising this question. We have attempted different versions of how we report p values, as we want to make sure to address rigor and transparency in reporting data. As corresponding author, I favor reporting p values for all statistical comparisons. To help the reader identifying what we considered statistically significant, we color coded the p values, with red for p-value<0.05 and black for p-value>0.05. As a reader, seeing a p-value=0.7 allows me to know that the authors performed an analysis comparing these conditions and found the mean not to be different. Not presenting the p-value makes me wonder whether the authors even analyzed those groups. In other words, I value more the ability to analyze the data seeing all p-values than not being distracted by not-significant p-values. This is just my preference.

      (2) Fig. 1 is not informative and should be removed.

      I thank Reviewer 1 for the suggestion. In previous drafts of the manuscript, this figure was included only as a panel. However, we decided it was better to guide the reader into the scope of our work. This is part of our scientific style and, therefore, we prefer to keep the figure.

      (3) The emphasis on a particular muscarinic agonist favored by many ion channel physiologists, oxotremorine, is not meaningful (lines 192, 198). The important point is stimulation of muscarinic AChRs, which physiologically are stimulated by acetylcholine. The particular muscarinic agonist used is unimportant. Unless mandated by eLife, "cholinergic type 1 muscarinic receptors" are usually referred to as M1 mAChRs, or even better is "Gq-coupled M1 mAChRs." I don't think that Kruse and Whitten, 2021 were the first to demonstrate the increase in excitability of sympathetic neurons from stimulation of M1 mAChRs. Please try and cite in a more scholarly fashion.

      A) I have modified lines 192 and 198 removing mention to oxotremorine.

      B) I have modified the nomenclature used to refer to cholinergic type 1 muscarinic receptors.

      C) I cited references on the role of M current on sympathetic motor neuron excitability. I also removed the reference (Kruse and Whitten, 2021) referring only on the temporal correlation between the decrease of KCNQ current with excitability.

      (4) The authors may want to use the term "M current" (after defining it) as the current produced by KCNQ2&3-containing channels in sympathetic neurons, and reserve "KCNQ" or "Kv7" currents as those made by cloned KCNQ/Kv7 channels in heterologous systems. A reason for this is to exclude currents KCNQ1-containing channels, which most definitely do not contribute to the "KCNQ" current in these cells. I am not mandating this, but rather suggesting it to conform with the literature.

      Thank you for the suggestion. I have modified the text to use the term M current. I maintain the use of KCNQ only when referring to KCNQ channel, such as in the section describing the abundance of KCNQ2.

      (5) The section in the text on "Aging reduces KCNQ current" is confusing. Can the authors describe their results and their interpretation more directly?

      I am not sure to understand the request. I assumed point 5 and 6 are related and decided to answer point 6.

      (6) Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? What about KCNQ3? It would be very enlightening if the authors would just quantify the ratio of KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves (see Shapiro et al., JNS, 2000; Selyanko et al., J. Physiol., Hadley et al., Br. J. Pharm., 2001 and a great many more). It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry.

      A. Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? Our interpretation is that the decrease in M current is not caused by a decrease in the abundance of KCNQ (2) channels. We do not claim that changes in excitability are underlied by a reduction in the expression or density of KCNQ2 channels. On the contrary, our working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. We have modified the description in the results section to clarify this concept.

      B. What about KCNQ3? Unfortunately, we did not find an antibody to detect KCNQ3 channels. I have added a sentence to state this.

      C. KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves. This is a great idea. Thank you for the suggestion. Is this a necessary experiment for the acceptance of this manuscript?

      D. It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry. Reviewer 1 is correct. We did not assess for differences in the suppression of M current by mAChR activation. We do not see the connection of this experiment with the scope of the current investigation.

      (7) Why do the authors use linopirdine instead of XE-991? Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error?

      A. Why do the authors use linopirdine instead of XE-991? After validation of KCNQ2/3 inhibition by Linopirdine, we found the effect on membrane potential recordings to be reproducible. Linopirdine has also been reported to be reversible. We wanted to assess reversibility on the excitability of young neurons. We did not find the effect to be reversible. We performed experiments applying XE-991 while recording the membrane potential. XE-991 did not show a clear effect. I was not surprised by this. It is very likely that the pharmacological inhibition of one channel leads to the activation of other channel types. This is highlighted in the work by Kimm, Khaliq, and Bean, 2015. “Further experiments revealed that inhibiting either BK or Kv2 alone leads to recruitment of additional current through the other channel type during the action potential as a consequence of changes in spike shape.” In fact, it was quite remarkable that the aged and young phenotypes were mimicked by targeting KCNQ pharmacologically.

      B. Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. I have added a sentence to point out that linopirdine is less potent than XE-991. It reads: “We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.”

      C. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error? Thank you for pointing out this. I have added information for both retigabine and linopirdine in the Methods section, both were missing.

      (8) Can the authors use a more scientific explanation of RTG action than "activating KCNQ channels?" For instance, RTG induces both a negative-shift in the voltage-dependance of activation and a voltage-independent increase in the open probability, both of which differing in detail between KCNQ2 and KCNQ3 subunits. The authors are free to use these exact words. Thus, the degree of "activation" is very dependent upon voltage at any voltages negative to the saturating voltages for channel activation.

      I have modified the text to reflect your suggestion.

      (9) Methods: did the authors really use "poly-l-lysine-coated coverslips?" Almost all investigators use poly-D-lysine as a coating for mammalian tissue-culture cells and more substantial coatings such as poly-D-lysine + laminin or rat-tail collagen for peripheral neurons, to allow firm attachment to the coverslip.

      That is correct. We used poly-L-lysine-coated coverslips. Sympathetic motor neurons do not adhere to poly-D-Lysine.

      (10) As a suggestion, sampling M-type/KCNQ/Kv7 current at 2 kHz is not advised, as this is far faster than the gating kinetics of the channels. Were the signals filtered?

      It is correct. Currents were sampled at 2KHz. Data were low-pass filtered at 3 KHz. Our conditions are not far from what is reported by others. Some sample at 10KHz and even 50 KHz. Others do not report the sample frequency.

      Reviewer #2:

      Weaknesses:

      None, the revised version of the manuscript has addressed all my concerns.

      I am glad we were able to satisfy previous concerns.

      Reviewer #3:

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality.

      Allow me to clarify our previous responses and determine how this aligns with your concerns. In the previous revision, Reviewer 3 wrote: “It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.” And suggested to “use of blockers and activators to provide greater relevance.” I assumed these comments were the main concern and that doing such experiments was enough to satisfy the criticism. It is discouraging to see that our experiments did not satisfy the concerns of the reviewer of being correlative.

      If Reviewer 3 is referring to stablishing causality between aging and a reduction in M current, I would like to emphasize that such endeavor is complicated as there is not a clear experiment to solve that issue. Our best attempt was to reverse aging with rapamycin, but the recommendation was to remove those experiments.

      … but the specifics of the effects and relevance to intact preparations are unclear. Additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations.

      I apologize for missing this point in the previous revision. The proposed experiments will require an upward microscope coupled to an electrophysiology rig. Unfortunately, I do not have the equipment to do these experiments.

      Summary of recommendations from the three reviewers:

      Please make corrections as suggested by reviewer 1 to improve the manuscript. Specifically, reviewer 1 suggests making changes to p values in Figure 5,

      It is not clear what the suggested changes are. The comment from Reviewer 1 says: The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. If the suggested change is to remove p values > 0.05, I have explained my rational for keeping those values. If the Journal has a specific format on how to report p-values, I will be happy to make appropriate changes.

      and the importance of citing original scholarly works related to effects of increase in excitability of sympathetic neurons by M1 receptors, and the terminology for M currents and KCNQ currents. These changes will improve the manuscript and are strongly recommended.

      I cited original papers on that area, and changed the terminology for M current. I kept KCNQ when referring to the channel protein or abundance.

      The section dealing with Aging Reduces KCNQ currents seems to contain a lot of extraneous information especially in the last part of the long paragraph and this section should be rewritten for improved clarity… and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates.

      A. I removed extraneous information in that section. It now reads: Previous work by our group and others demonstrated that cholinergic stimulation leads to a decrease in M current and increases the excitability of sympathetic motor neurons at young ages \cite{RN67,RN68,RN69,RN71, RN72, RN73, RN74, RN75}. The molecular determinants of the M current are channels formed by KCNQ2 and KCNQ3 in these neurons \cite{RN76, RN77, RN70}. Thus, Figure 6A shows a voltage response (measured in current-clamp mode) and a consecutive M current recording (measured in voltage-clamp mode) in the same neuron upon stimulation of cholinergic type 1 muscarinic receptors. It illustrates the temporal correlation between the decrease of M current with the increase in excitability and firing of APs upon activation with oxotremorine. This strong dependence led us to hypothesize that aging decreases M current, leading to a depolarized RMP and hyperexcitability (Figure 6B). For these experiments, we measured the RMP and evoked activity using perforated patch, followed by the amplitude of M current using a whole-cell voltage clamp in the same cell. We also measured the membrane capacitance as a proxy for cell size. Interestingly, M current density was smaller by 29\% in middle age (7.5 ± 0.7 pA/pF) and by 55\% in old (4.8 ± 0.7 pA/pF) compared to young (10.6 ± 1.5 pA/pF) neurons (Figure 6C-D). The average capacitance was similar in young (30.8 ± 2.2 pF), middle-aged (27.4 ± 1.2 pF), and old (28.8 ± 2.3 pF) neurons (Figure 6E), suggesting that aging is not associated with changes in cell size of sympathetic motor neurons, and supporting the hypothesis that aging alters the levels of M current. Next, we tested the effect on the abundance of the channels mediating M current. Contrary to our expectation, we observed that KCNQ2 protein levels were 1.5 ± 0.1 -fold higher in old compared to young neurons (Figure 6F-G). Unfortunately, we did not find an antibody to detect consistently KCNQ3 channels. We concluded that the decrease in M current is not caused by a decrease in the abundance of KCNQ2 protein.

      B. and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates. I am not sure to understand the request on the section of the correlation of KCNQ with AP firing rate. I divided the long paragraph.

      The apparent lack of correlation between KCNQ current and KCNQ2 protein needs to be better explained. This is a central part of the study and this result undercuts the premise of the paper.

      Indeed, total KCNQ2 protein abundance increases while M current decreases. We do not claim in our work that changes in excitability are caused by a reduction in the expression or density of KCNQ2 channels. On the contrary, our current working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. I have modified the description in the results section and discussion to clarify this concept.

      Additionally, the poor specificity of Linordipine for KCNQ should be pointed out in the limitations.

      I pointed this limitation. It reads: We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.

      Finally, the editor notes that the author response should not contain ambiguities in what was addressed in the revision. In the original summary of consolidated revisions that were requested, one clearly and separately stated point (point 4) was that experiments in slice cultures should be strongly considered to extend the significance of the work to an intact brain preparation. The author response letter seems to imply that this was done, but this is not the case. The author response seems to have combined this point with another separate point (point 3) about using KCNQ drugs, and imply that all concerns were addressed. Authors should be clear about what revisions were in fact addressed.

      As corresponding author, and direct responsible of the document provided for the reply to the reviewers, I apologize for my mistake. After reviewing this comment, I realized I did not respond to the Major points in the section of the Recommendations for the authors from Reviewer 3. I missed that entire section. My previous responses addressed the Public review of reviewer 3. When doing so, I did not separate the sentences, omitting the request on performing the experiment in slices.


      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years. The authors suggest the ingestion of rapamycin may partially reverse the age-related decrease in M-channel expression. With the rapamycin part included, it is unclear how this work will impact the field of age-related neuronal dysfunction, as the mechanistic information is not strong.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments, the lovely, crisp presentation of the data, and the expert statistics. The separation of neurons into tonic, phasic, and adapting classes is also interesting, and informative. The writing is also elegant, and crisp. The above is especially true of the manuscript up until the part dealing with the effects of rapamycin, which becomes less compelling.

      We appreciate the thoughtful comments and constructive feedback to improve the impact of the manuscript.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      We agree with the reviewer in that we cannot ascribe a signaling mechanism to the reversibility observed with rapamycin. Therefore, we are following the recommendation of the reviewer and have removed the rapamycin section.

      We want to emphasize that, in the aging field, any advancement in the knowledge of how drugs such as rapamycin reverse age-associated phenotypes is of crucial importance. These drugs, commonly referred to as aging interventions, include rapamycin, calorie restriction, elamipretide, and metformin. We could have used any of these interventions. And yet, the cellular and molecular mechanisms for each one of these anti-aging drugs are unknown.

      We want to note that, although the nature of the mTOR field is controversial, the effect of rapamycin in extending lifespan and improving health is not. At least these authors have not been able to find retracted papers on that subject or notices from the NIA alerting on this issue. We kindly request the reviewer to provide the references related to rapamycin that were retracted so we can evaluate how that affects the rigor of the premise for our future work.

      As authors, we also find it important to note that we are confident of our observations regarding the effect of rapamycin, and that we are not removing this section because we are retracting our claims. We will use these data to continue our research of the mechanism behind the effect of aging on sympathetic motor neurons.

      Reviewer #2:

      Summary:

      This research shows compelling and detailed evidence showing that aging influences intrinsic membrane properties of peripheral sympathetic motor neurons such that they become more excitable. Furthermore, the authors present convincing evidence that the oral administration of the anti-aging drug Rapamycin partially reversed hyperexcitability in aged neurons. This study also investigates the molecular mechanisms underlying age-associated hyperexcitability in mouse sympathetic motor neurons. In that regard, the authors found an age-associated reduction of an outward current having properties similar to KCNQ2/Q3 potassium current. They suggested a reduction of KCNQ2/Q3 current density in aged neurons as a potential mechanism behind their overactivity.

      Strengths:

      Detailed and rigorous analysis of electrical responses of peripheral sympathetic motor neurons using electrophysiology (perforated patch and whole-cell recordings). Most of the conclusions of this paper are well supported by the data.

      We thank the reviewer for valuing our effort to present a detailed and rigorous analysis.

      Weaknesses:

      (1) The identity of the age-associated reduced current as KCNQ2/Q3 is not corroborated by pharmacology (blocking the current with the specific blocker XE-991).

      We have performed experiments using blockers of KCNQ channels. See responses below.

      (2) The manuscript does not include a direct test of the reduction of KCNQ current as the mechanism behind age-induced hyperexcitability.

      Thank you for raising this point. We have performed experiments blocking KCNQ channels with Linopiridine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. We present the results in a new figure. We also added the description in the Results section.

      Reviewer #3:

      This is a descriptive study of membrane excitability and Na+ and K+ current amplitudes of sympathetic motor neurons in culture. The main findings of the study are that neurons isolated from aged animals show increased membrane excitability manifested as increased firing rates in response to electrical stimulation and changes in related membrane properties including depolarized resting membrane potential, increased rheobase, and spontaneous firing. By contrast, neuron cultures from young mice show little to no spontaneous firing and relatively low firing rates in response to current injection. These changes in excitability correlate with significant reductions in the magnitude of KCNQ currents in aged neurons compared to young neurons. Treating cultures with the immunosuppressive drug, rapamycin, which has known antiaging effects in model animals appears to reverse the firing rates in aged neurons and enhance KCNQ current. The authors conclude that aging promotes hyperexcitability of sympathetic motor neurons.

      The electrophysiological cataloging of the neuronal properties is generally well done, and the experiments are performed using perforated patch recordings which preserve the internal constituents of neurons, providing confidence that the effects seen are not due to washout of regulators from the cells.

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality. It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.

      We appreciate the constructive criticism. In an attempt to assess whether changes in KCNQ are in fact directly responsible for the changes in membrane excitability, we have performed experiments blocking KCNQ channels with Linopirdine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. Conversely, we activated KCNQ channels in old neurons with retigabine and found that the pharmacological activation was enough to hyperpolarize the membrane potential and stop the firing of action potentials. This effect was reversible. These two experiments provide solid evidence to our statement that age-associated reduction of KCNQ activity is responsible for the hyperexcited state in sympathetic motor neurons. We present the results in a new figure (Figure 8). We also added the description in the Results section.

      Furthermore, a notable omission seems to be the analysis of Ca2+ currents which have been widely linked to alterations in membrane properties in aging.

      We thank the reviewer for the comment. We did omit to include data on our studies of calcium currents. We agree that the study of the effect of calcium currents is relevant as it can influence the afterhyperpolarization. Furthermore, we believe that potential effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. Adding this information to this manuscript would only contribute to the tabulation of effects that we observe in sympathetic motor neurons with aging. As our main goal was to determine the ion channels responsible for the hyperexcited state, voltage-gated calcium channels or other calcium sources could have reflected a more indirect mechanism as compared to changes in sodium or potassium currents. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      As well, additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations. Finally, experiments using KCNQ blockers and activators could provide greater relevance that the observed changes in KCNQ are indeed connected to changes in membrane excitability.

      We are happy to report that we have performed these experiments and that the results strengthen the conclusion that changes in KCNQ are connected to changes in membrane excitability.

      Recommendations for the authors:

      We recommend the following essential revisions summarized from the reviews:

      (1) Is the change in KCNQ current responsible for the altered membrane excitability? What happens to membrane excitability when KCNQ is partially blocked (see reviewer 2 comment below)? Conversely, what happens to the excitability of aged neurons if KCNQ is activated (e.g., with retigabine)? (see reviewer 3 comment below). Results of these important experiments are needed to support the argument that KCNQ underlies the alterations in firing and membrane excitability.

      We have responded to this point. Thank you for the suggested experiments. In summary, the new experiments show that blocking KCNQ channels in young neurons lead to depolarization, and in some cases, the firing of action potentials. Conversely, the activation of KCNQ channels in aged neurons leads to hyperpolarization and a cease of firing. We have added a new figure and reported the results in the Results section.

      (2) Rapamycin experiments are underdeveloped and weak. These should be further developed by examining the effects of KCNQ blockers to see if their effects on membrane excitability are reversed. Also, see comment 2 from reviewer 1.

      We have followed the recommendation by reviewer 1 and removed the section on rapamycin.

      (3) The study should examine voltage-gated calcium currents to determine potential changes in these currents with aging. See reviewer 3 comments.

      We thank the reviewer for the comment. We performed preliminary experiments and found that aging impacts calcium currents. However, we omitted to include the data. In our opinion, the changes in calcium currents are outside the scope of this work, as the changes could be related to physiological processes that go beyond the control of firing. Effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. The study of the relationship between changes in calcium currents and those physiological processes would require multiple experiments and detailed analysis. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      We have also edited suggestions in the Figures and Legends.

      (2) In Fig.4 panel H, Y-axis must be # AP at 100 pA.

      We corrected the axis in Figure 4H.

      (3) In Legend Fig. 5, the number of cells for each subpopulation (n) needs to be corrected. In plots F-I, n= 9, 7, and 3 seem to be the number of adapting cells for 12-, 64- and 115w-old, respectively, instead of the number of single, phasic, and old cells for 12-week-old mice. A similar correction seems to be needed for 64-week-old and 115-week-old.

      We corrected the n number in Figure 5.

      (4) In Figure 6 panel C, it would be helpful for a reader to align the voltage protocol depicted with the current shown.

      We have aligned the voltage protocol to the current traces.

      (5) In the legend of Figure 7, the description of panel A ends with "Magnitude of voltage step to elicit each trace is shown in black", however in panel A there is no voltage depiction. In the description of panel D, "N = X animals, n=x cells" must be corrected.

      We have modified the legend to clarify. It now reads: “Text at the right of each current trace corresponds to the voltage used to elicit that current.”

      New Figure 8

      Author response image 1.

      Pharmacological inhibition and activation of KCNQ channels mimic the age-dependent phenotype. A. Membrane potential recordings from two young neurons treated with 25 μM linopirdine during the time illustrated by the light gray box. No holding current was applied. B. Left: Summary of the resting membrane potential measured before (light orange) and after (dark orange) the application of linopirdine. Right: Summary of the depolarization produced by linopirdine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 8 cells, 14-week-old mice. C. Membrane potential recordings from two aged neurons treated with 10 μM retigabine during the time illustrated by the light gray box. No holding current was applied. D. Left: Summary of the resting membrane potential measured before (light purple) and after (dark purple) the application of retigabine. Right: Summary of the hyperpolarization produced by retigabine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 7 cells, 120-week-old mice. P-values are shown at the top of the graphs.

    2. Reviewer #3 (Public Review):

      This study described changes in membrane excitability and Na+ and K+ current amplitudes of sympathetic motor neurons in culture. The findings indicate that neurons isolated from aged animals show increased membrane excitability manifested as increased firing rates in response to electrical stimulation and changes in related membrane properties including depolarized resting membrane potential, increased rheobase, and spontaneous firing. By contrast, neuron cultures from young mice show little to no spontaneous firing and relatively low firing rates in response to current injection. These changes in excitability correlate with reductions in the magnitude of KCNQ currents in neurons cultured from aged mice compared to neurons from cultured from young mice. The authors conclude that aging promotes hyperexcitability of sympathetic motor neurons through changes in KCNQ channels.

      The electrophysiological cataloging of the neuronal properties is well done, and the experiments are performed using perforated patch recordings which preserves the internal constituents of neurons, providing confidence that the effects seen are not due to washout of regulators from the cells. The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality. Pharmacological support is provided indicating that blockade or enhancement of KCNQ reverses the changes in excitability, but the specifics of the effects and relevance to intact preparations are unclear. Additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations.

    3. eLife assessment

      This study presents valuable observations indicating that the excitability of cultured sympathetic motor neurons increases in neurons cultured from aged mice, and is inversely correlated with the amplitude of KCNQ currents. The alterations in membrane excitability are relevant for aging-related changes in neuronal membrane properties. While the study documents interesting changes in membrane excitability in cultured neurons with aging, the mechanisms underlying these changes are not clear and physiological relevance of the results to the intact circuits is incomplete.

    4. Reviewer #1 (Public Review):

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments and the lovely, crisp presentation of the data, The separation of neurons into tonic, phasic and adapting classes is also interesting, and informative. The ability to successfully isolate and dissociate peripheral ganglia from such older animals is also quite rare and commendable! There is much useful detail here.

      Weaknesses:

      Whereas the description of the data are very nice and useful, the manuscript does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe changes in signaling mechanisms, such as that of M1 mAChRs to the phenomena that is supported by data.

    5. Reviewer #2 (Public Review):

      Summary:

      This research shows compelling and detailed evidence showing that aging influences intrinsic membrane properties of peripheral sympathetic motor neurons, which become hyperexcitable. The authors found that sympathetic motor neurons from old mice exhibit increased firing rates (spontaneous and evoked), more depolarized membrane resting potential, and increased rheobase. Furthermore, the study investigates cellular mechanisms underlying age-associated hyperexcitability and shows solid evidence supporting that a decreased activity of KCNQ channels during aging is a major contributor to the increased excitability of sympathetic old neurons. All conclusions of this paper are well supported by the data.

      Strengths:

      Detailed and rigorous analysis of electrical responses of peripheral sympathetic motor neurons using electrophysiology (perforated patch and whole-cell recordings). The study identifies a decrease in KCNQ current as a cellular mechanism behind age-induced hyperexcitability in sympathetic motor neurons.

      Weaknesses:

      None, the revised version of the manuscript has addressed all my concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, Blin and colleagues develop a high-throughput behavioral assay to test spontaneous swimming and olfactory preference in individual Mexican cavefish larvae. The authors present compelling evidence that the surface and cave morphs of the fish show different olfactory preferences and odor sensitivities and that individual fish show substantial variability in their spontaneous activity that is relevant for olfactory behaviour. The paper will be of interest to neurobiologists working on the evolution of behaviour, olfaction, and the individuality of behaviour.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors posed a research question about how an animal integrates sensory information to optimize its behavioral outputs and how this process evolved. Their data (behavioral output analysis with detailed categories in response to the different odors in different concentrations by comparing surface and cave populations and their hybrid) partially answer this tough question. They built a new low-disturbance system to answer the question. They also found that the personality of individual fish is a good predictor of behavioral outputs against odor response. They concluded that cavefish evolved to specialize their response to alanine and histidine while surface fish are more general responders, which was supported by their data.

      Strengths:

      With their new system, the authors could generate clearer results without mechanical disturbances. The authors characterize multiple measurements to score the odor response behaviors, and also brought a new personality analysis. Their conclusion that cavefish evolved as a specialist to sense alanine and histidine among 6 tested amino acids was well supported by their data.

      Weaknesses:

      The authors posed a big research question: How do animals evolve the processes of sensory integration to optimize their behavioral outputs? I personally feel that, to answer the questions about how sensory integration generates proper (evolved) behavior, the authors at least need to show the ecological relevance of their response. For the alanine/histidine preference in cavefish, they need data for the alanine and other amino acid concentrations in the local cave water and compare them with those of surface water.

      We agree with the reviewer. This is why, in the Discussion section, we had written: “…Such significant variations in odor preferences or value may be adaptive and relate to the differences in the environmental and ecological conditions in which these different animals live. However, the reason why Pachón cavefish have become “alanine specialists” remains a mystery and prompts analysis of the chemical ecology of their natural habitat. Of note, we have not found an odor that would be repulsive for Astyanax so far, and this may relate to their opportunist, omnivorous and detritivore regime (Espinasa et al., 2017; Marandel et al., 2020).” This is also why we currently develop field work projects aimed at clarifying this question. However, such experiments and analyses are challenging, practically and technically. We hope we can reach some conclusions in the future.

      To complete the discussion we have also added an important hypothesis: “Alternatively, specialization for alanine may not need to be specific for an olfactory cue present only, or frequently, or in high amounts in caves. Bat guano for example, which is probably the main source of food in the Pachón cave, must contain many amino acids. Enhanced recognition of one of them - in the present case alanine but evolution may have randomly acted for enhanced recognition of another amino acid – should suffice to confer cavefish with augmented sensitivity to their main source of nutriment.”

      Also, as for "personality matters", I read that personality explains a large variation in surface fish. Also, thigmotaxis or wall-following cavefish individuals are exceeded to respond well to odorants compared with circling and random swimming cavefish individuals. However, I failed to understand the authors' point about how much percentages of the odorant-response variations are explained (PVE) by personality. Association (= correlation) was good to show as the authors presented, but showing proper PVE or the effect size of personality to predict the behavioral outputs is important to conclude "personality is matter"; otherwise, the conclusion is not so supported.

      From the above, I recommend the authors reconsider the title also their research questions well. At this moment, I feel that the authors' conclusions and their research questions are a little too exaggerated, with less supportive evidence.

      Thank you for this interesting suggestion, which we have fully taken into consideration. We have therefore now calculated and plotted PVE (the percentage of variation explained on the olfactory score) as a function of swimming speed or as a function of swimming pattern. The results are shown in modified Figure 8 of our revised ms and they suggest that the personality (here, swimming patterns or swimming speed) indeed predicts the olfactory response skills. Therefore, we would like to keep our title as we provide support for the fact that “personality matters”.

      Also, for the statistical method, Fisher's exact test is not appropriate for the compositional data (such as Figure 2B). The authors may quickly check it at https://en.wikipedia.org/wiki/Compositional_data or https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436.

      The authors may want to use centered log transformation or other appropriate transformations (Rpackage could be: https://doi.org/10.1016/j.cageo.2006.11.017). According to changing the statistical tests, the authors' conclusion may not be supported.

      Actually, in most cases, the distributions are so different (as seen by the completely different colors in the distribution graphs) that there is little doubt that swimming behaviors are indeed different between surface and cavefish, or between ‘before’ and ‘after’ odor stimulation. However, it is true that Fisher’s exact test is not fully appropriate because data can be considered as compositional type. For this kind of data, centered log transformation have been suggested. However, our dataset contains many zeros, and this is a case where log transformations have difficulty handling.

      To help us dealing with our data, the reviewer proposed to consider the paper by Greenacre (2021) (https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436). In his paper, Greenacre clearly wrote: "Zeros in compositional data are the Achilles heel of the logratio approach (LRA)."

      Therefore, we have now tested our data using CA (Correspondence Analysis), that can deal with table containing many zeros and is a trustable alternative to LRA (Cook-Thibeau, 2021; Greenacre, 2011).

      The results of CA analysis are shown in Supplemental figure 8 and they fully confirm the difference in baseline swimming patterns between morphs as well as changes (or absence of changes) in behavioral patterns after odor stimulation suggested by the colored bar plots in main figures, with confidence ellipses overlapping or not overlapping, depending on cases. Therefore, the CA method fully confirms and even strengthens our initial interpretations.

      Finally, we have kept our initial graphical representation in the ms (color-coded bar plots; the complete color code is now given in Suppl. Fig7), and CA results are shown in Suppl. Figure 8 and added in text.

      Reviewer #2 (Public Review):

      In their submitted manuscript, Blin et al. describe differences in the olfactory-driven behaviors of river-dwelling surface forms and cave-dwelling blind forms of the Mexican tetra, Astyanax mexicanus. They provide a dataset of unprecedented detail, that compares not only the behaviors of the two morphs but also that of a significant number of F2 hybrids, therefore also demonstrating that many of the differences observed between the two populations have a clear (and probably relatively simple) genetic underpinning.

      To complete the monumental task of behaviorally testing 425 six-week-old Astyanax larvae, the authors created a setup that allows for the simultaneous behavioral monitoring of multiple larvae and the infusion of different odorants without introducing physical perturbations into the system, thus biasing the responses of cavefish that are particularly fine-tuned for this sensory modality. During the optimization of their protocol, the authors also found that for cave-dwelling forms one hour of habituation was insufficient and a full 24 hours were necessary to allow them to revert to their natural behavior. It is also noteworthy that this extremely large dataset can help us see that population averages of different morphs can mask quite significant variations in individual behaviors.

      Testing with different amino-acids (applied as relevant food-related odorant cues) shows that cavefish are alanine- and histidine-specialists, while surface fish elicit the strongest behavioral responses to cysteine. It is interesting that the two forms also react differently after odor detection: while cave-dwelling fish decrease their locomotory activity, surface fish increase it. These differences are probably related to different foraging strategies used by the two populations, although, as the observations were made in the dark, it would be also interesting to see if surface fish elicit the same changes in light as well.

      Thank you for these nice comments.

      Further work will be needed to pinpoint the exact nature of the genetic changes that underlie the differences between the two forms. Such experimental work will also reveal how natural selection acted on existing behavioral variations already present in the SF population.

      Yes. Searching for genetic underpinnings of the sensory-driven behavioral differences is our current endeavor through a QTL study and we should be able to report it in the near future.

      It will be equally interesting, however, to understand what lies behind the large individual variation of behaviors observed both in the case surface and cave populations. Are these differences purely genetic, or perhaps environmental cues also contribute to their development? Does stochasticity provided by the developmental process has also a role in this? Answering these questions will reveal if the evolvability of Astyanax behavior was an important factor in the repeated successful colonization of underground caves.

      Yes. We will also access (at least partially) responses to most of these questions in our current QTL study.

      Reviewer #3 (Public Review):

      Summary:

      The paper explores chemosensory behaviour in surface and cave morphs and F2 hybrids in the Mexican cavefish Astyanax mexicanus. The authors develop a new behavioural assay for the longterm imaging of individual fish in a parallel high-throughput setup. The authors first demonstrate that the different morphs show different basal exploratory swimming patterns and that these patterns are stable for individual fish. Next, the authors test the attraction of fish to various concentrations of alanine and other amino acids. They find that the cave morph is a lot more sensitive to chemicals and shows directional chemotaxis along a diffusion gradient of amino acids. For surface fish, although they can detect the chemicals, they do not show marked chemotaxis behaviour and have an overall lower sensitivity. These differences have been reported previously but the authors report longer-term observations on many individual fish of both morphs and their F2 hybrids. The data also indicate that the observed behavior is a quantitative genetic trait. The approach presented will allow the mapping of genes' contribution to these traits. The work will be of general interest to behavioural neuroscientists and those interested in olfactory behaviours and the individual variability in behavioural patterns.

      Strengths:

      A particular strength of this paper is the development of a new and improved setup for the behavioural imaging of individual fish for extended periods and under chemosensory stimulation. The authors show that cavefish need up to 24 h of habituation to display a behavioural pattern that is consistent and unlikely to be due to the stressed state of the animals. The setup also uses relatively large tanks that allow the build-up of chemical gradients that are apparently present for at least 30 min.

      The paper is well written, and the presentation of the data and the analyses are clear and to a high standard.

      Thank you for these nice comments.

      Weaknesses:

      One point that would benefit from some clarification or additional experiments is the diffusion of chemicals within the behavioural chamber. The behavioural data suggest that the chemical gradient is stable for up to 30 min, which is quite surprising. It would be great if the authors could quantify e.g. by the use of a dye the diffusion and stability of chemical gradients.

      OK. We had tested the diffusion of dyes in our previous setup and we also did in the present one (not shown). We think that, due to differences of molecular weight and hydrophobicity between the tested dyes and the amino acid molecules we are using, their diffusion does not constitute a proper read-out of actual amino acid diffusion. We anticipate that amino acid diffusion is extremely complex in the test box, possibly with odor plumes diffusing and evolving in non-gradient patterns, in the 3 dimensions of the box, and potentially further modified by the fish swimming through it, the flow coming from the opposite water injection side and the borders of the box. This is the reason why we have designed the assay with contrasting “odor side” and “water control side”. Moreover, our question here is not to determine the exact concentration of amino acid to which the fish respond, but to compare the responses in cavefish, surface fish and F2 hybrids. Finally and importantly, we have performed dose/response experiments whereby varying concentrations have been presented for 3 of the 6 amino acids tested, and these experiments clearly show a difference in the threshold of response of the different morphs.

      The paper starts with a statement that reflects a simplified input-output (sensory-motor) view of the organisation of nervous systems. "Their brains perceive the external world via their sensory systems, compute information and generate appropriate behavioral outputs." The authors' data also clearly show that this is a biased perspective. There is a lot of spontaneous organised activity even in fish that are not exposed to sensory stimulation. This sentence should be reworded, e.g. "The nervous system generates autonomous activity that is modified by sensory systems to adapt the behavioural pattern to the external world." or something along these lines.

      Done

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In addition to my comments in the "weakness" section above, here are my other comments.

      How many times fish were repeatedly assayed and what the order (alanine followed by cysteine, etc) was, is not clear (Pg 24, Materials and Methods). I am afraid that fish memorize the prior experience to get better/worse their response to the higher conc of alanine, etc. Please clarify this point.

      Many fish were tested in different conditions on consecutive days, indeed. Most often, control experiments (eg, water/nothing; water/water; nothing/nothing) were followed by odor testing. In such cases, there is no risk that fish memorize prior experience and that such previous experience interferes with response to odor. In other instances, fish were tested with a low concentration of one amino acid, followed by a high concentration of another amino acid, which is also on the safe side. Of note, on consecutive days, the odors were always perfused on alternate sides of the test box, to avoid possibility of spatial memory. Finally, in the few cases where increasing concentrations of the same amino acids were perfused consecutively, 1) they were perfused on alternate sides, 2) if the fish does not detect a low concentration below threshold / does not respond, then prior experience should not interfere for responding to higher concentrations, and 3) we have evidence (unpublished, current studies) that when a fish is given increasing concentrations of the same amino acid above detection threshold, then the behavioral response is stable and reproducible (eg does not decrease or increase).

      Minor points:

      Thygmotaxis and wall following.

      Classically, thigmotaxis and wall following are treated as the same (sharma et al., 2009; https://pubmed.ncbi.nlm.nih.gov/19093125/) but the authors discriminate it in thigmotaxis at X-axis and Y-axis because fish repeatedly swam back and forth on x-axis wall or y-axis wall. I understand the authors' point to discriminate WF and T but present them with more explanations (what the differences between them) in the introduction and result sections.

      Done

      Pg5 "genetic architecture" in the introduction.

      "Genetic architecture" analysis needs a more genomic survey, such as GWAS, QTL mapping, and Hi-C. Phenotype differences in F2 generation can be stated as "genetic factor(s)" "genetic component(s)", etc. please revise.

      Done

      Pg10 At the serine treatment, the authors concluded that "...suggesting that their detection threshold for serine is lower than for alanine." I believe that the 'threshold for serine is higher' according to the authors' data. Their threshold-related statement is correct in Pg21 "as SF olfactory concentration detection threshold are higher than CF,..." So the statement on page 10 is a just mistake, I think. Please revise.

      Done (mistake indeed)

      Pg11 After explaining Fig5, the statement "In sum, the responses of the different fish types to different concentrations of different amino acids were diverse and may reflect complex, case-bycase, behavioral outputs" does not convey any information. Please revise.

      OK. Done : “In sum, the different fish types show diverse responses to different concentrations of different amino acids.”

      For the personality analysis (Fig 7)

      The index value needs more explanation. I read the materials and methods three times but am still confused. From the equation, the index does not seem to exceed 1.0, unless the "before score" was a negative value, and the "after score" value was positive. I could not get why the authors set a score of 1.5 as the threshold for the cumulative score of these different behavior index values (= individual score). Please provide more description. Currently, I am skeptical about this index value in Fig 7.

      Done, in results and methods.

      Pg15 the discussion section

      Please discuss well the difference between the authors' finding (cavefish respond 10^-4M for position and surface fish responded 10^-4 for thig-Y; Fig 4AB), and those in Hinaux et al. 2016 (cavefish responded 10^-10M alanine but surface fish responded 10^-5M or higher). It seems that surface fish could respond to the low conc of alanine as cavefish do, which is opposed to the finding in Hinaux 2016.

      The increase in NbrtY at population level for surface fish with 10-4M alanine (~10-6M in box) was most probably due to only a few individuals. Contrarily to cavefish, all other parameters were unchanged in surface fish for this concentration. Moreover, at individual level, only 3.2% of surface fish had significant olfactory scores (to be compared to 81.3% for cavefish). Thus, we think that globally this result does not contradict our previous findings in Hinaux et al (2016), and solely represent the natural, unexplained variations inherent to the analysis of complex animal behaviors – even when we attempt to use the highest standards of controlled conditions.

      Of note, in the revised version, we have now included a full dose/response analysis for alanine concentration ranging from 10-2M to 10-10M, on cavefish. Alanine 10-5M has significant effects (now shown in Suppl Fig2 and indicated in text; a column has been added for 10-5M in Summary Table 1). Lower concentrations have milder effects (described in text) but confirm the very low detection threshold of cavefish for this amino acid.

      Pg19, "In sum, CF foraging strategy has evolved in response to the serious challenge of finding food in the dark"

      My point is the same as explained in the 'weakness' section above: how this behavior is effective in the cave life, if they conclude so? Please explain or revise this statement.

      The present manuscript reports on experiments performed in “artificial” and controlled laboratory conditions. We are fully aware that these conditions are probably distantly related to conditions encountered in the wild. Note that we had written in original version (page 20) “…for 6-week old juveniles in a rectangular box - but the link may be more elusive when considering a fish swimming in a natural, complex environment.” As the reviewer may know, we also perform field studies in a more ethological approach of animal behaviors, thus we may be able to discuss this point more accurately in the future.

      Pg20 "To our knowledge, this is the first time individual variations are taken into consideration in Astyanax behavioral studies."

      This is wrong. Please see Fernandes et al., 2022. (https://pubmed.ncbi.nlm.nih.gov/36575431/).

      OK. The sentence is wrong if taken in its absolute sense, i.e., considering inter-individual variations of a given parameter (e.g., number of neuromasts per individual or number of approaches to vibrating rod in Fernandez et al, 2022). In this same sense, Astyanax QTL studies on behaviors in the past also took into account variations among F2 individuals. Here, we wanted to stress that personality was taken into consideration. The sentence has been changed: “To our knowledge, this is the first time individual temperament is taken into consideration in Astyanax behavioral studies.”

      Figure 2B and others.

      The order of categories (R, R-TX, etc) should match in all columns (SF, F2, and CF). Currently, the category orders seem random or the larger ratio categories at the bottom, which is quite difficult to compare between SF, F2, and CF. Also, the writings in Fig 2A (times, Y-axis labels, etc), and the bargraphs' writings are quite difficult to read in Fig 2B, Fig 3B 4H, 5GN, 6EFG. Also, no need to show fish ID in Fig 2C in the current way, but identify the fish data points of the fish in Fig 2D (SF#40, CF#65, and F2#26) in Fig 2C if the authors want to show fish ID numbers in the boxplots. Fish ID numbers in other boxplot figures are recommended to be removed too.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations (33 possibilities in total, see new Suppl Fig7), which are never the same in different plots/conditions because individual tested fish are different. We decided that that the best way was to represent, from bottom to top, the most used to the less used swimming patterns, and to use a color code that matches at best the different combinations. It was impossible to give the full color code on each figure, therefore it was simplified, and we believe that the results are well conveyed on the graphs. We would like to keep it as it is. To respond (partially) to the reviewer’s concern, we have now added a full color code description in a new Supplemental Figure 7 (associated to Methods).

      Size of lettering has been modified in all pattern graphs like Fig2A. Thanks for the suggestion, it reads better now.

      Finally, we would like to keep the fish ID numbers because this contributes to conveying the message of the paper, that individuality matters.

      Raw data files were not easy to read in Excel or LibreOffice. Please convert them into the csv format to support the rigor in the authors' conclusion.

      We do not understand this request. Our very large dataset must be analysed with R, not excel for stats or for plotting and pattern analysis. However, raw data files can be opened in excel with format conversion.

      Reviewer #2 (Recommendations For The Authors):

      I think most of the experimental procedures (with few exceptions, see below) are well-defined and nicely described, so the majority of my suggestions will be related to the visualization of the data. I think the authors have done a great job in presenting this complex dataset, but there are still some smaller tweaks that could be used to increase the legibility of the presented data.

      First and perhaps foremost, a better definition of the swimming pattern subsets is needed. I have no problem understanding the main behavioral types, but whereas the color codes for these suggest that there is continuous variance within each pattern, it is not clear (at least to me), what particular aspect(s) of the behaviors vary. Also, whereas the sidebars/legends suggest a continuum within these behaviors, the bar charts themselves clearly present binned data. I did not find a detailed description of how the binning was done. As this has been - according the Methods section - a manual process, more clarity about the details of the binning would be welcome. I would also suggest using binned color codes for the legends as well.

      Done, in Results and Methods. We hope it is now clear that there is no “continuum”, rather multiple combinations of discrete swimming patterns. The gradient aspect in color code in figures has been removed to avoid the idea of continuum. According to the chosen color code, WF is in red, R in blue, T in yellow and C in green. Then, combination are represented by colors in between, for example, R+WF is purple. We have now added a full color code description for the swimming patterns and their combinations in a new Supplemental Figure 7 (associated to Methods).

      Also, to better explain the definition of the swimming patterns and the graphical representation, it now reads (in Methods):

      “The determination of baseline swimming patterns and swimming patterns after odor injection was performed manually based on graphical representations such as in Figure 2A or Figure 3A. Four distinctive baseline behaviors clearly emerged: random swim (R; defined as haphazard swimming with no clear pattern, covering entirely or partly the surface of the arena), wall following (WF; defined as the fish continuously following along the 4 sides of the box and turning around it, in a clockwise or counterclockwise fashion), large or small circles (C; self explanatory), and thigmotactism (T, along the X- or the Y-axis of the box; defined as the fish swimming back and forth along one of the 4 sides of the box). On graphical representations of swimming pattern distributions, we used the following color code: R in blue, WF in red, C in green, T in yellow. Of note, many fish swam according to combination(s) of these four elementary swimming patterns (see descriptions in the legends of Supplemental figures, showing many examples). To fully represent the diversity and the combinations of swimming patterns used by individual fish, we used an additional color code derived from the “basic” color code described above and where, for example R+WF is purple. The complete combinatorial color code is shown in Suppl. Fig7.”

      It would be also easier to comprehend the stacked bar charts, presenting the particular swimming patterns in each population, if the order of different swimming patterns was the same for all the plots (e.g. the frequency of WF always presented at the bottom, R on the top, and C and T in the middle). This would bring consistency and would highlight existing differences between SF, CF, and F2s. Furthermore, such a change would also make it much easier to see (and compare) shifts in behaviors.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations, which are never the same in different plots/conditions because the individual fish tested are different. We decided to keep it as it currently stands, because we think re-doing all the graphs and figures would not significantly improve the representation. In fact, we think that the differences between morphs (dominant blue in SF, dominant red in CF) and between conditions (bar charts next to each other) are easy to interpret at first glance in the vast majority of cases. Moreover, they are now completed by CA analyses (Suppl Figure 8).

      While the color coding of the timeline in the "3D" plots presented for individual animals is a nice feature, at the moment it is slightly confusing, as the authors use the same color palette as for the stacked bar charts, representing the proportionality of the particular swimming patterns. As the y-axis is already representing "time" here, the color coding is not even really necessary. If the authors would like to use a color scheme for aesthetic reasons, I would suggest using another palette, such as "grey" or "viridis".

      We would like to keep the graphical aspect of our figures as they are, for aesthetic reasons. To avoid confusion with stacked bar chart color code, we have added a sentence in Methods and in the legend of Figure 2, where the colors first appear:

      “The complete combinatorial color code is shown in Suppl. Figure 7. Of note, in all figures, the swimming pattern color code does not relate whatsoever with the time color code used in the 2D plus time representation of swimming tracks such as in Figure 2A”.

      I would also suggest changing the boxplots to violin-plots. Figure 7 clearly shows bimodality for F2 scores (something, as the authors themselves note, not entirely surprising given the probably poligenic nature of the trait), but looking at SF and CF scores I think there are also clear hints for non-normal distributions. If non-normal distribution of traits is the norm, violin-plots would capture the variance in the data in a more digestible way. (The existence of differently behaving cohorts within the population of both SF and CF forms would also help to highlight the large pre-existing variance, something that was probably exploited by natural selection as well, as mentioned briefly in the Discussion by the authors, too.)

      The bimodal distribution of scores shown by F2s in Figure 7B is indeed probably due to the polygenic nature of the trait. However, such distribution is rather the exception than the norm. Moreover, the boxplot representations we have used throughout figures include all the individual points, and outliers can be identified as they have the fish ID number next to them. This allows the reader to grasp the variance of the data. Again, redoing all graphs and figures would constitute a lot of work, for little gain in term of conveying the results. Therefore, we choose not to change the boxplot for violin plots.

      The summary data of individual scores in Table 1B shows some intriguing patterns, that warrant a bit further discussion, in my opinion. For example, we can see opposite trends in scores of SF and CF forms with increasing alanine concentration. Is there an easy explanation for this? Also, in the case of serine, the CF scores do not seem to respond in a dose-dependent manner and puzzlingly at 10^(-3)M serine concentration F2 scores are above those of both grandparental populations.

      That is true. However, we have no simple explanation for this. To begin responding to this question, we have now performed full dose/responses expts for alanine (concentrations tested from 10-2M to 10-10M on cavefish; confirm that CF are bona fide “alanine specialists”) and for serine (10-2M to 104M tested on both morphs; confirm that both morphs respond well to this amino acid). These complementary results are now included in text and figures (partially) and in the summary table 1.

      If anything is known about this, I would also welcome some discussion on how thigmotactic behavior, a marker of stress in SF, could have evolved to become the normal behavior of CF forms, with lower cortisol levels and, therefore lower anxiety.

      We actually think thigmotactism is a marker of stress in both morphs. See Pierre et al, JEB 2020, Figure S3A: in both SF and CF thigmotaxis behavior decreases after long habituation times. In our hands, the only difference between the two morphs is that surface fish (at 5 month of age) express stress by thigmotactism but also freezing and rapid erratic movements, while cavefish have a more restricted stress repertoire.

      This is why in the present paper we have carefully made the distinction between thigmotactism (= possible stress readout) and wall following (= exploratory behavior). Our finding that WF and large circles confers better olfactory response scores to cavefish is in strong support of the different nature of these two swimming patterns. Then, why is swimming along the 4 walls of a tank fundamentally different from swimming along one wall? The question is open, although the number of changes of direction is probably an important parameter: in WF the fish always swims forward in the same direction, while in T the fish constantly changes direction when reaching the corner of the tank – which is similar to erratic swim in stressed surface fish.

      Finally two smaller suggestions:

      • When referring to multiple panels on the same figure it would be better to format the reference as "Figure 4D-G" instead of "Figure 4DEFG";

      Done

      • On page 4, where the introduction reads as "although adults have a similar olfactory rosette with 2025 lamellae", in my opinion, it would be better to state that "while adults of the two forms have a similar olfactory rosette with 20-25 lamellae".

      Done

      Reviewer #3 (Recommendations For The Authors):

      Consider moving Figure 3 to be a supplement of Figure 4. This figure shows a water control and therefore best supplements the alanine experiment.

      We would like to keep this figure as a main figure: we consider it very important to establish the validity of our behavioral setup at the beginning of the ms, and to establish that in all the following figures we are recording bona fide olfactory responses.

      "sensory changes in mecano-sensory and gustatory systems " - mechano-sensory.

      Done

      Figure 2 legend: "(3) the right track is the 3D plus time (color-coded)" - shouldn't it be 2D plus time or 3D (x,y, time).

      True! Thanks for noting this, corrected.

      Figure 4 legend "E, Change in swimming patterns" should be H.

      Done

      "suggesting that their detection threshold for serine is lower than for alanine" - higher?

      Done

      In the behavioural plots, I assume that the "mean position" value represents the mean position along the X-axis of the chamber - this should be clarified and the axis label updated accordingly.

      That is correct and has been updated in Methods and Figures and legends.

      "speed, back and forth trips in X and Y, position and pattern changes (see Methods; Figure 7A)." - here it would be helpful to add an explanation like "to define an olfactory score for individual fish."

      This has been changed in Results and more detailed explanations on score calculations are now given in Methods.

      "possess enhanced mecanosensory lateral line" - mechanosensory.

      Done

    2. eLife assessment

      In this important paper, Blin and colleagues develop a high-throughput behavioral assay to test spontaneous swimming and olfactory preference in individual Mexican cavefish larvae. The authors present compelling evidence that the surface and cave morphs of the fish show different olfactory preferences and odor sensitivities and that individual fish show substantial variability in their spontaneous activity that is relevant for olfactory behaviour. The paper will be of interest to neurobiologists working on the evolution of behaviour, olfaction, and the individuality of behaviour.

    3. Joint Public Review:

      Summary:

      The paper explores chemosensory behaviour in surface and cave morphs and F2 hybrids in the Mexican cave fish Astyanax mexicanus. The authors develop a new behavioural assay for the long-term imaging of individual fish in a parallel high-throughput setup. The authors first demonstrate that the different morphs show different basal exploratory swimming patterns and that these patterns are stable for individual fish. Next, the authors test the attraction of fish to various concentrations of alanine and other amino acids. They find that the cave morph is a lot more sensitive to chemicals and shows directional chemotaxis along a diffusion gradient of amino acids. Surface fish, although can detect the chemicals, do not show marked chemotaxis behaviour and have an overall lower sensitivity. These differences have been reported previously but the authors report longer-term observations on many individual fish of both morphs and their F2 hybrids. The data also indicate that the observed behaviour is a quantitative genetic trait. The approach presented will allow the mapping of genes contribution to these traits. The work will be of general interest to behavioural neuroscientists and those interested in olfactory behaviours and the individual variability in behavioural patterns.

      Strengths:

      The authors provide a large dataset of swimming behaviour for surface fish and cave fish and also their F2 hybrids, demonstrating large differences in chemosensory behaviour and indicating that this is a quantitative genetic trait.

      One strength of the paper is the development of a new and improved setup for the behavioural imaging of individual fish for extended periods and under chemosensory stimulation. The authors show that cave fish need up to 24 h of habituation to display a behavioural pattern that is consistent and unlikely to be due to the stressed state of the animals. The setup also uses relatively large tanks that allows the build-up of chemical gradients.

      With their new system, the authors could generate cleaner results without mechanical disturbances. The authors characterize multiple measurements to score the odour response behaviours and also developed a new personality analysis. Their conclusion that cave fish evolved as a specialist to sense alanine and histidine among 6 tested amino acids was well supported by their data.

      Weaknesses:

      Further work will be needed to pinpoint the nature of the genetic changes and neurobiological mechanisms that underlie the differences between the two forms and the large individual variation of behaviours.<br /> The authors did not measure the concentrations of alanine and other amino acids in the local cave water and surface water.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your comment. (1) In Figure 1c, "HSV-wt" refers to the virus rescued from pBAC—GFP-HSV (as mentioned in the “Method” section), which carries GFP itself. Therefore, detecting GFP cannot distinguish between HSV infection and HIV reactivation. Hence, we assess the reactivation effect by measuring the mRNA levels of HIV LTR. (2) Our data indicate that overexpression of ICP34.5 inhibits the reactivation of the HIV latent reservoir, but this effect is not equivalent to the activation observed in HSV-1 with ICP34.5 deletion. There are some possible reasons: one is that the overexpression of ICP34.5 by lentivirus is randomly integrated into the genome of J-Lat cell line, which will potentially activate HIV latency to some extent. The other is that ICP34.5 mainly inhibited HIV reactivation through modulation of host NF-κB or HSF1 pathways, while PMA, TNF-a, and HSV-1 with deleted ICP34.5 can reactivate HIV latency by other mechanisms that have yet to be determined. Thereby, exerting a synergistic small inhibitory effect. We will further discuss this issue in the revised version. Thank you.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your comment. We plan to conduct several experiments to demonstrate a reduction in HSV-1 replication after ICP34.5 deletion: (1) Detect the growth curve of HSV-1 deleted with ICP34.5 in Vero cells. The virus growth curve of HSV-1 with deleted ICP34.5 may be lower than that of wild-type HSV-1, which could demonstrate a reduction in HSV-1 replication after ICP34.5 deletion. (2) Detect the level of inflammatory factors in tumor cells after infection with HSV-1 deleted with ICP34.5.

      We believe that the effect is specific, as we previously tested poxviruses and adenoviruses and found no activation of the latent reservoir. We consider the activation observed with HSV-1 virus and HSV-1 with deleted ICP34.5 to be specific. We will supplement relevant data in the revised version.

      In addition, we will provide the corresponding RNA-seq data to assess its effect on cellular genes.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study of limited numbers of rhesus macaques. There were only 3 monkeys per group in this study, but our results were encouraging. Although the number of macaques was relatively limited, these nine macaques were distributed very carefully based on age, sex, weight and genotype. All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. Our further studies will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      We will provide more data about the safety assessment of HSV-1 vector in SIV-infected macaques, and also further discuss the potential of inflammatory HSV vector in PLWH in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      We agree with your suggestion. In fact, we are currently further exploring some viral genes of HSV-1 that play a role in activation. We have found that the ICP0 gene of HSV-1 virus can activate HIV, and the specific mechanism is under investigation.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your suggestion. We will plan to conduct IPDA experiments to further supplement data on the overall reduction in circulating latent cell numbers in animals.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      We plan to use primary cells for related experiments to further validate the results of the cell experiments.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your comments. In fact, our study adopts the "shock and kill" strategy, with a focus on the "kill" aspect leaning towards T-cell therapy. Although the vaccine in the paper also utilizes Env antigen, we believe these antibodies are insufficient for neutralizing the mutated SIV virus. We strongly agree with your suggestion that in HIV/AIDS treatment, effective T-cell killing combined with broad-spectrum neutralizing antibodies would be more effective. This aligns with our findings, as our treatment has partially delayed viral rebound but with a relatively short duration of suppression. This may indicate insufficient killing activity. In future research, we will further consider the role of broad-spectrum neutralizing antibodies. Our revised manuscript will elaborate on this in the discussion section.

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. The reason is that our treatment simultaneously utilizes both the HSV vector vaccine and ART therapy. Although the empty HSV-vector cannot elicit SIV-specific CTL response, it effectively activates the latent SIV reservoirs and then these activated virions can be partially killed by ART, Therefore, even without carrying antigens, the slight delay may be achieved.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

    3. eLife assessment

      In this useful study, the authors tested a novel approach to eradicate the HIV reservoir by constructing a herpes simplex virus (HSV)-based therapeutic vaccine. The approach was tested in experimental infections of chronically SIV-infected, antiretroviral therapy (ART)-treated macaques with extent of rebound after ART interruption as a measure of the size of the HIV reservoir. While mean viremia at rebound was lower in the HSV vaccine-treated group, the evidence presented appear to be be incomplete because the group size was small and the viral load at rebound was highly variable.

    4. Reviewer #2 (Public Review):

      Summary:

      In this article, Wen et. al. describe the development of a 'proof-of-concept' bi-functional vector based on HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with an HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next, the authors cleverly construct a bifunctional HSV-based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally, expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit the potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however, there are some questions I wish the authors had explored to get answers to, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, with the potential to expand to clinical studies. The work was well-written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained, and written very clearly for lay readers.

      Claims are adequately supported by evidence and well-designed experiments including controls.

      Weaknesses:

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

    1. Author response:

      We would like to thank the eLife Editors and Reviewers for their positive assessment and constructive comments, and for the opportunity to revise our manuscript. We greatly appreciate the Reviewers’ recommendations and believe that they will further improve our manuscript.

      In revising the manuscript, our primary focus will be enhancing the clarity surrounding testing procedures and addressing corrections for multiple comparisons. Additionally, we intend to offer more explicit information about the statistical tests employed, along with the details about the number of models/comparisons for each test. We will also include an extended discussion on potential limitations of the dopaminergic receptor mapping methods used, addressing the Reviewers’ comments relating to the quality of PET imaging with different dopaminergic tracers in mesiotemporal regions such as the hippocampus. While the code used for connectopic mapping is publicly available through the ConGrads toolbox, we will provide the additional code we have used for data processing and analysis, visualization of hippocampal gradients, and the cortical projections. The data used in the current study is not publicly available due to ethical considerations concerning data sharing, but can be shared upon reasonable request from the senior author. Additional plans include clarifying and discussing which findings were successfully replicated, and addressing Reviewers’ suggestions for using other openly available cohorts for replication, and implementing alternative coordinate systems to quantify connectivity change along gradients.

    2. eLife assessment

      This fundamental work demonstrates the importance of considering overlapping modes of functional organization (i.e. gradients) in the hippocampus, showing associations between with aging, dopaminergic receptor distribution and episodic memory. The evidence supporting the conclusions is solid, although some clarifications about testing procedures and a discussion of the limitations of the dopaminergic receptor mapping techniques employed should be provided along with analysis code. The work will be of broad interest to basic and clinical neuroscientists.

    3. Reviewer #1 (Public Review):

      The authors studied how hippocampal connectivity gradients across the lifespan, and how these relate to memory function and neurotransmitter distributions. They observed older age with less distinct transitions and observed an association between gradient de-differentiation and cognitive decline.

      This is overall an innovative and interesting study to assess gradient alterations across the lifespan and its associations to cognition.

      The paper is well-written, and the methods appear sound and thoughtful. There are several strengths, including the inclusion of two independent cohorts, the use of gradient mapping and alignment techniques, and an overall sound statistical and analysis framework. There are several areas for potential improvements in the paper, and these are listed below:

      (1) The reported D1 associations appear a bit post-hoc in the current work and I was unclear why the authors specifically focussed on dopamine here, as other transmitter systems are similar present at the level of the hippocampus and implicated in aging.

      Moreover, the authors may be aware that multiple PET tracers are somewhat challenged in the mesiotemporal region. Is this the case for the D1 receptor as well? The hippocampus is a small and complex structure, and PET more of a low res technique so one would want to highlight and discuss the limitations of the correlations with PET maps here and/or evaluate whether the analysis adds necessary findings to the study.

      From my (perhaps somewhat biased) perspective, it might be valuable to instead or in addition look at measures of hippocampal microstructure and how these relate to the functional aging effects. This could be done, if available, using data from the same subjects (eg based on quantitative MRI contrasts and/or structural MRI) and/or using contextualization findings as implemented in eg hippomaps.readthedocs.io

      (2) Can the authors clarify why they did not replicate based on cohorts that are more widely used in the community and open access, such as CamCAN and/or HCP-Aging? It might connect their results with other studies if an attempt was made to also show that findings persist in either of these repositories.

      (3) The authors applied TSM and related these parameters to topographic changes in the gradients. I was wondering whether and how such an approach controls for autocorrelation present in both the PET map and gradients. Could the authors clarify?

      (4) The TSM approach quantifies the gradients in terms of x/y/z direction in a cartesian coordinate system. Wouldn't a shape intrinsic coordinate system in the hippocampus also be interesting, and perhaps even be more efficient to look at here (see eg DeKraker 2022 eLife or Paquola et al 2020 eLife)?

    4. Reviewer #2 (Public Review):

      Summary:

      This paper derives the first three functional gradients in the left and right hippocampus across two datasets. These gradient maps are then compared to dopamine receptor maps obtained with PET, associated with age, and linked to memory. Results reveal links between dopamine maps and gradient 2, age with gradients 1 and 2, and memory performance.

      Strengths:

      This paper investigates how hippocampal gradients relate to aging, memory, and dopamine receptors, which are interesting and important questions. A strength of the paper is that some of the findings were replicated in a separate sample.

      Weaknesses

      The paper would benefit from added clarification on the number of models/comparisons for each test. Furthermore, it would be helpful to clarify whether or not multiple comparison correction was performed and - if so - what type or - if not - to provide a justification. The manuscript would furthermore benefit from code sharing and clarifying which results did/did not replicate.

    5. Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed the complex functional organization of the hippocampus using two separate adult lifespan datasets. They investigated how individual variations in the detailed connectivity patterns within the hippocampus relate to behavioral and molecular traits. The findings confirm three overlapping hippocampal gradients and reveal that each is linked to established functional patterns in the cortex, the arrangement of dopamine receptors within the hippocampus, and differences in memory abilities among individuals. By employing multivariate data analysis techniques, they identified older adults who display a hippocampal gradient pattern resembling that of younger individuals and exhibit better memory performance compared to their age-matched peers. This underscores the behavioral importance of maintaining a specific functional organization within the hippocampus as people age.

      Strengths:

      The evidence supporting the conclusions is overall compelling, based on a unique dataset, rich set of carefully unpacked results, and an in-depth data analysis. Possible confounds are carefully considered and ruled out.

      Weaknesses:

      No major weaknesses. The transparency of the statistical analyses could be improved by explicitly (1) stating what tests and corrections (if any) were performed, and (2) justifying the elected statistical approaches. Further, some of the findings related to the DA markers are borderline statistically significant and therefore perhaps less compelling but they line up nicely with results obtained using experimental animals and I expect the small effect sizes to be largely related to the quality and specificity of the PET data rather than the derived functional connectivity gradients.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "comparative transcriptomics reveal a novel tardigrade specific DNA binding protein induced in response to ionizing radiation" aims to provide insights into the mediators and mechanisms underlying tardigrade radiation tolerance. The authors start by assessing the effect of ionizing radiation (IR) on the tardigrade lab species, H. exemplaris, as well as the ability of this organism to recover from this stress - specifically, they look at DNA double and single-strand breaks. They go on to characterize the response of H. exemplaris and two other tardigrade species to IR at the transcriptomic level. Excitingly, the authors identify a novel gene/protein called TDR1 (tardigrade DNA damage response protein 1). They carefully assess the induction of expression/enrichment of this gene/protein using a combination of transcriptomics and biochemistry - even going so far as to use a translational inhibitor to confirm the de novo production of this protein. TDR1 binds DNA in vitro and co-localizes with DNA in tardigrades.

      Reverse genetics in tardigrades is difficult, thus the authors use a heterologous system (human cells) to express TDR1 in. They find that when transiently expressed TDR1 helps improve human cell resistance to IR.

      This work is a masterclass in integrative biology incorporating a holistic set of approaches spanning next-gen sequencing, organismal biology, biochemistry, and cell biology. I find very little to critique in their experimental approaches.

      Strengths:

      (1) Use of trans/interdisciplinary approaches ('omics, molecular biology, biochemistry, organismal biology)

      (2) Careful probing of TDR1 expression/enrichment

      (3) Identification of a completely novel protein seemingly involved in tardigrade radio-tolerance.

      (4) Use of multiple, diverse, tardigrade species of 'omics comparison.

      Weaknesses:

      (1) No reverse genetics in tardigrades - all insights into TDR1 function from heterologous cell culture system.

      (2) Weak discussion of Dsup's role in preventing DNA damage in light of DNA damage levels measured in this manuscript.

      (3) Missing sequence data which is essential for making a complete review of the work.

      Overall, I find this to be one of the more compelling papers on tardigrade stress-tolerance I have read. I believe there are points still that the authors should address, but I think the editor would do well to give the authors a chance to address these points as I find this manuscript highly insightful and novel.

      We thank the reviewer for his comments.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      The sequence data can be accessed at the NCBI SRA database with Bioproject ID PRJNA997229.

      Reviewer #3 (Public Review):

      Summary:

      This paper describes transcriptomes from three tardigrade species with or without treatment with ionizing radiation (IR). The authors show that IR produces numerous single-strand and double-strand breaks as expected and that these are substantially repaired within 4-8 hours. Treatment with IR induces strong upregulation of transcripts from numerous DNA repair proteins including Dsup specific to the Hypsobioidea superfamily. Transcripts from the newly described protein TDR1 with homologs in both Hypsibioidea and Macrobiotoidea supefamilies are also strongly upregulated. They show that TDR1 transcription produces newly translated TDR1 protein, which can bind DNA and co-localizes with DNA in the nucleus. At higher concentrations, TDR appears to form aggregates with DNA, which might be relevant to a possible function in DNA damage repair. When introduced into human U2OS cells treated with bleomycin, TDR1 reduces the number of double-strand breaks as detected by gamma H2A spots. This paper will be of interest to the DNA repair field and to radiobiologists.

      Strengths:

      The paper is well-written and provides solid evidence of the upregulation of DNA repair enzymes after irradiation of tardigrades, as well as upregulation of the TRD1 protein. The reduction of gamma-H2A.X spots in U2OS cells after expression of TRD1 supports a role in DNA damage.

      Weaknesses:

      Genetic tools are still being developed in tardigrades, so there is no mutant phenotype to support a DNA repair function for TRD1, but this may be available soon.

      We thank the reviewer for his comments.

      Reviewer #4 (Public Review):

      The manuscript brings convincing results regarding genes involved in the radio-resistance of tardigrades. It is nicely written and the authors used different techniques to study these genes. There are sometimes problems with the structure of the manuscript but these could be easily solved. According to me, there are also some points which should be clarified in the result sections. The discussion section is clear but could be more detailed, although some results were actually discussed in the results section. I wish that the authors would go deeper in the comparison with other IR-resistant eucaryotes. Overall, this is a very nice study and of interest to researchers studying molecular mechanisms of ionizing radiation resistance.

      I have two small suggestions regarding the content of the study itself.

      (1) I think the study would benefit from the analyses of a gene tree (if feasible) in order to verify if TDR1 is indeed tardigrade-specific.

      (2) It would be appreciated to indicate the expression level of the different genes discussed in the study, using, for example, transcript per millions (TPMs).Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      We thank the reviewer for his comments.

      (1) To identify TDR1 homologous sequences in non-tardigrade species, we conducted extensive homology searches using multiple homology-based approaches (Blastp and Diamond against the NCBI non-redundant protein sequences (nr) database and hmmsearch against the EBI reference proteomes), which failed to identify TDR1 homologs in non-tardigrade ecdysozoans, thus strongly supporting that TDR1 is indeed tardigrade-specific.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      (2) To further document expression levels (which were already available from the Tables in the initial submission), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8). These additional figures clearly document that the DNA repair genes discussed in the main text and TDR1 are highly expressed genes after IR and after Bleomycin treatment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      (1) It has always seemed strange to me that tardigrades accumulate just as much DNA damage as any other organism when irradiated and yet their Dsup protein is supposed to shield and protect their DNA from damage. Perhaps this is an appropriate time for this idea to be reconsidered given the Dsup was NOT induced by IR in this study and the authors found that their animals incurred just as much damage as other biological systems. While Dsup is clearly not the focus of this manuscript, it is the protein most associated with tardigrade radio-tolerance and I would argue this new paper would call into question previous conclusions made about Dsup.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      (2) While reverse genetics are difficult in tardigrades, they are not impossible, and RNAi can be used to good effect in these animals. In fact several authors on this manuscript have used RNAi to examine the necessity of genes in tardigrade stress tolerance in the past. Was an attempt made to RNAi TDR1? If not, why? With the large amount of work that the authors put into showing the sufficiency of TDR1 for increasing radiotolerance in cell culture, one would think looking at necessity in tardigrades would be of great interest. If RNAi was performed, what were the results? Even a negative result here is informative since a protein can be sufficient but not necessary for a function - if this were the case it would mean tardigrades have some redundant mechanism(s) for surviving radiation exposure beyond TDR1.

      We have attempted RNAi experiments targeting TDR1 or a mix of DNA repair genes (including XRCC5) and examined response to a bleomycin treatment of 2 weeks. Unfortunately, we could not distinguish any difference between uninjected animals and animals injected with TDR1 dsRNAs , or the mix of DNA repair genes dsRNAs. We concluded that, bleomycin treatment, that we used because it is much easier to perform than irradiation, was perhaps not the best way to assay a potential impact of RNAi on survival since it required long term treatment for several days during which the effect of RNAi may have waned. Another attempt was therefore made injecting with TDR1 or control GFP dsRNAs and exposing animals to a 2000Gy IR treatment. We noticed that the viability was lower after injection with GFP dsRNAs than with TDR1 dsRNAs (likely due to problems we had with the injection needle during injections). The next day, animals were irradiated and we observed after 24h that animals injected with GFP dsRNAs exhibited higher lethality rates than animals injected with TDR1 dsRNAs or uninjected animals. We found that this set of experiments were not conclusive. Our current experimental set up will make it difficult to distinguish lethality due to injections from lethality due to potentially decreased resistance to IR. In particular, many key controls are difficult to make (in particular, we could not confirm the efficiency of target gene knockdown, as it is very challenging given the low amount of biological material available and the poor expression of these genes without irradiation). From a practical point of view, performing these experiments is thus very challenging. We nevertheless agree that, in future work, further experimentation is needed to examine the impact of knock-down by RNAi of TDR1 or of other genes such as DNA repair genes or Dsup, in tardigrade DNA repair and survival after IR. Gene knock-out with CRISPR-Cas9 is a very promising alternative to RNAi given that studies in mutant lines will eliminate the confounding effect of lethality due to injections.

      (3) Regarding the U2OS experiments. I have several questions/points of clarification:

      a. Were survival/proliferation levels tested or only H2AX foci? I think that showing decreased H2AX foci (fewer double-stranded breaks) correlates with higher survival rates would be important.

      In the experiments reported in Figure 6, cells were transiently transfected with expression vectors and we did not examine the impact on survival rates. U2OS cells are resistant to high doses of Bleomycin and testing survival would require longer exposure at much higher concentrations (Buscemi et al, 2014, PMID: 25486478). In order to try and better address an impact on cell survival, we therefore generated populations of cells stably expressing the candidate tardigrade proteins fused to GFP. Despite trying different experiment conditions for treatment with Bleomycin, we could not detect a reproducibly significant benefit on cell survival for any of the tardigrade proteins tested, including RvDsup which was used as a positive control (since it was previously reported to improve cell survival in response to X-rays). One possibility is that the analysis should be performed in clones and not in populations of cells with heterogeneous expression levels of the tardigrade protein tested. For example, expression levels of the tardigrade protein needed to reduce the number of phospho-H2AX foci in response to DNA damage may interfere with cell division. We note that in the original Dsup paper, the benefit of RvDsup on cell survival was reported in specific transgenic clones. Experiments in different biological systems have also started to document toxic effects of RvDsup expression, illustrating the challenge, when performing experiments in heterologous systems, to achieve suitable expression levels of the tested protein. Trying to perform such a finer analysis, in our opinion, would go beyond the scope of our manuscript and will be best addressed in future studies. We are therefore careful in the text not to make any claim on the benefit of TDR1 expression on cell survival in response to Bleomycin in human cultured cells.

      (b) From the methods I am a bit confused as to how the images were treated/foci quantified. With the automatic segmentation and foci identification, is this done through the entire Z-series or a single layer? If the latter then I am not sure the results are meaningful, since we do not know how many foci might be present in other layers of the nuclei analyzed. If the former, please clarify this in the method since it is a very important consideration.

      We have acquired images throughout the entire Z-series and edited the text to make it more clear ; We now write: “ Z-stacks were maximum projected and analyzed with Zen Blue software (v2.3)...”. To limit the time needed for image analysis, we have generated an artificial image by projecting the entire Z-series into a single image and counted foci in that single maximum projection image. Although there are potential drawbacks, such as potentially only counting one focus when two foci are superposed along the Z axis, this approach overcomes the limitations of quantification from a single layer. We further ensured statistical robustness of the analysis by performing quantification from several independent fields of the labelled cells and several independent biological replicates (n>=3 as now specified in the legend of figure 6a).

      (c) RvDsup reduced levels of HXA1 foci in these experiments, however, HeDsup was not found to be enriched in the transcriptomic analysis performed here. Was there a reason HeDsup was not used in the cell-based experiments? One could argue that RvDsup is from a different species of tardigrade, but it is a bit concerning that an ortholog of a protein found NOT to be induced by radiation exposure seems to perform as well (if not better) than some versions of TDR1.

      RvDsup is the protein initially shown to increase survival of human HEK293 cells treated with X-rays and reduce the number of phospho-H2AX foci induced: it was therefore used as a positive control in our experiments. The sequence of HeDsup is only poorly similar to RvDsup (with 26% identity) and activity of HeDsup in cultured cells has not been reported before. We therefore believe that HeDsup is not well suited to provide a positive control for the experiments performed in our manuscript.

      (d) From the methods, it seems that cells were treated with Bleomycin and then immediately fixed without any sort of recovery time. In this short timeframe, the presence of TDR1 appears to be enough to deal with a substantial amount of double-stranded breaks (as evidenced by the reduced number of HXA1 foci). Does this make sense? How quickly could one expect DNA repair machinery to make significant progress in resolving damaged DNA? This response seems much faster than what was observed in tardigrades. Perhaps the authors to comment on this.

      Kinetic studies in human cells show extremely rapid repair of DNA double-strand breaks. Sensing of DNA double strand breaks by PARP proteins takes place within seconds after irradiation by IR (Pandey and Black, 2021, PMID: 33674152). NHEJ is then observed to take place by formation of 53BP1 foci within 15 minutes (Schultz et al, 2000, PMID: 11134068). The number of phospho-H2AX and 53BP1 foci peaks at 30 minutes and starts declining thereafter, showing that at a significant number of sites, DNA repair is proceeding very rapidly (by NHEJ). Although we are not aware of any studies of DNA repair kinetics in U2OS cells after addition of Bleomycin, DNA damage must be instantaneous and further take place during exposure to the drug in parallel to DNA repair, which would be expected to have similar kinetics than after irradiation with IR.

      In our experiments, several mechanisms may be involved in reducing the number of phospho-H2AX foci induced by Bleomycin, such as DNA protection (for Dsup expression) or stimulation of DNA repair (for RNF146 expression). For TDR1, the molecular mechanism involved remains to be determined. Given our finding that TDR1 can form aggregates with DNA, an additional possibility is that clustering of phospho-H2AX foci is induced.

      (4) I could not find the sequences of the TDR1 proteins studied here. I did find the cDNA sequence of HeTDR1 in the final supplementary file, but not the other TDR1 orthologs. In the place where it appeared the TDR1 sequences from other tardigrades should be there were very short segments of the HETDR1 sequence. All sequences of proteins used in this study should be easily accessible to the reader and reviewers as it is not possible to review this work without accessing the sequences.

      Our apologies for the inappropriate documentation of TDR1 sequences in the original manuscript. As requested, we have now included the TDR1 sequences in the Supplementary Table 4.

      (5) Likewise, the RNA sequence data is said to be deposited in NCBI under PRJNA997229, but I do not find this available on NCBI.

      The RNA sequence data was deposited in NCBI under the indicated reference before submission of the manuscript. The data has now been released and is fully available on NCBI.

      (6) A few typographical errors: e.g., Page 10 - sentence 4 has two periods ". ." or page 14 which has an open parenthesis that is not closed.

      These typos have been corrected in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      In Figure 4C, what fraction of the 50 genes upregulated in all species and treatments are DNA repair genes? Is there any other notable commonality between these 50 genes? The bulk of upregulated genes are specific to a species and to treatment with IR or bleomycin. What fraction of DNA repair genes are specific to a species or treatment?

      The results in Figure 4C on the 50 putative orthologous genes upregulated in all species and treatments are further detailed in supp Figure 10. The legend to supp Figure 10 now provides the requested information: 14/50 genes are DNA repair genes and the other notable commonality is that 21/50 are “stress response genes”. We did not further breakdown the analysis to evaluate the fraction of DNA repair genes specific to a species or treatment. It will be interesting to gather data in more species to hed light on the evolutionary history of DNA repair gene regulation in response to IR.

      How does the suite of upregulated tardigrade DNA repair proteins after IR or bleomycin compare with DNA or repair proteins upregulated under similar treatments in human cells? Are they quantitatively or qualitatively different, or both?

      There is a great wealth of studies documenting genes differentially expressed in human cells in response to IR (e.g. Borras-Fresneda et al, 2016, PMID: 27245205; Rieger and Chu, 2004, PMID: 15356296; Budwoeth et al, 2012, PMID: 23144912 ; Rashi-Elkeles et al, 2011, PMID: 21795128; Jen and Cheung, 2003, PMID: 12915489...). Upregulation of DNA repair and cell cycle genes is commonly found. However, the number of DNA repair genes induced is always very limited and fold stimulation very modest compared to the massive upregulation observed in tardigrades.

      On page 14, please explain the acronym BER. Do the authors mean Base Excision Repair? Or something else?

      As assumed by the reviewer, the acronym BER stands for Base Excision Repair. The acronym has been removed from the main text and replaced by the full name.

      Reviewer #4 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      Abstract:

      The abstract is fine. What was hard to grasp at the beginning is why TDR1 gene was named that way. It should be clearer that this study decided to further focus on that gene, one of the most overexpressed gene after IR, with an unknown function. Then maybe introduce that it was found to be unique to tardigrade and to interact with DNA. Therefore, it was named TDR1.

      Introduction:

      The introduction has been modified according to the suggestions of Reviewer#4 below. One of the suggested references, Nicolas et al 2023 from the Van Doninck lab, was published while our manuscript was under review and cannot be considered as background information for our study.

      1st paragraph:

      The study is on tardigrades, I found it strange that the first paragraph is on D. radiodurans. I think it is fine to mention what is known in bacteria and eucaryotes but we should already know what will be the main topic in the first paragraph of the introduction. Some details about D. radiodurans seem less important and distracting from the main topic (3D conformation).

      2nd paragraph:

      When mentioning radio-resistant eurcaryotes the authors do not mention the larvae of the anhydrobiotic insect Polypedilum vanderplanki. Stating that the mechanisms of resistance are poorly characterized should perhaps be nuanced. There are some recent studies on D. radiodurans (Ujaoney et al., 2017) the insect P. vanderplanki (Ryabova et al., 2017), tardigrades (Kamilari et al., 2019), and rotifers (Nicolas et al., 2023, Moris et al., 2023). Perhaps these papers are worth indicating that if mechanisms are not elucidated yet, recent studies suggest some actors involved in their resistance. Regarding the sentence stating that DNA repair rather than DNA protection plays a predominant role in the radio-resistance of bdelloid rotifers should also be nuanced. Indeed, many chaperones, antioxidants were mentioned to play a role in the radio-resistance of bdelloid rotifers (Moris et al., 2023). The authors mentioned the reference Hespeels et al., 2023 which is not found in their list of references, I am not sure which paper they refer to. The last sentence of the second paragraph does not mean much. I am not sure what the authors want to state with this. Perhaps they should specify if they mean that the function of many other genes overexpressed after IR remains unknown.

      Still, in the second paragraph, the authors focus on rotifers. They also do not mention what is known in the insect P. vanderplanki, which should be added. They still do not mention tardigrades. I think it is nice to first start with eucaryotes and then focus on tardigrades but as I mentioned before it would help to understand the aim of the paper if the first paragraph mentioned briefly the tardigrades and then could go into detail in the third paragraph.

      3rd paragraph:

      The sentence starting "with over 1400 species" best to remove from it "but they can differ in their resistance" and start the next sentence with that.

      4th paragraph:

      Very clear, we finally understand what is the focus of the manuscript.

      5th paragraph:

      Very clear. The authors should mention the names of the three studied species. Here, A. antarcticus is missing. The sentence "Further analyses in H. exemplaris... showed that TDR1 protein is present and upregulated". The authors should mention in which conditions the protein is upregulated. In that paragraph the authors mention phospho-H2AX: it might be good to introduce its functions before in the introduction (it is mentioned in the second sentence of the results: best to move it to the introduction).

      Results:

      There are a few sentences in this section which rather discuss the results than describe them. I think the manuscript might gain in quality if these interpretations of the results are moved into the discussion section. That would make the result section more concise and the discussion enriched.

      For instance, I suggest to move these sentences into the discussion:

      • "the finding of persistent DSBs in gonads at 72h.... likely explains...".

      • "suggesting that (i) DNA synthesis..."

      • " Phospho-H2AX....also suggested"

      • "Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..."

      • "our results suggest that RNF146 upreguation could contribute..."

      • "AMNP gene g12777 was shown to increase...Based on our results, it is possible that..."

      Interpretations mentioned here above were always introduced cautiously (-"suggesting that (i) DNA synthesis..." ; -" Phospho-H2AX....also suggested" ; -"Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..." ; -"our results suggest that RNF146 upreguation could contribute..." ). These cautious interpretations were usually important in deciding next steps of the work. We therefore believe it is important to mention these interpretations in the results section to clearly expose the milestones marking the progression of the study.

      For some results, they were directly discussed in the results section for the sake of concision (for example -"the finding of persistent DSBs in gonads at 72h.... likely explains..."; -"AMNP gene g12777 was shown to increase...Based on our results, it is possible that..." ) since, in our opinion, there was no need to mention them again in the main discussion.

      Some other parts could be good to be moved into the introduction:

      • "Previous studies have indicated that irradiation with IR increases expression of Rad51,..." none of the actors involved in DNA repair are mentioned in the introduction. Also, change resistant into resistance

      • "A. antarcticus ..., known for its resistant to high doses of UV....

      We have moved these parts to the introduction as recommended.

      It was in O. areolatus.... that the first demonstration..."

      This piece of information is somewhat anecdotical. We choose to keep it it here in the results section. This information on the radio-resistance of the species P. areolatus is only relevant at this specific step of the study because it encouraged us to consider that P. fairbanksi, which we isolated fortuitously, would be a good model species for studying radio-resistance of tardigrades.

      Here are some additional comments/suggestions on the result section:

      1st section

      • Remove the Gross et al., 2018 from the sentence "using confocal microscopy", it looks otherwise that these results are from their study, not yours.

      We have changed the text to make it clear that this is indeed a finding of Gross et al which was previously made in non-irradiated tardigrades. We replicated this finding, which showed that the protocol was working appropriately, and that we could use this control result for comparison with irradiated animals. We apologize for this confusion.

      The text now states: “Using confocal microscopy, we could detect DNA synthesis in replicating intestinal cells of control animals, as previously shown by (Gross et al. 2018).”

      2nd section

      • It is confusing what has been found induced by IR and/or by Bleomycin.

      • I think it might help if the authors first present what is induced after IR, then write if it is similar after Bleomycin. Especially since they start to do it in the first paragraph of that section. However, they only mention TDR1 in the second paragraph dedicated to Bleomycin treatment which is confusing as it is also overexpressed after IR. It is also not clear if RNF146 is also induced by Bleomycin.

      As recommended, the text presents first what is induced after IR and then what is induced by Bleomycin in the following paragraph. When reporting results with Bleomycin, we have provided a global assessment of what is common to both treatments in Supp Figure 3 and in Supp Table 3. In this figure, we also specifically highlighted several key genes of DNA repair induced by both treatments. These are also mentioned in the text (p8) to illustrate the point that many key DNA repair genes are common to both treatments. We have now added RNF146 to that list as recommended.

      • Regarding TDR1, it is not clear when introduced in the text as "promising candidate" why it is the case. It is clear in the figures but perhaps the authors should explain why they chose these genes for further analyses: high log2foldchange and expression level for instance. Regarding that last comment, it would be interesting to have an idea about the expression level of the genes with high log2foldchange. In Figures 2, 3, and 4 the pvalue and log2foldchange are represented but not the expression level (ideally Transcript per Millions). These values would give an additional idea on the importance of that gene. While looking at the figures, it is unclear why you did not further characterize other genes with high log2foldchange (some with even hints of their function): the mentioned RNF146, macroH2A1 (not even mentioned in the results), some genes unannotated in the figures with likely unknown functions,

      When selecting genes of interest, we did indeed take into account high expression levels. To more clearly document expression levels (which were already available from the Tables), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8).

      • It is also unclear at that stage why you named it "Tardigrade DNA damage response protein", as it is characterized as DNA repair/damage proteins by specific GO id or is it based on your downstream analyses, I think it might be worth to quickly mention the reason of that name.

      The name illustrates two points which were already characteristic at this point in time of the study i.e. 1) it is a tardigrade specific protein and 2) it is induced in response to DNA damage.

      • Regarding the BLAST analyses the protein was searched in C. elegans, D. melanogaster and H. sapiens. Why only these three species? What were the threshold evalues used for these analyses. As mentioned in the main comment, it would be worth searching species phylogenetically close to tardigrades to verify if it is well-tardigrade specific. Did you try to make a gene tree, after looking for a conserved domain (using hmmersearch)?

      As indicated in the methods section, the “Tardigrade-specific" annotation was determined by absence of hits after high-throughput alignment (with diamond using –ultrasensitive-option) on the NCBI nr database and absence of hits after blast search on C. elegans, D. melanogaster and H. sapiens proteomes as a complementary criterion (the latter blast search was primarily performed to enrich for functional annotations). Based on these criteria, TDR1 was annotated as “Tardigrade-specific”. As stated in the text, we also searched for TDR1 related sequences with 1) blastp (which is more sensitive than diamond) on the NCBI nr database and 2) HMMER on Reference Proteomes, and no hits were found among non-tardigrade ecdysozoans organisms, confirming TDR1 is specific to tardigrades. For Blast search for example, there were five hits in non-ecdysozoans organisms (two cephalochordates, one mollusc and two echinoderma). The blastp and HMMER results are now included in the revised supplementary material (Supp Table 5). These very few hits in species phylogenetically distant from tardigrades cannot be taken to support the existence of TDR1 genes outside tardigrades.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      • Page 9: "Proteins extracts from H. exemplaris... at 4h and 24h..." I think this sentence can be removed as this is mentioned again 2 paragraphs after: "...we conducted an unbiased proteome analysis... at 4h..." The log2foldchange threshold mentioned for the proteomic analyses is 0.3: why this threshold, was it chosen randomly?

      This is threshold is commonly used when considering log2foldchange with the technology used in our study, an isobaric multiplexed quantitative proteomic strategy which is known to compress ratios (Hogrebe et al. 2018).

      • Page 10:

      It would be good for more clarity to indicate at the beginning of the new section which species were investigated after IR or Bleomycin treatment.

      TDR1 homologs in the other tardigrade species were identified based on what? Best reciprocal hit?

      As indicated in the methods section of the manuscript, we searched for homologs in other tardigrade species by BLAST. A best reciprocal hit approach was not performed to try to determine which homologs might be orthologs. In particular, most TDR1 homologs identified are known from transcriptome assemblies and high-contiguity genome assemblies are needed to more confidently identify orthology (using synteny). The results of the BLASTP search are now provided as supplementary material (Supp Table 5).

      Preliminary experiments indicated that A. antarcticus and P. fairbanski survived exposure to 1000 Gy: is there a supplementary graph showing this?

      We have corrected the text to avoid any confusion. We have not rigorously examined the dose-dependent survival of P. fairbanksi in response to irradiation. Text was changed to: “We found by visual inspection of animals after IR that A. antarcticus and P. fairbanksi readily survived exposure to 1000 Gy.”

      • Page 11:

      "A set of 50 genes was upregulated in the three species": please be precise if only after IR.

      Done

      These genes cannot be the same as they are from different species. Did the author mean that they are coding for similar proteins? It might be good to give some more details even if the supplementary figure is mentioned.

      Obviously, these genes are putative orthologs. We have changed the text to:

      ” a set of 50 putative orthologous genes was upregulated in response to IR in all three species”

      Discussion:

      • General comment: the discussion is focused mainly on TDR1, it would be nice to also discuss the other results: DNA repair genes, RNF146.

      A whole paragraph is devoted to discussion of results on DNA repair genes and RNF146. We have extended that discussion following on the suggestion of the reviewer. In particular, we have explicitly mentioned the apparent paradox that XRCC5 and XRCC6, which are among the most highly stimulated genes at the mRNA level, only display modest upregulation at the protein level. Although further studies would be needed to examine the mechanisms involved, we propose that upregulation of RNF146, whose human homolog has been shown to drive degradation of PARylated XRCC5 and XRCC6 proteins in response to IR (Kang et al. 2011), may be responsible for higher degradation rates and may thus counterbalance increased levels of protein synthesis.

      • Pulse field electrophoresis would be nice to be performed. It has been used to assess DSBs in bdelloid rotifers, is it possible in tardigrades?

      As stated in the discussion, we believe that it would be challenging to perform pulse field electrophoresis in tardigrades. However, if possible, these experiments would certainly bring invaluable information to complement our analysis of DNA damage induced by IR.

      • "By comparative transcriptomics": please rephrase that sentence.

      • Proteins acting early in DNA repair: I am not sure I understand this sentence. Actors as ligases act not at the beginning of the repair pathways.

      Well noted. We have removed ligases from the list.

      • It is confusing that the authors mention NHEJ and double-strand break repair pathways as different pathways. There are 2 main pathways to repair DBSs: NHEJ and HR. It would be nice to add a reference to the sentence "PARP proteins act as sensors of DNA damage etc."

      A typo in the sentence gave rise to the misleading suggestion that NHEJ is not a double strand repair pathway. It has been corrected.

      A reference has been added for PARP proteins.

      • It would be nice if the authors can explain deeper their suggestion that degradation of DNA repair actors is essential for tardigrade IR resistance.

      We have expanded this part of the discussion and hope that it is clearer.

      “For XRCC5 and XRCC6, our studyestablished, by two independent methods, proteomics and Western blot analysies, that the stimulation at the protein level could be much more modest (6 and 20-fold at most (Supp Figure 6) than at the RNA level (420 and 90 fold respectively). This finding suggests that the abundance of DNA repair proteins does not simply increase massively to quantitatively match high numbers of DNA damages. Interestingly, in response to IR, the RNF146 ubiquitin ligase was also found to be strongly upregulated. RNF146 was previously shown to interact with PARylated XRCC5 and XRCC6 and to target them for degradation by the ubiquitin-proteasome system (Kang et al. 2011). To explain the lower fold stimulation of XRCC5 and XRCC6 at the protein levels, it is therefore tempting to speculate that, XRCC5 and XRCC6 protein levels (and perhaps that of other scaffolding complexes of DNA repair as well) are regulated by a dynamic balance of synthesis, promoted by gene overexpression, and degradation, made possible by RNF146 upregulation. Consistent with this hypothesis, we found that, similar to human RNF146 (Kang et al. 2011), He-RNF146 expression in human cells reduced the number of phospho-H2AX foci detected in response to Bleomycin (Figure 6).”

      • Page 15: Please add a reference for the sentence "Functional analysis of promotor sequences in transgenic tardigrades etc."

      The reference has been added to fix this omission.

      Material and Methods:

      Small comments:

      • 40 μm mesh: space missing

      • 100 μm mesh: space missing

      • (for Bleomycin)): parenthesis missing

      • remove "as indicated in the text"

      • The investigated time points after radiation need to be clearly stated in the method section. It is also unclear in the IR and Bleomycin section which tardigrades were treated with what. Not all were treated with Bleomycin.

      The small comments above have been fixed in the revised version of the manuscript.

      • Page 21: please precise the coverage of the RNA sequencing

      Statistics on mapping of RNAseq reads are now provided in Supp Table 10.

      • Page 22: Was any read trimming performed? Anything about the quality check of the reads?

      Trimming was conducted using trimmomatic (v0.39) and quality check using FastQC (v. ?) This information has been added to the Methods section.

      • Were the analyses confirmed by a second approach: for instance, EdgeR? Deseq2 and EdgeR do not always have the same results. For more robust analyses it is advised to use both.

      Differential transcriptome analyses were conducted with DESeq2 only. The robustness of our identification of differentially expressed genes in response to IR stems from performing comparative analyses in three different species, rather than from using two bioinformatics pipelines in a single species. We also note that benchmarking reported in the initial DEseq2 paper showed that identification of differentially expressed genes with large log fold changes (which, as reported in our manuscript, is characteristic of many DNA repair genes in response to IR) is very consistent between DEseq2 and EdgeR.

      Figures:

      • Figure 2: Legend vertical dotted line does not indicate log2foldchange value of 4 in all panels: it would be good to indicate for panels a and c as well.

      Figure 2has been improved following on the suggestions of the reviewer. Dotted lines now show log2foldchange value of 2 in all panels (ie Fold Change of 4 as mentioned in the main text).

      • Figure 2C: There are a few points with high log2foldchange which are not annotated: was it because nothing was found in the blast research? If yes, it would be good to indicate their functions. If not, it would be good to mention in the discussion that there are some genes with still unknown functions which might play an important role in the resistance of tardigrades to IR.

      The few points which are not annotated in figure 2c can now be found in Supp Table 3 Some of them have no hit in Blast search, some others such as BV898_09662 or BV898_07145 have hits on DNA repair genes as RBBP8/CtIP or XRCC6 respectively but are not annnotated as such by eggnog in KEGG pathway.

      • Figure 4C: Why not have included the response of P. fairbanski to bleomycin? I guess it was not done, but it is unclear in the results and methods sections.

      P.fairbanksi response to bleomycin wasn’t assessed as we didn’t get enough animals to run the study. The method section has been modified to precise this point.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript aims to provide insights into the mediators and mechanisms underlying tardigrade radiation tolerance. The authors start by assessing the effect of ionizing radiation (IR) on the tardigrade lab species, H. exemplaris, as well as the ability of this organism to recover from this stress - specifically they look at DNA double and single strand breaks. They go on to characterize the response of H. exemplaris and two other tardigrade species to IR at the transcriptomic level. Excitingly, the authors identify a novel gene/protein called TDR1 (tardigrade DNA damage response protein 1). They carefully assess the induction of expression/enrichment of this gene/protein using a combination of transcriptomics and biochemistry - even going so far as to use a translational inhibitor to confirm the de novo production of this protein. TDR1 binds DNA in vitro and co-localizes with DNA in tardigrades.

      Reverse genetics in tardigrades is difficult, thus the authors use a heterologous system (human cells) to express TDR1 in. They find that when transiently expressed TDR1 helps improve human cell resistance to IR.

      This work is a masterclass in integrative biology incorporating a holistic set of approaches spanning next-gen sequencing, organismal biology, biochemistry, and cell biology. I think the importance of the findings is suitable and honestly, I find very little to critique in their experimental approaches.

      Overall, I find this to be one of the more compelling papers on tardigrade stress-tolerance I have read.

    3. eLife assessment

      This study offers valuable insight into the remarkable resistance of tardigrades to ionizing radiation by showing that radiation treatment induces a suite of DNA repair proteins and by identifying a strongly induced tardigrade-specific DNA-binding protein that can reduce the number of double-strand breaks in human U2OS cells. The evidence of upregulation of repair proteins is convincing, and the case for a role of the newly identified protein in repair can be strengthened as genetic tools for tardigrades become better developed. The results will interest the fields of DNA repair and radiobiology as well as tardigrade biologists.

    4. Reviewer #3 (Public Review):

      Summary:

      This paper describes transcriptomes from three tardigrade species with or without treatment with ionizing radiation (IR). The authors show that IR produces numerous single strand and double strand breaks as expected and that these are substantially repaired within 4-8 hours. Treatment with IR induces strong upregulation of transcripts from numerous DNA repair proteins, and from the newly described protein TDR1 with homologs in both Hypsibioidea and Macrobiotoidea supefamilies. The authors show that TDR1 transcription produces newly translated TDR1 protein, which can bind DNA and co-localizes with DNA in the nucleus. At higher concentrations TDR appears to form aggregates with DNA, which might be relevant to a possible function in DNA damage repair. When introduced into human U2OS cells treated with the radiomimetic drug bleomycin, TDR1 reduces the number of double-strand breaks as detected by gamma H2AX spots. This paper will be of interest to the DNA repair field and to radiobiologists.

      Strengths:

      The paper is well-written and provides solid evidence of the upregulation of DNA repair enzymes after irradiation of tardigrades, as well as upregulation of the TRD1 protein. The reduction of gamma-H2A.X spots in U2OS cells after expression of TRD1 supports a role in a DNA damage.

      Weaknesses:<br /> Genetic tools are still being developed in tardigrades, so there is no mutant phenotype to support a DNA repair function for TRD1, but this may be available soon.

    5. Reviewer #4 (Public Review):

      In this study, Anoud et al. show convincing results of genes involved in the radio-resistance of tardigrades. With transcriptomics, they found many genes involved in DNA repair pathways to be overexpressed after ionizing radiation. In addition, they found RNF146 coding for a ubiquitin ligase, and genes of the AMNP family. Finally, they more deeply characterized one upregulated gene that they named TDR1 (Tardigrade DNA damage Response 1) which seems specific to tardigrades. With proteomics they verified these results. They show that TDR1 binds DNA in vitro and co-localize with DNA in tardigrades. Because of the difficulties of carrying reverse genetics in tardigrades, the authors showed in vitro that human cells expressing TDR1 led to a reduced number of phospho-H2AX foci (indicating DNA damages) when treated with Bleomycin. Based on these results, the authors suggested that TDR1 interacts with DNA and might regulate chromosomal organization and favors DNA repair.

      Strengths:

      The paper provides solid evidence of the upregulation of DNA repair enzymes after irradiation of tardigrades, as well as upregulation of the TRD1 protein.

      The reduction of gamma-H2A.X spots in U2OS cells after expression of TRD1 supports a role in a DNA damage.

      The shown interaction of TDR1 with DNA.

      Weaknesses:

      No reverse genetics to support a DNA repair function for TRD1, even if I recognize that these remain difficult to carry in tardigrades.

      No pulse field electrophoresis gels to show DNA damages in tardigrades, which remain apparently challenging to perform in tardigrades.

      After revision, the manuscript gained in structure, and in precision.

      Overall, the manuscript provides valuable and convincing results contributing to our knowledge of tardigrade radio resistance. While reverse genetics remain difficult to carry in tardigrades, the authors used the alternative approach to investigate TDR1 function in vitro in human cells.

      This study illustrates integrative biology as it combines a set of different methodologies including next-generation sequencing, transcriptomic and proteomic analyses, immunohistochemistry, immunolabelling, in vitro assays and SEM. According to me, the quality and importance of the results make it of interest to the fields of DNA repair, radiobiology, and radio resistance.

    1. Author response:

      Reviewer #1 (Public Review):

      This study makes a substantial contribution to our understanding of the molecular evolutionary dynamics of microbial genomes by proposing a model that incorporates relatively frequent adaptive reversion mutations. In many ways, this makes sense from my own experience with evolutionary genomic data of microbes, where reversions are surprisingly familiar as evidence of the immense power of selection in large populations.

      One criticism is the reliance on one major data set of B. fragilis to test fits of these models, but this is relatively minor in my opinion and can be caveated by discussion of other relevant datasets for parallel investigation.

      We analyze data from 10 species of the Bacteroidales family, and we compare it to a dataset of Bacteroides fragilis. We have now added a reference to a recent manuscript from our group showing phenotypic alteration by reversion of a stop codon and further breaking of the same pathway through stop codons in other genes in Burkholderia dolosa on page 9, and have added a new analysis of codon usage in support of the reversion model on page 14.

      We have chosen not to analyze other species as there are no large data sets with rigorous and evenly-applied quality control across scales. We anticipate the reversion model would be able to fit the data in these cases. We now note that this work remains to be done in the discussion.

      Another point is that this problem isn't as new as the manuscript indicates, see for example https://journals.asm.org/doi/10.1128/aem.02002-20 .

      Loo et al puts forward an explanation similar to the purifying model proposed by Rocha et al, which we refute here. Quoting from Loo et al: “Our results confirm the observation that nonsynonymous SNPs are relatively elevated under shorter time periods and that purifying selection is more apparent over longer periods or during transmission.” While there is some linguistic similarity between the weak purifying model and our model of strong local adaptation model and strong adaptive reversion, we believe that the dynamical and predictive implications suggested by the reversion model are an important conceptual leap and correction to the literature. We now cite Loo et al and additional works cited therein. We have updated the abstract, introduction, and discussion to further emphasize the distinction of the reversion model from previous models: namely the implication of the reversion model that long-time scale dN/dS hides dynamics.

      Nonetheless, the paper succeeds by both developing theory and offering concrete parameters to illustrate the magnitudes of the problems that distinguish competing ideas, for example, the risk of mutational load posed in the absence of frequent back mutation.

      Reviewer #2 (Public Review):

      This manuscript asks how different forms of selection affect the patterns of genetic diversity in microbial populations. One popular metric used to infer signatures of selection is dN/dS, the ratio of nonsynonymous to synonymous distances between two genomes. Previous observations across many bacterial species have found dN/dS decreases with dS, which is a proxy for the divergence time. The most common interpretation of this pattern was proposed by Rocha et al. (2006), who suggested the excess in nonsynonymous mutations on short divergence times represent transient deleterious mutations that have not yet been purged by selection.

      In this study, the authors propose an alternative model based on the population structure of human gut bacteria, in which dN is dominated by selective sweeps of SNPs that revert previous mutations within local populations. The authors argue that contrary to standard population genetics models, which are based on the population dynamics of large eukaryotes, the large populations in the human gut mean that reversions may be quite common and may have a large impact on evolutionary dynamics. They show that such a model can fit the decrease of dN/dS in time at least as well as the purifying selection model.

      Strengths

      The main strength of the manuscript is to show that adaptive sweeps in gut microbial populations can lead to small dN/dS. While previous work has shown that using dN/dS to infer the strength of selection within a population is problematic (see Kryazhimskiy and Plotkin, 2008, cited in the paper) the particular mechanism proposed by the authors is new to my knowledge. In addition, despite the known caveats, dN/dS values are still routinely reported in studies of microbial evolution, and so their interpretation should be of considerable interest to the community.

      The authors provide compelling justification for the importance of adaptive reversions and make a good case that these need to be carefully considered by future studies of microbial evolution. The authors show that their model can fit the data as well as the standard model based on purifying selection and the parameters they infer appear to be plausible given known data. More generally, I found the discussion on the implications of traditional population genetics models in the context of human gut bacteria to be a valuable contribution of the paper.

      Thank you for the kind words and appreciation of the manuscript.

      Weaknesses

      The authors argue that the purifying selection model would predict a gradual loss in fitness via Muller's ratchet. This is true if recombination is ignored, but this assumption is inconsistent with the data from Garud, et al. (2019) cited in the manuscript, who showed a significant linkage decrease in the bacteria also used in this study.

      We now investigate the effect of recombination on the purifying selection model on page 8 and in Supplementary Figure S6. In short, we show that reasonable levels of recombination (obtained from literature r/m values) cannot rescue the purifying selection model from Muller’s ratchet when s is so low and the influx of new deleterious mutations is so high. We thank the reviewers for prompting this improvement.

      I also found that the data analysis part of the paper added little new to what was previously known. Most of the data comes directly from the Garud et al. study and the analysis is very similar as well. Even if other appropriate data may not currently be available, I feel that more could be done to test specific predictions of the model with more careful analysis.

      In addition to new analyses regarding recombination and compensatory mutations using the Garud et al data set, we have now added two new analyses, both using Bacteroides fragilis . First, we show that de novo mutations in Zhao & Lieberman et al dataset include an enrichment of premature stop codons (page 9). Second we show that genes expected to be under fluctuating selection in B. fragilis displays a significant closeness to stop codons, consistent with recent stop codons and reversions. We thank the reviewer for prompting the improvement.

      Finally, I found the description of the underlying assumptions of the model and the theoretical results difficult to understand. I could not, for example, relate the fitting parameters nloci and Tadapt to the simulations after reading the main text and the supplement. In addition, it was not clear to me if simulations involved actual hosts or how the changes in selection coefficients for different sites was implemented. Note that these are not simply issues of exposition since the specific implementation of the model could conceivably lead to different results. For example, if the environmental change is due to the colonization of a different host, it would presumably affect the selection coefficients at many sites at once and lead to clonal interference. Related to this point, it was also not clear that the weak mutation strong selection assumption is consistent with the microscopic parameters of the model. The authors also mention that "superspreading" may somehow make a difference to the probability of maintaining the least loaded class in the purifying selection model, but what they mean by this was not adequately explained.

      We apologize for leaving the specifics of the implementation from the paper and only accessible through the Github page and have corrected this. We have added a new section in the methods further detailing the reversion model and the specifics of how nloci and Tadapt (now tau_switch as of the edits) are implemented in the code.

      The possibility for clonal interference is indeed included in the simulation. Switching is not correlated with transmissions in our main figure simulations (Figure 4a). When we run simulations in which transmission and selection are correlated, the results remain essentially the same, barring higher variance at lower divergences (new Figure S10). We have now clarified these points in the results, and have also better clarified the selection only at transmission model in the main results.

      Reviewer #3 (Public Review):

      The diversity of bacterial species in the human gut microbiome is widely known, but the extensive diversity within each species is far less appreciated. Strains found in individuals on opposite sides of the globe can differ by as little as handfuls of mutations, while strains found in an individual's gut, or in the same household, might have a common ancestor tens of thousands of years ago. What are the evolutionary, ecological, and transmission dynamics that established and maintain this diversity?

      The time, T, since the common ancestor of two strains, can be directly inferred by comparing their core genomes and finding the fraction of synonymous (non-amino acid changing) sites at which they differ: dS. With the per-site per-generation mutation rate, μ, and the mean generation times roughly known, this directly yields T (albeit with substantial uncertainty of the generation time.) A traditional way to probe the extent to which selection plays a role is to study pairs of strains and compare the fraction of non-synonymous (amino acid or stop-codon changing) sites, dN, at which the strains differ with their dS. Small dN/dS, as found between distantly related strains, is attributed to purifying selection against deleterious mutations dominating over mutations that have driven adaptive evolution. Large dN/dS as found in laboratory evolution experiments, is caused by beneficial mutations that quickly arise in large bacterial populations, and, with substantial selective advantages, per generation, can rise to high abundance fast enough that very few synonymous mutations arise in the lineages that take over the population.

      A number of studies (including by Lieberman's group) have analyzed large numbers of strains of various dominant human gut species and studied how dN/dS varies. Although between closely related strains the variations are large -- often much larger than attributable to just statistical variations -- a systematic trend from dN/dS around unity or larger for close relatives to dN/dS ~ 0.1 for more distant relatives has been found in enough species that it is natural to conjecture a general explanation.

      The conventional explanation is that, for close relatives, the effects of selection over the time since they diverged has not yet purged weakly deleterious mutations that arose by chance -- roughly mutations with sT<1 -- while since the common ancestor of more distantly related strains, there is plenty of time for most of those that arose to have been purged.

      Torrillo and Lieberman have carried out an in-depth -- sophisticated and quantitative -- analysis of models of some of the evolutionary processes that shape the dependence of dN/dS on dS -- and hence on their divergence time, T. They first review the purifying selection model and show that -- even ignoring its inability to explain dN/dS > 1 for many closely related pairs -- the model has major problems explaining the crossover from dN/dS somewhat less than unity to much smaller values as dS goes through -- on a logarithmic scale -- the 10^-4 range. The first problem, already seen in the infinite-population-size deterministic model, is that a very large fraction of non-synonymous mutations would have to have deleterious s's in the 10^-5 per generation range to fit the data (and a small fraction effectively neutral). As the s's are naturally expected (at least in the absence of quantitative analysis to the contrary) to be spread out over a wide range on a logarithmic scale of s, this seems implausible. But the authors go further and analyze the effects of fluctuations that occur even in the very large populations: ~ >10^12 bacteria per species in one gut, and 10^10 human guts globally. They show that Muller's ratchet -- the gradual accumulation of weakly deleterious mutations that are not purged by selection - leads to a mutational meltdown with the parameters needed to fit the purifying selection model. In particular, with N_e the "effective population size" that roughly parametrizes the magnitude of stochastic birth-death and transition fluctuations, and U the total mutation rate to such deleterious mutations this occurs for U/s > log(sN_e) which they show would obtain with the fitted parameters.

      Torrillo and Lieberman promise an alternate model: that there are a modest number of "loci" at which conditionally beneficial mutations can occur that are beneficial in some individual guts (or other environmental conditions) at some times, but deleterious in other (or the same) gut at other times. With the ancestors of a pair of strains having passed through one too many individuals and transmissions, it is possible for a beneficial mutation to occur and rise in the population, only later to be reverted by the beneficial inverse mutation. With tens of loci at which this can occur, they show that this process could explain the drop of dN/dS from short times -- in which very few such mutations have occurred -- to very long times by which most have flipped back and forth so that a random pair of strains will have the same nucleotide at such sites with 50% probability. Their qualitative analysis of a minimally simple model of this process shows that the bacterial populations are plenty big enough for such specific mutations to occur many times in each individual's gut, and with modest beneficials, to takeover. With a few of these conditionally beneficial mutations or reversions occurring during an individuals lifetime, they get a reasonably quantitative agreement with the dN/dS vs dS data with very few parameters. A key assumption of their model is that genetically exact reversion mutations are far more likely to takeover a gut population -- and spread -- than compensatory mutations which have a similar phenotypic-reversion effect: a mutation that is reverted does not show up in dN, while one that is compensated by another shows up as a two-mutation difference after the environment has changed twice.

      Strengths:

      The quantitative arguments made against the conventional purifying selection model are highly compelling, especially the consideration of multiple aspects that are usually ignored, including -- crucially -- how Muller's ratchet arises and depends on the realistic and needed-to-fit parameters; the effects of bottlenecks in transmission and the possibility that purifying selection mainly occurs then; and complications of the model of a single deleterious s, to include a distribution of selective disadvantages. Generally, the author's approach of focusing on the simplest models with as few as possible parameters (some roughly known), and then adding in various effects one-by-one, is outstanding and, in being used to analyze environmental microbial data, exceptional.

      The reversion model the authors propose and study is a simple general one and they again explore carefully various aspects of it -- including dynamics within and between hosts -- and the consequent qualitative and quantitative effects. Again, the quantitive analysis of almost all aspects is exemplary. Although it is hard to make a compelling guess of the number of loci that are subject to alternating selection on the needed time-scales (years to centuries) they make a reasonable argument for a lower bound in terms of the number of known invertible promoters (that can genetically switch gene expression on and off).

      We are very grateful for the reviewer’s kind words and careful reading.

      Weaknesses:

      The primary weakness of this paper is one that the author's are completely open about: the assumption that, collectively, any of possibly-many compensatory mutations that could phenotypically revert an earlier mutation, are less likely to arise and takeover local populations than the exact specific reversion mutation. While detailed analysis of this is, reasonably enough, beyond the scope of the present paper, more discussion of this issue would add substantially to this work. Quantitatively, the problem is that even a modest number of compensatory mutations occurring as the environmental pressures change could lead to enough accumulation of non-synonymous mutations that they could cause dN/dS to stay large -- easily >1 -- to much larger dS than is observed. If, say, the appropriate locus is a gene, the number of combinations of mutations that are better in each environment would play a role in how large dN would saturate to in the steady state (1/2 of n_loci in the author's model). It is possible that clonal interference between compensatory and reversion mutations would result in the mutations with the largest s -- eg, as mentioned, reversion of a stop codon -- being much more likely to take over, and this could limit the typical number of differences between quite well-diverged strains. However, the reversion and subsequent re-reversion would have to both beat out other possible compensatory mutations -- naively less likely. I recommend that a few sentences in the Discussion be added on this important issue along with comments on the more general puzzle -- at least to this reader! -- as to why there appear to be so little adaptive genetic changes in core genomes on time scales of human lifetimes and civilization.

      We now directly consider compensatory mutations (page 14, SI text 3.2, and Supplementary Figure 12). We show that as long as true reversions are more likely than compensatory mutations overall, (adaptive) nonsynonymous mutations will still tend to revert towards their initial state and not contribute to asymptotic dN/dS, and show that true reversions are expected in a large swath of parameter space. Thank you for motivating this improvement!

      We note in the discussion that directional selection could be incorporated into the parameter alpha (assuming even more of the genome is deleterious) on page 16.

      An important feature of gut bacterial evolution that is now being intensely studied is only mentioned in passing at the end of this paper: horizontal transfer and recombination of core genetic material. As this tends to bring in many more mutations overall than occur in regions of a pair of genomes with asexual ancestry, the effects cannot be neglected. To what extent can this give rise to a similar dependence of dN/dS on dS as seen in the data? Of course, such a picture begs the question as to what sets the low dN/dS of segments that are recombined --- often from genetic distances comparable to the diameter of the species.

      We now discuss the effect of recombination on the purifying selection model on page 8 and in Supplementary Figure S6. In short, we now show that reasonable levels of recombination cannot rescue the purifying selection model from Muller’s ratchet when s is so low and the influx of new deleterious mutations is so high. We thank the reviewers for prompting this improvement

    2. Reviewer #2 (Public Review):

      This manuscript asks how different forms of selection affect the patterns of genetic diversity in microbial populations. One popular metric used to infer signatures of selection is dN/dS, the ratio of nonsynonymous to synonymous distances between two genomes. Previous observations across many bacterial species have found dN/dS decreases with dS, which is a proxy for the divergence time. The most common interpretation of this pattern was proposed by Rocha et al. (2006), who suggested the excess in nonsynonymous mutations on short divergence times represent transient deleterious mutations that have not yet been purged by selection.

      In this study, the authors propose an alternative model based on the population structure of human gut bacteria, in which dN is dominated by selective sweeps of SNPs that revert previous mutations within local populations. The authors argue that contrary to standard population genetics models, which are based on the population dynamics of large eukaryotes, the large populations in the human gut mean that reversions may be quite common and may have a large impact on evolutionary dynamics. They show that such a model can fit the decrease of dN/dS in time at least as well as the purifying selection model.

      Strengths

      The main strength of the manuscript is to show that adaptive sweeps in gut microbial populations can lead to small dN/dS. While previous work has shown that using dN/dS to infer the strength of selection within a population is problematic (see Kryazhimskiy and Plotkin, 2008, cited in the paper) the particular mechanism proposed by the authors is new to my knowledge. In addition, despite the known caveats, dN/dS values are still routinely reported in studies of microbial evolution, and so their interpretation should be of considerable interest to the community.

      The authors provide compelling justification for the importance of adaptive reversions and make a good case that these need to be carefully considered by future studies of microbial evolution. The authors show that their model can fit the data as well as the standard model based on purifying selection and the parameters they infer appear to be plausible given known data. More generally, I found the discussion on the implications of traditional population genetics models in the context of human gut bacteria to be a valuable contribution of the paper.

      Weaknesses

      The authors argue that the purifying selection model would predict a gradual loss in fitness via Muller's ratchet. This is true if recombination is ignored, but this assumption is inconsistent with the data from Garud, et al. (2019) cited in the manuscript, who showed a significant linkage decrease in the bacteria also used in this study.

      I also found that the data analysis part of the paper added little new to what was previously known. Most of the data comes directly from the Garud et al. study and the analysis is very similar as well. Even if other appropriate data may not currently be available, I feel that more could be done to test specific predictions of the model with more careful analysis.

      Finally, I found the description of the underlying assumptions of the model and the theoretical results difficult to understand. I could not, for example, relate the fitting parameters nloci and Tadapt to the simulations after reading the main text and the supplement. In addition, it was not clear to me if simulations involved actual hosts or how the changes in selection coefficients for different sites was implemented. Note that these are not simply issues of exposition since the specific implementation of the model could conceivably lead to different results. For example, if the environmental change is due to the colonization of a different host, it would presumably affect the selection coefficients at many sites at once and lead to clonal interference. Related to this point, it was also not clear that the weak mutation strong selection assumption is consistent with the microscopic parameters of the model. The authors also mention that "superspreading" may somehow make a difference to the probability of maintaining the least loaded class in the purifying selection model, but what they mean by this was not adequately explained.

    3. eLife assessment

      This valuable study addresses the interpretation of patterns of synonymous and nonsynonymous diversity in microbial genomes. The authors present solid theoretical and computational evidence that adaptive mutations that revert the amino acids to an earlier state can significantly impact the observed ratios of synonymous and nonsynonymous mutations in human commensal bacteria. This paper will be of interest to microbiologists with a background in evolution.

    4. Reviewer #1 (Public Review):

      This study makes a substantial contribution to our understanding of the molecular evolutionary dynamics of microbial genomes by proposing a model that incorporates relatively frequent adaptive reversion mutations. In many ways, this makes sense from my own experience with evolutionary genomic data of microbes, where reversions are surprisingly familiar as evidence of the immense power of selection in large populations.

      One criticism is the reliance on one major data set of B. fragilis to test fits of these models, but this is relatively minor in my opinion and can be caveated by discussion of other relevant datasets for parallel investigation.

      Another point is that this problem isn't as new as the manuscript indicates, see for example https://journals.asm.org/doi/10.1128/aem.02002-20.

      Nonetheless, the paper succeeds by both developing theory and offering concrete parameters to illustrate the magnitudes of the problems that distinguish competing ideas, for example, the risk of mutational load posed in the absence of frequent back mutation.

    5. Reviewer #3 (Public Review):

      The diversity of bacterial species in the human gut microbiome is widely known, but the extensive diversity within each species is far less appreciated. Strains found in individuals on opposite sides of the globe can differ by as little as handfuls of mutations, while strains found in an individual's gut, or in the same household, might have a common ancestor tens of thousands of years ago. What are the evolutionary, ecological, and transmission dynamics that established and maintain this diversity?

      The time, T, since the common ancestor of two strains, can be directly inferred by comparing their core genomes and finding the fraction of synonymous (non-amino acid changing) sites at which they differ: dS. With the per-site per-generation mutation rate, μ, and the mean generation times roughly known, this directly yields T (albeit with substantial uncertainty of the generation time.) A traditional way to probe the extent to which selection plays a role is to study pairs of strains and compare the fraction of non-synonymous (amino acid or stop-codon changing) sites, dN, at which the strains differ with their dS. Small dN/dS, as found between distantly related strains, is attributed to purifying selection against deleterious mutations dominating over mutations that have driven adaptive evolution. Large dN/dS as found in laboratory evolution experiments, is caused by beneficial mutations that quickly arise in large bacterial populations, and, with substantial selective advantages, per generation, can rise to high abundance fast enough that very few synonymous mutations arise in the lineages that take over the population.

      A number of studies (including by Lieberman's group) have analyzed large numbers of strains of various dominant human gut species and studied how dN/dS varies. Although between closely related strains the variations are large -- often much larger than attributable to just statistical variations -- a systematic trend from dN/dS around unity or larger for close relatives to dN/dS ~ 0.1 for more distant relatives has been found in enough species that it is natural to conjecture a general explanation.<br /> The conventional explanation is that, for close relatives, the effects of selection over the time since they diverged has not yet purged weakly deleterious mutations that arose by chance -- roughly mutations with sT<1 -- while since the common ancestor of more distantly related strains, there is plenty of time for most of those that arose to have been purged.

      Torrillo and Lieberman have carried out an in-depth -- sophisticated and quantitative -- analysis of models of some of the evolutionary processes that shape the dependence of dN/dS on dS -- and hence on their divergence time, T. They first review the purifying selection model and show that -- even ignoring its inability to explain dN/dS > 1 for many closely related pairs -- the model has major problems explaining the crossover from dN/dS somewhat less than unity to much smaller values as dS goes through -- on a logarithmic scale -- the 10^-4 range. The first problem, already seen in the infinite-population-size deterministic model, is that a very large fraction of non-synonymous mutations would have to have deleterious s's in the 10^-5 per generation range to fit the data (and a small fraction effectively neutral). As the s's are naturally expected (at least in the absence of quantitative analysis to the contrary) to be spread out over a wide range on a logarithmic scale of s, this seems implausible. But the authors go further and analyze the effects of fluctuations that occur even in the very large populations: ~ >10^12 bacteria per species in one gut, and 10^10 human guts globally. They show that Muller's ratchet -- the gradual accumulation of weakly deleterious mutations that are not purged by selection - leads to a mutational meltdown with the parameters needed to fit the purifying selection model. In particular, with N_e the "effective population size" that roughly parametrizes the magnitude of stochastic birth-death and transition fluctuations, and U the total mutation rate to such deleterious mutations this occurs for U/s > log(sN_e) which they show would obtain with the fitted parameters.

      Torrillo and Lieberman promise an alternate model: that there are a modest number of "loci" at which conditionally beneficial mutations can occur that are beneficial in some individual guts (or other environmental conditions) at some times, but deleterious in other (or the same) gut at other times. With the ancestors of a pair of strains having passed through one too many individuals and transmissions, it is possible for a beneficial mutation to occur and rise in the population, only later to be reverted by the beneficial inverse mutation. With tens of loci at which this can occur, they show that this process could explain the drop of dN/dS from short times -- in which very few such mutations have occurred -- to very long times by which most have flipped back and forth so that a random pair of strains will have the same nucleotide at such sites with 50% probability. Their qualitative analysis of a minimally simple model of this process shows that the bacterial populations are plenty big enough for such specific mutations to occur many times in each individual's gut, and with modest beneficials, to takeover. With a few of these conditionally beneficial mutations or reversions occurring during an individuals lifetime, they get a reasonably quantitative agreement with the dN/dS vs dS data with very few parameters. A key assumption of their model is that genetically exact reversion mutations are far more likely to takeover a gut population -- and spread -- than compensatory mutations which have a similar phenotypic-reversion effect: a mutation that is reverted does not show up in dN, while one that is compensated by another shows up as a two-mutation difference after the environment has changed twice.

      Strengths:

      The quantitative arguments made against the conventional purifying selection model are highly compelling, especially the consideration of multiple aspects that are usually ignored, including -- crucially -- how Muller's ratchet arises and depends on the realistic and needed-to-fit parameters; the effects of bottlenecks in transmission and the possibility that purifying selection mainly occurs then; and complications of the model of a single deleterious s, to include a distribution of selective disadvantages. Generally, the author's approach of focusing on the simplest models with as few as possible parameters (some roughly known), and then adding in various effects one-by-one, is outstanding and, in being used to analyze environmental microbial data, exceptional.

      The reversion model the authors propose and study is a simple general one and they again explore carefully various aspects of it -- including dynamics within and between hosts -- and the consequent qualitative and quantitative effects. Again, the quantitive analysis of almost all aspects is exemplary. Although it is hard to make a compelling guess of the number of loci that are subject to alternating selection on the needed time-scales (years to centuries) they make a reasonable argument for a lower bound in terms of the number of known invertible promoters (that can genetically switch gene expression on and off).

      Weaknesses:

      The primary weakness of this paper is one that the author's are completely open about: the assumption that, collectively, any of possibly-many compensatory mutations that could phenotypically revert an earlier mutation, are less likely to arise and takeover local populations than the exact specific reversion mutation. While detailed analysis of this is, reasonably enough, beyond the scope of the present paper, more discussion of this issue would add substantially to this work. Quantitatively, the problem is that even a modest number of compensatory mutations occurring as the environmental pressures change could lead to enough accumulation of non-synonymous mutations that they could cause dN/dS to stay large -- easily >1 -- to much larger dS than is observed. If, say, the appropriate locus is a gene, the number of combinations of mutations that are better in each environment would play a role in how large dN would saturate to in the steady state (1/2 of n_loci in the author's model). It is possible that clonal interference between compensatory and reversion mutations would result in the mutations with the largest s -- eg, as mentioned, reversion of a stop codon -- being much more likely to take over, and this could limit the typical number of differences between quite well-diverged strains. However, the reversion and subsequent re-reversion would have to both beat out other possible compensatory mutations -- naively less likely. I recommend that a few sentences in the Discussion be added on this important issue along with comments on the more general puzzle -- at least to this reader! -- as to why there appear to be so little adaptive genetic changes in core genomes on time scales of human lifetimes and civilization.

      An important feature of gut bacterial evolution that is now being intensely studied is only mentioned in passing at the end of this paper: horizontal transfer and recombination of core genetic material. As this tends to bring in many more mutations overall than occur in regions of a pair of genomes with asexual ancestry, the effects cannot be neglected. To what extent can this give rise to a similar dependence of dN/dS on dS as seen in the data? Of course, such a picture begs the question as to what sets the low dN/dS of segments that are recombined --- often from genetic distances comparable to the diameter of the species.

    1. Author response:

      Reviewer #1 (Public Review):

      1) Napthylamine (1NA), an industrial reagent used in the manufacturing of dyes and pesticides is harmful to humans and the environment. In the current manuscript, the authors report the successful isolation of a Pseudomonas strain from a former naphthylamine manufacturing site that is capable of degrading 1NA. Using genetic and enzymatic analysis they identified the initial stages of 1NA degradation and the enzymes responsible for downstream processing of 1,2-dihydroxynapthalene and Salicylate. The authors determined the molecular structure of NpaA1, the first enzyme in the pathway responsible for glutamylation of 1NA. NpaA1 has a border substrate specificity compared to previously characterized enzymes involved in aromatic amine degradation. They carried out structural comparison of NpaA1 with glutamine synthase structures, alfa-fold models of similar enzymes and put forth hypothesis to explain the broad substrate specificity of NpaA1.

      The manuscript is well written and easy to understand. The authors carried out careful genetic analysis to identify the genes/enzymes responsible for degradation of 1NA to catechol. They characterized the first enzyme in the pathway, NpaA1 which is responsible glutamylation of 1NA. and determined the molecular structure of apo-NpaA1, NpaA1 - AMPPNP complex and Npa1 - ADP - Met-Sox-P complex using X-ray crystallography.

      The proposed mechanism of broad substrate specificity of NpaA1, however, is based on comparison of 1NA docked NpaA1 structure with St-GS (Glutamate synthase) and Alphafold2 predicted model of AtdA1 from an aniline degrading strain of Acinetobacter sp. Lack of molecular structure or mutational studies to back the proposed mechanism makes it difficult to agree with the proposed mechanism.

      We appreciate your valuable comments. To further demonstrate that the structure of the aromatic amine binding tunnel and active pocket determines the broad substrate specificity of NpaA1, we have conducted additional experiments with several key residue mutants of the binding tunnel for naphthylamine and monocyclic aniline activities. The results provide a more detailed elucidation of the reasons for NpaA1's broad substrate specificity. Specific results and analyses are provided in the subsequent response.

      Reviewer #2 (Public Review):

      Microbial degradation of synthetic organic compounds is the basis of bioremediation. Biodegradation of 1NA has not been previously reported. The report describes a complete study of 1NA biodegradation by a new isolate Pseudomonas sp. strain JS3066. The study includes the enrichment and isolation of the 1NA-degrading bacterium Pseudomonas sp. strain JS3066, the identification of the genes and enzymes involved in 1NA degradation, and the detailed characterization of γ-glutamylorganoamide synthetase by using biochemical and structural analysis. In the discussion, the potential evolution of 1NA degradation pathway, the similarity and difference between γ-glutamylorganoamide synthetase and glutamine synthetase, and the significance were explained. The conclusions were well supported by the results presented.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

    2. eLife assessment

      This important work identifies a p. aeruginosa strain and enzyme that can degrade 1-naphthylamine, a harmful industrial pollutant. Data resulting from in vivo and structural approaches are compelling, but additional mutagenesis would further test and establish the broad substrate specificity of NpaA1. With this additional data, this paper would be of high interest to biologists and enzymologists studying biodegradation of industrial pollutants.

    3. Reviewer #1 (Public Review):

      (1) Napthylamine (1NA), an industrial reagent used in the manufacturing of dyes and pesticides is harmful to humans and the environment. In the current manuscript, the authors report the successful isolation of a Pseudomonas strain from a former naphthylamine manufacturing site that is capable of degrading 1NA. Using genetic and enzymatic analysis they identified the initial stages of 1NA degradation and the enzymes responsible for downstream processing of 1,2-dihydroxynapthalene and Salicylate. The authors determined the molecular structure of NpaA1, the first enzyme in the pathway responsible for glutamylation of 1NA. NpaA1 has a border substrate specificity compared to previously characterized enzymes involved in aromatic amine degradation. They carried out structural comparison of NpaA1 with glutamine synthase structures, alfa-fold models of similar enzymes and put forth hypothesis to explain the broad substrate specificity of NpaA1.

      The manuscript is well written and easy to understand. The authors carried out careful genetic analysis to identify the genes/enzymes responsible for degradation of 1NA to catechol. They characterized the first enzyme in the pathway, NpaA1 which is responsible glutamylation of 1NA. and determined the molecular structure of apo-NpaA1, NpaA1 - AMPPNP complex and Npa1 - ADP - Met-Sox-P complex using X-ray crystallography.<br /> The proposed mechanism of broad substrate specificity of NpaA1, however, is based on comparison of 1NA docked NpaA1 structure with St-GS (Glutamate synthase) and Alphafold2 predicted model of AtdA1 from an aniline degrading strain of Acinetobacter sp. Lack of molecular structure or mutational studies to back the proposed mechanism makes it difficult to agree with the proposed mechanism.

    4. Reviewer #2 (Public Review):

      Microbial degradation of synthetic organic compounds is the basis of bioremediation. Biodegradation of 1NA has not been previously reported. The report describes a complete study of 1NA biodegradation by a new isolate Pseudomonas sp. strain JS3066. The study includes the enrichment and isolation of the 1NA-degrading bacterium Pseudomonas sp. strain JS3066, the identification of the genes and enzymes involved in 1NA degradation, and the detailed characterization of γ-glutamylorganoamide synthetase by using biochemical and structural analysis. In the discussion, the potential evolution of 1NA degradation pathway, the similarity and difference between γ-glutamylorganoamide synthetase and glutamine synthetase, and the significance were explained. The conclusions were well supported by the results presented.

    1. Author response:

      Reviewer #1 (Public Review):

      “… it remains unclear how ninein reduction causes bone defects …”

      We have added several control experiments that permit us to conclude that osteoblast numbers remain unaltered in the ninein-knockout embryos, and that bone abnormalities in vivo are caused by fusion defects of osteoclast precursor cells, whereas the proliferation, viability, or the adhesion of these precursor cells remain unaffected. For details, please see our comments below.

      “Discussion includes several unfounded potential mechanisms that really need to be thoroughly analyzed to gain a mechanistic understanding of the bone defects…”

      The new data back up our claim of fusion defects as a cause for limited osteoclast function. We have re-written parts of the discussion, to take into account our new findings.

      “Data showing normal osteoblasts in ninein-null mice was qualitative and requires further in-depth analysis and quantification of osteoblast …”

      To address this point, quantification of osteoblast numbers in tibiae at E16.5 and E18.5 was performed in control and ninein-deleted mouse embryos. The data are presented in the new Figures 3G and J.

      “In ninein knock-out mice, reduced TRAP+ve multinuclear cells were observed (Figure 6A and 6B). However, the magnitude of difference (about 5% decrease in multinucleated cells) is not consistent with the skeletal deformities reported in Figures 2-4, potentially suggesting the contribution of additional mechanisms.”

      We agree that the difference appears to be small at first glance, but nevertheless it remains statistically significant (a more than three-fold difference). We would like to recall that these observations (Fig. 6A) were performed at E14.5, i.e. at a stage when no ossification has occurred yet. We are looking at the first fusion events of myeloid precursors, likely derived from the fetal liver, that colonize the area of the first bone to form, and small differences in the number of functional osteoclasts may account for different timing of ossification. We think that differences in osteoclast fusion also account for the premature appearance of ossification centers for other skeletal elements, at later time points during development.

      “The fusion assay in Figure 6C needs further clarification. How was the syncytia perimeter defined to measure cell surface? The x-axis suggests that there are syncytia that contain up to 160 nuclei at day 3. How were the nuclei differentially stained and quantified?”

      We provide now additional information on the experimental approach in the revised manuscript, on pages 16-17 (Materials and Methods). For information: high numbers of syncytial nuclei in cultures were also observed by other groups in the past (Tiedemann et al., 2017, Front Cell Dev Biol. 5:54). In addition, we performed new experiments and quantified the fusion of osteoclast precursors by staining for actin and nuclei (new Figure 7C). This allowed us to quantify several additional parameters related to cell fusion (as initially performed in Raynaud-Messina et al., 2018, PNAS, 115:E2556-E2565).

      “Some text needs clarification. … What is the definition of "large syncytia"? Is the fusion index increase by day 5 diminished in later days? A graph of the syncytia size/ nuclei number or fusion index in the above-mentioned days will be helpful.”

      Information on the definition of “large syncytia” is now provided on page 10 (1st paragraph). We added further experimental details on osteoclast size for days 3, 4, and 5 in the supplemental Figures 7A and B. Most importantly, we performed additional assays of the fusion index by quantifying syncytial versus non-syncytial nuclei in a semi-automated manner. The new data are presented in Figure 7C, and the methods are explained on page 17. Together with our new analysis of cell proliferation, cell viability, and cell adhesion (Figure 7C, D, suppl. Fig. 7C-G), we provide now solid evidence for a fusion defect at the origin of impaired formation of ninein del/del osteoclasts.

      “Assessment of resorption was qualitative in Figure 6E and since the fusion deficiencies are transient, quantification of a corresponding resorption activity is needed. This should be described in the Materials and Methods section.”

      Quantifications of the bone resorption activities are now provided in the new Figure 7E, and a reference for the methods is provided on page 16.

      “Further experiments are needed to show connections between reduced centrosome clustering and reduced osteoclast formation as there is no evidence to date that suggest centrosome clustering is required for cell fusion. Multi-color live imaging and dynamic analysis can be used to determine if the ninein deficient cells show defective movement/migration/ fusion dynamics.”

      We agree that it is an important question, and studying potential links between centrosomal microtubule organization and osteoclast fusion is an ongoing project of the team. However, we estimate that in order to obtain conclusive results this will require 1-2 additional years of research activity, and we intend to present this as a separate project in the future. At the current point of our investigation, we think that providing a solid link between ninein, osteoclast fusion, and controlled timing of ossification, as shown in this manuscript, represents valuable progress to understand previously published bone abnormalities in patients with ninein mutations.

      “Quantification of the % of multinucleated osteoclasts that contain clustered and dispersed centrosomes is needed.”

      New quantification experiments on centrosome clustering are now provided in Figure 8H. These quantifications demonstrate that the potential of centrosome clustering is almost completely lost in osteoclasts without ninein.

      Reviewer #2 (Public Review):

      “Based on the decrease in the number of osteoclasts (Fig 5E, G, and also per coverslip after 2 days in culture), the authors suggest that the loss of ninein impacts osteoclast proliferation. First, proliferation can be directly quantified using Ki67 staining or EdU incorporation. Second, other interpretations are also plausible and can also be experimentally tested. These include less adhesion and attachment of the mutants to the coverslips, but perhaps more relevant in vivo is cell death of the ninein mutant osteoclasts. It has been established that the loss of centrosome function activates p53- dependent cell death and osteoclasts might be a vulnerable cell population. Quantifying p53 immunoreactivity and/or cell death in osteoclasts might help clarify the phenotype of osteoclast reduction.”

      In response to the reviewers, we have performed a series of new experiments that include

      1) A careful analysis of the fusion index, using a semi-automated approach, indicating significant differences in the fusion of precursor cells into osteoclasts (Fig. 7C).

      2) We have repeated the quantification of cell numbers prior to fusion and find variations between samples from different mice (also among mice of the same genotype), but we see on average comparable cell adhesion between samples from control mice and ninein-del/del mice. The data are provided in the supplemental Figure 7F. Moreover, we have quantified the expression of three main beta-integrins at the surface of control and ninein del/del osteoclast precursors (suppl. Fig. 7G), without detecting significant differences. Altogether, these data suggest the cell adhesion is comparable for the two genotypes.

      3) We have addressed the question of altered cell proliferation, by performing flow cytometry experiments and by quantifying the different cell cycle stages (Fig. 7D), and by quantifying Ki67 expression (suppl. Fig. 7C). We see no significant differences between samples from control and ninein-del/del mice.

      4) We have addressed the question of cell death, by performing Annexin V staining and flow cytometry (suppl. Fig. 7D), and by immunoblotting for cleaved caspase 3 and PARP (suppl. Fig. 7E). These experiments reveal no significant differences between the control and ninein del/del samples. Our data permit us to exclude cell death as a likely cause for the reduction of fused osteoclasts in the absence of ninein.

      Overall, the new experiments show that the defects in osteoclast formation from ninein-deleted samples are due to defects in cell fusion, but not in cell proliferation, cell adhesion or viability.

      Reviewer #3 (Public Review):

      “The authors put much emphasis on the centrosome in the Introduction session. However, it was not until Figure 7 did they show abnormal centriole clustering in osteoclasts. The introduction should include more background on osteoclast and osteoblast balance during skeletal development.”

      To address this, we included more background on the role of osteoclasts and osteoblasts in the revised introduction (page 4).

    2. Reviewer #1 (Public Review):

      The impact of this paper is that it shows conclusively the bone defects caused by ninein depletion, albeit transient defects, which has been indirectly deduced in past studies. The paper is largely descriptive including the cytoskeletal analysis of osteoclasts thus it remains unclear how ninein reduction causes bone defects and why this defect is transient. The Discussion includes several unfounded potential mechanisms that really need to be thoroughly analyzed to gain a mechanistic understanding of the bone defects in ninein-null mice.

      Other points:<br /> Data showing normal osteoblasts in ninein-null mice was qualitative and requires further in-depth analysis and quantification of osteoblast and osteocyte presence and activity in ninein del/del mice to strengthen the study.

      In ninein knock-out mice, reduced TRAP+ve multinuclear cells were observed (Figure 6A and 6B). However, the magnitude of difference (about 5% decrease in multinucleated cells) is not consistent with the skeletal deformities reported in Figures 2-4, potentially suggesting the contribution of additional mechanisms.

      The fusion assay in Figure 6C needs further clarification. How was the syncytia perimeter defined to measure cell surface? The x-axis suggests that there are syncytia that contain up to 160 nuclei at day 3. How were the nuclei differentially stained and quantified?

      Some text needs clarification. For instance, "On days 3 and 4, we found only about half as many large syncytia in cultures from ninein-deleted mice, compared to controls, but on day 5 large syncytia lacking ninein exceeded 90% of control levels. Altogether, this suggests that fusion deficiencies are a transient phenomenon in in vitro-induced adult osteoclasts. On later days of culture, fusion efficiency started to diminish." What is the definition of "large syncytia"? Is the fusion index increase by day 5 diminished in later days? A graph of the syncytia size/ nuclei number or fusion index in the above-mentioned days will be helpful.

      Assessment of resorption was qualitative in Figure 6E and since the fusion deficiencies are transient, quantification of a corresponding resorption activity is needed. This should be described in the Materials and Methods section.

      Further experiments are needed to show connections between reduced centrosome clustering and reduced osteoclast formation as there is no evidence to date that suggest centrosome clustering is required for cell fusion. Multi-color live imaging and dynamic analysis can be used to determine if the ninein deficient cells show defective movement/migration/ fusion dynamics.

      Quantification of the % of multinucleated osteoclasts that contain clustered and dispersed centrosomes is needed.

    3. eLife assessment

      This valuable study offers new insight into the role of centrosome protein ninein in skeletal development through an analysis of the skeletal phenotype of ninein-deficient mice. While there is solid evidence supporting the conclusion that the absence of ninein leads to transient skeletal abnormalities and a lasting reduction in osteoclastogenesis, the evidence to substantiate the claim that enhanced ossification is attributed to reduced osteoclast formation/activity is insufficient. This work will be of interest to scientists in bone biology and skeletal development field.

    4. Reviewer #2 (Public Review):

      The paper by Gilbert et al. is well-written in a detailed format and the authors are candid in their data interpretation by acknowledging that the described ninein bone defects are mild, transient, and do not lead to major long-lasting defects in adulthood.

      The main strength of the study is presenting a novel link between a centrosomal protein and osteoclasts in the mouse. However, the majority of the work is dedicated to describing the premature ossification phenotype and less attention is paid to how a centrosomal protein affects osteoclast proliferation, survival, and/or differentiation into mature osteoclasts.

      Based on the decrease in the number of osteoclasts (Fig 5E, G, and also per coverslip after 2 days in culture), the authors suggest that the loss of ninein impacts osteoclast proliferation. First, proliferation can be directly quantified using Ki67 staining or EdU incorporation. Second, other interpretations are also plausible and can also be experimentally tested. These include less adhesion and attachment of the mutants to the coverslips, but perhaps more relevant in vivo is cell death of the ninein mutant osteoclasts. It has been established that the loss of centrosome function activates p53-dependent cell death and osteoclasts might be a vulnerable cell population. Quantifying p53 immunoreactivity and/or cell death in osteoclasts might help clarify the phenotype of osteoclast reduction.

    5. Reviewer #3 (Public Review):

      Ninein is a centrosome protein that has been implicated in microtubule anchorage and centrosome cohesion. Mutations in the human ninein gene have been linked to Seckel syndrome and a rare form of skeletal dysplasia. However, the role of ninein in skeletal development remains unknown. Here, we describe a ninein knockout mouse with advanced endochondral ossification during embryonic development. Although the long bones maintain a regular size, the absence of ninein delays the formation of the bone marrow cavity in the prenatal tibia. Likewise, intramembranous ossification in the skull is more developed, leading to a premature closure of the interfrontal suture. We demonstrate that ninein is strongly expressed in osteoclasts of control mice and that its absence reduces the fusion of precursor cells into syncytial osteoclasts. As a consequence, ninein-deficient osteoclasts have a reduced capacity to resorb bone. At the cellular level, the absence of ninein interferes with<br /> centrosomal microtubule organization, reduces centrosome cohesion, and provokes the loss of centrosome clustering in multinucleated mature osteoclasts. We propose that centrosomal ninein is important for osteoclast fusion, to enable a functional balance between bone-forming osteoblasts and bone-resorbing osteoclasts during skeletal development.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Results showing reactivation for near and far items separately are now included in Fig. 5 and convincingly suggest a simultaneous reactivation. For me, the open question remaining (see public) review is the degree to which the methods used here to show clustered vs sequential reactivation are mutually exclusive; and if the pre-selection of a time window of peak reactivation (based on all future items) biases the analyses towards clustered reactivation. The discussion would benefit from a brief discussion of these issues.

      We have added a brief discussion of the issues. However, we want to clarify a minor point of the public review: While our interpretation implies that replay and reactivation are probably mutually exclusive within a single retrieval event, it does not imply that strategies cannot vary within different retrieval events of the same participant. Nevertheless, we want to address this raised concern (that is, if we understand correctly, that replay events that are contained within the time window of the reactivation analysis could not be distinguished by the chosen methods) and have added it to the discussion.

      The corresponding sentence reads:

      “[…] Finally, we want to acknowledge that by selecting a time window for the clustered reactivation we cannot distinguish very fast replay events (<=30ms) from clustered reactivation if they are contained exactly within the specific reactivation analysis time window..

      Reviewer #2 (Recommendations For The Authors):

      Figure 5D shows the difference scores between near vs. distant items for learning and retrieval. Similar to Figure 5 from the first version of your paper, the difference score does not show whether reactivation of the near vs. distant items change from learning to retrieval. You could show this change in a 2 (near vs. distant) x 2 (learning vs. retrieval) box plot (corresponding to Figure 5A).

      We have added the requested plot as supplement 9 and referred to it in the figure description. However comparing absolute, raw probabilities between different blocks is tricky, as baseline probabilities are varying over time (e.g. due to shift in distance to sensors), therefore, differential reactivation might be better suited as it is a relative measure to compare between blocks.

      At the end of the results section, you state: "On average, differential reactivation probability increased from pre to post resting state (Figure 5D).". I would suggest providing some statistical comparison and the corresponding values.

      We have calculated and added respective p-value statistics of a T-Test and reported that the increase is only descriptive and not statistically significant.

    2. eLife assessment

      This magnetoencephalography study reports important new findings regarding the nature of memory reactivation during cued recall. It replicates previous work showing that such reactivation can be sequential or clustered, with sequential reactivation being more prevalent in low performers. It adds convincing evidence, even though based on limited amounts of data, that high memory performers tend to show simultaneous (i.e., clustered) reactivation, varying in strength with item distance in the learned graph structure. The study will be of interest to scientists studying memory replay.

    3. Reviewer #1 (Public Review):

      Summary:

      Previous work in humans and non-human animals suggests that during offline periods following learning, the brain replays newly acquired information in a sequential manner. The present study uses a MEG-based decoding approach to investigate the nature of replay/reactivation during a cued recall task directly following a learning session, where human participants are trained on a new sequence of 10 visual images embedded in a graph structure. During retrieval, participants are then cued with two items from the learned sequence, and neural evidence is obtained for the simultaneous or sequential reactivation of future sequence items. The authors find evidence for both sequential and clustered (i.e., simultaneous) reactivation. Replicating previous work, low-performing participants tend to show sequential, temporally segregated reactivation of future items, whereas high-performing participants show more clustered reactivation. Adding to previous work, the authors show that an image's reactivation strength varies depending on its proximity to the retrieval cue within the graph structure.

      Strengths:

      As the authors point out, work on memory reactivation has largely been limited to the retrieval of single associations. Given the sequential nature of our real-life experiences, there is clearly value in extending this work to structured, sequential information. State-of-the-art decoding approaches for MEG are used to characterize the strength and timing of item reactivation. The manuscript is very well written with helpful and informative figures in the main sections. The task includes an extensive localizer with 50 repetitions per image, allowing for stable training of the decoders and the inclusion of several sanity checks demonstrating that on-screen items can be decoded with high accuracy.

      Weaknesses:

      Of major concern, the experiment is not optimally designed for analysis of the retrieval task phase, where only 4 min of recording time and a single presentation of each cue item are available for the analyses of sequential and non-sequential reactivation. In their revision, the authors include data from the learning blocks in their analysis. These blocks follow the same trial structure as the retrieval task, and apart from adding more data points could also reveal a possible shift from sequential to clustered reactivation as learning of the graph structure progresses. The new analyses are not entirely conclusive, maybe given the variability in the number of learning blocks that participants require to reach the criterion. In principle, they suggest that reactivation strength increases from learning (pre-rest) to final retrieval (post-rest).

      On a more conceptual note, the main narrative of the manuscript implies that sequential and clustered reactivation are mutually exclusive, such that a single participant would show either one or the other type. With the analytic methods used here, however, it seems possible to observe both types of reactivation. For example, the observation that mean reactivation strength (across the entire trial, or in a given time window of interest) varies with graph distance does not exclude the possibility that this reactivation is also sequential. In fact, the approach of defining one peak time window of reactivation may bias towards simultaneous, graded reactivation. It would be helpful if the authors could clarify this conceptual point. A strong claim that the two types of reactivation are mutually exclusive would need to be substantiated by further evidence, for instance, a suitable metric contrasting "sequenceness" vs "clusteredness".

      On the same point, the non-sequential reactivation analyses use a time window of peak decodability that is determined based on the average reactivation of all future items, irrespective of graph distance. In a sequential forward cascade of reactivations, it could be assumed that the reactivation of near items would peak earlier than the reactivation of far items. In the revised manuscript, the authors now show the "raw" timecourses of item decodability at different graph distances, clearly demonstrating their peak reactivation times, which show convincingly that reactivation for near and far items occurs at very similar time points. The question that remains, therefore, is whether the method of pre-selecting a time window of interest described above could exert a bias towards finding clustered reactivation.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors investigate replay (defined as sequential reactivation) and clustered reactivation during retrieval of an abstract cognitive map. Replay and clustered reactivation were analysed based on MEG recordings combined with a decoding approach. While the authors state to find evidence for both, replay and clustered reactivation during retrieval, replay was exclusively present in low performers. Further, the authors show that reactivation strength declined with an increasing graph distance.

      Strengths:

      The paper raises interesting research questions, i.e., replay vs. clustered reactivation and how that supports retrieval of cognitive maps. The paper is well written, well structured and easy to follow. The methodological approach is convincing and definitely suited to address the proposed research questions.

      The paper is a great combination between replicating previous findings (Wimmer et al. 2020) with a new experimental approach but at the same time presenting novel evidence (reactivation strength declines as a function of graph distance).

      What I also want to positively highlight is their general transparency. For example, they pre-registered this study but with a focus on a different part of the data and outlined this explicitly in the paper.

      The paper has very interesting findings. However, there are some shortcomings, especially in the experimental design. These are shortly outlined below but are also openly and in detail discussed by the authors.

      Weaknesses:

      The individual findings are interesting. However, due to some shortcomings in the experimental design they cannot be profoundly related to each other. For example, the authors show that replay is present in low but not in high performers with the assumption that high performers tend to simultaneously reactivate items. But then, the authors do not investigate clustered reactivation (= simultaneous reactivation) as a function of performance due to a low number of retrieval trials and ceiling performance in most participants.<br /> As a consequence of the experimental design, some analyses are underpowered (very low number of trials, n = ~10, and for some analyses, very low number of participants, n = 14).

    1. Author response:

      We thank both the reviewers for their thorough reading of our manuscript and insightful suggestions. We thank the editors for their assessment of our article. We will submit a revised manuscript that addresses several comments and include a point-by-point response to the reviewers.

      (1) With respect to how our data compares with previously published datasets, we will provide a table comparing cell numbers. Study differences such as read depth, strain of animals used (including pigmented vs albino), method of cell isolation (including drug exposure), and number of cells profiled raise a significant impediment to integration with previously published datasets. We would like to highlight that ours is the first SEC single cell study that uses pigmented mouse eyes on C57BL/6J background. Integrating with the albino mouse data (Thompson et al. 2021) hindered pathway analyses possibly due to the variable drop out of genes across studies that was likely impacted by differences in method of cell isolation and increased representation of stress response genes in their dataset. We also attempted an integrated analysis with published mouse data (Van Zyl et al. 2020) but did not obtain additional meaningful information due to their low SEC numbers.

      (2) The reviewers commented that our integration of single cell and single nuc data should be done with caution: we agree and had given careful consideration to the integration process. We will demonstrate the contribution of different samples and datasets to show how our datasets have integrated.

      (3) To address the purity of bulk RNA seq, we will add more details for isolation of SECs for bulk seq. The markers to distinguish the three cell types were informed by immunofluorescence. Using these markers, we performed FACS using gates that were well separated. We have provided a heatmap with hierarchical clustering based on Euclidean distance of the EC subtypes (Figure 1B) analyzed by bulk RNA seq in addition to number of DE genes between subtypes.

      (4) To address the immunostaining of NPNT and CCL21A, since both our antibodies are derived from the same species (goat), a co-labeling wasn’t possible. To be prudent, we used adjacent sections, flat-mounts, and RNAscope and provided further evidence of the anterior/posterior “bias” in supplemental figures. Although we agree on its importance, work with human tissue will be a focus of future work.

      (5) Regarding the reviewer’s comments on substructure and that profiling may still not be comprehensive, we agree that further even more comprehensive studies are still needed. Profiling more cells will determine the robustness of the detected cell state difference and will help to resolve the cause of substructure within clusters as due to either lack of completely comprehensive profiling of cell types/states or more stochastic differences. We will add a comment to the discussion.

    2. Reviewer #2 (Public Review):

      Summary:

      This article has characterized the mouse Schlemm's canal expression profile using a comprehensive approach based on sorted SEC, LEC, and BEC total RNA-Seq, scRNA-Seq, and snRNA-Seq to enrich the selection of SECs. The study has successfully profiled genome-wide gene expression using sorted SECs, demonstrating that SECs have a closer similarity to LECs than BECs. The combined scRNA- and snRNA-Seq data with deep coverage of gene expression led to the successful identification of many novel biomarkers for inner wall SECs, outer wall SECs, collector channel ECs, and pericytes. In addition, the study also identified two novel states of inner wall SECs separated by new markers. The study provides significant novel information about the biology and expression profile of SECs in the inner and outer walls. It is of great significance to have this novel, convincing, and comprehensive study led by leading researchers published in this journal.

      Strengths:

      This is a comprehensive study using various data to support the expression characterization of mouse SECs. First, the study profiled genome-wide expression using sorted SECs, LECs, and BECs from the same tissue/organ to identify the similarities and differences among the three types of cells. Second, snRNA-Seq was applied to enrich the number of SECs from mouse ocular tissues significantly. Increased sampling of SECs and other cells led to more comprehensive coverage and characterization of cells, including pericytes. Third, the combined scRNA- and snRNA-Seq data analyses increase the power to further characterize the subtle differences within SECs, leading to identifying the expression markers of Inner and Outer wall SECs, collector channel ECs, and distal region cells. Fourth, the identified unique markers were validated for RNA and protein expression in mouse ocular tissues. Fifth, the study explored how the IOP- and glaucoma-associated genes are expressed in the ScRNA- and snRNA-Seq data, providing potential connections of these GWAS genes with IOP and glaucoma. Sixth, the initial pathway and network analyses generated exciting hypotheses that could be tested in other independent studies.

      Weaknesses:

      A few minor weaknesses have been noted. First, since snRNA-Seq and scRNA-Seq generated different coverage of expressed genes in the cells, how did the combined analyses balance the un-equal sequencing coverage and missing data points in the snRNA-Seq data? Second, the RNA/protein validation of selected SEC molecular markers was done using mouse anterior segment tissues. It would be more helpful to examine whether these molecular markers for SECs could work well in human SECs. Third, the effort to characterize the GWAS-identified IOP- and glaucoma-associated genes is exciting but with limited new information. Additional work could be performed to prioritize these genes.

    3. eLife assessment

      This is an important study characterizing the unique expression of mouse Schlemm's canal endothelial cells (SECs), which function in the aqueous humor outflow pathway of the eye. The work convincingly identifies novel biomarkers for SECs and molecular markers for inner wall and outer wall SECs, followed by targeted RNA and protein expression validation in mouse eyes. Gene networks and pathways were analyzed for their potential contribution to glaucoma pathogenesis.

    4. Reviewer #1 (Public Review):

      Summary:

      Balasubramanian et al. characterized the cell types comprising mouse Schlemm's canal (SC) using bulk and single-cell RNA sequencing (scRNA-seq). The results identify expression patterns that delineate the SC inner and outer wall cells and two inner wall 'states'. Further analysis demonstrates expression patterns of glaucoma-associated genes and receptor-ligand pairs between SEC's and neighboring trabecular meshwork.

      Strengths:

      While mouse SC has been profiled in previous scRNA-seq studies (van Zyl et al 2020, Thomson et al 2021), these data provide higher resolution of SC cell types, particularly endothelial cell (SEC) populations. SC is an important regulator of anterior chamber outflow and has important consequences for glaucoma.

      Weaknesses:

      (1) Since SC has previously been characterized in mouse, human, and other species by scRNA-seq in other studies, this study would benefit from more direct comparisons to published datasets. For example, Table 4 could be expanded to list the SC cell numbers profiled in each study. Expression patterns highlighted in this study could be independently verified by plotting in publicly available mouse SC datasets. Further, a comparison to human expression patterns would assess whether type-specific expression patterns are conserved. Alternatively, an integrated analysis could be performed. Indeed, the authors mention that an integrated analysis was attempted but the data is not shown. It is unclear if this was because of a lack of agreement between datasets or other reasons.

      (2) Figure 1 presents bulk RNA seq results comparing SEC, BEC, and LEC expression patterns. These populations were isolated using cell surface markers and enrichment by FACS. Since each EC population is derived from the same sample, the accuracy of this data hinges on the purity of enrichment. However, a reference is not given for this method and it is not clear how purity was validated. The authors later note that marker Emcn, which was used to identify BECs, is also expressed in SECs and LECs at lower levels. It should be demonstrated that these populations are clearly separated by flow cytometry.

      (3) Bulk RNA-seq analysis infers similarity from the number of DEGs between samples, however, this is not a robust indicator. A correlation analysis should be run to verify conclusions.

      (4) Figures 2-4 present three different datasets targeting the same tissue: 1) C57bl/6j scRNA-seq, 2) C57bl/6j snRNA-seq, 3) 129/sj scRNA-seq. Integrated analysis comparing datasets #1 to #2 and #3 is also presented. Integration methods are not described beyond 'normalization for cell numbers'. It is unclear if additional alignment methods were used. Integration across each of these datasets needs careful consideration, especially since different filtering methods were used (e.g. <20% mito in scRNA-seq and <5% in snRNA-seq). Improper integration could affect the ability to cluster or exaggerate differences between cell/types and states. It would be useful to demonstrate the contribution of different samples and datasets to each cell type/state to verify that these are not driven by batch effects, mouse strain, or collection platform.

      (5) IW1 and IW2 are not well separated, and it is unclear if these represent truly different cell states. Figure 5b shows the staining of CCL21A and describes expression in the 'posterior portion' but in the image there are no DAPI+ nuclei in the anterior portion, suggesting the sampling in this section is different from Figure 5a. This would be improved by co-staining NPNT and CCL21A to demonstrate specificity.

      (6) The substructures observed within clusters in sc/snRNA-seq data suggest that overall profiling may still not be comprehensive. This should be noted in the discussion.

    1. eLife assessment

      In this manuscript, the authors have identified Rapamycin, a common pharmacological tool, thought to only bind to the mTOR kinase, as an off-target modulator of the ion channel TRPM8, the main cold sensor in mammals. This is a valuable study, that presents solid evidence for its claims. The NMR methods employed need to be better validated in order to become a tool for the community.

    2. Reviewer #1 (Public Review):

      Summary:

      In this valuable study, the authors found that the macrolide drug rapamycin, which is an important pharmacological tool in the clinic and the research lab, is less specific than previously thought. They provide solid functional evidence that rapamycin activates TRPM8 and develop an NMR method to measure the specific binding of a ligand to a membrane protein.

      Strengths:

      The authors use a variety of complementary experimental techniques in several different systems, and their results support the conclusions drawn.

      Weaknesses:

      Controls are not shown in all cases, and a lack of unity across the figures makes the flow of the paper disjointed. The proposed location of the rapamycin binding pocket within the membrane means that molecular docking approaches designed for soluble proteins alone do not provide solid evidence for a rapamycin binding pocket location in TRPM8, but the authors are appropriately careful in stating that the model is consistent with their functional experiments.

      Impact:

      This work provides still more evidence for the polymodality of TRP channels, reminding both TRP channel researchers and those who use rapamycin in other contexts that the adjective "specific" is only meaningful in the context of what else has been explicitly tested.

    3. Reviewer #2 (Public Review):

      Summary:

      Tóth and Bazeli et al. find rapamycin activates heterologously-expressed TRPM8 and dissociated sensory neurons in a TRPM8-dependent way with Ca2+-imaging. With electrophysiology and STTD-NMR, they confirmed the activation is through direct interaction with TRPM8. Using mutants and computational modeling, the authored localized the binding site to the groove between S4 and S5, different than the binding pocket of cooling agents such as menthol. The hydroxyl group on carbon 40 within the cyclohexane ring in rapamycin is indispensable for activation, while other rapalogs with its replacement, such as everolimus, still bind but cannot activate TRPM8. Overall, the findings provide new insights into TRPM8 functions and may indicate previously unknown physiological effects or therapeutic mechanisms of rapamycin.

      Strengths:

      The authors spent extensive effort on demonstrating that the interaction between TRPM8 and rapamycin is direct. The evidence is solid. In probing the binding site and the structural-function relationship, the authors combined computational simulation and functional experiments. It is very impressive to see that "within" a rapamycin molecule, the portion shared with everolimus is for "binding", while the hydroxyl group in the cyclohexane ring is for activation. Such detailed dissection represents a successful trial in the computational biology-facilitated, functional experiment-validated study of TRP channel structural-activity relationship. The research draws the attention of scientists, including those outside the TRP channel field, to previously neglected effects of rapamycin, and therefore the manuscript deserves broad readership.

      Weaknesses:

      The significance of the research could be improved by showing or discussing whether a similar binding pocket is present in other TRP channels, and hence rapalogs might bind to or activate these TRP channels. Additionally, while the finding on TRPM8 is novel, it is worthwhile to perform more comprehensive pharmacological characterization, including single-channel recording and a few more mutant studies to offer further insight into the mechanism of rapamycin binding to S4~S5 pocket driving channel opening. It is also necessary to know if rapalogs have independent or synergistic effects on top of other activators, including cooling agents and lower temperature, and their dependence on regulators such as PIP2.

      Additional discussion that might be helpful:

      The authors did confirm that rapamycin does not activate TRPV1, TRPA1 and TRPM3. But other TRP channels, particularly other structurally similar TRPM channels, should be discussed or tested. Alignment of the amino acid sequences or structures at the predicted binding pocket might predict some possible outcomes. In particular, rapamycin is known to activate TRPML1 in a PI(3,5)P2-dependent manner, which should be highlighted in comparison among TRP channels (PMID: 35131932, 31112550).

    4. Reviewer #3 (Public Review):

      Summary:

      Rapamycin is a macrolide of immunologic therapeutic importance, proposed as a ligand of mTOR. It is also employed as in essays to probe protein-protein interactions.<br /> The authors serendipitously found that the drug rapamycin and some related compounds, potently activate the cationic channel TRPM8, which is the main mediator of cold sensation in mammals. The authors show that rapamycin might bind to a novel binding site that is different from the binding site for menthol, the prototypical activator of TRPM8. These solid results are important to a wide audience since rapamycin is a widely used drug and is also employed in essays to probe protein-protein interactions, which could be affected by potential specific interactions of rapamycin with other membrane proteins, as illustrated herein.

      Strengths:

      The authors employ several experimental approaches to convincingly show that rapamycin activates directly the TRPM8 cation channel and not an accessory protein or the surrounding membrane. In general, the electrophysiological, mutational and fluorescence imaging experiments are adequately carried out and cautiously interpreted, presenting a clear picture of the direct interaction with TRPM8. In particular, the authors convincingly show that the interactions of rapamycin with TRPM8 are distinct from interactions of menthol with the same ion channel.

      Weaknesses:

      The main weakness of the manuscript is the NMR method employed to show that rapamycin binds to TRPM8. The authors developed and deployed a novel signal processing approach based on subtraction of several independent NMR spectra to show that rapamycin binds to the TRPM8 protein and not to the surrounding membrane or other proteins. While interesting and potentially useful, the method is not well developed (several positive controls are missing) and is not presented in a clear manner, such that the quality of data can be assessed and the reliability and pertinence of the subtraction procedure evaluated.

    1. Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-performed study that develops a new modeling approach (MoA-HMM) to simultaneously characterize reinforcement learning parameters of different RL agents, as well as latent behavioral states that differ in the relative contributions of those agents to the animal's choices. They performed this study in rats trained to perform the two-step task. While the major advance of the paper is developing and rigorously validating this novel technical approach, there are also a number of interesting conceptual advances. For instance, humans performing the two-step task are thought to exhibit a trade-off between model-free and model-based strategies. However, the MoA-HMM did not reveal such a trade-off in the rats, but instead suggested a trade-off between model-based exploratory vs. exploitative strategies. Additionally, the firing rates of neurons in the orbitofrontal cortex (OFC) reflected latent behavioral states estimated from the HMM, suggesting that (1) characterizing dynamic behavioral strategies might help elucidate neural dynamics supporting behavior, and (2) OFC might reflect the contributions of one or a subset of RL agents that are preferentially active or engaged in particular states identified by the HMM.

      Strengths:

      The claims were generally well-supported by the data. The model was validated rigorously and was used to generate and test novel predictions about behavior and neural activity in OFC. The approach is likely to be generally useful for characterizing dynamic behavioral strategies.

      Weaknesses:

      There were a lot of typos and some figures were mis-referenced in the text and figure legends.

    2. eLife assessment

      This important work by Veneditto and colleagues developed a new modeling approach, called a mixture-of-agent hidden Markov model (MoA-HMM), in which choice behaviors are modeled as transitions between discrete states defined by different weighting of several reinforcement learning and decision strategies. The authors apply this approach to their previous data collected from rats performing the two-step task, and show that this method provides better fits to the data than previous methods, and predicts fluctuations in neural and other behavioral data. The reviewers found this study to be overall convincing, and the method is of general interest to the field.

    3. Reviewer #1 (Public Review):

      Summary:

      Motivated by the existence of different behavioral strategies (e.g. model-based vs. model-free), and potentially different neural circuits that underlie them, Venditto et al. introduce a new approach for inferring which strategies animals are using from data. In particular, they extend the mixture of agents (MoA) framework to accommodate the possibility that the weighting among different strategies might change over time. These temporal dynamics are introduced via a hidden Markov model (HMM), i.e. with discrete state transitions. These state transition probabilities and initial state probabilities are fit simultaneously along with the MoA parameters, which include decay/learning rate and mixture weightings, using the EM algorithm. The authors test their model on data from Miller et al., 2017, 2022, arguing that this formulation leads to (1) better fits and (2) improved interpretability over their original model, which did not include the HMM portion. Lastly, they claim that certain aspects of OFC firing are modulated by the internal state as identified by the MoA-HMM.

      Strengths:

      The paper is very well written and easy to follow, especially for one with a significant modeling component. Furthermore, the authors do an excellent job explaining and then disentangling many threads that are often knotted together in discussions of animal behavior and RL: model-free vs. model-based choice, outcome vs. choice-focused, exploration vs. exploitation, bias, preservation. Each of these concepts is quantified by particular parameters of their models. Model recovery (Fig. 3) is mostly convincing and licenses their fits to animal behavior later (although see below). While the specific claims made about behavior and neural activity are not especially surprising (e.g. the animals begin a session, in which rare vs. common transitions are not yet known, in a more exploratory mode), the MoA-HMM framework seems broadly applicable to other tasks in the field and useful for the purpose of quantification here.

      Weaknesses:

      The authors sometimes seem to equivocate on to what extent they view their model as a neural (as opposed to merely behavioral) description. For example, they introduce their paper by citing work that views heterogeneity in strategy as the result of "relatively independent, separable circuits that are conceptualized as supporting distinct strategies, each potentially competing for control." The HMM, of course, also relates to internal states of the animal. Therefore, the reader might come away with the impression that the MoA-HMM is literally trying to model dynamic, competing controllers in the brain (e.g. basal ganglia vs. frontal cortex), as opposed to giving a descriptive account of their emergent behavior. If the former is really the intended interpretation, the authors should say more about how they think the weighting/arbitration mechanism between alternative strategies is implemented, and how it can be modulated over time. If not, they should make this clearer.

      Second, while the authors demonstrate that model recovery recapitulates the weight dynamics and action values (Fig. 3), the actual parameters that are recovered are less precise (Fig. 3 Supplement 1). The authors should comment on how this might affect their later inferences from behavioral data. Furthermore, it would be better to quantify using the R^2 score between simulated and recovered, rather than the Pearson correlation (r), which doesn't enforce unity slope and zero intercept (i.e. the line that is plotted), and so will tend to exaggerate the strength of parameter recovery.

      Finally, the authors are very aware of the difficulties associated with long-timescale (minutes) correlations with neural activity, including both satiety and electrode drift, so they do attempt to control for this using a third-order polynomial as a time regressor as well as interaction terms (Fig. 7 Supplement 1). However, on net there does not appear to be any significant difference between the permutation-corrected CPDs computed for states 2 and 3 across all neurons (Fig. 7D). This stands in contrast to the claim that "the modulation of the reward effect can also be seen between states 2 and 3 - state 2, on average, sees a higher modulation to reward that lasts significantly longer than modulation in state 3," which might be true for the neuron in Fig. 7C, but is never quantified. Thus, while I am convinced state modulation exists for model-based (MBr) outcome value (Fig. 7A-B), I'm not convinced that these more gradual shifts can be isolated by the MoA-HMM model, which is important to keep in mind for anyone looking to apply this model to their own data.

    1. Author response:

      Reviewer #1:

      The phenomenon of stress-inducible mutagenesis in bacterial evolution remains a topic of heated debate. Consequently, the emergence of genetically encoded resistance may stem from either microevolution or the dissemination of pre-existing variants from polyclonal infections under drug pressure. We believe that the Introduction presents both of these hypotheses in a balanced manner to elucidate the rationale behind our mutation accumulation investigations.

      While we acknowledge the well-known existence of phenotypic antibiotic resistance, it's worth noting that conclusions regarding mutation rates are often drawn from fluctuation assays without confirmation of genetic-level changes. This discrepancy persists despite fluctuation assays accounting for both phenotypic and genotypic alterations. Combining genome sequencing with fluctuation assays underscores the importance of making this distinction.

      Thank you for the suggestion regarding improving the figures; we will incorporate these changes accordingly in the revised version. Additionally, we will address the rationale for using sub-lethal doses of antibiotics and compare our results with the referenced papers.

      Reviewer #2:

      Thank you for acknowledging the values of the manuscript and for the insightful suggestions for improvement. We agree on the necessity to directly connect the mutation accumulation experiments with the tolerance assay, and we have already initiated additional experiments to integrate into a revised version.

      We also agree with and have been aware of the notion that cell death affects the calculation of the mutation rate. However, the error in the estimation of the generation time leads to an overestimation of the mutation rate, which, in our case, reinforces the conclusion that no discernible increase in mutation rate occurs in our mutation accumulation experiment. In the revised version, we aim to address i) the source of variation in cell death degree and ii) its influence on calculations.

      The SNPs identified from the lineages of each treatment are compiled in the "unique muts.xls" file within the Figshare document bundle we included with the manuscript. We regret not providing a detailed reference to this in the manuscript; instead, the Figshare files were merely mentioned under the Data Availability section (No. 6) without specifics. As advised, we will create a supplementary table containing this data.

      Reviewer #3:

      Thank you for appreciating the manuscript's merits and for the instructive suggestions (also articulated in the specific comments). We agree that we should show the data on reduced colony growth on agar plates to demonstrate that the drug concentrations used in the study are relevant. We will include this in the revised version, as well as changes in response to all specific comments.

      We acknowledge that the observed upregulation of DNA repair enzymes and the low mutation rates under drug pressure represent correlative data. Therefore, we opted against presenting the qPCR results as a mechanistic explanation. In the manuscript, we carefully stated: "The observed upregulation of the relevant DNA repair enzymes might account for the low mutation rate even under drug pressure." We did not establish a mechanistic link or emphasize the repair activation in the title, abstract, or discussion. We recognize the necessity for a new series of targeted experiments to provide mechanistic explanations. In this paper, our aim is to convincingly demonstrate that antibiotic pressure did not induce the occurrence of new adaptive mutations.

    2. eLife assessment

      This useful study reports on the impact of antibiotic pressure on the genomic stability of the mc2155 strain of Mycobacterium smegmatis, a model for Mycobacterium tuberculosis. The study concludes that exposure to antibiotics did not lead to the emergence of new adaptive mutations in laboratory settings, contradicting the prevailing theory of antibiotic resistance development through drug-induced microevolution. While the genomic analysis provided detailed insights into the stability of M. smegmatis following exposure to standard TB treatment antibiotics, the evidence presented for antibiotic pressure not contributing to the occurrence of new adaptive mutations is still incomplete.

    3. Reviewer #1 (Public Review):

      In this manuscript, Molnar, Suranyi and colleagues have probed the genomic stability of Mycobacterium smegmatis in response to several anti-tuberculosis drugs as monotherapy and in combination. Unlike the study by Nyinoh and McFaddden http://dx.doi.org/10.1002/ddr.21497 (which should be cited), the authors use a sub-lethal dose of antibiotic. While this is motivated by sound technical considerations, the biological and therapeutic rationale could be further elaborated. The results the authors obtain are in line with papers examining the genomic mutation rate in vitro and from patient samples in Mycobacterium tuberculosis, in vitro in Mycobacterium smegmatis and in vitro in Mycobacterium tuberculosis (although the study by HL David (PMID: 4991927) is not cited). The results are confirmatory of previous studies. It is therefore puzzling why the authors propose the opposite hypothesis in the paper (i.e antibiotic exposure should increase mutation rates) merely to tear it down later. This straw-man style is entirely unnecessary. The results on the nucleotide pools are interesting, but the statistically significant data is difficult to identify as presented, and therefore the new biological insights are unclear. Finally, the authors show that a fluctuation assay generates mutations with higher frequencies that the genetic stability assays, confirming the well-known effect of phenotypic antibiotic resistance.

    4. Reviewer #2 (Public Review):

      In this study, the authors assess whether selective pressure from drug chemotherapy influences the emergence of drug resistance through the acquisition of genetic mutations or phenotypic tolerance. I commend the authors on their approach of utilizing the mutation accumulation (MA) assay as a means to answer this and whole genome sequencing of clones from the assay convincingly demonstrates low mutation rates in Mycobacteria when exposed to sub-inhibitory concentrations of antibiotics. Also, quantitative PCR highlighted the upregulation of DNA repair genes in Mycobacteria following drug treatment, implying the preservation of genomic integrity via specific repair pathways.

      Even though the findings stem from M. smegmatis exposure to antibiotics under in vitro conditions, this is still relevant in the context of the development of drug resistance so I can see where the authors' train of thought was heading in exploring this. However, I think important experiments to perform to more fully support the conclusion that resistance is largely associated with phenotypic rather than genetic factors would have been to either sequence clones from the ciprofloxacin tolerance assay (to show absence/ minimal genetic mutations) or to have tested the MIC of clones from the MA assay (to show an increase in MIC). There seems to be a disconnect between making these conclusions from experiments conducted under different conditions, or perhaps the authors can clarify why this was done. With regards to the sub-inhibitory drug concentration applied, there is significant variation in the viability as calculated by CFUs following the different treatments and there is evidence that cell death greatly affects the calculation of mutation rate (PMCID: PMC5966242). For instance, the COMBO treatment led to 6% viability whilst the INH treatment led to 80% cell viability. Are there any adjustments made to take this into account? It would also be useful to the reader to include a supplementary table of the SNPs detected from the lineages of each treatment - to determine if at any point rifampicin treatment led to mutations in rpoB, isoniazid to katG mutations, etc. Overall, while this study is tantalizingly suggestive of phenotypic tolerance playing a leading role in drug resistance (and perhaps genetic mutations a sub-ordinate role) a more substantial link is needed to clarify this.

    5. Reviewer #3 (Public Review):

      Summary:

      This manuscript describes how antibiotics influence genetic stability and survival in Mycobacterium smegmatis. Prolonged treatment with first-line antibiotics did not significantly impact mutation rates. Instead, adaptation to these drugs appears to be mediated by upregulation of DNA repair enzymes. While this study offers robust data, findings remain correlative and fall short of providing mechanistic insights.

      Strengths:

      The strength of this study is the use of genome-wide approaches to address the specific question of whether or not mycobacteria induce mutagenic potential upon antibiotic exposure.

      Weaknesses:

      The authors suggest that the upregulation of DNA repair enzymes ensures a low mutation rate under drug pressure. However, this suggestion is based on correlative data, and there is no mechanistic validation of their speculations in this study.

      Furthermore, as detailed below, some of the statements made by the authors are not substantiated by the data presented in the manuscript.

      Finally, some clarifications are needed for the methodologies employed in this study. Most importantly, reduced colony growth should be demonstrated on agar plates to indicate that the drug concentrations calculated from liquid culture growth can be applied to agar surface growth. Without such validations, the lack of induced mutation could simply be due to the fact that the drug concentrations used in this study were insufficient.

    1. Author response:

      eLife assessment

      This paper presents a valuable optimization algorithm for determining the spatio-temporal organization of chromatin. The algorithm identifies the polymer model that best fits population averaged Hi-C data and makes predictions about the spatio-temoral organization of specific genomic loci such as the oncogenic Myc locus. While the algorithm will be of value to biologists and physicists working in the field of genome organization, the provided methodological details and evidence are incomplete to fully substantiate the conclusions. In particular, the following would be beneficial: analysis of single-cell data, the inclusion of loci beyond Myc, testing the dependence of results on the chosen parameters, providing more details on CTCF occupancy at loop anchors, and better substantiating the claim about predictions of single-cell heterogeneity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study aim to use an optimization algorithm approach, based on the established Nelder-Mead method, to infer polymer models that best match input bulk Hi-C contact data. The procedure infers the best parameters of a generic polymer model that combines loop-extrusion (LE) dynamics and compartmentalization of chromatin types driven by weak biochemical affinities. Using this and DNA FISH, the authors investigate the chromatin structure of the MYC locus in leukemia cells, showing that loop extrusion alone cannot explain local pathogenic chromatin rearrangements. Finally, they study the locus single-cell heterogeneity and time dynamics.

      Strengths:

      • The optimization method provides a fast computational tool that speeds up the parameter search of complex chromatin polymer models and is a good technical advancement.

      • The method is not restricted to short genomic regions, as in principle it can be applied genome-wide to any input Hi-C dataset, and could be potentially useful for testing predictions on chromatin structure.

      Weaknesses:

      (1) The optimization is based on the iterative comparison of simulated and Hi-C contact matrices using the Spearman correlation. However, the inferred set of the best-fit simulation parameters could sensitively depend on such a specific metric choice, questioning the robustness of the output polymer models. How do results change by using different correlation coefficients?

      This is an important question. We have tested several metrics in the process of building the fitting procedure. We will showcase side-by-side comparisons of the fitting results obtained using these different metrics in an upcoming version of the preprint.

      (2) The best-fit contact threshold of 420nm seems a quite large value, considering that contact probabilities of pairs of loci at the mega-base scale are defined within 150nm (see, e.g., (Bintu et al. 2018) and (Takei et al. 2021)).

      This is a good point. Unfortunately, there is no established standard distance cutoff to map distances to Hi-C contact frequency data. Indeed, previous publications have used anywhere between 120 nm to 500 nm (see e.g. (Cardozo Gizzi et al. 2019), (Cattoni et al. 2017) , (Mateo et al. 2019), (Hafner et al. 2022), (Murphy and Boettiger 2022), (Takei et al. 2021), (Fudenberg and Imakaev 2017) , (Wang et al. 2016), (Su et al. 2020), (Chen et al. 2022), (Finn et al. 2019)). We will include a supplementary table in the upcoming revised preprint listing these values to demonstrate the lack of consensus. This large variation could reflect different chromatin compaction levels across distinct model systems, and different spatial resolutions in DNA FISH experiments performed by different labs. The variance in the threshold choice is also likely partially explained by Hi-C experimental details, e.g. the enzyme used for digestion, which biases the effective length scale of interactions detected (Akgol Oksuz et al. 2021). Among commonly used restriction enzymes, HindIII has a relatively low cutting frequency which results in a lower sensitivity to short-range interactions; on the other hand, MboI has a higher cutting frequency which results in a higher sensitivity to short-range interactions (Akgol Oksuz et al. 2021). Because the Hi-C data we used for the Myc locus in (Kloetgen et al. 2020) was generated using HindIII, we chose a distance cutoff close to the larger end of published values (420 nm).

      (3) In their model, the authors consider the presence of LE anchor sites at Hi-C TAD boundaries. Do they correspond to real, experimentally found CTCF sites located at genomic positions, or they are just assumed? A track of CTCF peaks of the considered chromatin loci would be needed.

      We apologize this was not clear. The LE anchor sites in the simulation model were chosen because they correspond to experimental CTCF sites and ChIP-seq peaks located at the corresponding genomic positions. Representative CTCF ChIP-seq tracks from (Kloetgen et al. 2020) will be added to figure 2 in the revised preprint version to emphasize this point.

      (4) In the model, each TAD is assigned a specific energy affinity value. Do the different domain types (i.e., different colors) have a mutually attractive energy? If so, what is its value and how is it determined? The simulated contact maps (e.g., Figure 2C) seem to allow attractions between different blocks, yet this is unclear.

      Sorry this was not explicit. The attraction energy between a pair of monomers in the simulation is determined using the geometric mean of the affinities of the two monomers. This applies to both monomers within the same domain and in different domains. This detail will be clarified in the upcoming revised preprint.

      (5) To substantiate the claim that the simulations can predict heterogeneity across single cells, the authors should perform additional analyses. For instance, they could plot the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions and check whether the models can recapitulate the FISH-observed variance or standard deviation. They could also add other testable predictions, e.g., on gyration radius distributions, kurtosis, all-against-all comparison of single-molecule distance matrices, etc,.

      We agree that heterogeneity prediction is a key advantage of the simulations. We do note that the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions measured by FISH were plotted in Fig. 3C as empirical cumulative probability distributions (as is standard in the field), side by side with the simulation predictions. Simulations indeed recapitulate the variance observed by FISH. We also had emphasized this important point in the main text: “Importantly, not just the average distances, but the shape of the distance distribution across individual cells closely matches the predictions of the simulations in both cell types, further confirming that the simulations can predict heterogeneity across cells.”

      (6) The authors state that loop extrusion is crucial for enhancer function only at large distances. How does that reconcile, e.g., with Mach et al. Nature Gen. (2022) where LE is found to constrain the dynamics of genomically close (150kb) chromatin loci?

      This is an interesting question. In (Mach et al. 2022), the authors tracked the physical distance between two fluorescent labels positioned next to either anchor of a ~150 kb engineered topological domain using live-cell imaging. They found that abrogation of the loop anchors by ablation of the CTCF binding motifs, or knock-down of the cohesin subunit Rad21 resulted in increased physical distance between the loci. HMM Modeling of the distance over time traces suggests that the increased distance resulted from rarer and shorter contacts between the anchors. While this might seem at odds with the results of Fig. 4L, we note a key difference between the loci. While (Mach et al. 2022) observed the dynamics of the distance separating two CTCF loop anchors, in our model only the MYC promoter is proximal to a loop anchor, while the position of the second locus is varied, but remains far from the other anchor. The deletion of the CTCF sites at both anchors in (Mach et al. 2022) indeed results in a lowered sensitivity of the physical distance to Rad21 knock-down, reminiscent of the results of Fig. 4L in our work. This result demonstrates that loop extrusion disruption disproportionately impacts distances between loci close to loop anchors, consistent with Hi-C results (Rao et al. 2017; Nora et al. 2017). We therefore believe that the models in our work and (Mach et al. 2022) are not at odds, but simply reflect that loop extrusion perturbations impact distances between loop anchors the most. Enhancer-Promoter loops are generally distinct from CTCF-mediated loops (Hsieh et al. 2020, 2022). While (Mach et al. 2022) represents a landmark study in our understanding of the dynamics of genomic folding by loop extrusion, we therefore believe that the locus we chose here - which matches the endogenous MYC architecture - may more accurately represent Enhancer-Promoter dynamics than a synthetic CTCF loop. To better articulate the similarities between model predictions and differences between the two loci, we will simulate a locus matching that of (Mach et al. 2022) in the upcoming revised preprint, and test the sensitivity of contact frequency and duration to in silico cohesin knock-down. This will also serve to extend the generality of our conclusions to different categories of genomic architectures, and the text will be clarified accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors Fu et al., developed polymer models that combine loop extrusion with attractive interactions to best describe Hi-C population average data. They analyzed Hi-C data of the MYC locus as an example and developed an optimization strategy to extract the parameters that best fit this average Hi-C data.

      Strengths:

      The model has an intuitive nature and the authors masterfully fitted the model to predict relevant biology/Hi-C methodology parameters. This includes loop extrusion parameters, the need for self-interaction with specific energies, and the time and distance parameters expected for Hi-C capture.

      Weaknesses:

      (1) We are no longer in the age in which the community only has access to population average Hi-C. Why was only the population average Hi-C used in this study?

      Can single-cell data: i.e. single-cell Hi-C/Dip-C data or chromatin tracing data (i.e. see Tan et al Science 2018 - for Dip-C, Bintu et al Science 2018, Su et al Cell 2020 for chromatin tracing, etc.) or even 2 color DNA FISH data (used here only as validation) better constrain these models? At the very least the simulations themselves could be used to answer this essential question.

      I am expecting that the single-cell variance and overall distributions of distances between loci might better constrain the models, and the authors should at least comment on it.

      We agree that it is possible to recapitulate single-cell Hi-C or chromatin tracing data with simulations, and that these data modalities have a superior potential to constrain polymer models because they provide an ensemble of single allele structures rather than population-averaged contact frequencies. However, these data remain out of reach for most labs compared to Hi-C. Our goal with this work was to provide an approachable method that anyone interested could deploy on their locus of choice, and reasoned that Hi-C currently remains the data modality available to most. We envision this strategy will help reach labs beyond the small number of groups expert in single cell chromatin architecture, and thus hopefully broaden the impact of polymer simulations in the chromatin organization field.

      Nevertheless, we do agree that the comparison of single-cell chromatin architectures to simulations is a fertile ground for future studies. We will include a brief discussion of the potential of single-cell architectures in an upcoming version of the manuscript.

      (2) The authors claimed "Our parameter optimization can be adapted to build biophysical models of any locus of interest. Despite the model's simplicity, the best-fit simulations are sufficient to predict the contribution of loop extrusion and domain interactions, as well as single-cell variability from Hi-C data. Modeling dynamics enables testing mechanistic relationships between chromatin dynamics and transcription regulation. As more experimental results emerge to define simulation parameters, updates to the model should further increase its power." The focus on the Myc locus in this study is too narrow for this claim. I am expecting at least one more locus for testing the generality of this model.

      We note that we used two distinct loci in the study, the MYC locus in leukemia vs T cells (Figs. 2-3) and a representative locus in experiments comparing WT CTCF with a mutant that leads to loss of a subset of CTCF binding sites (Fig. 1L). To further demonstrate generality, we will add to the upcoming revised preprint a demonstration of the simulation fitting to other loci acquired in different cell types.

      Akgol Oksuz, Betul, Liyan Yang, Sameer Abraham, Sergey V. Venev, Nils Krietenstein, Krishna Mohan Parsi, Hakan Ozadam, et al. 2021. “Systematic Evaluation of Chromosome Conformation Capture Assays.” Nature Methods 18 (9): 1046–55.

      Bintu, Bogdan, Leslie J. Mateo, Jun-Han Su, Nicholas A. Sinnott-Armstrong, Mirae Parker, Seon Kinrot, Kei Yamaya, Alistair N. Boettiger, and Xiaowei Zhuang. 2018. “Super-Resolution Chromatin Tracing Reveals Domains and Cooperative Interactions in Single Cells.” Science 362 (6413). https://doi.org/10.1126/science.aau1783.

      Cardozo Gizzi, Andrés M., Diego I. Cattoni, Jean-Bernard Fiche, Sergio M. Espinola, Julian Gurgo, Olivier Messina, Christophe Houbron, et al. 2019. “Microscopy-Based Chromosome Conformation Capture Enables Simultaneous Visualization of Genome Organization and Transcription in Intact Organisms.” Molecular Cell 74 (1): 212–22.e5.

      Cattoni, Diego I., Andrés M. Cardozo Gizzi, Mariya Georgieva, Marco Di Stefano, Alessandro Valeri, Delphine Chamousset, Christophe Houbron, et al. 2017. “Single-Cell Absolute Contact Probability Detection Reveals Chromosomes Are Organized by Multiple Low-Frequency yet Specific Interactions.” Nature Communications 8 (1): 1753.

      Chen, Liang-Fu, Hannah Katherine Long, Minhee Park, Tomek Swigut, Alistair Nicol Boettiger, and Joanna Wysocka. 2022. “Structural Elements Facilitate Extreme Long-Range Gene Regulation at a Human Disease Locus.” bioRxiv. https://doi.org/10.1101/2022.10.20.513057.

      Finn, Elizabeth H., Gianluca Pegoraro, Hugo B. Brandão, Anne-Laure Valton, Marlies E. Oomen, Job Dekker, Leonid Mirny, and Tom Misteli. 2019. “Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization.” Cell 176 (6): 1502–15.e10.

      Fudenberg, Geoffrey, and Maxim Imakaev. 2017. “FISH-Ing for Captured Contacts: Towards Reconciling FISH and 3C.” Nature Methods 14 (7): 673–78.

      Hafner, Antonina, Minhee Park, Scott E. Berger, Elphège P. Nora, and Alistair N. Boettiger. 2022. “Loop Stacking Organizes Genome Folding from TADs to Chromosomes.” bioRxiv. https://doi.org/10.1101/2022.07.13.499982.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Xavier Darzacq, and Robert Tjian. 2022. “Enhancer-Promoter Interactions and Transcription Are Largely Maintained upon Acute Loss of CTCF, Cohesin, WAPL or YY1.” Nature Genetics 54 (12): 1919–32.

      Hsieh, Tsung-Han S., Claudia Cattoglio, Elena Slobodyanyuk, Anders S. Hansen, Oliver J. Rando, Robert Tjian, and Xavier Darzacq. 2020. “Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.” Molecular Cell 78 (3): 539–53.e8.

      Kloetgen, Andreas, Palaniraja Thandapani, Panagiotis Ntziachristos, Yohana Ghebrechristos, Sofia Nomikou, Charalampos Lazaris, Xufeng Chen, et al. 2020. “Three-Dimensional Chromatin Landscapes in T Cell Acute Lymphoblastic Leukemia.” Nature Genetics 52 (4): 388–400.

      Mach, Pia, Pavel I. Kos, Yinxiu Zhan, Julie Cramard, Simon Gaudin, Jana Tünnermann, Edoardo Marchi, et al. 2022. “Cohesin and CTCF Control the Dynamics of Chromosome Folding.” Nature Genetics 54 (12): 1907–18.

      Mateo, Leslie J., Sedona E. Murphy, Antonina Hafner, Isaac S. Cinquini, Carly A. Walker, and Alistair N. Boettiger. 2019. “Visualizing DNA Folding and RNA in Embryos at Single-Cell Resolution.” Nature 568 (7750): 49–54.

      Murphy, Sedona, and Alistair Nicol Boettiger. 2022. “Polycomb Repression of Hox Genes Involves Spatial Feedback but Not Domain Compaction or Demixing.” bioRxiv. https://doi.org/10.1101/2022.10.14.512199.

      Nora, Elphège P., Anton Goloborodko, Anne-Laure Valton, Johan H. Gibcus, Alec Uebersohn, Nezar Abdennur, Job Dekker, Leonid A. Mirny, and Benoit G. Bruneau. 2017. “Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization.” Cell 169 (5): 930–44.e22.

      Nuebler, Johannes, Geoffrey Fudenberg, Maxim Imakaev, Nezar Abdennur, and Leonid A. Mirny. 2018. “Chromatin Organization by an Interplay of Loop Extrusion and Compartmental Segregation.” Proceedings of the National Academy of Sciences of the United States of America 115 (29): E6697–6706.

      Rao, Suhas S. P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. 2017. “Cohesin Loss Eliminates All Loop Domains.” Cell 171 (2): 305–20.e24.

      Su, Jun-Han, Pu Zheng, Seon S. Kinrot, Bogdan Bintu, and Xiaowei Zhuang. 2020. “Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin.” Cell 182 (6): 1641–59.e26.

      Takei, Yodai, Shiwei Zheng, Jina Yun, Sheel Shah, Nico Pierson, Jonathan White, Simone Schindler, Carsten H. Tischbirek, Guo-Cheng Yuan, and Long Cai. 2021. “Single-Cell Nuclear Architecture across Cell Types in the Mouse Brain.” Science 374 (6567): 586–94.

      Wang, Siyuan, Jun-Han Su, Brian J. Beliveau, Bogdan Bintu, Jeffrey R. Moffitt, Chao-Ting Wu, and Xiaowei Zhuang. 2016. “Spatial Organization of Chromatin Domains and Compartments in Single Chromosomes.” Science 353 (6299): 598–602.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors Fu et al., developed polymer models that combine loop extrusion with attractive interactions to best describe Hi-C population average data. They analyzed Hi-C data of the MYC locus as an example and developed an optimization strategy to extract the parameters that best fit this average Hi-C data.

      Strengths:

      The model has an intuitive nature and the authors masterfully fitted the model to predict relevant biology/Hi-C methodology parameters. This includes loop extrusion parameters, the need for self-interaction with specific energies, and the time and distance parameters expected for Hi-C capture.

      Weaknesses:

      (1) We are no longer in the age in which the community only has access to population average Hi-C. Why was only the population average Hi-C used in this study?

      Can single-cell data: i.e. single-cell Hi-C/Dip-C data or chromatin tracing data (i.e. see Tan et al Science 2018 - for Dip-C, Bintu et al Science 2018, Su et al Cell 2020 for chromatin tracing, etc.) or even 2 color DNA FISH data (used here only as validation) better constrain these models? At the very least the simulations themselves could be used to answer this essential question.

      I am expecting that the single-cell variance and overall distributions of distances between loci might better constrain the models, and the authors should at least comment on it.

      (2) The authors claimed "Our parameter optimization can be adapted to build biophysical models of any locus of interest. Despite the model's simplicity, the best-fit simulations are sufficient to predict the contribution of loop extrusion and domain interactions, as well as single-cell variability from Hi-C data. Modeling dynamics enables testing mechanistic relationships between chromatin dynamics and transcription regulation. As more experimental results emerge to define simulation parameters, updates to the model should further increase its power." The focus on the Myc locus in this study is too narrow for this claim. I am expecting at least one more locus for testing the generality of this model.

    3. eLife assessment

      This paper presents a valuable optimization algorithm for determining the spatio-temporal organization of chromatin. The algorithm identifies the polymer model that best fits population averaged Hi-C data and makes predictions about the spatio-temoral organization of specific genomic loci such as the oncogenic Myc locus. While the algorithm will be of value to biologists and physicists working in the field of genome organization, the provided methodological details and evidence are incomplete to fully substantiate the conclusions. In particular, the following would be beneficial: analysis of single-cell data, the inclusion of loci beyond Myc, testing the dependence of results on the chosen parameters, providing more details on CTCF occupancy at loop anchors, and better substantiating the claim about predictions of single-cell heterogeneity.

    4. Reviewer #1 (Public Review):

      Summary:

      The authors of this study aim to use an optimization algorithm approach, based on the established Nelder-Mead method, to infer polymer models that best match input bulk Hi-C contact data. The procedure infers the best parameters of a generic polymer model that combines loop-extrusion (LE) dynamics and compartmentalization of chromatin types driven by weak biochemical affinities. Using this and DNA FISH, the authors investigate the chromatin structure of the MYC locus in leukemia cells, showing that loop extrusion alone cannot explain local pathogenic chromatin rearrangements. Finally, they study the locus single-cell heterogeneity and time dynamics.

      Strengths:

      -The optimization method provides a fast computational tool that speeds up the parameter search of complex chromatin polymer models and is a good technical advancement.

      -The method is not restricted to short genomic regions, as in principle it can be applied genome-wide to any input Hi-C dataset, and could be potentially useful for testing predictions on chromatin structure.

      Weaknesses:

      (1) The optimization is based on the iterative comparison of simulated and Hi-C contact matrices using the Spearman correlation. However, the inferred set of the best-fit simulation parameters could sensitively depend on such a specific metric choice, questioning the robustness of the output polymer models. How do results change by using different correlation coefficients?

      (2) The best-fit contact threshold of 420nm seems a quite large value, considering that contact probabilities of pairs of loci at the mega-base scale are defined within 150nm (see, e.g., Bintu et al. Science (2018) and Takei et al. Science (2021)).

      (3) In their model, the authors consider the presence of LE anchor sites at Hi-C TAD boundaries. Do they correspond to real, experimentally found CTCF sites located at genomic positions, or they are just assumed? A track of CTCF peaks of the considered chromatin loci would be needed.

      (4) In the model, each TAD is assigned a specific energy affinity value. Do the different domain types (i.e., different colors) have a mutually attractive energy? If so, what is its value and how is it determined? The simulated contact maps (e.g., Figure 2C) seem to allow attractions between different blocks, yet this is unclear.

      (5) To substantiate the claim that the simulations can predict heterogeneity across single cells, the authors should perform additional analyses. For instance, they could plot the histograms (models vs. experiments) of the TAD2-TAD4 distance distributions and check whether the models can recapitulate the FISH-observed variance or standard deviation. They could also add other testable predictions, e.g., on gyration radius distributions, kurtosis, all-against-all comparison of single-molecule distance matrices, etc,.

      (6) The authors state that loop extrusion is crucial for enhancer function only at large distances. How does that reconcile, e.g., with Mach et al. Nature Gen. (2022) where LE is found to constrain the dynamics of genomically close (150kb) chromatin loci?

    1. Reviewer #2 (Public Review):

      Summary:

      The paper presents PPI-hotspot a method to predict PPI-hotspots. Overall, it could be useful but serious concerns about the validation and benchmarking of the methodology make it difficult to predict its reliability.

      Strengths:

      Develops an extended benchmark of hot-spots.

      Weaknesses:

      (1) Novelty seems to be just in the extended training set. Features and approaches have been used before.

      (2) As far as I can tell the training and testing sets are the same. If I am correct, it is a fatal flaw.

      (3) Comparisons should state that: SPOTONE is a sequence (only) based ML method that uses similar features but is trained on a smaller dataset. FTmap I think predicts binding sites, I don't understand how it can be compared with hot spots. Suggesting superiority by comparing with these methods is an overreach.

      (4) Training in the same dataset as SPOTONE, and then comparing results in targets without structure could be valuable.

      (5) The paper presents as validation of the prediction and experimental validation of hotspots in human eEF2. Several predictions were made but only one was confirmed, what was the overall success rate of this exercise?

    2. eLife assessment

      The manuscript presents a machine-learning method to predict protein hotspot residues. The validation is incomplete, along with the misinterpretation of the results with other current methods like FTMap.

    3. Reviewer #1 (Public Review):

      Summary:

      The paper describes a program developed to identify PPI-hot spots using the free protein structure and compares it to FTMap and SPOTONE, two webservers that they consider as competitive approaches to the problem. On the positive side, I appreciate the effort in providing a new webserver that can be tested by the community but have two major concerns as follows.

      (1) The comparison to the FTMap program is wrong. The authors misinterpret the article they refer to, i.e., Zerbe et al. "Relationship between hot spot residues and ligand binding hot spots in protein-protein interfaces" J. Chem. Inf. Model. 52, 2236-2244, (2012). FTMap identifies hot spots that bind small molecular ligands. The Zerbe et al. article shows that such hot spots tend to interact with hot spot residues on the partner protein in a protein-protein complex (emphasis on "partner"). Thus, the hot spots identified by FTMap are not the hot spots defined by the authors. In fact, because the Zerbe paper considers the partner protein in a complex, the results cannot be compared to the results of Chen et al. This difference is missed by the authors, and hence the comparison of the FTMap is invalid. I did not investigate the comparison to SPOTONE, and hence have no opinion.

      (2) Chen et al. use a number of usual features in a variety of simple machine-learning methods to identify hot spot residues. This approach has been used in the literature for more than a decade. Although the authors say that they were able to find only FTMap and SPOTONE as servers, there are dozens of papers that describe such a methodology. Some examples are given here: (Higa and Tozzi, 2009; Keskin, et al., 2005; Lise, et al., 2011; Tuncbag, et al., 2009; Xia, et al., 2010). There are certainly more papers. Thus, while I consider the web server as a potentially useful contribution, the paper does not provide a fundamentally novel approach.

      Higa, R.H. and Tozzi, C.L. Prediction of binding hot spot residues by using structural and evolutionary parameters. Genet Mol Biol 2009;32(3):626-633.

      Keskin, O., Ma, B.Y. and Nussinov, R. Hot regions in protein-protein interactions: The organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005;345(5):1281-1294.

      Lise, S., et al. Predictions of Hot Spot Residues at Protein-Protein Interfaces Using Support Vector Machines. PLoS One 2011;6(2).

      Tuncbag, N., Gursoy, A. and Keskin, O. Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 2009;25(12):1513-1520.

      Xia, J.F., et al. APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics 2010;11:174.

      Strengths:<br /> A new web server was developed for detecting protein-protein interaction hot spots.

      Weaknesses:<br /> The comparison to FTMap results is wrong. The method is not novel.

    1. eLife assessment

      This study provides valuable insights and allows for hypothesis generation around diet-microbe-host interactions in alcohol use disorder. The strength of the evidence is convincing: the work is done in a rigorous manner and includes well-characterized and described human samples. Limitations include the cross-sectional study design, and the authors should clarify their experimental groups and definitions.

    2. Reviewer #1 (Public Review):

      Summary:

      This work by Leclercq and colleagues performed metabolomics on biospecimens collected from 96 patients diagnosed with several types of alcohol use disorders (AUD). The authors discovered strong alterations in circulating glycerophospholipids, bile acids, and some gut microbe-derived metabolites in AUD patients compared to controls. An exciting part of this work is that metabolomics was also performed in frontal cortex of post-mortem brains and cerebrospinal fluid of heavy alcohol users, and some of the same metabolites were seen to be altered in the central nervous system. This is an important study that will form the basis for hypothesis generation around diet-microbe-host interactions in alcohol use disorder. The work is done in a highly rigorous manner, and the rigorously collected human samples are a clear strength of this work. Overall, many new insights may be gained by this work, and it is poised to have a high impact on the field.

      Strengths:

      (1) The rigorously collected patient-derived samples.

      (2) There is high rigor in the metabolomics investigation.

      (3) Statistical analyses are well-described and strong.

      (4) An evident strength is the careful control of taking blood samples at the same time of the day to avoid alterations in meal- and circadian-related fluctuations in metabolites.

      Weaknesses:

      (1) Some validation in animal models of ethanol exposure compared to pair-fed controls would help strengthen causal relationships between metabolites and alterations in the CNS.

      (2) The classification of "heavy alcohol users" based on autopsy reports may not be that accurate.

      (3) The fact that most people with alcohol use disorder choose to drink over eating food, there needs to be some more discussion around how dietary intake (secondary to heavy drinking) most likely has a significant impact on the metabolome.

    3. Reviewer #2 (Public Review):

      The authors carried out the current studies with the justification that the biochemical mechanisms that lead to alcohol addiction are incompletely understood. The topic and question addressed here are impactful and indeed deserve further research. To this end, a metabolomics approach toward investigating the metabolic effects of alcohol use disorder and the effect of alcohol withdrawal in AUD subjects is valuable. However, it is primarily descriptive in nature, and these data alone do not meet the stated goal of investigating biochemical mechanisms of alcohol addiction. The current work's most significant limitation is the cross-sectional study design, though inadequate description and citation of the underlying methodological approaches also hampers interest.

      Most of the data are cross-sectional in the study design, i.e., alcohol use disorder vs controls. However, it is well established that there is a high degree of interpersonal variation with metabolism, and further, there is somewhat high intra-personal variation in metabolism over time. This means that the relatively small cohort of subjects is unlikely to reflect the broader condition of interest (AUD/withdrawal). The authors report a comparison of a later time-point after alcohol withdrawal (T2) vs. the AUD condition. However, without replicative time points from the control subjects it is difficult to assess how much of these changes are due to withdrawal vs the intra-personal variation described above. Overall, there is not enough experimental context to interpret these findings into a biological understanding. For example, while several metabolites are linked with AUD and associated with microbiome or host metabolism based on existing literature, it's unclear from the current study what function these changes have concerning AUD, if any. The authors also argue that alcohol withdrawal shifts the AUD plasma metabolic fingerprint towards healthy controls (line 153). However, this is hard to assess based on the plots provided since the change in the direction of the orange data subset is considers AUD T2 vs T1. In contrast, AUD T2 vs Control would represent the claimed shift. To support these claims, the authors would better support their argument by showing this comparison as well as showing all experimental groups (including control subjects) in their multi-dimensional model (e.g., PCA). The authors attempt to extend the significance of their findings by assessing post-mortem brain tissues from AUD subjects; however, the finding that many of the metabolites changed in T2/T1 are also present in AUD brain tissues is interesting; however, not strongly supporting of the authors' claims that these metabolites are markers of AUD (line 173). Concerning the plasma cohort itself, it is unclear how the authors assessed for compliance with alcohol withdrawal or whether the subjects' blood-alcohol levels were independently verified.

      The second area of concern is the need for more description of the analytical methodology, the lack of metabolite identification validation evidence, and related statistical questions. The authors cite reference #59 regarding the general methodology. However, this reference from their group is a tutorial/review/protocol-focused resource paper, and it is needs to be clarified how specific critical steps were actually applied to the current plasma study samples given the range of descriptions provided in the citations. The authors report a variety of interesting metabolites, including their primary fragment intensities, which are appreciated (Supplementary Table 3), but no MS2 matching scores are provided for level 2 or 3 hits. Further, level 1 hits under their definition are validated by an in-house standard, but no supporting data are provided besides this categorization. Finally, a common risk in such descriptive studies is finding spurious associations, especially considering many factors described in the current work. These include AUD, depression, anxiety, craving, withdrawal, etc. The authors describe the use of BH correction for multiple-hypothesis testing. However, this approach only accounts for the many possible metabolite association tests within each comparison (such as metabolites vs depression). It does not account for the multi-variate comparisons to the many behavior/clinical factors described above. The authors should employ one of several common strategies, such as linear mixed effects models, for these types of multi-variate assessments.

    1. Reviewer #1 (Public Review):

      Summary:

      Arman Angaji and his team delved into the intricate world of tumor growth and evolution, utilizing a blend of computer simulations and real patient data from liver cancer.

      Strengths:

      Their analysis of how mutations and clones are distributed within tumors revealed an interesting finding: tumors don't just spread from their edges as previously believed. Instead, they expand both from within and the edges simultaneously, suggesting a unique growth mode. This mode naturally indicates that external forces may play a role in cancer cells dispersion within the tumor. Moreover, their research hints at an intriguing phenomenon - the high death rate of progenitor cells and extremely slow pace in growth in the initial phase of tumor expansion. Understanding this dynamic could significantly impact our comprehension of cancer development.

      Weaknesses:

      It's important to note, however, that this study relies on specific computer models, metrics derived from inferred clones, and a limited number of patient data. While the insights gained are promising, further investigation is essential to validate these findings. Nonetheless, this work opens up exciting avenues for comprehending the evolution of cancers.

    2. eLife assessment

      The paper uses published data and a proposed cell-based model to understand how growth and death mechanisms lead to the observed data. This work provides an important insight into the early stages of tumour development. From the work provided here, the results are solid, showing a thorough analysis. However, the work has not fully specified the model, which can lead to some questions around the model's suitability.

    3. Reviewer #2 (Public Review):

      Summary:

      The article uses a cell-based model to investigate how mutations and cells spread throughout a tumour. The paper uses published data and the proposed model to understand how growth and death mechanisms lead to the observed data. This work provides an insight into the early stages of tumour development. From the work provided here, the results are solid, showing a thorough analysis. However, the work has not fully specified the model, which can lead to some questions around the model's suitability. The article is well-written and presents a very suitable and rigorous analysis to describe the data. The authors did a particularly nice job of the discussion and decision of their "metrics of interest", though this is not the main aim of this work.

      Strengths:

      Due to the particularly nice and tractable cell-based model, the authors are able to perform a thorough analysis to compare the published data to that simulated with their model. They then used their computational model to investigate different growth mechanisms of volume growth and surface growth. With this approach, the authors are able to compare the metric of interest (here, the direction angle of a new mutant clone, the dispersion of mutants throughout the tumour) to quantify how the different growth models compare to the observed data. The authors have also used inference methods to identify model parameters based on the data observed. The authors performed a rigorous analysis and have chosen the metrics in an appropriate manner to compare the different growth mechanisms.

      Weaknesses:

      The work contained within this article considers a single cell-based model. While ideally, this is sufficient, results from simulated multi-cellular systems can often be sensitive to the model choice. Performing this work with various other standard models would strengthen the results significantly. This is, however, not an easy task.

      Context:

      Improved mechanistic understanding into the early developmental stages of tumours will further assist in disease treatment and quantification. Understanding how readily and quickly a tumour is evolving is key to understanding how it will develop and progress. This work provides a solid example as to how this can be achieved with data alongside simulated models.

    1. eLife assessment

      Leveraging state-of-the-art experimental and analytical approaches, this valuable study characterizes the recruitment and activation of large populations of human motor units during slow isometric contractions in two lower limb muscles. Evidence for many claims is solid, however, the main claim that this study reveals rate coding of entire motoneuron pools requires additional data in more dynamic conditions.

    2. Reviewer #1 (Public Review):

      Summary:

      This study explores the neural control of muscle by decomposing the firing activity of constituent motor units from the grid of surface electromyography (EMG) in the Tibialis (TA) Anterior and Vastus Lateralis (VL) during isometric contractions. The study involves extensive samples of motor units across the broadest range of voluntary contraction intensities up to 80% of MVC. The authors examine the rate coding of the population of motor units, which describes the instantaneous firing rate of each motor unit as a function of muscle force. This relationship is characterized by a natural logarithm function that delineates two distinct phases: an initial phase with a steep acceleration in firing rate, particularly pronounced in low-threshold motor units, and a subsequent modest linear increase in firing rate, more significant in high-threshold motor units.

      Strengths:

      The study makes a significant contribution to the field of neuromuscular physiology by providing a detailed analysis of motor unit behavior during muscle contractions in a few ways.

      (1) The significance lies in its comprehensive framework of motor unit activity during isometric contractions in a broad range of intensities, providing insights into the non-linear relationship between the firing rate and the muscle force. The extensive sample of motor units across the pool confirms the observation in animal studies in which the spinal motoneuron exhibits a discharge consisting of distinct phases in response to synaptic currents, under the influence of persistent inward currents. As such, it is now reasonable to state the human motor units across the pool are also under the control of gain modulation via some neuromodulatory effects in addition to synaptic inputs arising from ionotropic effects.

      (2) The firing scheme across the entire motoneuron pool revealed in this study reconciles the discrepancy in firing organization under debate; i.e., whether it is 'onion skin' like or not (Heckman and Enoka 2012). The onion skin like model states that the low threshold motor units discharge higher than high threshold motor units and have been held for a long time because the firing behaviors were examined in a partial range of contraction force range due to technical limitations. This reconciliation is crucial because it is fundamental to modelling the organization of motor unit recruitment and rate coding to achieve a desired force generation to advance our understanding of motor control.

      (3) The extensive data collection with a novel blind source separation algorithm on the expanded number of channels of surface EMG signal provides a robust dataset that enhances the reliability and validity of findings, setting a new standard for empirical studies in the field.

      Collectively, this study fills several knowledge gaps in the field and advances our understanding of the mechanism underlying the isometric force generation.

      Weaknesses:

      Although the findings and claims based on them are mostly well aligned, some accounts of the methods and claims need to be clarified.

      (1) The authors examine the input-output function of a motor unit by constructing models, using force as an input and discharge rate as an output. It sounds circular, or the other way around to use the muscle force as an input variable, because the muscle force is the result of motor unit discharges, not the cause that elicits the discharges. More specifically, as a result of non-linear interactions of synchronous and/or asynchronous discharges of a population of a given motoneuron pool that give rise to transient increase/maintenance in twitch force, the gross muscle force is attained. I acknowledge that it is extremely challenging experimentally to measure synaptic currents impinging upon the spinal motoneurons in human subjects and the author has an assumption that the force could be used as a proxy of synaptic currents. However, it is necessary to explicitly provide the caveats and rationale behind that. Force could be used as the input variable for modelling.

      2) The authors examine the firing organizations in TA and VL in this study without explicit purposes and rationale for choosing these muscles. The lack of accounts makes it hard for the readers to interpret the data presented, particularly in terms of comparing the results from the different muscles.

      (3) In the methods, the author described the manual curation process after applying the blind source separation algorithm. For the readers to understand the whole process of decomposition and to secure rigor and robustness of the analyses, it would be necessary to provide details on what exact curation is performed with what criteria.

      (4) In Figure 3, the early recruited units tend to become untraceable in the higher range of contraction. This is more pronounced in the muscle VL. This limitation would ambiguate the whole firing curve along the force axis and therefore limitation and the applicability in the different muscles needs to be discussed.

      (5) It is unclear how commonly the notion "the long-held belief that rate coding is similar across motor units from the same pool" is held among the community without a reference. Different firing organizations have been modelled and discussed in the seminal paper by Fuglevand et al. (1993), and as far as I understand, the debate has not converged to a specific consensus. As such, any reference would be required to support the claim the notion is widely recognized.

      (6) The authors claim that the firing behavior as a function of force is well characterized by a natural logarithmic function, which consists of initial steep acceleration followed by a modest increase in firing rate. Arguably the gain modulation in firing rate could be attributed to a neuromodulatory effect on the spinal motoneuron, which has been suggested by a number of animal studies. However, the complexity of the interactions between ionotropic and neuromodulatory inputs to motoneurons may require further elucidation to fully understand the mechanisms of neural control; it is possible to consider the differential acceleration among different threshold motor units as a differential combinatory effect of ionotropic and neuromodulatory inputs, but it is not trivially determined how differentially or systematically the inputs are organized. Likewise, the authors make an account for the difference in firing rate between TA and VL in terms of different amounts or balances of excitatory and inhibitory inputs to the motoneuron pool, but again this could be explained by other factors, such as a different extent of neuromodulatory effects. To determine the complexity of the interactions, further studies will be warranted.

      (7) It is unclear with the account " ... the bandwidth of muscle force is < 10Hz during isometric contraction" in the manuscript alone, and therefore, it is difficult to understand the following claim. It appears very interesting and crucial for motor unit discharge and force generation and maintenance because it would pose a question of why the discharge rate of most motor units is higher than 10Hz, despite the bandwidth being so limited, but needs to be elaborated.

      (References)

      Heckman, C. J. & Enoka, R. M. Motor unit. Comprehensive Physiology 2, 2629-2682 (2012).

      Fuglevand, A. J., Winter, D. A. & Patla, A. E. Models of recruitment and rate coding organization in motor-unit pools. J Neurophysiol 70, 2470-2488 (1993).

    3. Reviewer #2 (Public Review):

      Summary:

      The motivation for this study is to provide a comprehensive assessment of motor unit firing rate responses of entire pools during isometric contractions. The authors have used new quantitative methods to extract more unique motor units across contractions than prior studies. This was achieved by recording muscle fibre action potentials from four high-density surface electromyogram (HDsEMG) arrays (Caillet et al., 2023), quantifying residual EMG comparing the recorded and data-based simulation (Figure 1A-B), and developing a metric to compare the spatial identification for each motor unit (Figure 1D-E). From identified motor units, the authors have provided a detailed characterization of recruitment and firing rate responses during slow voluntary isometric contractions in the vastus lateralis and tibialis anterior muscles up to 80% of maximum intensity. In the lower limb, it is interesting how lower threshold motor units have firing rate responses that saturate, whereas higher threshold units that presumably produce higher muscle contractile forces continue to increase their firing rate. In many ways, these results agree with the rate coding of motor units in the extensor digitorum communis muscle (Monster and Chan, 1977). The paper is detailed, and the analyses are well explained. However, there are several points that I think should be addressed to strengthen the paper.

      General comments:

      (1) The authors claim they have measured the complete rate coding profiles of motor units in the vastus lateralis and tibialis anterior muscles. However, this study quantified rate coding during slow and prolonged voluntary isometric contractions whereas the function of rate coding during movements (Grimby and Hannerz, 1977) or more complex isometric contractions (Cutsem and Duchateau, 2005; Marshall et al., 2022) remains unexplored. For example, supraspinal inputs may not scale the same way across low and higher threshold motor units, or between muscles (Devanne et al., 1997), making the response of firing rates to increasing isometric contraction force less clear. Conceptually, the authors focus on the literature on intrinsic motoneurone properties, but in vivo, other possibilities are that descending supraspinal drive, spinal network dynamics, and afferent inputs have different effects across motor unit sizes, muscles, and types of contractions. Also, the influence from local muscles that act as synergists (e.g., vastii muscles for the vastus lateralis, and peroneal muscles that evert the foot for the tibialis anterior) or antagonists (coactivation during higher contraction intensities would stiffen the joint) may provide differential forms of proprioceptive feedback across motor pools.

      (2) The evidence that the entire motor unit pool was recorded per muscle is not clear. There appears to be substantial residual EMG (Figure 1B), signal cancellation of smaller motor units (lines 172-176), some participants had fewer than 20 identified motor units, and contractions never went above 80% of MVC. Also, to my understanding, there remains no gold-standard in awake humans to estimate the total motor unit number in order to determine if the entire pool was decomposed. Furthermore, using four HDsEMG arrays also raises questions about how some channels were placed over non-target muscles, and if motor units were decomposed from surrounding synergists.

      (3) The authors claim (Abstract L51; Discussion L376) that a commonly held view in the field is that rate coding is similar across motor units from the same pool. Perhaps this is in reference to some studies that have carefully assessed lower threshold motor units during lower force ramp contractions (e.g., Fuglevand et al., 2015; Revill and Fuglevand, 2017). However, a more complete integration of the literature exploring motor unit firing rate responses during rapid isometric contractions, comparing different muscles and contraction intensities would be helpful. From Figure 3, the range of rate coding in the tibialis anterior (~7-40 Hz) is greater than the vastus lateralis (~5-22 Hz) muscle across contraction levels. In agreement with other studies, the range of rate coding within some muscles is different than others (Kirk et al., 2021) and during maximal intensity (Bellemare et al., 1983) or rapid contractions (Desmedt and Godaux, 1978). Likewise, within a motor pool, there is a diversity of firing rate responses across motor units of different sizes as a function of isometric force (Monster and Chan, 1977; Desmedt and Godaux, 1977; Kukula and Clamann, 1981; Del Vecchio et al., 2019; Marshall et al., 2022). A strength of this paper is how firing rate responses are quantified across a wide range of motor unit recruitment thresholds and between two muscles. I suggest improving clarity for the general reader, especially in the motivation for testing two lower limb muscles, and elaborating on some of the functional implications.

    4. Reviewer #3 (Public Review):

      Summary:

      This is an interesting manuscript that uses state-of-the-art experimental and simulation approaches to quantify motor unit discharge patterns in the human TA and VL. The non-linear profiles of motor unit discharge were calculated and found to have an initial acceleration phase followed by an attenuation phase. Lower threshold motor units had a larger gain of the initial acceleration whereas the higher threshold motor unit had a higher gain in the attenuation phase. These data represent a technical feat and are important for understanding how humans generate and control voluntary force.

      Strengths:<br /> The authors used rigorous, state-of-the-art analyses to decompose and validate their motor unit data during a wide range of voluntary efforts.

      The analyses are clearly presented, applied, and visualized.

      The supplemental data provides important transparency.

      Weaknesses:

      The number of participants and muscles tested are quite small - particularly given the constraints on yield. It is unclear if this will translate to other motor pools. The justification for TA and VL should be provided.

      While an impressive effort was made to identify and track motor units across a range of contractions, it appears that a substantial portion of muscle force was not identified. Though high-intensity contractions are challenging to decompose - the authors are commended for their technical ability to record population motor unit discharge times with recruitment thresholds up to 75% of a participant's maximal voluntary contractions. However previous groups have seen substantial recruitment of motor units above 80% and even 90% maximum activation in the soleus. Given the innervation ratios of higher threshold motor units, if recruitment continued to 100%, the top quartile would likely represent a substantial portion of the traditional fast-fatigable motor units. It would be highly interesting to understand the recruitment and rate coding of the highest threshold motor units, at a minimum I would suggest using terms other than "entire range" or "full spectrum of recruitment thresholds"

      The quantification of hysteresis using torque appears to make self-evident the observation that lower threshold motor units demonstrate less hysteresis with respect to torque. If there is motor unit discharge there will be force. I believe this limitation goes beyond the floor effects discussed in the manuscript. Traditionally, individuals have used the discharge of a lower threshold unit as the measure on which to apply hysteresis analyses to infer ion channel function in human spinal motoneurons.

      The main findings are not entirely novel. See Monster and Chan 1977 and Kanosue et al 1979.

    1. eLife assessment

      This study reports single-cell RNA sequencing results of lung adenocarcinoma, comparing 4 treatment-naive and 5 post-neoadjuvant chemotherpy tumor samples. Of interest is the delineation of two macrophage subtypes : Anti-mac cells (CD45+CD11b+CD86+) and Pro-mac cells (CD45+CD11b+ARG+), with the proportion of Pro-mac/pro-tumorigenic cells significantly increasing in LUAD tissues after neoadjuvant chemotherapy. In terms of significance, the findings might be useful but only if robust statistical comparisons (currently missing) can be provided. As it stands, the level of supportive evidence is inadequate.

    2. Reviewer #1 (Public Review):

      Summary:

      This study reports single-cell RNA sequencing results of lung adenocarcinoma, comparing 4 treatment-naive and 5 post-neoadjuvant chemotherapy tumor samples.<br /> The authors claim that there are metabolic reprogramming in tumor cells as well as stromal and immune cells after chemotherapy.<br /> The most significant findings are in the macrophages that there are more pro-tumorigenic cells after chemotherapy, i.e. CD45+CD11b+ARG+ cells. In the treatment-naive samples, more anti-tumorigenic CD45+CD11b+CD86+ macrophages are found. They sorted each population and performed functional analyses.

      Strengths:

      Comparison of the treatment-naive and post-chemotherapy samples of lung adenocarcinoma.

      Weaknesses:

      (1) Lengthy descriptive clustering analysis, with indistinct direct comparisons between the treatment-naive and the post-chemotherapy samples.<br /> (2) No statistical analysis was performed for the comparison.<br /> (3) Difficult to match data to the text.<br /> (4) ARG1 is a cytosolic enzyme that can be detected by intracellular staining after fixation. It is unclear how the staining and sorting was performed to measure function of sorted cells.

    3. Reviewer #2 (Public Review):

      In this study, Huang et al. performed a scRNA-seq analysis of lung adenocarcinoma (LUAD) specimens from 9 human patients, including 5 who received neoadjuvant chemotherapy (NCT), and 4 without treatment (control). The new data was produced using 10 × Genomics technology and comprises 83622 cells, of which 50055 and 33567 cells were derived from the NCT and control groups, respectively. Data was processed via R Seurat package, and various downstream analyses were conducted, including CNV, GSVA, functional enrichment, cell-cell interaction, and pseudotime trajectory analyses. Additionally, the authors performed several experiments for in vitro and in vivo validation of their findings, such as immunohistochemistry, immunofluorescence, flow cytometry, and animal experiments.

      The study extensively discusses the heterogeneity of cell populations in LUAD, comparing the samples with and without chemotherapy. However, there are several shortcomings that diminish the quality of this paper:

      • The number of cells included in the dataset is limited, and the number of patients from different groups is low, which may reduce the attractiveness of the dataset for other researchers to reuse. Additionally, there is no metadata on patients' clinical characteristics, such as age, sex, history of smoking, etc., which would be valuable for future studies.<br /> • Several crucial details about the data analysis are missing: How many PCs were used for reduction? Which versions of Seurat/inferCNV/other packages were used? Why monocle2 was used and not monocle3 or other packages? Also, the authors use R version 3.6.1, and the current version is 4.3.2.<br /> • It seems that the authors may lack a fundamental understanding of scRNA-seq data processing and the functions of Seurat. For instance, they state, 'Next, we classified cell types through dimensional reduction and unsupervised clustering via the Seurat package.' However, dimensional reduction and unsupervised clustering are not methods for cell classification. Typically, cell types are classified using marker genes or other established methods.<br /> "Therefore, to identify subclusters within each of these nine major cell types, we performed principal component analysis" (Line 127). Principal component analysis is a method for dimensionality reduction, not cell clustering.<br /> The authors did not mention the normalization or scaling of the data, which are crucial steps in scRNA-seq data preprocessing.<br /> • Numerous style and grammar mistakes are present in the main text. For instance, certain sections of the methods are written in the present tense, suggesting that parts of a protocol were copied without text editing. Furthermore, some sections of the introduction are written in the past tense when the present tense would be more suitable. Clusters are inconsistently referred to by numbers or cell types, leading to confusion. Additionally, the authors frequently use the term "evolution" when describing trajectory analysis, which may not be appropriate. Overall, significant revisions to the main text are required.<br /> • Some figures are not mentioned in order or are not referenced in the text at all, such as Figure 5l (where it is also unclear how the authors selected the root cells). Additionally, many figures have text that is too small to be read without zooming in. Overall, the quality of the figures is inconsistent and sometimes very poor.<br /> • At times, the authors' statements are incomplete (ex. Lines 67-69, Line 177, Line 629, Lines 646-648 and 678).

      The results section lacks clarity on several points:<br /> • The authors state that "myofibroblasts exclusively originated from the control group". However, pathways up-regulated in myofibroblasts (such as glycolysis) were enhanced after chemotherapy, as indicated by GSVA score. Similarly, why are some clusters of TAMs from the control group associated with pathways enriched in chemotherapy group?<br /> • Further explanation is necessary regarding the distinctions between malignant and non-malignant cells, as well as regarding the upregulation of metabolism-related pathways in fibroblasts from the NCT group. Additionally, clarification is needed regarding why certain TAMs from the control group are associated with pathways enriched in the chemotherapy group.<br /> • In the section titled 'Chemo-driven Pro-mac and Anti-mac Metabolic Reprogramming Exerted Diametrically Opposite Effects on Tumor Cells': The markers selected to characterize the anti- and pro-macrophages are commonly employed for describing M1 or M2 polarization. It is uncertain whether this new classification into anti- and pro-macrophages is necessary. Additionally, it should be noted that pro-macrophages are anti-inflammatory, while anti-macrophages are pro-inflammatory, which could lead to confusion. M2 macrophages are already recognized for their role in stimulating tumor relapse after chemotherapy.<br /> • The authors suggest that there is "reprogramming of CD8+ cytotoxic cells" following chemotherapy (Line 409). It remains unclear whether they imply the reprogramming of other CD8+ T cells into cytotoxic cells. While it is indicated that cytotoxic cells from the control group differ from those in the NCT group and that NCT cytotoxic T cells exhibit higher cytotoxicity, the authors did not assess the expression of NK and NK-like T cell markers (aside from NKG7), which may possess greater cytotoxic potential than CD8+ cytotoxic cells. This could also elucidate why cytotoxic cells from the NCT and control groups are positioned on separate branches in trajectory analysis. Overall, with 22.5k T cells in the dataset, only 3 subtypes were identified, suggesting a need for improved cell annotations by the authors.

    1. eLife assessment

      This valuable study presents a series of results aimed at uncovering the involvement of the endosomal sorting protein SNX4 in neurotransmitter release. While the evidence supporting the conclusions is solid, the molecular mechanisms remain unclear, and the study would significantly benefit from additional experiments to strengthen its findings. This paper will be of interest to cell biologists and neurobiologists.

    2. Reviewer #1 (Public Review):

      Summary:

      In the work: "Endosomal sorting protein SNX4 limits synaptic vesicle docking and release" Josse Poppinga and collaborators addressed the synaptic function of Sortin-Nexin 4 (SNX4). Employing a newly developed in vitro KO model, with live imaging experiments, electrophysiological recordings, and ultrastructural analysis, the authors evaluate modifications in synaptic morphology and function upon loss of SNX4. The data demonstrate increased neurotransmitter release and alteration in synapse ultrastructure with a higher number of docked vesicles and shorter AZ. The evaluation of the presynaptic function of SNX4 is of relevance and tackles an open and yet unresolved question in the field of presynaptic physiology.

      Strengths:

      The sequential characterization of the cellular model is nicely conducted and the different techniques employed are appropriate for the morpho-functional analysis of the synaptic phenotype and the derived conclusions on SNX4 function at presynaptic site. The authors succeeded in presenting a novel in vitro model that resulted in chronical deletion of SNX4 in neurons. A convincing sequence of experimental techniques is applied to the model to unravel the role of SNX4, whose functions in neuronal cells and at synapses are largely unknown. The understanding of the role of endosomal sorting at the presynaptic site is relevant and of high interest in the field of synaptic physiology and in the pathophysiology of the many described synaptopathies that broadly result in loss of synaptic fidelity and quality control at release sites.

      Weaknesses:

      The flow of the data presentation is mostly descriptive with several consistent morphological and functional modifications upon SNX loss. The paper would benefit from a wider characterization that would allow us to address the physiological roles of SNX4 at the synaptic site and speculate on the underlying molecular mechanisms. In addition, due to the described role of SNX4 in autophagy and the high interest in the regulation of synaptic autophagy in the field of synaptic physiology, an initial evaluation of the autophagy phenotype in the neuronal SNX4KO model is important, and not to be only restricted to the discussion section.

    3. Reviewer #2 (Public Review):

      Summary:

      SNX4 is thought to mediate recycling from endosomes back to the plasma membrane in cells. In this study, the authors demonstrate the increases in the amounts of transmitter release and the number of docked vesicles by combining genetics, electrophysiology, and EM. They failed to find evidence for its role in synaptic vesicle cycling and endocytosis, which may be intuitively closer to the endosome function.

      Strengths:

      The electrophysiological data and EM data are in principle, convincing, though there are several issues in the study.

      Weaknesses:

      It is unclear why the increase in the amounts of transmitter release and docked vesicles happened in the SNX4 KO mice. In other words, it is unclear how the endosomal sorting proteins in the end regulate or are connected to presynaptic, particularly the active zone function.

    4. Reviewer #3 (Public Review):

      Summary:

      The study aims to determine whether the endosomal protein SNX4 performs a role in neurotransmitter release and synaptic vesicle recycling. The authors exploited a newly generated conditional knockout mouse to allow them to interrogate the SNX4 function. A series of basic parameters were assessed, with an observed impact on neurotransmitter release and active zone morphology. The work is interesting, however as things currently stand, the work is descriptive with little mechanistic insight. There are a number of places where the data appear to be a little preliminary, and some of the conclusions require further validation.

      Strengths:

      The strengths of the work are the state-of-the-art methods to monitor presynaptic function.

      Weaknesses:

      The weaknesses are the fact that the work is largely descriptive, with no mechanistic insight into the role of SNX4. Further weaknesses are the absence of controls in some experiments and the design of specific experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank all of the reviewers for their helpful and the effort they made in reading and evaluating our manuscript. In response to them, we have made major changes to the text and figures and performed substantial new experiments. These new data and changes to the text and figures have substantially strengthened the manuscript. We believe that the manuscript is now very strong in both its impact and scope and we hope that reviewers will find it suitable for publication in eLife

      A point-by-point response to the reviewers' specific comments is provided below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, Yu et al ascribe potential tumor suppressive functions to the non-core regions of RAG1/2 recombinases. Using a well-established BCR-ABL oncogene-driven system, the authors model the development of B cell acute lymphoblastic leukemia in mice and found that RAG mutants lacking non-core regions show accelerated leukemogenesis. They further report that the loss of non-core regions of RAG1/2 increases genomic instability, possibly caused by increased off-target recombination of aberrant RAG-induced breaks. The authors conclude that the non-core regions of RAG1 in particular not only increase the fidelity of VDJ recombination, but may also influence the recombination "range" of off-target joints, and that in the absence of the non-core regions, mutant RAG1/2 (termed cRAGs) catalyze high levels of off-target recombination leading to the development of aggressive leukemia.

      Strengths:

      The authors used a genetically defined oncogene-driven model to study the effect of RAG non-core regions on leukemogenesis. The animal studies were well performed and generally included a good number of mice. Therefore, the finding that cRAG expression led to the development of more aggressive BCR-ABL+ leukemia compared to fRAG is solid.

      Weaknesses:

      In general, I find the mechanistic explanation offered by the authors to explain how the non-core regions of RAG1/2 suppress leukemogenesis to be less convincing. My main concern is that cRAG1 and cRAG2 are overexpressed relative to fRAG1/2. This raises the possibility that the observed increased aggressiveness of cRAG tumors compared to fRAG tumors could be solely due to cRAG1/2 overexpression, rather than any intrinsic differences in the activity of cRAG1/2 vs fRAG1/2; and indeed, the authors allude to this possibility in Fig S8, where it was shown that elevated expression of RAG (i.e. fRAG) correlated with decreased survival in pediatric ALL. Although it doesn't mean the authors' assertions are incorrect, this potential caveat should nevertheless be discussed.

      We appreciate the valuable suggestions from the reviewer. BCR-ABL1+ B-ALL is characterized by halted early B-lineage differentiation. In BCR-ABL1+ B cells, RAG recombinases are highly expressed, leading to the inactivation of genes that encode essential transcription factors for B-lineage differentiation. This results in cells being trapped within the precursor compartment, thereby elevating RAG gene expression. Our interpretation of the data suggests that, in BCR-ABL1+ B-ALL mouse models, the high expression of both cRAG and fRAG and the deletion of the non-core regions influence the precision of RAG targeting within the genome. This causes more genomic damage in cRAG tumors than in fRAG tumors, consequently leading to the observed increased aggressiveness of cRAG tumors compared to fRAG tumors. We discussed the issues on Page 12, lines 295-307 in the revised manuscript.

      Some of the conclusions drawn were not supported by the data.

      (1) I'm not sure that the authors can conclude based on μHC expression that there is a loss of pre-BCR checkpoint in cRAG tumors. In fact, Fig. 2B showed that the differences are not statistically significant overall, and more importantly, μHC expression should be detectable in small pre-B cells (CD43-). This is also corroborated by the authors' analysis of VDJ rearrangements, showing that it has occurred at the H chain locus in cRAG cells.

      We appreciate the insightful comment from the reviewer. Upon reevaluation of the data presented in Fig. 2B, we identified and rectified certain errors. The revised analysis now shows that the differences in μHC expression are statistically significant. This significant expression of μHC in fRAG leukemic cells implies that these cells may progress further in differentiation, potentially acquiring an immune phenotype. These modifications have been incorporated into the manuscript on page 7, lines 153-156 in the revised manuscript.

      (2) The authors found a high degree of polyclonal VDJ rearrangements in fRAG tumor cells but a much more limited oligoclonal VDJ repertoire in cRAG tumors. They concluded that this explains why cRAG tumors are more aggressive because BCR-ABL induced leukemia requires secondary oncogenic hits, resulting in the outgrowth of a few dominant clones (Page 19, lines 381-398). I'm not sure this is necessarily a causal relationship since we don't know if the oligoclonality of cRAG tumors is due to selection based on oncogenic potential or if it may actually reflect a more restricted usage of different VDJ gene segments during rearrangement.

      Thank you for your insightful comments and questions regarding the relationship between the oligoclonality of V(D)J rearrangements and the aggressiveness of cRAG tumors. You raise an important point regarding whether the observed oligoclonality is a result of selective pressure favoring clones with specific oncogenic potential, or if it reflects inherent limitations in V(D)J segment usage during rearrangement in cRAG models. In our study, we observed a marked difference in the V(D)J rearrangement patterns between fRAG and cRAG tumor cells, with cRAG tumors exhibiting a more limited, oligoclonal repertoire. This observation led us to speculate that the aggressive nature of cRAG tumors might be linked to a selective advantage conferred by specific V(D)J rearrangements that cooperate with the BCR-ABL1 oncogene to drive leukemogenesis. However, we acknowledge that our current data do not definitively establish a causal relationship between oligoclonality and tumor aggressiveness. The restricted V(D)J repertoire in cRAG tumors could indeed be due to a more constrained rearrangement process, possibly influenced by the altered expression or function of RAG1/2 in the absence of non-core regions. This could limit the diversity of V(D)J rearrangements, leading to the emergence of a few dominant clones not necessarily because they have greater oncogenic potential, but because of a narrowed field of rearrangement possibilities.

      To address this question more thoroughly, future studies could examine the functional consequences of specific V(D)J rearrangements found in dominant cRAG tumor clones. This could include assessing the oncogenic potential of these rearrangements in isolation and in cooperation with BCR-ABL1, as well as exploring the mechanistic basis for the restricted V(D)J repertoire. Such studies would provide deeper insight into the interplay between RAG-mediated recombination, clonal selection, and leukemogenesis in BCR-ABL1+ B-ALL.

      We appreciate your feedback on this matter and agree that further investigation is required to unravel the precise relationship between V(D)J rearrangement diversity and leukemic progression in cRAG models. We have revised our discussion to reflect these considerations and to clarify the speculative nature of our conclusions regarding the link between oligoclonality and tumor aggressiveness. We added more discussion on this issue on Page 7, lines 166-170 in the revised manuscript.

      (3) What constitutes a cancer gene can be highly context- and tissue-dependent. Given that there is no additional information on how any putative cancer gene was disrupted (e.g., truncation of regulatory or coding regions), it is not possible to infer whether increased off-target cRAG activity really directly contributed to the increased aggressiveness of leukemia.

      We totally agree you raised the issues. In Supplementary Table 3, we have presented data on off-target gene disruptions, specifically in introns, exons, downstream regions, promoters, 3' UTRs, and 5' UTRs. However, this dataset alone does not suffice to conclusively determine whether the increased off-target activity of cRAG directly influences the heightened aggressiveness of leukemia. To bridge this knowledge gap, our future research will extend to include both knockout and overexpression experiments targeting these off-target genes.

      (4) Fig. 6A, it seems that it is really the first four nucleotide (CACA) that determines fRAG binding and the first three (CAC) that determine cRAG binding, as opposed to five for fRAG and four for cRAG, as the author wrote (page 24, lines 493-497).

      We thank the reviewer for the insightful comment. In response, we have revised the text to accurately reflect the nucleotide sequences responsible for RAG binding and cleavage. Specifically, we now clarify that the first four nucleotides (CACA) are crucial for fRAG binding and cleavage, while the initial three nucleotides (CAC) are essential for cRAG binding and cleavage. These updates have been made on page 10, lines 242-245 of the revised manuscript.

      (5) Fig S3B, I don't really see why "significant variations in NHEJ" would necessarily equate "aberrant expression of DNA repair pathways in cRAG leukemic cells". This is purely speculative. Since it has been reported previously that alt-EJ/MMEJ can join off target RAG breaks, do the authors detect high levels of microhomology usage at break points in cRAG tumors?

      We appreciate the reviewer's comment. Currently, we have not observed microhomology usage at breakpoints in cRAG tumors. We plan to address this aspect in a future, more detailed study. Regarding the 'aberrant expression of DNA repair pathways in cRAG leukemic cells, we acknowledge that this is speculative. Therefore, we have carefully rephrased this to 'suggesting a potential aberrant expression of DNA repair pathways in cRAG leukemic cells.' This modification is reflected on page 12, lines 290-291 of the revised manuscript.

      (6) Fig. S7, CDKN2B inhibits CDK4/6 activation by cyclin D, but I don't think it has been shown to regulate CDK6 mRNA expression. The increase in CDK6 mRNA likely just reflects a more proliferative tumor but may have nothing to do with CDKN2B deletion in cRAG1 tumors.

      We fully concur with the reviewer's comment. We have deleted this inappropriate part from the text.

      Insufficient details in some figures. For instance, Fig. 1A, please include statistics in the plot showing a comparison of fRAG vs cRAG1, fRAG vs cRAG2, cRAG1 vs cRAG2. As of now, there's a single p-value (0.0425) stated in the main text and the legend but why is there only one p-value when fRAG is compared to cRAG1 or cRAG2? Similarly, the authors wrote "median survival days 11-26, 10-16, 11-21 days, P < 0.0023-0.0299, Fig. S2B." However, it is difficult for me to figure out what are the numbers referring to. For instance, is 11-26 referring to median survival of fRAG inoculated with three different concentrations of GFP+ leukemic cells or is 11-26 referring to median survival of fRAG, cRAG1, cRAG2 inoculated with 10^5 cells? It would be much clearer if the authors can provide the numbers for each pair-wise comparison, if not in the main text, then at least in the figure legend. In Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells? Also in Fig. 5, why did 24 SVs give rise to 42 breakpoints, and not 48? Doesn't it take 2 breaks to accomplish rearrangement? In Fig. 6B-C, it is not clear how the recombination sizes were calculated. In the examples shown in Fig. 4, only cRAG1 tumors show intra-chromosomal joins (chr 12), while fRAG and cRAG2 tumors show exclusively inter-chromosomal joins.

      We appreciate the reviewer's feedback and have made the following revisions:

      (1) The text has been adjusted to rectify the previously mentioned error in the figure legends (page 1, lines 5-6).

      (2) We have clarified the intended message in the revised text (page 6, lines 129-130) and the figure legend (page 4-5, lines 107-113) for greater precision.

      (3) Figure 5A-B now presents an overview of all structural variants (SVs) identified in both cRAG and fRAG cells, offering a comprehensive comparison.

      (4) Among the analyzed SVs, 24 generated a total of 48 breakpoints, with 41 occurring within gene bodies and the remaining 7 in adjacent flanking sequences. This informs our exon-intron distribution profile analysis.

      (5) We have defined recombination sizes as ‘the DNA fragment size spanning the two breakpoints’ for clarity (page 10, lines 251-252).

      (6) All off-target recombinations identified in the genome-wide analyses of fRAG, cRAG1, and cRAG2 leukemic cells were determined to be intra-chromosomal joins, highlighting their specific nature within the genomic context.

      Insufficient details on certain reagents/methods. For instance, are the cRAG1/2 mice of the same genetic background as fRAG mice (C57BL/6 WT)? On Page 23, line 481, what is a cancer gene? How are they defined? In Fig. 3C, are the FACS plots gated on intact cells? Since apoptotic cells show high levels of gH2AX, I'm surprised that the fraction of gH2AX+ cells is so much lower in fRAG tumors compared to cRAG tumors. The in vitro VDJ assay shown in Fig 3B is not described in the Method section (although it is described in Fig S5b). Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells?

      We are grateful for the reviewer's feedback and have incorporated their insights as follows:

      (1) We clarify that both cRAG1/2 and fRAG mice share the same genetic background, specifically the C57BL/6 WT strain, ensuring consistency across experimental models.

      (2) We define a 'cancer gene' as one harboring somatic mutations implicated in cancer. To support our analysis, we refer to the Catalogue Of Somatic Mutations In Cancer (COSMIC) at http://cancer.sanger.ac.uk/cosmic. COSMIC serves as the most extensive repository for understanding the role of somatic mutations in human cancers.

      (3) Upon thorough review of the raw data for γ-H2AX and the fluorescence-activated cell sorting (FACS) plots gated on intact cells, we propose that the observed discrepancies might stem from the limited sensitivity of the γ-H2AX flow cytometry detection method. This insight prompts our commitment to employing more efficient detection methodologies in forthcoming studies.

      (4) Detailed procedures for the in vitro V(D)J recombination assay have been included in the Methods section (page 15, lines 384-388) to enhance the manuscript's comprehensiveness and reproducibility.

      (5) The presented plots offer a comprehensive overview of structural variants (SVs) identified in both cRAG and fRAG cells, providing a holistic view of the genomic landscape across different models.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors tone down some of their conclusions, which are not necessarily supported by their own data. in addition, there are some minor mistakes in figure assembly/presentation. For instance, I believe that the axes labels in Fig. 1E were flipped. BrdU should be on y-axis and 7-AAD on the x-axis. Fig. 3B, the y-axis contains a typo, it should be "CD90.1..." and not "D90.1...". In Fig. 5C, the numbers seem to be flipped, with 93% corresponding to cRAG1 and 100% to cRAG2 (compare with the description on page 23, lines 474-475). Fig. 5C, y-axis, "hybrid" is a typo. Page 3, line 59: The abbreviation of RSS has already been described earlier (p4, line 53).

      We thank the reviewer for these suggestions. We carefully checked the raw data and corrected these mistakes in the revised manuscript.

      Page 3, line 63: "signal" segment (commonly referred to as signal ends), not "signaling" segment.

      We have changed “signaling segment” to “signal ends in the revised manuscript. (page 3, lines 54-55)

      Page 3, lines 64-65: VDJ recombination promotes the development of both B and T cells, and aberrant recombination can cause both B and T cell lymphomas.

      The statement about the role of V(D)J recombination in B and T cell development and its link to lymphomagenesis is grounded in a substantial body of research. Theoretical frameworks and empirical studies delineate how aberrations in the recombination process can lead to genomic instability, potentially triggering oncogenic events. This connection is extensively documented in immunology and oncology literature, illustrating the critical balance between necessary genetic rearrangements for immune diversity and the risk of malignancy when these processes are dysregulated (Thomson, et al.,2020; Mendes, et al.,2014; Onozawa and Aplan,2012).

      Page 4, line 72: "recombinant dispensability" is not a commonly used phrase. Do the authors mean the say that the non-core regions of RAG1/2 are not strictly required for VDJ recombination?

      We thank the reviewers for their insightful suggestion. We have revised the sentence to read, 'Although the non-core regions of RAG1/2 are not essential for V(D)J recombination, the evolutionary conservation of these regions suggests their potential significance in vivo, possibly affecting RAG activity and expression in both quantitative and qualitative manners.' This revision appears on page 3, lines 61-62, in the revised manuscript.

      Fig. 4. It would have been nice to show at least one more cRAG1 tumor circus plot.

      We appreciate the reviewer's comment and concur with the suggestion. In future sequencing experiments, we will consider including additional replicates. However, due to time and financial constraints, the current sequencing effort was limited to a maximum of three replicates.

      Reviewer #3 (Recommendations For The Authors):

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance. The following issues need to be addressed by the authors.

      (1) Authors should check and review extensively for improvements to the use of English.

      We thank the reviewer for their comment. With assistance from a native English speaker, we have carefully revised the manuscript to enhance its readability.

      (2) Authors should revise the conclusion so that the above can be clearly reviewed and summarized.

      The conclusion has been partially revised in the revised manuscript.

      (3) The article should state that the experiment was independently repeated three times.

      The experiment was repeated under the same conditions three times and the information has been descripted in Statistics section on page 19, lines 473-475 in the revised manuscript.

      (4) The article will be more convincing if it uses references in the last 5 years.

      We are grateful to the reviewer for their guidance in enhancing our manuscript. We have incorporated additional references from the past five years in the revised version.

      (5) Additional experiments are suggested to elucidate the molecular mechanisms related to off-target recombination.

      We thank the reviewer for this suggestion. In future experiments, we plan to perform ChIP-seq analysis to investigate the relationship between chromatin accessibility and off-target effects, as well as to examine the impact of knocking out and overexpressing off-target genes on cancer development and progression.

      (6) It is suggested to further analyze the effect of the absence of non-core RAG region on the differentiation and development of peripheral B cells in mice by flow analysis and expression of B1 and B2.

      Thank you very much for highlighting this crucial issue. FACS analysis was performed, revealing that leukemia cells in peripheral B cells in mice did not express CD5. The data are presented as follows:

      Author response image 1.

      (7) Fig3A should have three biological replicates and the molecular weight should be labeled on the right side of the strip.

      Thank you for this suggestion. The experiment was independently repeated three times, and the molecular weights have been labeled on the right side of the bands in the revised version

      References:

      Mendes RD, Sarmento LM, Canté-Barrett K, Zuurbier L, Buijs-Gladdines JG, Póvoa V, Smits WK, Abecasis M, Yunes JA, Sonneveld E, Horstmann MA, Pieters R, Barata JT, Meijerink JP. 2014. PTEN microdeletions in T-cell acute lymphoblastic leukemia are caused by illegitimate RAG-mediated recombination events. BLOOD 124:567-578. doi:10.1182/blood-2014-03-562751

      Onozawa M, Aplan PD. 2012. Illegitimate V(D)J recombination involving nonantigen receptor loci in lymphoid malignancy. Genes Chromosomes Cancer 51:525-535. doi:10.1002/gcc.21942

      Thomson DW, Shahrin NH, Wang P, Wadham C, Shanmuganathan N, Scott HS, Dinger ME, Hughes TP, Schreiber AW, Branford S. 2020. Aberrant RAG-mediated recombination contributes to multiple structural rearrangements in lymphoid blast crisis of chronic myeloid leukemia. LEUKEMIA 34:2051-2063. doi:10.1038/s41375-020-0751-y

    2. eLife assessment

      Using a set of animal models, this valuable paper shows tumor suppressive function of the non-core regions of RAG1/2 recombinases. The conclusions are supported by solid evidence.

    3. Reviewer #1 (Public Review):

      Summary:

      In this report, Yu et al ascribe potential tumor suppressive functions to the non-core regions of RAG1/2 recombinases. Using a well-established BCR-ABL oncogene-driven system, the authors model the development of B cell acute lymphoblastic leukemia in mice and found that RAG mutants lacking non-core regions show accelerated leukemogenesis. They further report that the loss of non-core regions of RAG1/2 increases genomic instability, possibly caused by increased off-target recombination of aberrant RAG-induced breaks. The authors conclude that the non-core regions of RAG1 in particular not only increases the fidelity of VDJ recombination, but may also influence the recombination "range" of off-target joints, and that in the absence of the non-core regions, mutant RAG1/2 (termed cRAGs) catalyze high levels of off-target recombination leading to the development of aggressive leukemia.

      Strengths:

      The authors used a genetically defined oncogene-driven model to study the effect of RAG non-core regions have on leukemogenesis. The animal studies were well performed and generally included a good number of mice. Therefore, the finding that cRAG expression led to development of more aggressive BCR-ABL+ leukemia compared to fRAG is solid. The authors also present some nice analyses that characterize the (genomic) nature of aggressive leukemia that develop in the absence of RAG non-core regions.

      Weaknesses:

      The paper relies on cRAG1/2 overexpression, an experimental limitation that needs to be taken into consideration when extrapolating the physiological relevance of the findings.

    1. Reviewer #2 (Public Review):

      Summary:

      The paper entitled "Goal-directed motor actions drive acetylcholine dynamics in sensory cortex" aims to characterize the dynamics of cholinergic signaling in sensory cortex during perceptual behavior. The authors showed that acetylcholine release in S1 was linked to goal-directed motor actions rather than sensory input or reward delivery, a pattern also observed in the auditory cortex (A1). This release was specifically associated with whisking and licking and was potentiated by training. The results contribute to a better understanding of neuromodulator actions. That said, several aspects of the manuscript could benefit from improved writing, data presentation, and statistical analysis.

      Strengths:

      The evidence provided is clear to link ACh response to different task-related events. Implementing two different tasks to show generality is appreciated. Important control analysis is included.

      Weaknesses:

      The quantification of ACh signal differences across different trial types or between expert and early-training mice is lacking. Although statistical significance is occasionally mentioned, the indication of significance in figures seems rare. For example, in Figures 5A and E, it is difficult to tell when p is < 0.05. Based on the sentence "small, but significant increase on Hits over False Alarm trials (Figure 5A, S Figure 4A)" there is indeed a time point where the difference is significant, and more details should be added (when and the p-value).

      For Figure 5D, it seems like there is no significant difference between Hit and False alarm trials, however, for the trials with 1 or 2 lick there appears to be a difference. Is it due to a lack of power? Moreover, in Figure 5 H the first licks also seem to differ.

      Linear regression: the coefficient of determination (R²) is absent, in Figures 4E, F, and 6B, H, making it hard to evaluate the goodness of the fitting.

      Similar comments apply to Figure 7: the lack of quantitative comparisons between the coefficients of first lick and other regressors, and between early and expert training, as well as the change in goodness of fit by removing a regressor.

      The writing of the introduction and discussion could be improved to enhance readability, and the manuscript could improve its discussion on orofacial movement and acetylcholine release by citing relevant studies demonstrating the association between neuronal activity and orofacial/body movements.

    2. eLife assessment

      This study provides important evidence that links acetylcholine responses in the sensory cortex to motor actions during perceptual tasks, rather than to rewards. The evidence for the association between acetylcholine responses and motor actions is solid, but does not demonstrate the causal link implied by the title and abstract. The manuscript would benefit from a more detailed description of results and methodologies. This study is of broad interest to the neuroscience field.

    3. Reviewer #1 (Public Review):

      Summary:

      This study aimed at gaining a better comprehension of the functional role of acetylcholine release within the sensory cortex. To this end, the authors measured the dynamics of cortical acetylcholine release using two-photon imaging of the GRAB-Ach3.0 fluorescent sensor, either in the mouse primary somatosensory cortex (S1), throughout the learning of a whisker-dependent object position discrimination task, or in the primary auditory cortex (A1) of mice engaged in a specific sound signal detection task.

      The illustrated results suggest that variations in acetylcholine release tend to be associated, in the primary sensory areas, with goal-directed actions (whisking in the case of the object position discrimination task, and more strongly with licking), rather than with sensory inputs or rewards. They also indicate that the variations in cholinergic signal specifically associated with licking increase with learning.

      Strengths:

      The impact of cholinergic inputs on cortical function has intrigued neuroscientists for many decades due to the complexity of its mode of action on the molecular and cellular points of view.

      Being able to image the dynamics of cortical cholinergic release in vivo on mice engaged in goal-directed tasks has moved this field into a really exciting phase, where it becomes possible to draw links between specific behavioral features and local variations of cholinergic release in given cortical areas.

      This study is therefore particularly timely, it provides a set of precious and original data. Globally the experiments were rigorously designed, and the illustrated quantifications and analyses follow high standards. This work therefore constitutes a valuable contribution to this field of research and could be of interest to a large audience.

      Weaknesses:

      Although the manuscript reports very interesting links between behavior and cortical cholinergic release, the study remains correlative and is devoid of experiments allowing to link causally cholinergic cortical inputs with motor actions, and more globally to gauge their impact on learning and execution of the tasks. Since the nature of the link between goal-directed motor actions and acetylcholine dynamics is not really clarified here, the word "drive" in the title of the paper, which may have a causal connotation should be replaced (especially since acetylcholine-related signal fluctuations seems often to precede motor actions).

      As high-speed videography of the C2 whisker was achieved during the object position discrimination task, it seems that the whisker curvature changes could have been quantified in addition to the whisker angle. This would allow appreciation of how acetylcholine related signals vary according to both whisker-related motor output and sensory input, hereby providing clearer support for the assertion that acetylcholine levels are "related to motor actions rather than sensory inputs".

      The data set related to the auditory task is used here to support the claim that licks rather than rewards are linked to variations of fluorescence of the cholinergic sensor in sensory cortices. These data seem very interesting indeed but are shown here in a very incomplete manner (a figure illustrating the learning curves of the 6 recorded animals, and acetylcholine dynamics during the four types of trials would be very welcome). If the animals were placed on a treadmill and the locomotion measured, together with pupil size, during the task as in Gee et al., BioRxiv 2022, one could ask how these other motor activities are linked with acetylcholine dynamics in A1. By comparing the impact of goal-directed actions versus motor activities accompanying more global state transitions on acetylcholine dynamics, these data could provide a particularly valuable contribution to this study. They could in addition rule out potential confounding factors regarding the claim that cholinergic dynamics are here mainly linked to first licks.

      Coming back to the whisker-dependent object localization task, if cholinergic-related signals have been recorded during the "no whisker sessions", analyzing these data would be very useful in the scope of this study. Indeed, during these sessions, the animals were not naive, since they went through the learning of the task, but could not resolve it anymore, still they most probably kept on licking upon the pole-in and/or pole-out cues. In these sessions, the licking is fully dissociated from tactile sensory inputs, and for this reason it would be particularly interesting to see how the fluorescence varies with first licks. In addition, plotting these sessions in Figure 6C would be informative. Indeed, if the increase of cholinergic signals with performance comes progressively due to changes in the internal state of the animal and/or plasticity mechanisms, first lick related cholinergic signal variations could remain high despite the decrease of performance in these sessions.

      Finally, because the functional role of cortical cholinergic release is a hot topic, a few recent studies addressing this question with slightly different approaches in the visual cortex would be worth mentioning, at least in the discussion, as well as a recent study focusing on motor learning, which revealed an apparent decrease of acetylcholine dynamics associated with goal-directed motor actions upon learning.

    1. Reviewer #2 (Public Review):

      Summary:

      While many studies have explored the impacts of pathogens on hosts, the effect of hosts on pathogens has received less attention. In this manuscript, Wang et al. utilize Drosophila melanogaster and an opportunistic pathogen, Serratia marcescens, to explore how the host impacts pathogenicity. Beginning with an observation that larval presence and density impacted microbial growth in fly vials (which they assess qualitatively as the amount of 'slick' and quantitatively as microbial load/CFUs), the authors focus on the impact of axenic/germ-free larvae on an opportunistic pathogen S. marcescens. Similar to their observations with general microbial load, they find that larvae reduce the presence of a pinkish slick of Sm, indicative of its secondary metabolite prodigiosin. The presence of larvae alters prodigiosin production, pathogen load, pathogen cellular morphology, and virulence, and this effect is through transcriptional and metabolic changes in the pathogen. Overall, they observe a loss of virulence factors/pathways and an increase in pathways contributing to growth. Given the important role the host plays in this lifestyle shift, the authors then examined host features that might influence these effects, focusing on the role of antimicrobial peptides (Amps). The authors combine the use of synthetic Amps and an Amp-deficient fly line and conclude much of the larval inhibitory effect is due to their production of AMPs.

      Strengths:

      This is a very interesting question and the use of Drosophila-Serratia marcescens is a great model to explore these interactions and effects.

      The authors have an interesting and compelling phenotype and are asking a unique question on the impact of the host on the pathogen. The use of microbial transcriptomics and metabolomics is a strength, especially in order to assess these impacts on the pathogen level and at single-cell level to capture heterogeneity.

      Weaknesses:

      Overall, the writing style in the manuscript makes it difficult to fully understand and appreciate the data and its interpretation.

      The data on the role of AMPs would benefit from strengthening. Some of the arguments in the text of that section are also counterintuitive. The authors show that AMP larvae have a reduced impact on Sm as compared to wt larvae, but it seems less mild of an effect than that observed with wt excreta (assuming the same as secreta in Figures 7, should be corrected or harmonized). Higher doses of AMPs give a phenotype similar to wt larvae, but a lower dose (40 ng/ul) gives phenotypes more similar to controls. The authors argue that this data suggests AMPs are the factor responsible for much of the inhibition, but their data seems more to support that it's synergistic- you seem to still need larvae (or some not yet defined feature larvae make, although secreta/excreta was not sufficient) + AMPs to see similar effects as wt. Based on positioning and color scheme guessing that AMP 40ng/ul was used in Figures 7D-H, but could not find this detail in the text, methods, or figure legend and it should be indicated. This section does not seem to be well supported by the provided data, and this inconsistency greatly dampened this reviewer's enthusiasm for the paper.

    2. eLife assessment

      This valuable study examines the role of a host in conditions that shift pathogenicity of opportunistic microbes. The use of single-cell microbial transcriptomics and metabolomics to demonstrate the host's effects on pathogen dynamics is interesting and convincing. However, the connection to host antimicrobial peptides driving these effects is incomplete and would benefit from additional evidence and improved explanation in the text. This paper has the potential to be of broad interest to those working in host-microbe (microbiome and pathogen) interactions.

    3. Reviewer #1 (Public Review):

      Summary:

      In this work, Wang and colleagues used Drosophila-Serratia as a host-microbe model to investigate the impact of the host on gut bacteria. The authors showed that Drosophila larvae reduce S. marcescens abundance in the food likely due to a combination of mechanical force and secretion of antimicrobial peptides. S. marcescens exposed to Drosophila larvae lost virulence to flies and could promote larval growth similar to typical Drosophila gut commensals. These phenotypic changes were reflected in the transcriptome and metabolome of bacteria, suggesting that the host could drive the switch from pathogenicity to commensalism in bacteria. Further, the authors used single-cell bacterial RNA-seq to demonstrate the heterogeneity in gut bacterial populations.

      Strengths:

      This is a valuable work that addresses an important question of the effect of the host on its gut microbes. The authors could convincingly demonstrate that gut bacteria are strongly affected by the host with important consequences for both interacting partners. Moreover, the authors used state-of-the-art bacterial single-cell RNA-seq to reveal heterogeneity in host-associated commensal populations.

      Weaknesses:

      Some of the conclusions are not fully supported by the data.

      Specifically, in lines 142-143, the authors claim that larva antagonizes the pathogenicity of S. marcescens based on the survival data. I do not fully agree with this statement. An alternative possibility could be that, since there are fewer S. marcescens in larvae-processed food, flies receive a lower pathogen load and consequently survive. Can the authors rule this out?

      Also, the authors propose that Drosophila larvae induce a transition from pathogenicity to commensalism in S. marcescens and provide nice phenotypic and transcriptomic data supporting this claim. However, is it driven only by transcriptional changes? Considering high mutation rates in bacteria, it is possible that S. marcescens during growth in the presence of larvae acquired mutations causing all the observed phenotypic and transcriptional changes. To test this possibility, the authors could check how long S. marcescens maintains the traits it acquires during growth with Drosophila. If these traits persist after reculturing isolated bacteria, it is very likely they are caused by genome alterations, if not - likely it is a phenotypic switch driven by transcriptional changes.

    4. Reviewer #3 (Public Review):

      In this study, Wang and coworkers established a model of Drosophila-S. marcescens interactions and thoroughly examined host-microbe bidirectional interactions. They found that:

      (1) Drosophila larvae directly impact microbial aggregation and density;<br /> (2) Drosophila larvae affect microbial metabolism and cell wall morphology, as evidenced by reduced prodigiosin production and EPS production, respectively;<br /> (3) Drosophila larvae attenuate microbial virulence;<br /> (4) Drosophila larvae modulate the global transcription of microbes for adaptation to the host;<br /> (5) Microbial single-cell RNA sequencing (scRNA-seq) analysis revealed heterogeneity in microbial pathogenicity and growth;<br /> (6) AMPs are key factors controlling microbial virulence phenotypes.

      Taken together, they concluded that host immune factors such as AMPs are directly involved in the pathogen-to-commensal transition by altering microbial transcription.

      General comments:

      In general, this study is intriguing as it demonstrates that host immune effectors such as AMPs can serve as critical factors capable of modulating microbial transcription for host-microbe symbiosis. However, several important questions remain unanswered. One such question is: What is the mechanism by which AMPs modulate the pathogen-to-commensal transition? One hypothesis suggests that antimicrobial activity may influence microbial physiology, subsequently modulating transcription for the transition from pathogen to commensal. In this context, it is imperative to test various antibiotics with different modes of action (e.g., targeting the cell wall, transcription, or translation) at sub-lethal concentrations to determine whether sub-lethal doses of antimicrobial activity are sufficient to induce the pathogen-to-commensal transition.

    1. Author response:

      The authors express their gratitude to the reviewers for their insightful comments.

      Reviewer #1: We are uncertain about the reference to an overjudgement of the recovery of spermatogonial stem cells, as we did not draw any conclusions on this in the current study. Additionally, we have received feedback mentioning the multitude and diversity of datasets as both a strength and a weakness. However, we would appreciate clarification on which datasets may have been insufficiently reviewed and how our selection of highlights may have introduced bias to the interpretation and conclusion of the study. It is important to note that we did not select any patients/ data; all patient data were incorporated into our results section. We acknowledge the need for clarification regarding our study population for the germ cell stainings. As stated in our Materials and Methods section, our current study population includes the cohort from our previous publication (Vereecke et al., 2020), supplemented by nine additional participants, totaling n=106 trans women. While Fig. 1C incorporates both previous and new data on germ cells, we understand the need to clarify this to avoid confusion. Additionally, we will include information on the Tanner stages of the trans women in our cohort (all G5), as well as details on the selection criteria for our controls and their Tanner stages. As briefly touched upon in the discussion, a marker such as delta-like homolog 1 would indeed be valuable to assess the presence of truly immature Leydig cells. Unfortunately, our attempts to optimize the immunofluorescence protocol for this marker were unsuccessful, resulting in a double staining instead of a triple staining for the Leydig cells. The suboptimal resolution of Fig.1 will be solved.

      Reviewer #2 raises concerns regarding the suitability of rejuvenated testicular tissue for research purposes. However, we emphasize that this tissue source holds significant value. Although there is a wide availability of adult testicular tissue (coming from prostate cancer patients or vasectomy reversal patients), we are especially looking for alternatives for the scarce prepubertal/ pubertal tissue for research on in vitro spermatogenesis. While we acknowledge that transgender tissue with severe hyalinization or without spermatogonia may not be suitable for such research, the abundance of transgender tissue without these issues emphasizes the value of this tissue source.

    2. eLife assessment

      This important study presents new knowledge of the spermatogonial stem cell (SSC) niche in trans women after gender-affirming hormone therapy (GAHT). While the evidence supporting the claims is convincing, weaknesses identified by both reviewers should be addressed. The work will be of interest to researchers and clinicians working in the field of sexual medicine and andrology.

    3. Reviewer #1 (Public Review):

      Summary:

      This is a nice paper taking a broad range of aspects and endpoints into account. The effect of GAHT in girls has been nicely worked out. Changes in Sertoli and peritubular cells appear valid, less strong evidence is provided for Leydig cell development. The recovery of SSCs appears an overjudgement and should be rephrased. The multitude and diversity of datasets appear a strength and a weakness as some datasets were not sufficiently critically reviewed and a selection of highlights provides a certain bias to the interpretation and conclusion of the study.

      The authors need to indicate that the subset of data on SSCs has been reported previously (Human Reprod 36: 5-15 (2021) and is simply re-incorporated in the present paper. as Fig. 1C. There are sufficient new results to publish the remaining datasets as a separate paper. Authors could refer to the SSC data with reference to the previous publication.

      Strengths:

      The patient cohort is impressive and is nicely characterized. Here, histological endpoints and endocrine profiles were analyzed appropriately for most endpoints. The paper is well-written and has many new findings.

      Weaknesses:

      The patients and controls are poorly separated in regard to pubertal status. Here additional endpoints (e.g. Tanner status) would have been helpful especially as the individual patient history is unknown. Pre- and peri-puberty is a very rough differentiation. The characterization and evaluation of Leydig cells is the weakest histological endpoint. Here, additional markers may be required. Fig. 1 suffers from suboptimal micrograph quality.

    4. Reviewer #2 (Public Review):

      Summary:

      The study is devoted to the deep investigation of the spermatogonial stem cell (SSC) niche in trans women after gender-affirming hormone therapy (GAHT). Both cellular structure and functionality of the niche were studied. The authors evidently demonstrated that all cellular components of SSC niche were affected by hormone therapy. Interestingly, the signs of "rejuvenation" within the niche were also observed indicating the possible reverse to the immature condition.

      Strengths:

      The obtained findings are important for the better understanding of hormonal regulation of testis and SSC niche and provide some clues for using the biomaterials from these specific and even unique donors for biomedical research.

      Weaknesses:

      This study has some limitations. Many studies can't be done using the testes cells of trans women, since their cells are significantly different from adult man cells and less from prepubertal and pubertal cells. The authors themselves identify some of the limitations: this material is suitable only for studying prepubertal processes in the testis. However, the authors also report large variability in data due to different hormonal therapy regimens and, apparently, age. Accordingly, not all material obtained from trans women can also be used for studies of prepubertal processes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank both Editors and reviewers for their valuable time, careful reading, and constructive comments. The comments have been highly valuable and useful for improving the quality of our study, as well as important in guiding the direction of our present and future research. In the revised manuscript, we have incorporated the necessary changes including additional experimental data as suggested; please find our detailed pointby-point response to the reviewer’s comments and the changes we have made in the manuscript as follows.

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heatkilled yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Reply: Thank you so much for appreciating our effort to generate a whole cell anti-fungal vaccine by treating C. albicans cells with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells while they only divide relatively slower than the untreated cells. In the revised manuscript, we have presented additional evidence to show that CAET are live cells (Supp. Figs 2) and based on the new data, we expect a positive change in the reviewer’s opinion. Since CAET is a live strain, the data presented here is novel.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTAtreated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Reply: Thank you very much for appreciating our work and finding our strain to be a live whole-cell anti-fungal vaccine strain with translational potential. Since the current study focused on the identification and detailed characterizations of a non-genetically modified live-attenuated strain and determination of its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. In a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Reply: Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. We are extremely sorry for the lack of clarity in our writing related to Figs. 5 and 6, we have now modified the text and hope that the respected reviewer will find these convincing.

      Recommendations for the authors:

      Although the reviewers recognize the importance of the manuscript, they would like to see: 1) comparisons between their whole-cell vaccine and others tried in the field, 2) an investigation of the immune response in infected tissues and antibody response, and 3) more controls in Figures 5 and 6, and a time-dependent effect on the colony-forming units of their vaccine formulation. Please, address the questions and submit a revised version together with a rebuttal letter addressing point-by-point raised by each reviewer.

      Reply: (1) We are afraid that a comparative study of a live and heat-killed cell vaccines will mislead the information presented here. This is the only non-genetically modified antifungal vaccine candidate therefore a comparison with a dead strain at present is unwarranted. We have now added supporting data to confirm that, the survivability of C. albicans cells was unaffected at 6 hr of EDTA treatment (CAET, Supp. Fig. S2). (2) Since the current study focused on the identification and a detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. However, in a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice. (3) The results of Figs 5 and 6 were misinterpreted by the respected reviewer, please see the explanation below.

      Reviewer #1 (Recommendations For The Authors):

      Some specific comments/suggestions for the authors: (1) What was the viability of the yeast after EDTA treatment? Is the delayed growth response because many cells died and it takes a while for remaining viable cells to catch up? This is important to know because it may mean the dose given to mice is substantially different and that should be accounted for. Some PI staining of the cells after treatment would help.

      Reply: The growth curve assays (Fig. 1A and 1E) were initiated with O.D.600nm=0.5 of each cultures (~ 107 cells/mL) and the analyses suggested that the EDTA-treated C. albicans cells grew slower than the untreated cells. Fig. 1B and 1F further demonstrated that EDTA has minimal effect on the survival of the strain up to 8 hrs post-exposure. The proportion of the number of cells increased without and with metal chelators almost remained the same for this duration (0 – 8 hrs). Therefore, for subsequent analyses, 6 hr treatment was selected and such treated cells were considered as CAET, which were actively dividing live cells, albeit slower than untreated cells. As suggested and to strengthen our finding, a time dependent SYTOX Green and Propidium iodide staining of C. albicans cells without and with EDTA treatment was carried out and analysed by flow cytometry and microscopy, respectively. Both analyses revealed that the percentage of dead cells up to 12 hrs of without and with EDTA treatment remained the same. The new data has now been added in the revised version of the manuscript as Supplementary figure 2.

      Author response image 1.

      (2) In line with the above, what was the viability of the CAET cells after 3h in media? In the macrophage in vitro experiments, how do you know the reduced viability of the CAET cells is macrophage-specific? Did you run a control of CAET cells in media on their own to determine how CFU changed in macrophage-free conditions? Is the proliferation rates of untreated and CAET cells different? That would affect CFSE labelling and results. These experiments would work better with a GFP-expressing C. albicans strain, which is widely available. In the images in Figure 4c, it looks like there are more hyphae in CAET than untreated - was hyphal induction checked/measured? That's important to know because more hyphae usually means more clumping and this can affect CFU counts (giving the impression of less CFU when actually there is more). Because of all the issues above, I'm not fully convinced by the uptake/killing data.

      Reply: As explained in response 1, we used actively dividing WT and CAET cells, and equal number of these cells were CFSE labelled. As can be seen in Fig.4A, the rate of phagocytosis was the same in 1 hr of pre-culture, but in the subsequent time points the double-positive cells were reduced in the case of CAET cells and that is due to fungal killing by macrophages. Fungal cells were released from the macrophages by warm water treatment and CFU was determined. Fig. 4B suggested that at 1hr of co-culture, the CFU of both fungal cells (WT and CAET) were the same and the fungal clearance was observed at later time points. Thus, the reduced viability of CAET cells was macrophagespecific. EDTA has minimal effect on hyphal transition without and with the presence of serum and the new data has now been provided in the revised version (Supplementary Fig. 3).

      Author response image 2.

      (3) Pooled data should be shown for all animal experiments.

      Reply: Thank you for the suggestion, wherever it was meaningful pooled data for the animal experiments have now been provided.

      (4) Immune cell counts/analysis in the kidney and bone marrow would be hugely helpful and more relevant to understanding immune responses following immunisation/infection. I think a more interesting analysis for the authors to consider would be to immunise with heat-killed yeast vs EDTAtreated yeast and see if there is a qualitative difference or better protection, i.e. is the EDTA-treated whole-cell vaccine superior to the heat-killed version? That is a better question to address. As it stands, the data in the paper is not surprising.

      Reply: The studies on cellular and molecular mechanisms underlying protective immunity in CAETvaccinated mice are under progress in a separate study. This study mostly focused on the identification and detailed characterization of a non-genetically modified live-attenuated strain and its safety and efficacy as a potential vaccine candidate in a preclinical model. We are afraid that a comparison of a live cell (CAET) with a dead cell (heat-killed) will dilute the content of the manuscript and will not be meaningful. It is well accepted that the heat-killed C. albicans strain only provides partial short-lived protection to re-challenge (Refs-PMIDs: 12146759, and 9916097), thus, it does not warrant any comparison with CAET.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a highly interesting study. I have the following specific comments for clarification.

      (1) In the introduction, the authors mentioned other anti-candida vaccines that are mostly effective against Candida infection by inducing neutralizing antibodies. However, in their CAET vaccine candidate, they only checked the cellular immunity in blood and found a balanced immune response (both pro- and anti-inflammatory responses are induced). How about the antibody production in these mice? It is a bit surprising that both untreated Candida infection and CAET Candida infection produced similar immune activation based on Figure 6, yet the CAET immunization provides protection. Some innate cell recruitment is higher in untreated Ca infection than the CAET infected mice (Figure 5F). The overall results on immune response characterization did not seem to explain why the CAET infection led to host protection while untreated Ca infection cannot. Characterizing infected tissue immune cell differentiation and cytokine production may offer some additional insights.

      Reply: We agree with you that in this manuscript we have not provided any mechanistic study on the protective immunity in CAET-vaccinated mice. This will be demonstrated in a subsequent study.

      (2) In Figure 5, some critical data seem to be missing in panels B and C. The CFU and histopathological images for CAET-treated mice challenged by Ca should also be shown there for comparison. Although they did show some data in Figure 5E and Figure S4, it is necessary to have that data in 5B and 5C from the same experiment. Figure S4 is a very busy figure and the images are quite small. It may be necessary to use arrows to point out what information authors want to emphasize.

      Reply: Fig 5 B and 5C showed the data for mice that succumbed to infection. Since the other mice (saline control groups, CAET infected, CAET vaccinated, and re-challenged groups) survived, they were not sacrificed; therefore, the CFU data was not collected. In addition, we wanted to see the longevity of these survived mice and after 1 year of observations, they were handed over to the animal house for clearance as per the institutional guidelines. However, Figure 5E and Figure S4 (now Fig. S6) included all the mice groups as they were sacrificed at various time points irrespective of humane end points. As suggested FigS6 has now been modified and fungal cells were denoted by yellow arrows.

      (3) EDTA-treated yeast cells showed poor growth but also had thicker cell walls with high chitin, glucan, and mannan levels. What leads to its clearance in vivo remains unclear, as usually, cells with thick cell wall structures and low metabolism are more resistant to stress, e.g., dormant cells. Macrophages were shown to contribute to CAET killing in a phagocytosis assay (Figure 4). Checking cytokines produced by macrophages during co-incubation may offer some insights. In all, additional discussion on what caused in vivo clearance would be helpful.

      Reply: Mechanistic study on the protective immune responses of CAET will be demonstrated in a separate study. As suggested, the discussion section now contains additional information emphasising the in vivo clearance of CAET cells in the 3rd paragraph of discussion section.

      (4) Long paragraphs in the discussion section could be divided into a bigger number of shorter paragraphs.

      Reply: Thank you for the suggestion, it has now been modified in the revised version (7 short paragraphs). To make it more comprehensive, some of the content has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is unclear how many cells were treated with 250 micromolar of EDTA for 6 hours before preparing the inoculum. It seems that only the OD was measured before adding EDTA. This is not a very rigorous and reproducible method.

      Reply: In this manuscript, we have repeatedly used the same protocol to generate CAET cells for various analyses. The O.D.600nm= 0.5 culture is equivalent to 107 C. albicans cells per mL and this information has now been added in the revised manuscript.

      (2) Upon treatment with 250 micromolar of EDTA, cells were harvested and counted to prepare the inoculum (5x10e5) for injecting it in mice. However, it appears that CFU of the inoculum was not done. Based on data shown in Fig. 1B, 250 micromolar of EDTA does inhibit Candida cell replication. Thus, the authors may have counted dead cells and, thus, injected dead cells together with live cells for the CAET inoculum. Thus, mice receiving this inoculum may have been infected (and vaccinated) with a lower number of live Candida cells.

      Reply: Please see a similar response to reviewer #1. EDTA has minimal effect on the survival of C. albicans cells at 6 hr (also see supp. Fig. S2). We have already mentioned the CFU analysis of untreated and CAET cells in the methodology section related to inoculum preparation.

      (3) It is unclear if 6 hours of treatment with 250 micromolar of EDTA is enough to induce a block of Candida cell replication. In Figure 1B, the authors treated for 24h. The authors are encouraged to wash the cells after 6 hours of treatment and see if their cell division will recover upon removal of EDTA.

      Reply: Thank you for the suggestion. At 6 hr treatment, survivability of C. albicans cells was unaffected upon EDTA exposure. PI and SYTOX GREEN staining confirmed it (Supp. Fig. 2). Additionally, as suggested a rescue experiment was carried out by exogenous addition of divalent metals after 6 hr EDTA treatment and growth/CFU analyses were followed thereafter. A modified Fig. 1 A and B with new data has been provided.

      (4) The data shown in Figure 5A is extremely exciting. However, the number of mice in each group (n=6) is too low. Normally, 10 mice per group are used for virulence studies unless the authors provide a power analysis that 6 mice per group will be sufficient. Also, CFU data were only provided for Ca and saline-Ca groups (Fig. 5B) and not for the other groups. CFU data should be provided for all mice.

      Reply: Thank you for the suggestion and a statistical analysis of Fig. 5A was provided in the revised version. The rationale behind not including all mice groups in Fig. 5B is already explained in a response to reviewer #2.

      (5) It is unclear how the authors differentiate between CFU arising from CAET or from WT Candida.

      Reply: Since the Fig 5 E demonstrated that no CAET cells were detected in the kidney beyond 10 days of inoculation, in the re-challenged mice group (1CAET 2 Ca), the fungal cells those detected in the 3rd and 7th days were from the later inoculated cells (brown colour).

      (6) Figure 5E: it is unclear if a 1 saline-2 saline (Figure legend) or if 1 saline-2 Ca (text) group was included. If the latter, where are the CFU? It is impossible that 1 saline-2 Ca mice have no CFU.

      Reply: Thank you so much for pointing this out. The legend has now been modified that include 1saline-2saline and 1CAET-2Ca.

      (7) It seems that CFU is significantly present in the kidney in the 1 CAET - 2 Ca group at day 7 but not at day 3. How is this possible? This is an extremely invasive model of infection, and the authors are challenging intravenously 500,000 live Candida cells. If by the 3rd day, the authors detect no CFU, then how is it possible that CFUs are arising on day 7?

      Reply: We do detect fungal cells on 3rd day in 1CAET 2 WT mice group (~2000 cells), albeit much lower than in 7 days (~11200 cells). A Log10 scale graph has now been provided for better representation.

      (8) Most importantly, if the authors are not detecting CFU at day 3, then earlier time points (e.g. day 2, day 1, or even 12 hours post-challenge) must be analyzed. The authors should show that CFU from the organs is decreasing in a time-dependent manner. Also, all CFU should be shown as Log10.

      Reply: please see the previous response.

      (9) Fig. 6: because it is unclear if the mice were challenged with the same inoculum of live Candida cells (untreated and treated with EDTA), the different cytokine profiles between the two groups could be simply due to the different inoculum sizes and not to the effect of EDTA on Ca.

      Reply: please see the previous response as given also for Reviewer 1.

    2. eLife assessment

      This study presents a useful strategy in which the authors devised a simple method to attenuate Candida albicans and deliver a live whole-cell vaccine in a mouse model of systemic candidiasis. The reviewers are not convinced about the completeness of the study: the strength of the evidence is incomplete and could be augmented with additional experiments to more fully characterize vaccine efficacy and host immune responses.

    3. Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of the drug resistance, developing antifungal vaccine is a high priority. In this study, authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTA treated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and down regulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with a translational potential.

      Strengths:

      The main strength of the report is that authors identified a potential whole cell live vaccine strain that can provide a full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET mediated host protection remain unclear. The immune data is somewhat confusing. Authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Another potential concern is that using live wild type Candida cells treated with EDTA may still have chance to evolve and become infectious, considering that these treated cells still proliferate in vivo. Some of the gene regulation profiles may be transit and subjected to reverse, adding to the safety concern.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is on the use of this such EDTA treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Fig. 5 and in Fig. 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. Specific points are below.

      Methodology used is also an issue. As it stands, the impact is minor, if any.

      Comments on revised version:

      The data provided in the revised paper are simply not satisfactory and do not give confidence that a rigorous design and methodologies were used to obtain the results illustrated in this paper.

    1. eLife assessment

      This valuable study assesses through simulations how several known features of local cortical circuits - interneuron subtypes, their specific targeting of dendritic compartments, and certain brain rhythms - together affect the integration of synaptic inputs by a pyramidal cell into a spiking output signal. Employing several carefully considered simulation setups they convincingly demonstrate that beta rhythms are best suited to modulate and control dendritic Ca-spikes while gamma rhythms affect their coupling to somatic spiking, or how basal inputs are directly integrated into somatic spikes. However, the baseline setup may be idealized for the generation of the events in question and it would be beneficial if the similarity to the in-vivo activity regime was demonstrated further. The results will be relevant for neuroscientists studying local circuits or developing more abstract theories at the systems level.

    2. Reviewer #1 (Public Review):

      In this study, the authors explore the implications of two types of rhythmic inhibition - "gamma" (30-80 Hz) and "beta"(13-30Hz) - for synaptic integration. They study this in a multi-compartmental model L5 pyramidal neuron with Poisson excitation and rhythmic inhibition (16 Hz and 64 Hz), applied either to the perisomatic or apical tuft regions in the neuron. They find that 64 Hz inhibition applied to the cell body is effective in phasic modulation of AP generation, while 16 Hz inhibition applied to the apical tufts is effective in phasic modulation of dendritic spikes (in addition to APs). Switching the location of the two kinds of rhythmic inhibition reduces the overall excitability, but is not effective in phasic modulation of either dendritic spikes and weakly so for somatic APs.

      Strengths:

      The effect of the timescale of rhythmic inhibition on synaptic integration is an interesting question, since a) rhythmic spiking is most strongly evident in inhibitory population, b) rhythmic spiking is modulated by behavioral states and the sensory environment. The methods are clear and the data are well-presented. The study systematically explores the effect of two frequencies of rhythmic inhibition in a biophysically detailed model. The study considers not only idealized rhythmic inhibition but also the bursty kind that is observed in in-vivo conditions. Both distributed and clustered excitatory synaptic organization are simulated, which covers the two extremes of the spatial organization of excitatory inputs in-vivo.

      Weaknesses:

      SOM+ interneurons such as Martinotti cells target the apical tufts of pyramidals in the cortex. Since interneurons in general are strongly implicated in mediating rhythmic population activity over a range of timescales, it is quite appropriate to study the consequence of rhythmic inhibition provided by SOM+ interneurons for synaptic integration, including the phenomenon of dendritic spikes. However, using conclusions from a singular study (ref 22) to identify the beta band as the rhythm mediated by SOM+ is not very accurate. SOM+ interneurons have been implicated in regulating rhythms centered just below 30 Hz (refs 22, 21). It is a range that lies in the grey zone of the traditional definition of beta and gamma. However, it is significantly higher than the 16 Hz rhythms explored in this study. It thus remains unknown how a 25-30 Hz rhythmic inhibition (that has an experimentally suggested role for dendrite targeting SOM+ INs) in apical tufts regulates dendritic spikes.

      Distal dendritic inhibition has been previously shown to be more effective in controlling dendritic spikes. However, given the slow timescale of dendritic spikes, it can be hypothesized that high-frequency rhythmic inhibition would be ineffective in entraining the dendritic spikes either in distal or proximal location, as demonstrated by 4H and 5F, and vice versa. A computational study can take this further by exploring the robustness of this hypothesis. By sticking to a single-frequency definition of what constitutes Gamma (64 Hz) and Beta (16 Hz) inhibition, the current exploration does support the core hypothesis. However, given the temporal dynamics of dendritic spikes, it is valuable to learn, for example, the upper bound of "Beta" range (13-30Hz) inhibition that fails to phasically modulate them. In addition to the reason stated in the earlier paragraph, Alpha band activity (8-12 Hz), has been implicated (e.g. van Kerkoerle, 2014) in signaling of inter-areal feedback to the superficial layer in the cortex, potentially targeting apical tufts of pyramidals from multiple layers and resulting in alpha-range rhythmic inhibition. To make the findings significant, it might therefore be more pertinent to understand the consequences of ~10Hz rhythmic inhibition (in addition to the ~25-30 Hz Beta/Gamma) in the apical tufts for phasic modulation of dendritic spikes.

      The differential effect of Gamma and Beta range inhibition on basal and apical excitatory clusters is not convincing from the information provided. The basal cluster appears to overlap with perisomatic inhibitory synapses. The description in the methods does not have enough information to negate the visual perception (ln 979-81). With this understanding, it is not surprising that the correlation between excitation and APs is high (during the trough of gamma) for basal and not apical excitation. A more comparable scenario would be a more distal location of the basal excitatory cluster.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript illustrates how spatial targeting (perisomatic vs distal, apical, and basal dendritic) and timing of inhibition are crucial to distinct effects on neuronal integration and show that beta and gamma oscillations differentially engage dendritic spiking mechanisms.

      Strengths:

      The strength of this study lies in the integrative biophysical modelling of a layer 5 pyramidal neuron by bringing together in vitro and in vivo observations.

      Weaknesses:

      The weaknesses are probably in some of the parameterizations of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmc-portal/microcircuit.html), which is an incredible resource for cortical neurons and synapses.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors consider several known aspects of PV and SOM interneurons and tie them together into a coherent single-cell model that demonstrates how the aspects interact. These aspects are:<br /> (1) While SOM interneurons target distal parts of pyramidal cell dendrites, PV interneurons target perisomatic regions.<br /> (2) SOM interneurons are associated with beta rhythms, PV interneurons with gamma rhythms.<br /> (3) Clustered excitation on dendrites can trigger various forms of dendritic spikes independent of somatic spikes. The main finding is that SOM and PV interneurons are not simply associated with beta and gamma frequencies respectively, but that their ability to modulate the activity of a pyramidal cell "works best" at their assigned frequencies. For example, distally targeting SOM interneurons are ideally placed to precisely modulate dendritic Ca-spikes when their firing is modulated at beta frequencies or timed relative to excitatory inputs. Outside those activity regimes, not only is modulation weakened, but overall firing reduced.

      Strengths:

      I think the greatest strength is the model itself. While the various individual findings were largely known or strongly expected, the model provides a coherent and quantitative picture of how they come together and interact.

      The paper also powerfully demonstrates that an established view of "subtractive" vs. "divisive" inhibition may be too soma-focused and provide an incomplete picture in cells with dendritic nonlinearities giving rise to a separate, non-somatic all-or-nothing mechanism (Ca-spike).

      Weaknesses:

      While the authors overall did an admirable job of simulating the neuron in an in-vivo-like activity regime, I think it still provides an idealized picture that it optimized for the generation of the types of events the authors were interested in. That is not a problem per se - studying a mechanism under idealized conditions is a great advantage of simulation techniques - but this should be more clearly characterized. Specifics on this are very detailed and will follow in the comments to authors.

      What disappointed me a bit was the lack of a concise summary of what we learned beyond the fact that beta and gamma act differently on dendritic integration. The individual paragraphs of the discussion often are 80% summary of existing theories and only a single vague statement about how the results in this study relate. I think a summarizing schematic or similar would help immensely.

      Orthogonal to that, there were some points where the authors could have offered more depth on specific features. For example, the authors summarized that their "results suggest that the timescales of these rhythms align with the specialized impacts of SOM and PV interneurons on neuronal integration". Here they could go deeper and try to explain why SOM impact is specialized at slower time scales. (I think their results provide enough for a speculative outlook.)

      Beyond that, the authors invite the community to reappraise the role of gamma and beta in coding. This idea seems to be hindered by the fact that I cannot find a mention of a release of the model used in this work. The base pyramidal cell model is of course available from the original study, but it would be helpful for follow-up work to release the complete setup including excitatory and inhibitory synapses and their activation in the different simulation paradigms used. As well as code related to that.

      Impact:

      Individually, most results were at least qualitatively known or at least expected. However, demonstrating that beta-modulation of dendritic events and gamma-modulation of soma spiking can work together, at the same time and in the same model can lead to highly valuable follow-up work. For example, by studying how top-down excitation onto apical compartments and bottom-up excitation on basal compartments interacts with the various rhythms; or what the impact of silencing of SOM neurons by VIP interneuron activation entails. But this requires - again - public release of the model and the code controlling the simulation setups.

      Beyond that, the authors clearly demonstrated that a single compartment, i.e., only a soma-focused view is too simple, at least when beta is considered. Conversely, the authors were able to describe the impact of most things related to the apical dendrite on somatic spiking as "going through" the Ca-spike mechanism. Therefore, the setup may serve as the basis of constraining simplified two-compartment models in the future.

    1. eLife assessment

      This valuable paper presents convincing evidence that changing the constraint of how long to stop at an intermediate target significantly influences the degree of coarticulation of two sequential reaching movements, as well as their response to mechanical perturbations. Using an optimal-control framework, the authors offer a normative explanation of how both co-articulated and separated sequential movement can be understood as an optimal solution to the task requirements.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, Kalidini and Crevecoeur ask why sequential movements are sometimes coarticulated. To answer this question, first, they modified a standard optimal controller to perform consecutive reaches to two targets (T1 and T2). They investigated the optimal solution with and without a constraint on the endpoint's velocity in the via target (T1). They observed that the controller coarticulates the movements only when there is no constraint on the speed at the via-point. They characterized coarticulation in two ways: First, T2 affected the curvature of the first reach in unperturbed reaches. Second, T2 affected corrective movements in response to a mechanical perturbation of the first reach.

      Parallel to the modeling work, they ran the same experiment on human participants. The participants were instructed to either consider T1 as via point (go task) or to slow down in T1 and then continue to T2 (stop task). Mirroring the simulation results, they observed coarticulation only in the go task. Interestingly, in the go task, when the initial reach was occasionally perturbed, the long-latency feedback responses differed for different T2 targets, suggesting that the information about the final target was already present in the motor circuits that mediate the long-latency response. In summary, they conclude that coarticulation in sequential tasks depends on instruction, and when coarticulation happens, the corrections in earlier segments of movement reflect the entirety of the coarticulated sequence.

      Evaluation

      Among many strengths of this paper, most notably, the results and the experiment design are grounded in, and guided by the optimal control simulation. The methods and procedures are appropriate and standard. The results and methods are explained sufficiently and the paper is written clearly. The results on modulation of long-latency response based on future goals are interesting and of broad interest for future experiments on motor control in sequential movement. However, I find the authors' framing of these results, mostly in the introduction section, somewhat complicated.

      The current version of the introduction motivates the study by suggesting that "coarticulation and separation of sub-movement [in sequential movements] have been formulated as distinct hypotheses" and this apparent distinction, which led to contradictory results, can be resolved by Optimal Feedback Control (OFC) framework in which task-optimized control gains control coarticulation. This framing seems complicated for two main reasons. First, the authors use chunking and coarticulation interchangeably. However, as originally proposed by (Miller 1956), the chunking of the sequence items may fully occur at an abstract level like working memory, with no motoric coarticulation of sequence elements at the level of motor execution. In this scenario, sequence production will be faster due to the proactive preparation of sequence elements. This simple dissociation between chunking and coarticulation may already explain the apparent contradiction between the previous works mentioned in the introduction section. Second, the authors propose the OFC as a novel approach for studying neural correlates of sequence production. While I agree that OFC simulations can be highly insightful as a normative model for understanding the importance of sequence elements, it is unclear to me how OFCs can generate new hypotheses regarding the neural implementation of sequential movements. For instance, if the control gains are summarizing the instruction of the task and the relevance of future targets, it is unclear in which brain areas, or how these control gains are implemented. I believe the manuscript will benefit from making points more clear in the introduction and the discussion sections.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors examine the question of whether discrete action sequences and coarticulated continuous sequential actions can be produced from the same controller, without having to derive separate control policies for each sequential movement. Using modeling and behavioral experiments, the authors demonstrate that this is indeed possible if the constraints of the policy are appropriately specified. These results are of interest to those interested in motor sequences, but it is unclear whether these findings can be interpreted to apply to the control of sequences more broadly (see weaknesses below).

      Strengths:

      The authors provide an interesting and novel extension of the stochastic optimal control model to demonstrate how different temporal constraints can lead to either individual or coarticulated movements. The authors use this model to make predictions about patterns of behavior (e.g., in response to perturbations), which they then demonstrate in human participants both by measuring movement kinematics as well as EMG. Together this work supports the authors' primary claims regarding how changes in task instructions (i.e., task constraints) can result in coarticulated or separated movement sequences and the extent to which the subsequent movement goal affects the planning and control of the previous movement.

      Weaknesses:

      I reviewed a prior version of this manuscript, and appreciate the authors addressing many of my previous comments. However, there are some concerns, particularly with regard to how the authors interpret their findings.

      (1) It would be helpful for the authors to discuss whether they think there is a fundamental distinction between a coarticulated sequence and a single movement passing through a via point (or equivalently, avoiding an obstacle). The notion of a coarticulated sequence brings with it the notion of sequential (sub)movements and temporal structure, whereas the latter can be treated as more of a constraint on the production of a single continuous movement. If I am interpreting the authors' findings correctly it seems they are suggesting that these are not truly different kinds of movements at the level of a control policy, but it would be helpful for the authors to clarify this claim.

      (2) The authors' model clearly shows that each subsequent target only influences the movement of one target back, but not earlier ones (page 7 lines 199-204). This stands in contrast to the paper they cite from Kashefi 2023, in which those authors clearly show that people account for at least 2 targets in the future when planning/executing the current movement. It would be useful to know whether this distinction arises because of a difference in experimental methodology, or because the model is not capturing something about human behavior.

      (3) In my prior review I raised a concern that the authors seem to be claiming that because they can use a single control policy for both coarticulated and separated movement sequences, there need not be any higher-level or explicit specification of whether the movements are sequential. While much of that language has been removed, it still appears in a few places (e.g., p. 13, lines 403-404). As previously noted, the authors' control policy can generate both types of movements as long as the proper constraints are provided to the model. However, these constraints must be specified somewhere (potentially explicitly, as the authors do by providing them as task instructions). Moreover, in typical sequence tasks, although some movements become coarticulated, people also tend to form chunks with distinct chunk boundaries, which presumably means that there is at least some specification of the sequential ordering of these chunks that must exist (otherwise the authors' model might suggest that people can coarticulate forever without needing to exhibit any chunk boundaries). Hence the authors should limit themselves to the narrow claim that a single control policy can lead to separated or coarticulated movements given an appropriate set of constraints, but acknowledge that their work cannot speak to where or how those constraints are specified in humans (i.e., that there could still be an explicit sequence representation guiding coarticulation).

    1. Author response:

      Reviewer #1

      […] it seems that the readout units are not operating in continuous time, and that interval discrimination relies in part on external information. Specifically, the readout units only look at the spike counts during the window delta_t_w.

      In the first version of the review, the reviewer implied that each readout unit only receives input during a small window around the interval it represents. However, this is not the case. The small window that is depicted in Fig. 16 is a sliding window that is used to compute the states (i.e., an estimate of the instantaneous firing rate) at each point in time. The fact that the readout units indeed do operate in continuous time is apparent from Fig. 2A, showing the activity of all output units as a function of time: There is gradually changing activity with a peak at the represented interval. If each unit would only receive input during a window of a couple milliseconds, there would be a single peak of activity at the represented interval, and near-zero activity at any other time.

      This misunderstanding has been cleared out in the current version of the review (see last paragraph of review #1).

      Stimulus onset occurs at 1500 ms in order to allow the network to stabilize. Ideally, this value should be randomized across trials to ensure performance generalizes across initial states.

      This is a valid point which we will address in the revision. However, we note that experimentation with different onset values did not change the dynamics of the network systematically in previous studies (i.e., Hass et al., 2022).

      Why does StDev saturate? Is that because subjective time saturates as well?

      Indeed, the two phenomena are closely related. In section “Deviations from the scalar property and the origin on Vierordt’s law”, we discuss that both is caused by the broadening of the tuning curves of the readout units (Fig 1A) as the longest time constants of the network are exceeded.

      In the discussion, it would be nice to explain that dopaminergic modulation of subjective timing is not as universally observed as the linear psychophysical law or the scalar property, and I believe somewhat controversial (e.g., Ward, ..., Balsam, 2009).

      We are thankful for this advice and will adapt the discussion accordingly in the revision. Still, we note that dopaminergic modulation of subjective timing is one of the more robust effects observed in several time perception experiments.

      Reviewer #2:

      (1) Lack of Empirical Data: […] The paper would benefit from quantitative and qualitative simulations of results from specific, large-sample studies to anchor the model's predictions in concrete empirical evidence.

      While it is correct that this study does not attempt the replicate a concrete empirical study, we note that do compare the model's results with specific studies wherever possible. The comparison is done on the level of parameters of functional relationships: For the linear psychophysical law, we compare the slope and the indifference point of the model with those from experimental studies. For the scalar property, we compare the Weber fraction of the model to those computed from experiments. For dopaminergic modulation of subjective duration, no direct comparison with experimental data is possible, as the levels of modulation are estimated from in vitro experiments and cannot be directly compared with modulations in vivo. However, we discuss a range of qualitative observations in experiments that are reproduced (and explained) by the model.

      The above arguments notwithstanding, one can discuss whether the presentation of the experimental results and the comparison with the simulations is appropriate, and we do plan to extend this presentation in a revision.

      (2) Methodological Ambiguities: The training and testing procedures lack robust checks for generalization, leading to potential overfitting issues.

      It is correct that formal checks for generalization, such as cross-validation protocols, are missing, and we will include them in the revision. However, as we obtained a mechanistic understanding of how the model tells time, we are confident that our results are not due to overfitting.

      (3) Inadequate Visualization of Empirical Data: References to empirical data are vague and not directly visualized alongside model outputs. Future iterations should include empirical data, not general trends from psychophysics, in figures for a clear comparison.

      As mentioned above, the comparison between simulation and empirical data will be extended in a revision. However, we argue that the “general trends”, namely adherence of the model to the often-reported psychophysical regularities, are of greater importance compared to the replication of, e.g. one specific slope of the linear psychophysical law, which does vary a lot between experiments.

      (4) Limitations in Model Scope and Dynamics: […] Expanding the model limitations to consider isochronous pulse processing and the emergence of limit-cycle behaviors after prolonged stimulation would provide a more comprehensive understanding of the model's capabilities and limitations.

      The current research focuses on the estimation of a single duration rather than the processing of sequences of durations. Sequence processing is a vast field, and it has been argued that it comprises different mechanisms compared to duration estimation. Thus, we feel that including sequences processing would be beyond the scope of the already quite extensive paper. However, we will discuss a possible extension of the model to sequence processing in the revision.

      Additionally, the justification for using(N_{Poisson}\) as a proxy for more connections is unclear and warrants a more direct approach.

      We considered different means to vary the noise input into the network, including changes in the number of connections. We ultimately chose to vary the firing rate of a fixed number of Poisson input neurons. As the sum of the firing rates of N independent Poisson neurons with the same f is simply N*f and the synaptic contributions from each spike also linearly add up, this is equivalent to adding more Poisson neurons and thus, more connections.

      (5) Omissions and Redundancies: Certain omissions, such as the lack of a condition in Figure 7A or missing references to relevant models and reviews, detract from the paper's thoroughness.

      The reviewer refers to a condition where everything is ablated except NMDA. We will include such a condition in the revision. Regarding missing references, the reviewer requests including references that focus on sequence processing. While the focus of the current work is on estimating a single duration rather than a sequence of durations (see above), we will include a review on this topic as an outlook on this possible extension of the model.

      Moreover, some statements and terms like "internal clock" are used without a clear mechanistic definition within the model.

      We are thankful for this advice and will adapt the revision accordingly.

    2. Reviewer #1 (Public Review):

      Summary:<br /> This paper addresses the important question of the neural mechanisms underlying interval discrimination. The authors develop a detailed and biologically plausible model based on a previously proposed theory of timing. The model proposes that the interval between two stimuli can be encoded in the state of the neuronal and synaptic properties, specifically those with time constants on the order of hundreds of milliseconds, such as short-term synaptic plasticity and GABAb currents. Based on biological parameters in the PFC the authors show that the model can account for interval discrimination for up to 750 ms. Furthermore, the model accounts for three well-established psychophysical properties of interval timing: the linear relation between objective and neural time, the scalar property/Weber's law, and dopaminergic modulation of timing (although this property is less robust). Of particular novelty is the demonstration of Weber's law, and an explanation of how many complex and nonlinear neuronal properties produce a linear relationship between the standard deviation of interval estimates and their mean.

      This is an interesting paper that addresses a significant gap in the field. However, I have one major concern. As I understood the methods (and I may have misunderstood) it seems that the readout units are not operating in continuous time, and that interval discrimination relies in part on external information. Specifically, the readout units only look at the spike counts during the window delta_t_w. Thus, discrimination between 100 and 200 ms looks only at the spikes at 120-145 and 220-245, respectively, meaning that the experimenters are providing interval information for the readout of the intervals being discriminated. If this is indeed the case the model is fairly limited in biological plausibility and significantly dampens my enthusiasm for the paper.

      Stimulus onset occurs at 1500 ms in order to allow the network to stabilize. Ideally, this value should be randomized across trials to ensure performance generalizes across initial states.

      Why does StDev saturate? Is that because subjective time saturates as well?

      The model captures the effect of D2 receptors observed in some timing studies, specifically and DR2 activation increases "clock" speed. In the discussion, it would be nice to explain that dopaminergic modulation of subjective timing is not as universally observed as the linear psychophysical law or the scalar property, and I believe somewhat controversial (e.g., Ward, ..., Balsam, 2009).

      (NB: Regarding my potential concern that that the decoding was performed in discontinuous time, the authors have clarified that decoding was done in continuous time--i.e., each output unit was trained to respond to a given time bin of the target interval but exposed to all time bins of all intervals during testing. Thus confirming the robustness of their decoding procedure and model.)

    3. eLife assessment

      This useful paper explores a mathematical model of subsecond time perception, testing potential neural mechanisms behind the linear psychophysical law, Weber's law, and dopaminergic modulation of subjective durations. The model employed readout units to decode an interval. Nevertheless, the work is incomplete and presented as data-driven, but there is no analysis of empirical data.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The paper explores a mathematical model of subsecond time perception, engaging with established theories such as the linear psychophysical law, Weber's law, and dopaminergic modulation of subjective durations. While it ambitiously attempts to confirm specific mechanisms of time perception and presents a comprehensive description of these mechanisms, the work is presented as data-driven but its empirical backing and model generalization capabilities are questionable. The title's implication of a robust empirical foundation is misleading, as the main figures do not reflect empirical data directly but rather model outputs aligned with general trends in psychophysical studies. This disjunction raises concerns about the model's applicability and the strength of the claims made regarding time perception mechanisms.

      Strengths:<br /> (1) The paper describes specific mechanisms of time perception, providing a theoretical examination of linear psychophysical law, Weber's law, and dopaminergic modulation. This aspect is valuable for readers seeking a theoretical understanding of temporal perception.

      (2) The authors describe a range of psychophysical studies and theories, attempting to position their model within the broader scientific discourse on time perception.

      Weaknesses:<br /> (1) Lack of Empirical Data: The absence of two things: 1) quantification of error between model and empirical data with interpretation of what this degree of error means, and 2) clear comparisons between model and empirical data in all figures and tables, to substantiate the model's predictions stands out. The reliance on general trends rather than specific empirical studies undermines the strength and reliability of the model's claims. The paper would benefit from quantitative and qualitative simulations of results from specific, large-sample studies to anchor the model's predictions in concrete empirical evidence.

      (2) Methodological Ambiguities: The training and testing procedures lack robust checks for generalization, leading to potential overfitting issues. Clarifications are needed on whether and how the model reaches a steady state before stimulation and the implications of the chosen model time constants in the absence of stimulation. The overlap between training (50ms) and testing (25ms) steps and the implications for model generalization need validation with "traditional" parameter fitting protocols, such as formal model cross-validation across well-defined datasets and splits, as well as evaluations to understand and assess potential overfitting.

      (3) Inadequate Visualization of Empirical Data: References to empirical data are vague and not directly visualized alongside model outputs. Future iterations should include empirical data, not general trends from psychophysics, in figures for a clear comparison.

      (4) Limitations in Model Scope and Dynamics: The exploration of limitations is narrowly focused on interval length and noise. Expanding the model limitations to consider isochronous pulse processing and the emergence of limit-cycle behaviors after prolonged stimulation would provide a more comprehensive understanding of the model's capabilities and limitations. Additionally, the justification for using \(N_{Poisson}\) as a proxy for more connections is unclear and warrants a more direct approach. Adding more units to a truly data-driven model should be trivial.

      (5) Omissions and Redundancies: Certain omissions, such as the lack of a condition in Figure 7A or missing references to relevant models and reviews, detract from the paper's thoroughness. Moreover, some statements and terms like "internal clock" are used without a clear mechanistic definition within the model.

      Guidance for Readers<br /> Readers should approach this paper as a theoretical exploration into the mechanisms of subsecond-time perception. The model offers a detailed theoretical framework that engages with established laws and theories in time perception. However, it's crucial to note the model's reliance on general trends and its lack of direct empirical backing. The findings should be interpreted as a hypothesis-generating exercise rather than conclusive evidence.

    1. Reviewer #3 (Public Review):

      Summary:

      In this study the authors tested for alterations in selection intensity across ~13,000 protein coding genes along the gorilla lineage in order to test the hypothesis that the evolution of a polygynous social system resulted in relaxed selective constraint through a reduction in sperm competition. Of these genes, 578 exhibited signatures of relaxed purifying selection that were enriched for functions in male germ cells including meiosis and sperm biology. These genes were also more likely expressed in male germ cells and to contain deleterious mutations. Functional analysis of genes not previously implicated in male reproduction identified 41 new genes essential to male fertility in a Drosophila model. Moreover, genes under relaxed selective constraint in the gorilla lineage were more likely to contain loss of function variants in a cohort of infertile men. The authors conclude their results support the hypothesis that the emergence of a polygynous social system may have reduced the degree of selective pressures exerted through sperm competition.

      Strengths:

      (1) The identification of novel genes involved in spermatogenesis using signatures of relaxed selective constraint coupled to in vivo RNAi in Drosophila is very exciting and offers a proof of principal as to the power of evolutionarily-informed functional genomics that has been largely underutilized.

      Weaknesses:

      (1) The analysis is restricted to protein-coding regions of genes that have single, orthologous sequences spanning 261 mammalian species, and as such is a non-random set of 13,310 genes that have higher evolutionary conservation. While this approach is necessary for the analyses being performed, it excludes non-coding regions, recently duplicated genes/gene families, and rapidly evolving genes, which are all likely subject to stronger selection as compared to evolutionarily conserved genes (and gene regions). Thus, the conclusions of relaxed selective constraint as being pervasive is likely missing a large number of the most strongly selected genes, among which have repeatedly been shown to include sex and reproduction related genes. Would the results be similar if the set of orthologous genes were restricted to the primate lineage, as it may include more rapidly evolving genes?

      (2) The identification of genes showing relaxed selection along the gorilla lineage, which are overrepresented in male reproduction, supports the hypothesis that the emergency of polygyny resulted in relaxed sperm competition and is the driving force behind their observations. However, there is no control group to support that polygyny is the driving force. To more fully test this hypothesis the authors should consider contrasting their findings to observations for other species whereby polygyny did not evolve (or a gradation between). Ideally this could be integrated into RELAX-Scan comparisons, but even a semi-qualitative observation could be made for lineages more often having shared signatures of relaxed constraint across the 576 genes identified in gorilla.

      (3) The comparisons of infertile human males to a large number of presumably healthy males from a separate cohort can lead to genetic differences related to population structure and/or differences in study recruitment as compared to infertility, and care must be taken to avoid confounding in any association study before drawing conclusions. Population structure is likely to occur in human cohorts and is more likely to affect patterns of rare variation, even when controls are ascertained using similar enrollment criteria, geographic regions, racial/ethnic and national identities. In this study, the MERGE cohort upon a quick search appears to be largely recruited from Germany, vs. the control cohort gnomeAD is a more cosmopolitan study including somewhat diverse ancestries. Thus, it is likely the infertile vs. control cohort has existing genetic differences unrelated to the phenotype.

    2. eLife assessment

      This important work reports that genome-wide patterns of relaxed purifying selection on genes involved in male fertility may represent a response to the reduced sperm competition in the gorillas' mating system. However, the evidence supporting the conclusion is incomplete and needs to be strengthened. This work will be of interest to researchers working on evolution and reproductive biology.

    3. Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

    4. Reviewer #2 (Public Review):

      Summary:

      Bowman and colleagues have compiled a large comparative genomic dataset to examine the molecular evolution of genes in mammals, with the primary goal of identifying how changes in the gorilla mating system have shaped the evolution of spermatogenesis. They report several patterns pointing to signal of relaxed purifying selection on genes involved in male fertility, a pattern that they interpret as a response to changes in the mating system of gorillas. Many previous studies have used comparisons among species of primates and other mammals to understand how changes in mating systems have shaped the evolution or reproductive traits and genes. These collective works have provided some of the best evidence that changes in the form and intensity of sexual selection has had a strong effect on the evolution of male reproduction. The current study builds on this rich history by exploring molecular evolution of over 13,310 genes across 261 mammals. This very large phylogenetic dataset allows affords considerable power to characterize patterns of molecular evolution along the gorilla lineage. This allows for some added power relative to a previous study that interrogated the same lineage-specific patterns (Scally et al. 2021). They report a subset of genes showing evidence for either positive directional selection (less than 1% of genes) or relaxed purifying selection (4% of genes) in gorillas. Relaxed purifying selection is more common than positive selection, and genes showing signatures of relaxed constraint are enriched for spermatogenesis functions using various tests based on functional annotation or gene expression and infertility associations in humans and mice. The authors also report new functional data - the only original data in this study - using a high throughput genetic screen showing that some of these genes are also expressed in spermatogenesis in flies, and when perturbed they affect male fertility.

      These results are interpreted as strong evidence that changes in mating system, specifically that loss of sperm competition, has shaped the evolution of male reproduction in gorillas. The authors argue that these discoveries illustrate, for the first time, the genome-wide effect of striking changes in mating behavior in gorillas on the genetic underpinnings of male reproduction and provide new candidates relevant to male fertility in humans. Support for these central conclusions is eroded by a lack of appropriate comparative contrasts needed clarify the uniqueness of these patterns to gorillas and, critically, establish a direct phylogenetic association with mating system or correlated reproductive traits.

      Strengths:

      The presentation is engaging, clear, and easy to follow throughout. I enjoyed reading the overall narrative and I think that the authors did a good job of presenting the details of male reproductive biology in an informative and accessible manner. Given the general interest in gorilla evolution, and the clear relevance to humans, studies of this scope on male reproductive biology are likely to be of broad interest to both evolutionary and reproductive biologists.

      The reported signatures of molecular evolution in gorillas appear robust, well-executed, and supported by several lines of evidence that establish some links with male reproduction. The authors have presented a series of molecular evolution analyses that demonstrate both rigor and attention to analytical details and quality control. Although all the primary sequence data has been previously published by others, the compilation of a high-quality curated comparative dataset of this scale is impressive and inspires confidence in the underlying molecular results. Likewise, the incorporation of diverse other data from mice and humans helps shape the overall narrative. To my knowledge, this represents the most focused and detailed analysis of protein-coding evolution specific to gorillas to date (although parallel results from the landmark gorilla genome study - Scally et al. 2012 - are downplayed somewhat).

      Likewise, the inclusion of new functional data from Drosophila establishes a subset of genes showing recent changes in molecular evolution in gorillas that appear to be both deeply conserved in animals and related to male fertility.

      Weaknesses:

      This study lacks the necessary comparative framework needed to ascribe any of the reported patterns to changes in the reproductive system of gorillas, or to really understand the uniqueness of these patterns relative to other species. Although wording is careful at times, the authors repeatedly ascribe the patterns they are finding directly to the specific changes in mating system biology that has occurred in gorillas. The general framing and significance rests on the central finding that "these data provide compelling evidence that reduced sperm competition in gorillas is associated with relaxed purifying selection on genes related to male reproductive function (Abstract)". No such association between variation in mating system or at any correlated reproductive traits and molecular evolution is ever directly tested let alone established as a clear statistical correlation. The massive comparative dataset is used to localize patterns of molecular evolution to the gorilla lineage and then these patterns are interpreted in the context of changes in mating system, as an assumption of the study not a direct result. Although basic information of the reproductive system (or correlates thereof) likely exists for many of the 261 species included here, this information is never used to test for a relationship between changes in positive or purifying selection and reproduction.

      The lack of any such comparisons is especially curious given that there are many previous studies that have sought and established such connections for traits and/or genes in mammals (dozens now?), and especially great apes, before. This comparative approach is the gold standard to making claims linking mating system to molecular evolution and yet this is not pursued here. The authors are correct in that they provide a rigorous genome-wide analysis (but not at all for the first time, see Scally et al. 2012), but they skip this critical central step to rigorous inference in comparative genomics. This is essentially a broad comparative study, but the central conclusion (a direct link between mating system and molecular evolution) is speculative and not actually tested.

      Note that despite the framing here, there are of course several aspects of lineage specific biology that undoubtedly shape molecular evolution of male reproduction and fertility but could be unrelated to sperm competition per se. For example, shift in operational sex ratios can have profound effects on effective population sizes and the efficacy of selection, which of course would be expected to change the intensity and direction of molecular evolution. Likewise, shifts in population size, structure, and diet all can affect molecular evolution and reproduction.

      In the absence of a broad phylogenetically independent contrast (which would be really interesting here), the authors need to at least establish that there is indeed something noteworthy about the specific findings they report relative to other systems that have a different mating system. Such comparisons would be readily available within the great apes, especially compared to chimpanzees and bonobos (Pan). Most of the patterns are presented in such a way to suggest a clear connection between the result and the unique features of gorilla reproduction, but are these clearly outliers? Relaxed purifying selection is much more common than positive selection, is this result qualitatively or quantitatively unique to gorillas as implied (I would honestly be surprised if it was as this is a common outcome of these dn/ds-based tests)? Similar questions and the need for more context apply to the various enrichment tests. That genes involved in male reproduction evolve rapidly and that this reflects both relaxed constraint and positive selection is an exceptionally well-established pattern, as is enrichment for reproductive functions/expression of such genes in unbiased genome-wide screens (as cited by the authors, including in gorillas by Scally et al. 2012 who performed a very similar analysis albeit with some model advances used in the current study). Do chimpanzees or humans lack these specific signatures of relaxed constraint at reproductive genes or is it a much stronger enrichment in gorillas? Establishing these baseline comparisons would help a lot with interpretation of the core findings. A little bit of this is explored with the human comparisons but not in a parallel genome-wide manner that places the signatures in gorillas in context.

      I had similar questions related to the high-throughput Drosophila screen. This is a creative and novel component of the study. However, I am unclear on how to interpret the results or the conclusions drawn from them. It is very interesting that a subset of genes showing relaxed constraint are conserved to Drosophila and that perturbation of some of these cause fertility issues. However, the conclusion that these genes reflect novel candidates not implicated in sperm biology is a bit overstated. Here implicated means genes with an annotated sterility phenotype in humans, mice, flies, or gorillas - specific annotations which are pretty limited at least in the mammalian systems. The entire design was conditioned on analyzing genes that were reliably expressed during Drosophila spermatogenesis, and then focusing on those. But the comparative set for the enrichment test was a random set of genes. Shouldn't the background be a random set of testis-expressed genes? I would say that genes that are reliably expressed during spermatogenesis in both mammals and flies are implicated in sperm biology and genetic manipulation of such genes would be expected to produce fertility phenotypes at some appreciable rate. So the result here adds some interesting data but it does not seem unexpected or significant as framed.

    1. eLife assessment

      This valuable study explores the sequence characteristics and conservation of high-occupancy target loci, which are genomic regions bound by a multitude of transcription factors, at promoters and enhancers throughout the human genome. The computational analyses presented in this study are solid, although the evidence for some claims is inadequate. This study would be a helpful resource for researchers performing ChIP-seq based analyses of transcription factor binding.

    2. Reviewer #3 (Public Review):

      Summary:

      Hudaiberdiev and Ovcharenko investigate regions within the genome where a high abundance of DNA-associated proteins are located and identify DNA sequence features enriched in these regions, their conservation in evolution, and variation in disease. Using ChIP-seq binding profiles of over 1,000 proteins in three human cell lines (HepG2, K562, and H1) as a data source they're able to identify nearly 44,000 high-occupancy target loci (HOT) that form at promoter and enhancer regions, thus suggesting these HOT loci regulate housekeeping and cell identity genes. Their primary investigative tool is HepG2 cells, but they employ K562 and H1 cells as tools to validate these assertions in other human cell types. Their analyses use RNA pol II signal, super-enhancer, regular-enhancer, and epigenetic marks to support the identification of these regions. The work is notable, in that it identifies a set of proteins that are invariantly associated with high-occupancy enhancers and promoters and argues for the integration of these molecules at different genomic loci. These observations are leveraged by the authors to argue HOT loci as potential sites of transcriptional condensates, a claim that they are well poised to provide information in support of. This work would benefit from refinement and some additional work to support the claims.

      Comments:

      Condensates are thought to be scaffolded by one or more proteins or RNA molecules that are associated together to induce phase separation. The authors can readily provide from their analysis a check of whether HOT loci exist within different condensate compartments (or a marker for them). Generally, ChIPSeq signal from MED1 and Ronin (THAP11) would be anticipated to correspond with transcriptional condensates of different flavors, other coactivator proteins (e.g., BRD4), would be useful to include as well. Similarly, condensate scaffolding proteins of facultative and constitutive heterochromatin (HP1a and EZH2/1) would augment the authors' model by providing further evidence that HOT Loci occur at transcriptional condensates and not heterochromatin condensates. Sites of splicing might be informative as well, splicing condensates (or nuclear speckles) are scaffolded by SRRM/SON, which is probably not in their data set, but members of the serine arginine-rich splicing factor family of proteins can serve as a proxy-SRSF2 is the best studied of this set. This would provide a significant improvement to their proposed model and be expected since the authors note that these proteins occur at the enhancers and promoter regions of highly expressed genes.

      It is curious that MAX is found to be highly enriched without its binding partner Myc, is Myc's signal simply lower in abundance, or is it absent from HOT loci? How could it be possible that a pair of proteins, which bind DNA as a heterodimer are found in HOT loci without invoking a condensate model to interpret the results?

      Numerous studies have linked the physical properties of transcription factor proteins to their role in the genome. The authors here provide a limited analysis of the proteins found at different HOT-loci by employing go terms. Is there evidence for specific types of structural motifs, disordered motifs, or related properties of these proteins present in specific loci?

      Condensates themselves possess different emergent properties, but it is a product of the proteins and RNAs that concentrate in them and not a result of any one specific function (condensates can have multiple functions!)

      Transcriptional condensates serve as functional bodies. The notion the authors present in their discussion is not held by practitioners of condensate science, in that condensates exist to perform biochemical functions and are dissolved in response to satisfying that need, not that they serve simply as reservoirs of active molecules. For example, transcriptional condensates form at enhancers or promoters that concentrate factors involved in the activation and expression of that gene and are subsequently dissolved in response to a regulatory signal (in transcription this can be the nascently synthesized RNA itself or other factors). The association reactions driving the formation of active biochemical machinery within condensates are materially changed, as are the kinetics of assembly. It is unnecessary and inaccurate to qualify transcriptional condensates as depots for transcriptional machinery.

      This work has the potential to advance the field forward by providing a detailed perspective on what proteins are located in what regions of the genome. Publication of this information alongside the manuscript would advance the field materially.

    3. Reviewer #1 (Public Review):

      Summary:

      This study explores the sequence characteristics and features of high-occupancy target (HOT) loci across the human genome. The computational analyses presented in this paper provide information into the correlation of TF binding and regulatory networks at HOT loci that were regarded as lacking sequence specificity.

      By leveraging hundreds of ChIP-seq datasets from the ENCODE Project to delineate HOT loci in HepG2, K562, and H1-hESC cells, the investigators identified the regulatory significance and participation in 3D chromatin interactions of HOT loci. Subsequent exploration focused on the interaction of DNA-associated proteins (DAPs) with HOT loci using computational models. The models established that the potential formation of HOT loci is likely embedded in their DNA sequences and is significantly influenced by GC contents. Further inquiry exposed contrasting roles of HOT loci in housekeeping and tissue-specific functions spanning various cell types, with distinctions between embryonic and differentiated states, including instances of polymorphic variability. The authors conclude with a speculative model that HOT loci serve as anchors where phase-separated transcriptional condensates form. The findings presented here open avenues for future research, encouraging more exploration of the functional implications of HOT loci.

      Strengths:

      The concept of using computational models to define characteristics of HOT loci is refreshing and allows researchers to take a different approach to identifying potential targets. The major strengths of the study lies in the very large number of datasets analyzed, with hundreds of ChIP-seq data sets for both HepG2 and K562 cells as part of the ENCODE project. Such quantitative power allowed the authors to delve deeply into HOT loci, which were previously thought to be artifacts.

      Weaknesses:

      While this study contributes to our knowledge of HOT loci, there are critical weaknesses that need to be addressed. There are questions on the validity of the assumptions made for certain analyses. The speculative nature of the proposed model involving transcriptional condensates needs either further validation or be toned down. Furthermore, some apparent contradictions exist among the main conclusions, and these either need to be better explained or corrected. Lastly, several figure panels could be better explained or described in the figure legends.

    4. Reviewer #2 (Public Review):

      Summary:

      The paper 'Sequence characteristic and an accurate model of abundant hyperactive loci in human genome' by Hydaiberdiev and Ovcharenko offers comprehensive analyses and insights about the 'high-occupancy target' (HOT) loci in the human genome. These are considered genomic regions that overlap with transcription factor binding sites. The authors provided very comprehensive analyses of the TF composition characteristics of these HOT loci. They showed that these HOT loci tend to overlap with annotated promoters and enhancers, GC-rich regions, open chromatin signals, and highly conserved regions, and that these loci are also enriched with potentially causal variants with different traits.

      Strengths:

      Overall, the HOT loci' definition is clear and the data of HOT regions across the genome can be a useful dataset for studies that use HepG2 or K562 as a model. I appreciate the authors' efforts in presenting many analyses and plots backing up each statement.

      Weaknesses:

      It is noteworthy that the HOT concept and their signature characteristics as being highly functional regions of the genome are not presented for the first time here. Additionally, I find the main manuscript, though very comprehensive, long-winded and can be put in a shorter, more digestible format without sacrificing scientific content.

      The introduction's mention of the blacklisted region can be rather misleading because when I read it, I was anticipating that we are uncovering new regulatory regions within the blacklisted region. However, the paper does not seem to address the question of whether the HOT regions overlap, if any, with the ENCODE blacklisted regions afterward. This plays into the central assessment that this manuscript is long-winded.

      The introduction also mentioned that HOT regions correspond to 'genomic regions that seemingly get bound by a large number of TFs with no apparent DNA sequence specificity' (this point of 'no sequence specificity' is reiterated in the discussion lines 485-486). However, later on in the paper, the authors also presented models such as convolutional neural networks that take in one-hot-encoded DNA sequence to predict HOT performed really well. It means that the sequence contexts with potential motifs can still play a role in forming the HOT loci. At the same time, lines 59-60 also cited studies that "detected putative drive motifs at the core segments of the HOT loci". The authors should edit the manuscript to clarify (or eradicate) contradictory statements.

    1. Reviewer #1 (Public Review):

      Summary:

      By using the biophysical chromosome stretching, the authors measured the stiffness of chromosomes of mouse oocytes in meiosis I (MI) and meiosis II (MII). This study was the follow-up of previous studies in spermatocytes (and oocytes) by the authors (Biggs et al. Commun. Biol. 2020: Hornick et al. J. Assist. Rep. and Genet. 2015). They showed that MI chromosomes are much stiffer (~10 fold) than mitotic chromosomes of mouse embryonic fibroblast (MEF) cells. MII chromosomes are also stiffer than the mitotic chromosomes. The authors also found that oocyte aging increases the stiffness of the chromosomes. Surprisingly, the stiffness of meiotic chromosomes is independent of meiotic chromosome components, Rec8, Stag3, and Rad21L. with aging.

      Strengths:

      This provides a new insight into the biophysical property of meiotic chromosomes, that is chromosome stiffness. The stiffness of chromosomes in meiosis prophase I is ~10-fold higher than that of mitotic chromosomes, which is independent of meiotic cohesin. The increased stiffness during oocyte aging is a novel finding.

      Weaknesses:

      A major weakness of this paper is that it does not provide any molecular mechanism underlying the difference between MI and MII chromosomes (and/or prophase I and mitotic chromosomes).

    2. eLife assessment

      This valuable paper describes the stiffness of meiotic chromosomes in both oocytes and spermatocytes. The authors identify differences in stiffness between meiosis I and II chromosomes, as well as an age-dependent increase in stiffness in meiosis I (and meiosis II) chromosomes, results that are highly significant for the field of chromosome biology. The mechanisms underlying age-dependent changes in chromosome stiffness remain unclear, and the evidence to suggest that changes in stiffness are independent of cohesin, which is known to deteriorate with age, is incomplete.

    3. Reviewer #2 (Public Review):

      This paper reports investigations of chromosome stiffness in oocytes and spermatocytes. The paper shows that prophase I spermatocytes and MI/MII oocytes yield high Young Modulus values in the assay the authors applied. Deficiency in each one of three meiosis-specific cohesins they claim did not affect this result and increased stiffness was seen in aged oocytes but not in oocytes treated with the DNA-damaging agent etoposide.

      The paper reports some interesting observations which are in line with a report by the same authors of 2020 where increased stiffness of spermatocyte chromosomes was already shown. In that sense, the current manuscript is an extension of that previous paper, and thus novelty is somewhat limited. The paper is also largely descriptive as it does neither propose a mechanism nor report factors that determine the chromosomal stiffness.

      There are several points that need to be considered.

      (1) Limitations of the study and the conclusions are not discussed in the "Discussion" section and that is a significant gap. Even more so as the authors rely on just one experimental system for all their data - there is no independent verification - and that in vitro system may be prone to artefacts.

      (2) It is somewhat unfortunate that they jump between oocytes and spermatocytes to address the cohesin question. Prophase I (pachytene) spermatocytes chromosomes are not directly comparable to MI or MII oocyte chromosomes. In fact, the authors report Young Modulus values of 3700 for MI oocytes and only 2700 for spermatocyte prophase chromosomes, illustrating this difference. Why not use oocyte-specific cohesin deficiencies?

      (3) It remains unclear whether the treatment of oocytes with the detergent TritonX-100 affects the spindle and thus the chromosomes isolated directly from the Triton-lysed oocytes. In fact, it is rather likely that the detergent affects chromatin-associated proteins and thus structural features of the chromosomes.

      (4) Why did the authors use mouse strains of different genetic backgrounds, CD-1, and C57BL/6? That makes comparison difficult. Breeding of heterozygous cohesin mutants will yield the ideal controls, i.e. littermates.

      (5) How did the authors capture chromosome axes from STAG3-deficienct spermatocytes which feature very few if any axes? How representative are those chromosomes that could be captured?

    4. Reviewer #3 (Public Review):

      Summary:

      Understanding the mechanical properties of chromosomes remains an important issue in cell biology. Measuring chromosome stiffness can provide valuable insights into chromosome organization and function. Using a sophisticated micromanipulation system, Liu et al. analyzed chromosome stiffness in MI and MII oocytes. The authors found that chromosomes in MI oocytes were ten-fold stiffer than mitotic ones. The stiffness of chromosomes in MI mouse oocytes was significantly higher than that in MII oocytes. Furthermore, the knockout of the meiosis-specific cohesin component (Rec8, Stag3, Rad21l) did not affect meiotic chromosome stiffness. Interestingly, the authors showed that chromosomes from old MI oocytes had higher stiffness than those from young MI oocytes. The authors claimed this effect was not due to the accumulated DNA damage during the aging process because induced DNA damage reduced chromosome stiffness in oocytes.

      Strengths:

      The technique used (isolating the chromosomes in meiosis and measuring their stiffness) is the authors' specialty. The results are intriguing and informative to the chromatin/chromosome and other related fields.

      Weaknesses:

      (1) How intact the measured chromosomes were is unclear.

      (2) Some control data needs to be included.

      (3) The paper was not well-written, particularly the Introduction section.

      (4) How intact were the measured chromosomes? Although the structural preservation of the chromosomes is essential for this kind of measurement, the meiotic chromosomes were isolated in PBS with Triton X-100 and measured at room temperature. It is known that chromosomes are very sensitive to cation concentrations and macromolecular crowding in the environment (PMID: 29358072, 22540018, 37986866). It would be better to discuss this point.

    1. eLife assessment

      This important study addresses the challenge of antimicrobial resistance by targeting plasmid proteins that interfere with plasmid transfer as a novel strategy to limit the spread of antibiotic resistance genes. While the evidence presented is solid, the work would benefit from a clear integration of the approaches used and more thorough analyses to fully assess the effectiveness of this strategy. This study will interest those working on plasmid transfer and antimicrobial resistance.

    2. Reviewer #1 (Public Review):

      The study by Prieto et al. faces the increasingly serious problem of bacterial resistance to antimicrobial agents. This work has an important element of novelty proposing a new approach to control antibiotic resistance spread by plasmids. Instead of targeting the resistance determinant, plasmid-borne proteins are used as antigens to be bound by specific nanobodies (Nbs). Once bound plasmid transfer was inhibited and Salmonella infection blocked. This in-depth study is quite detailed and complex, with many experiments (9 figures with multiple panels), rigorously carried out. Results fully support the authors' conclusions. Specifically, the authors investigated the role of two large molecular weight proteins (RSP and RSP2) encoded by the IncHI1 derivative-plasmid R27 of Salmonella. These proteins have bacterial Ig-like (Big) domains and are expressed on the cell surface, creating the opportunity for them to serve as immunostimulatory antigens. Using a mouse infection model, the authors showed that RSP proteins can properly function as antigens, in Salmonella strains harboring the IncHI1 plasmid. The authors clearly showed increased levels of specific IgG and IgA antibodies against these RSP proteins proteins in different tissues of immunized animals. In addition, non-immunized mice exhibited Salmonella colonization in the spleen and much more severe disease than immunized ones.

      However, the strength of this work is the selection and production of nanobodies (Nbs) that specifically interact with the extracellular domain of RSP proteins. The procedure to obtain Nbs is lengthy and complicated and includes the immunization of dromedaries with purified RPS and the construction of a VHH (H-chain antibody variable region) library in E. coli. As RSP is expressed on the surface of E. coli, specific Nbs were able to agglutinate Salmonella strains harboring the p27 plasmid encoding the RSP proteins.<br /> The authors demonstrated that Nbs-RSP reduced the conjugation frequency of p27 thus limiting the diffusion of the amp resistance harbored by the plasmid. This represents an innovative and promising strategy to fight antibiotic resistance, as it is not blocked by the mechanism that determines, in the specific case, the amp resistance of p27 but it targets an antigen associated with HincHI- derivative plasmids. Thus, RPS vaccination could be effective not only against Salmonella but also against other enteric bacteria. A possible criticism could be that Nbs against RSP proteins reduce the severity of the disease but do not completely prevent the infection by Salmonella.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript aims to tackle the antimicrobial resistance through the development of vaccines. Specifically, the authors test the potential of the RSP protein as a vaccine candidate. The RSP protein contains bacterial Ig-like domains that are typically carried in IncHl1 plasmids like R27. The extracellular location of the RSP protein and its role in the conjugation process makes it a good candidate for a vaccine. The authors then use Salmonella carrying an IncHl plasmid to test the efficacy of the RSP protein as a vaccine antigen in providing protection against infection of antibiotic-resistant bacteria carrying the IncHl plasmid. The authors found no differences in total IgG or IgA levels, nor in pro-inflammatory cytokines between immunized and non-immunized mice. They however found differences in specific IgG and IgA, attenuated disease symptoms, and restricted systemic infection.

      The manuscript also evaluates the potential use of nanobodies specifically targeting the RSP protein by expressing it in E. coli and evaluating their interference in the conjugation of IncHl plasmids. The authors found that E. coli strains expressing RSP-specific nanobodies bind to Salmonella cells carrying the R27 plasmid thereby reducing the conjugation efficacy of Salmonella.

      Strengths:

      - The main strength of this manuscript is that it targets the mechanism of transmission of resistance genes carried by any bacterial species, thus making it broad.

      - The experimental setup is sound and with proper replication.

      Weaknesses:

      - The two main experiments, evaluating the potential of the RSP protein and the effects of nanobodies on conjugation, seem as parts of two different and unrelated strategies.

      - The survival rates shown in Figure 1A and Figure 3A for Salmonella pHCM1 and non-immunized mice challenged with Salmonella, respectively, are substantially different. In the same figures, the challenge of immunized mice and Salmonella pHCM1 and mice challenged with Salmonella pHCM1 with and without ampicillin are virtually the same. While this is not the only measure of the effect of immunization, the inconsistencies in the resulting survival curves should be addressed by the authors more thoroughly as they can confound the effects found in other parameters, including total and specific IgG and IgA, and pro-inflammatory cytokines.

      - Overall the results are inconsistent and provide only partial evidence of the effectiveness of the RSP protein as a vaccine target.

      - The conjugative experiments use very long conjugation times, making it harder to asses if the resulting transconjugants are the direct result of conjugation or just the growth of transconjugants obtained at earlier points in time. While this could be assessed from the obtained results, it is not a direct or precise measure.

      - While the potential outcomes of these experiments could be applied to any bacterial species carrying this type of plasmids, it is unclear why the authors use Salmonella strains to evaluate it. The introduction does a great job of explaining the importance of these plasmids but falls short in introducing their relevance in Salmonella.

    1. eLife assessment

      This valuable study reports on a series of artificial selection experiments for microbiomes that confer drought tolerance to rice plants. A major strength is the solid experimental design with multiple soils, which will likely guide others in designing their experiments, but the study has also shortcomings in that the rescuing effect is not benchmarked against healthy well-watered plants, the sterilized controls do not add much information, and the dispersal between inocula confounds the interpretation of the results. In addition, while the type of work presented here is a first step towards the eventual goal of plant microbiome engineering, that goal is still mainly an ambition. The abstract would benefit from this being made clear, and the presentation would overall benefit from more extensive consideration of recent developments in the field.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The study claims to explore plant microbiome engineering using host-mediated selection as a strategy to enhance rice growth and drought tolerance.

      Strengths:

      The authors have derived and identified simplified microbiomes from wild microbial communities of rice fields, deserts, and serpentine seep soils by selecting microbiomes from plants with desired phenotypes across generations. Metagenome-assembled genomes revealed enriched functions, such as glycerol-3-phosphate and iron transport, known to mediate plant-microbe interactions during drought.

      Weaknesses:

      The findings demonstrate the efficacy of host-mediated microbiome selection, but the engineering part for enhancing rice performance under drought-stress conditions has not been provided. The proposed mechanisms rely on correlations but not direct experimental proofs.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Styer et al. impose artificial selection on root-associated microbiomes to increase drought tolerance in rice plants using different soils as starting microbiomes. Using NDVI and biomass as a proxy for plant health, they find that iterative passaging of the microbiomes of the best-performing plants increased plant resilience to drought stress in a soil-dependent manner. The study makes use of numerous controls. The authors survey the microbiota of the plants across generations, using an array of interesting analyses to characterize their observations. Firstly, the authors find that the acquired microbiomes are divergent towards the beginning of the selection experiment, but nearly converge later suggesting that the selected communities become more similar over time. One reason is that the diversity of the microbiomes severely decreases after only one or two generations of selection AND that microbes from each inoculation source appear to easily disperse across the experiment, leading to microbiome homogeneity. The authors then present an analysis to correlate ASVs with the NDVI and Biomass over the course of the experiment (using the rice soil selection lines) to develop hypotheses about which ASVs may impact plant traits.

      Strengths:

      The authors set out to refine the understanding of microbiome artificial selection, a topic of recent interest to the plant microbiome field. The authors use an established approach (Mueller et al), expanding upon it by including multiple starting soil inocula to ask whether the strength of selection varies by input microbiome. This is an important and novel question. Using drought resilience as measured by NDVI and plant biomass to select upon was a wise choice for this type of study, given their relative ease and quickness to assess. The inclusion of several types of controls, multiple selection lines, and several starting soil inocula showed a thoughtful experimental design. The analyses were diverse, non-standard, and attempted to address microbiome dynamics on multiple fronts. I am not necessarily convinced by some of the conclusions (see below), however, I think this study examines an important and exciting topic in the area of plant microbiomes. I predict the findings of the experiments will inform a wide audience of researchers attempting similar studies and be helpful in their designs.

      Weaknesses:

      Although the controls were well designed, the dispersal of the microbiomes erased the utility of the sterile inoculated (SI) controls, at least from my reading of the manuscript. Perhaps the original intent of the SI plants was to contrast the selected microbiomes vs axenic plants to show that plant resilience to drought increased generation after generation. If the controls had worked properly under my presumed scenario, this would allow the authors to account for batch variation across the generations (due to slight differences in MS media prep, water quality, etc.). Instead, the SI lines acquired microbes from the experiment and never appeared to significantly deviate from the SL plants. The dispersal of the microbes amongst soils and selection lines also minimizes any conclusions that can be made about the different starting inocula and how prone to selection they may be.

    4. Reviewer #3 (Public Review):

      Summary:

      In this work, Styer et al. explore host selection as a means for recruiting microbes that may aid their host under stressful conditions, in this case under drought stress, as an alternative to target-SynCom design. They do so by subjecting rice plants to several generations of soil transplantation, and by using the most successful rice plants as donors for the next generation. By using several NGS approaches and very thorough bioinformatics analysis, the authors identify potential microbial taxa and the associated functions enriched in the conditions of interest.

      Strengths:

      In general, I think this approach was very much needed in the field as an alternative to SynComs, which are still not readily usable in croplands. This work sets the grounds for future similar approaches, using different stresses and different host plants.

      In this work, the experimental setup is well thought-through and well-replicated. In addition, an exhaustive set of preliminary experiments was performed before deciding on the final panel of soils to use and scoring methodology. The figures are clear and well-explained.

      Weaknesses:

      One of the more unexpected results is that sterile/non-inoculated calcined clay also tends to enrich similar microbes, and the authors did extensive work exploring possible sources and microbial dispersal within the growth chamber. In a future experiment, the work would benefit from including a truly sterile control (same growth chamber but completely isolated from possible contaminations). In this regard, the reader may get to wonder whether these efforts are necessary at all (selection experiments), since plants seem to get from their environment what they need to survive. This is discussed across the paper but not directly addressed and I think the manuscript would benefit from a clear argument for or against this idea.

    1. eLife assessment

      This article is a valuable addition to the growing literature on the developmental patterning of insect wings. Using CRISPR mutagenesis and localization of mRNA, the authors present solid evidence that the transcription factor Mirror is necessary for specifying the morphological identity of the most posterior regions of butterfly wings. The manuscript would benefit from more careful use of terminology and appropriate citation of related Drosophila literature, and there are also some concerns about whether the phenotype represents transformation or loss which might be clarified through a closer look at ultrastructure. With a clearer presentation of terminology, this paper would be of general interest to developmental and evolutionary biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

    3. Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told. There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good pan-species anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach as mentioned by the editor. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below.

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally.

      Thank you for your comments on this issue.

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often leads to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to better understand the stacked regression models used to ensure that these models are not overfit.

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.

      From Methods: “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009).

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.”

      Second, for MRI processing procedures, we included the following statements.

      From Methods: “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “ Sets of Features 1-10: Task fMRI contrast (Task Contrast) Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “

      “ Sets of Features 11-13: Task fMRI functional connectivity (Task FC) Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task.

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCP-A collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was pre-processed and concatenated across the four runs. We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC.

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established pre-processing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey matter volume, “FS_Tot_WM_Vol” or total cortical white matter volume, “FS_SubCort_GM_Vol” or total subcortical grey matter volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘α’: the greater the α, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l1 ratio=0) or absolute (known as ‘Lasso’; l1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and β is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: α using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘α’ and ‘l1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘α’ leads to similar predictive performance), resulting in different ‘α’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices. “

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. https://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. eLife assessment

      Using anchored phylogenomic analyses, this valuable study sheds new light on the evolutionary history of the plant diet of Belidae weevil beetles and their geographic distribution. Using convincing methodological approaches, the authors suggest a continuous association of certain belid lineages with Araucaria hosts, since the Mesozoic era. While the biogeographical analysis has weaknesses due to uncertainties in vicariance explanations, the study overall offers contributions to understanding the evolutionary dynamics of Belidae and provides novel insights into ancient community ecology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    2. eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. The work will be of interest to anyone studying natural genetic variation as well as the response of plants to altered CO2 levels in the atmosphere.

    3. Reviewer #1 (Public Review):

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis.

      Comments on current version:

      I appreciate the revisions and the effort the authors have made.

      Most of the abstract now accurately reflects the results and methods. It would be nice to have a few more technical details in the abstract, such as:<br /> * What was the CO2 level?<br /> * Which gene was identified?

      I still have a problem with this sentence:

      "The elevation of atmospheric CO2 leads to a decline in plant mineral content, which might pose a significant threat to food security in the coming decades."

      The authors provide a wide range of published studies that support this statement. I fully agree that this is what the literature suggests. However, I think the literature has asked the wrong question.

      In general, these studies addressed the question: Given no time for adaptation, do plants grown under high CO2 have a different mineral composition? The answer is yes.

      But a more important question is: Can plants and food crops adapt in time? I believe the strength of this study is that it tests this, and it suggests that the answer is yes. I also think there is a lot of unpublished results and greenhouse breeding success that supports the contention that most plants can adapt to the CO2.

      "The artificial elevation of atmospheric CO2 leads to a physiological response and decline in plant mineral content, which might pose a significant threat to food security in the coming decades if plants cannot adapt."

      It needs to be made clear throughout the paper when high CO2 levels lead to low mineral composition. These are all artificial manipulations without allowing the plants to adapt to the new environment.

      "The elevation of atmospheric CO2 concentration leads to a decline in the mineral composition of C3 plants (Gojon et al., 2023)." - this is well supported in artificial environments.

      Do wild plants have fewer minerals in their leaves today compared to plants in 1950? This would be great evidence and framing for this experiment.

      Crop plants having lower nitrogen and different mineral compositions over time is substantially a product of breeders initially increasing inputs and then, over the last decade, selecting for higher input efficiency.

      At the end of the introduction or the beginning of the results, please define why the CO2 level was chosen and its context as being at the high end of current predictions.

      "According to the literature, this results in a 20-25% reduction in vitamin C or lycopene and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023), Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules)."

      Thank you for sharing these reviews. These suggest to me that breeders favored the 80% yield bump over other traits. Either there was no breeding, or the breeding focused on other traits. It is important to mention that breeders should include mineral nutrition in their selection index while they maximize yield. Simpler breeding strategies can sometimes heavily favor one trait over others, but cattle breeders today regularly use selection indices that incorporate weights for two dozen traits.

      This study provides nice evidence that an annual weed species is likely to be able to adapt easily to high eCO2. Whether perennial species will be able to adapt in time is clearly a topic that needs to be investigated.

    4. Reviewer #2 (Public Review):

      The research uses a large collection of Arabidopsis thaliana accessions from various geographic scales to investigate the natural genetic variation underlying the response of ionome (elemental) composition to elevated CO2 (eCO2), a concern for future food security. While most accessions show a decrease in elemental accumulation, the authors demonstrate a wide variety of responses to eCO2 across the diversity of Arabidopsis, including lines that increase elemental content in eCO2. The demonstration of genetic diversity in eCO2 response is a significant contribution to our understanding of this important phenomenon.

      Comments on revised version:

      The authors made significant improvements in the manuscript from the original preprint, and the conclusions are now well supported by the evidence presented.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments and suggestions. We have prepared a revised manuscript with updated quantification of theta cycle skipping, new statistical comparisons of the difference between the two behavioral tasks, and general improvements to the text and figures.

      Reviewer #1 (Public Review):

      Summary

      The authors provide very compelling evidence that the lateral septum (LS) engages in theta cycle skipping.

      Strengths

      The data and analysis are highly compelling regarding the existence of cycle skipping.

      Weaknesses

      The manuscript falls short on in describing the behavioral or physiological importance of the witnessed theta cycle skipping, and there is a lack of attention to detail with some of the findings and figures:

      More/any description is needed in the article text to explain the switching task and the behavioral paradigm generally. This should be moved from only being in methods as it is essential for understanding the study.

      Following this suggestion, we have expanded the description of the behavioral tasks in the Results section.

      An explanation is needed as to how a cell can be theta skipping if it is not theta rhythmic.

      A cell that is purely theta skipping (i.e., always fires on alternating theta cycles and never on adjacent theta cycles) will only have enhanced power at half theta frequency and not at theta frequency. Such a cell will therefore not be considered theta rhythmic in our analysis. Note, however, that there is a large overlap between theta rhythmic and theta skipping cell populations in our data (Figure 3 - figure supplement 2), indicating that most cells are not purely theta skipping.

      The most interesting result, in my opinion, is the last paragraph of the entire results section, where there is more switching in the alternation task, but the reader is kind of left hanging as to how this relates to other findings. How does this relate to differences in decoding of relative arms (the correct or incorrect arm) during those theta cycles or to the animal's actual choice? Similarly, how does it relate to the animal's actual choice? Is this phenomenon actually behaviorally or physiologically meaningful at all? Does it contribute at all to any sort of planning or decision-making?

      We agree that the difference between the two behavioral tasks is very interesting. It may provide clues about the mechanisms that control the cycle-by-cycle expression of possible future paths and the potential impact of goal-directed planning and (recent) experience. In the revised manuscript, we have expanded the analysis of the differences in theta-cycle dynamics between the two behavioral tasks. First, we confirm the difference through a new quantification and statistical comparison. Second, we performed additional analyses to explore the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal (Figure 11 – figure supplements 2 and 3), but this did not appear to be the case. However, these results provide a starting point for future studies to clarify the task dependence of the theta- cycle dynamics of spatial representations and to address the important question of behavioral/physiological relevance.

      The authors state that there is more cycle skipping in the alternation task than in the switching task, and that this switching occurs in the lead-up to the choice point. Then they say there is a higher peak at ~125 in the alternation task, which is consistent. However, in the final sentence, the authors note that "This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to reward." Doesn't either arm potentially lead to a reward (but different amounts) in the switching task, not the alternation task? Yet switching is stronger in the alternation task, which is not constant and contradicts this last sentence.

      The reviewer is correct that both choices lead to (different amounts of) reward in the switching task. As written, the sentence that the reviewer refers to is indeed not accurate and we have rephrased it to: “This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to a desirable high-value reward.”.

      Additionally, regarding the same sentence - "representations of the goal arms alternate more strongly ahead of the choice point when the animals performed a task in which either goal arm potentially leads to reward." - is this actually what is going on? Is there any reason at all to think this has anything to do with reward versus just a navigational choice?

      We appreciate the reviewer’s feedback and acknowledge that our statement needs clarification. At the choice point in the Y-maze there are two physical future paths available to the animal (disregarding the path that the animal took to reach the choice point) – we assume this is what the reviewer refers to as “a navigational choice”. One hypothesis could be that alternation of goal arm representations is present whenever there are multiple future paths available, irrespective of the animal’s (learned) preference to visit one or the other goal arm. However, the reduced alternation of goal arm representations in the switching task that we report, suggests that the animal’s recent history of goal arm visits and reward expectations likely do influence the theta-cycle representations ahead of the choice point. We have expanded our analysis to test if theta cycle dynamics differ for trials before and after a switch in reward contingency in the switching task, but there was no statistical difference in our data. We have rewritten and expanded this part of the results to make our point more clearly.

      Similarly, the authors mention several times that the LS links the HPC to 'reward' regions in the brain, and it has been found that the LS represents rewarded locations comparatively more than the hippocampus. How does this relate to their finding?

      Indeed, Wirtshafter and Wilson (2020) reported that lateral septum cells are more likely to have a place field close to a reward site than elsewhere in their double-sided T-maze. It is possible that this indicates a shift towards reward or value representations in the lateral septum. In our study we did not look at reward-biased cells and whether they are more or less likely to engage in theta cycle skipping. This could be a topic for future analyses. It should be noted that the study by Wirtshafter and Wilson (2020) reports that a reward bias was predominantly present for place fields in the direction of travel away from the reward site. These reward-proximate LS cells may thus contribute to theta-cycle skipping in the inbound direction, but it is not clear if these cells would be active during theta sweeps when approaching the choice point in the outbound direction.

      Reviewer #2 (Public Review)

      Summary

      Recent evidence indicates that cells of the navigation system representing different directions and whole spatial routes fire in a rhythmic alternation during 5-10 Hz (theta) network oscillation (Brandon et al., 2013, Kay et al., 2020). This phenomenon of theta cycle skipping was also reported in broader circuitry connecting the navigation system with the cognitive control regions (Jankowski et al., 2014, Tang et al., 2021). Yet nothing was known about the translation of these temporally separate representations to midbrain regions involved in reward processing as well as the hypothalamic regions, which integrate metabolic, visceral, and sensory signals with the descending signals from the forebrain to ensure adaptive control of innate behaviors (Carus-Cadavieco et al., 2017). The present work aimed to investigate theta cycle skipping and alternating representations of trajectories in the lateral septum, neurons of which receive inputs from a large number of CA1 and nearly all CA3 pyramidal cells (Risold and Swanson, 1995). While spatial firing has been reported in the lateral septum before (Leutgeb and Mizumori, 2002, Wirtshafter and Wilson, 2019), its dynamic aspects have remained elusive. The present study replicates the previous findings of theta-rhythmic neuronal activity in the lateral septum and reports a temporal alternation of spatial representations in this region, thus filling an important knowledge gap and significantly extending the understanding of the processing of spatial information in the brain. The lateral septum thus propagates the representations of alternative spatial behaviors to its efferent regions. The results can instruct further research of neural mechanisms supporting learning during goal-oriented navigation and decision-making in the behaviourally crucial circuits entailing the lateral septum.

      Strengths

      To this end, cutting-edge approaches for high-density monitoring of neuronal activity in freely behaving rodents and neural decoding were applied. Strengths of this work include comparisons of different anatomically and probably functionally distinct compartments of the lateral septum, innervated by different hippocampal domains and projecting to different parts of the hypothalamus; large neuronal datasets including many sessions with simultaneously recorded neurons; consequently, the rhythmic aspects of the spatial code could be directly revealed from the analysis of multiple spike trains, which were also used for decoding of spatial trajectories; and comparisons of the spatial coding between the two differently reinforced tasks.

      Weaknesses

      Possible in principle, with the present data across sessions, longitudinal analysis of the spatial coding during learning the task was not performed. Without using perturbation techniques, the present approach could not identify the aspects of the spatial code actually influencing the generation of behaviors by downstream regions.

      Reviewer #3 (Public Review)

      Summary

      Bzymek and Kloosterman carried out a complex experiment to determine the temporal spike dynamics of cells in the dorsal and intermediate lateral septum during the performance of a Y-maze spatial task. In this descriptive study, the authors aim to determine if inputting spatial and temporal dynamics of hippocampal cells carry over to the lateral septum, thereby presenting the possibility that this information could then be conveyed to other interconnected subcortical circuits. The authors are successful in these aims, demonstrating that the phenomenon of theta cycle skipping is present in cells of the lateral septum. This finding is a significant contribution to the field as it indicates the phenomenon is present in neocortex, hippocampus, and the subcortical hub of the lateral septal circuit. In effect, this discovery closes the circuit loop on theta cycle skipping between the interconnected regions of the entorhinal cortex, hippocampus, and lateral septum. Moreover, the authors make 2 additional findings: 1) There are differences in the degree of theta modulation and theta cycle skipping as a function of depth, between the dorsal and intermediate lateral septum; and 2) The significant proportion of lateral septum cells that exhibit theta cycle skipping, predominantly do so during 'non-local' spatial processing.

      Strengths

      The major strength of the study lies in its design, with 2 behavioral tasks within the Y-maze and a battery of established analyses drawn from prior studies that have established spatial and temporal firing patterns of entorhinal and hippocampal cells during these tasks. Primary among these analyses, is the ability to decode the animal's position relative to locations of increased spatial cognitive demand, such as the choice point before the goal arms. The presence of theta cycle skipping cells in the lateral septum is robust and has significant implications for the ability to dissect the generation and transfer of spatial routes to goals within and between the neocortex and subcortical neural circuits.

      Weaknesses

      There are no major discernable weaknesses in the study, yet the scope and mechanism of the theta cycle phenomenon remain to be placed in the context of other phenomena indicative of spatial processing independent of the animal's current position. An example of this would be the ensemble-level 'scan ahead' activity of hippocampal place cells (Gupta et al., 2012; Johnson & Redish, 2007). Given the extensive analytical demands of the study, it is understandable that the authors chose to limit the analyses to the spatial and burst firing dynamics of the septal cells rather than the phasic firing of septal action potentials relative to local theta oscillations or CA1 theta oscillations. Yet, one would ideally be able to link, rather than parse the phenomena of temporal dynamics. For example, Tingley et al recently showed that there was significant phase coding of action potentials in lateral septum cells relative to spatial location (Tingley & Buzsaki, 2018). This begs the question as to whether the non-uniform distribution of septal cell activity within the Y-maze may have a phasic firing component, as well as a theta cycle skipping component. If so, these phenomena could represent another means of information transfer within the spatial circuit during cognitive demands. Alternatively, these phenomena could be part of the same process, ultimately representing the coherent input of information from one region to another. Future experiments will therefore have to sort out whether theta cycle skipping, is a feature of either rate or phase coding, or perhaps both, depending on circuit and cognitive demands.

      The authors have achieved their aims of describing the temporal dynamics of the lateral septum, at both the dorsal extreme and the intermediate region. All conclusions are warranted.

      Reviewer #1 (Recommendations For The Authors)

      The text states: "We found that 39.7% of cells in the LSD and 32.4% of cells in LSI had significantly higher CSI values than expected by chance on at least one of the trajectories." The text in the supplemental figure indicates a p-value of 0.05 was used to determine significance. However, four trajectory categories are being examined so a Bonferroni correction should be used (significance at p<0.0125).

      Indeed, a p-value correction for multiple tests should be performed when determining theta cycle skipping behavior for each of the four trajectories. We thank the reviewer for pointing out this oversight. We have implemented a Holm-Sidak p-value correction for the number of tested trajectories per cell (excluding trajectories with insufficient spikes). As a consequence, the number of cells with significant cycle-skipping activity decreased, but overall the results have not changed.

      Figure 4 is very confusing as raster plots are displayed for multiple animals but it is unclear which animal the LFP refers to? The bottom of the plot is also referenced twice in the figure caption.

      We apologize for the confusion. We have removed this figure in the revised manuscript, as it was not necessary to make the point about the spatial distribution of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2) and we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells in Figure 5A.

      Figure 6 has, I think, an incorrect caption or figure. Only A and B are marked in the figure but A-G are mentioned in the caption but do not appear to correspond to anything in the figure.

      Indeed, the caption was outdated. This has now been corrected.

      Figure 8 is also confusing for several reasons: how is the probability scale on the right related to multiple semi-separate (top and middle) figures? In the top and bottom figures, it is not clear what the right and left sides refer to. It is also unclear why a probability of 0.25 is used for position (seems potentially low). The caption also mentions Figure A but there are no lettered "sub" figures in Figure 8.

      The color bar on the right applies to both the top plot (directional decoding) and the middle plot (positional decoding). However, the maximum probability that is represented by black differs between the top and middle plots. We acknowledge that a shared color bar may lead to confusion and we have given each of the plots a separate color bar.

      As for the maximum probability of 0.25 for position: this was a typo in the legend. The correct maximum value is 0.5. In general, the posterior probability will be distributed over multiple (often neighboring) spatial bins, and the distribution of maximum probabilities will depend on the number of spatial bins, the level of spatial smoothing in the decoding algorithm, and the amount of decodable information in the data. It would be more appropriate to consider the integrated probability over a small section of the maze, rather than the peak probability that is assigned to a single 5 cm bin. Also, note that a posterior probability of 0.5 is many times higher than the probability associated with a uniform distribution, which is in our case.

      The left and right sides of the plots represent two different journeys that the animal ran. On the left an outbound journey is shown, and on the right an inbound journey. We have improved the figure and the description in the legend to make this clearer.

      The reviewer is correct that there are no panels in Figure 8 and we have corrected the legend.

      Some minor concerns

      The introduction states that "a few studies have reported place cell-like activity in the lateral septum (Tingley and Buzsaki, 2018; Wirtshafter and Wilson, 2020, 2019)." However, notably and controversially, the Tingley study is one of the few studies to find NO place cell activity in the lateral septum. This is sort of mentioned later but the citation in this location should be removed.

      The reviewer is correct, Tingley and Buzsaki reported a spatial phase code but no spatial rate code. We have removed the citation.

      Stronger position/direction coding in the dLS consistent with prior studies and they should be cited in text (not a novel finding).

      Thank you for pointing out this omission. Indeed, a stronger spatial coding in the dorsal lateral septum has been reported before, for example by Van der Veldt et al. (2021). We now cite this paper when discussing these findings.

      Why is the alternation task administered for 30m but the switching task for 45m?

      The reason is that rats received a larger reward in the switching task (in the high-reward goal arm) and took longer to complete trials on average. To obtain a more-or-less similar number of trials per session in both tasks, we extended the duration of switching task sessions to 45 minutes. We have added this explanation to the text.

      Regarding the percentage of spatially modulated cells in the discussion, it is also worth pointing out that bits/sec information is consistent with previous studies.

      Thank you for the suggestion. We now point out that the spatial information in our data is consistent with previous studies.

      Reviewer #2 (Recommendations For The Authors)

      While the results of the study are robust and timely, further details of behavioural training, additional quantitative comparisons, and improvements in the data presentation would make the study more comprehensible and complete.

      Major comments

      (1) I could not fully comprehend the behavioural protocols. They require a clearer explanation of both the specific rationale of the two tasks as well as a more detailed presentation of the protocols. Specifically:

      (1.1) In the alternation task, were the arms baited in a random succession? How many trials were applied per session? Fig 1D: how could animals reach high choice accuracy if the baiting was random?

      We used a continuous version of the alternation task, in which the animals were rewarded for left→home→right and right→home→left visit sequences. In addition, animals were always rewarded on inbound journeys. There was no random baiting of goal arms. Perhaps the confusion stems from our use of the word “trial” to refer to a completed lap (i.e., a pair of outbound/inbound journeys). On average, animals performed 54 of such trials per 30-minute session in the alternation task. We have expanded the description of the behavioral tasks in the Results and further clarified these points in the Methods section.

      (1.2) Were they rewarded for correct inbound trials? If there was no reward, why were they considered correct?

      Yes, rats received a reward at the home platform for correct inbound trials. We have now explicitly stated this in the text.

      (1.3) In the switch alternation protocol, for how many trials was one arm kept more rewarding than the other, and how many trials followed after the rewarding value switch?

      A switch was triggered when rats (of their own volition) visited the high-reward goal arm eight times in a row. Following a switch, the animals could complete as many trials as necessary until they visited the new high- reward goal arm in eight consecutive trials, which triggered another switch. As can be seen in Figure 1D, at the population level, animals needed ~13 trials to fully commit to the high-reward goal arm following a switch. We have further clarified the switching task protocol in the Results and Methods sections.

      (1.4) What does the phrase "the opposite arm (as 8 consecutive visits)" exactly mean? Sounds like 8 consecutive visits signalled that the arm was rewarded (as if were not predefined in the protocol).

      The task is self-paced and the animals initially visit both goal arms, before developing a bias for the high- reward goal arm. A switch of reward size was triggered as soon as the animal visited the high-reward goal arm for eight consecutive trials. We have rewritten the description of the switching task protocol, including this sentence, which hopefully clarifies the procedure.

      (1.5) P. 15, 1st paragraph, Theta cycle skipping and alternation of spatial representations is more prominent in the alternation task. Why in the switching task, did rats visit the left and right arms approximately equally often if one was more rewarding than the other? How many switches were applied per recording session, and how many trials were there in total?

      Both the left and right goal arms were sampled more or less equally by the animals because both goal arms at various times were associated with a large reward following switches in reward values during sessions. The number of switches per session varied from 1 to 3. Sampling of both goal arms was also evident at the beginning of each session and following each reward value switch, before animals switched their behavior to the (new) highly rewarded goal arm. In Table 1, we have now listed the number of trials and the number of reward-value switches for all sessions.

      (1.6) Is the goal arm in figures the rewarded/highly rewarded arm only or are non-baited arms also considered here?

      Both left and right arms are considered goal arms and were included in the analyses, irrespective of the reward that was received (or not received).

      (2) The spatial navigation-centred behavioural study design and the interpretation of results highlight the importance of the dorsal hippocampal input to the LS. Yet, the recorded LSI cells are innervated by intermediate and ventral aspects of the hippocampus, and LS receives inputs from the amygdala and the prefrontal cortex, which together may together bring about - crucial for the adaptive behaviours regulated by the LS - reward, and reward-prediction-related aspects in the firing of LS cells during spatial navigation. Does success or failure to acquire reward in a trial modify spatial coding and cycle skipping of LSD vs. LSI cells in ensuing inbound and outbound trials?

      This is an excellent question and given the length of the current manuscript, we think that exploration of this question is best left for a future extension of our study.

      A related question: in Figure 10, it is interesting that cycle skipping is prominent in the goal arm for outbound switching trials and inbound trials of both tasks. Could it be analytically explained by task contingencies and behaviour (e.g. correct/incorrect trial, learning dynamics, running speed, or acceleration)?

      Our observation of cycle skipping at the single-cell level in the goal arms is somewhat surprising and, we agree with the reviewer, potentially interesting. However, it was not accompanied by alternation of representations at the population level. Given the current focus and length of the manuscript, we think further investigation of cycle skipping in the goal arm is better left for future analyses.

      (3) Regarding possible cellular and circuit mechanisms of cycle skipping and their relation to the alternating representations in the LS. Recent history of spiking influences the discharge probability; e.g. complex spike bursts in the hippocampus are associated with a post-burst delay of spiking. In LS, cycle skipping was characteristic for LS cells with high firing rates and was not uniformly present in all trajectories and arms. The authors propose that cycle skipping can be more pronounced in epochs of reduced firing, yet the opposite seems also possible - this phenomenon can be due to an intermittently increased drive onto some LS cells. Was there a systematic relationship between cycle skipping in a given cell and the concurrent firing rate or a recent discharge with short interspike intervals?

      In our discussion, we tried to explain the presence of theta cycle skipping in the goal arms at the single-cell level without corresponding alternation dynamics at the population level. We mentioned the possibility of a decrease in excitatory drive. As the reviewer suggests, an increase in excitatory drive combined with post- burst suppression or delay of spiking is an alternative explanation. We analyzed the spatial tuning of cells with theta cycle skipping and found that, on average, these cells have a higher firing rate in the goal arm than the stem of the maze in both outbound and inbound run directions (Figure 5 – figure supplement 1). In contrast, cells that do not display theta cycle skipping do not show increased firing in the goal arm. These results are more consistent with the reviewer’s suggested mechanism and we have updated the discussion accordingly.

      (4) Were the differences between the theta modulation (cycle skipping) of local vs. non-local representations (P.14, line 10-12, "In contrast...", Figure 9A) and between alternation vs. switching tasks (Figure 10 C,D) significantly different?

      We have added quantification and statistical comparisons for the auto- and cross-correlations of the local/non-local representations. The results indeed show significantly stronger theta cycle skipping of the non-local representations as compared to the local representations (Figure 10 - figure supplement 1A), a stronger alternation of non-local representations in the outbound direction (Figure 10 - figure supplement 1B), and significant differences between the two tasks (Figure 11E,F).

      (5) Regarding the possibility of prospective coding in LS, is the accurate coding of run direction not consistent with prospective coding? Can the direction be decoded from the neural activity in the start arm? Are the cycling representations of the upcoming arms near the choice point equally likely or preferential for the then- selected arm?

      The coding of run direction (outbound or inbound) is distinct from the prospective/retrospective coding of the goal arm. As implemented, the directional decoding model does not differentiate between the two goal arms and accurate decoding of direction with this model can not inform us whether or not there is prospective (or retrospective) coding. To address the reviewer’s comments, we performed two additional analyses. First, we analyzed the directional (outbound/inbound) decoding performance as a function of location in the maze (Figure 6 - figure supplement 3E). The results show that directional decoding performance is high in both stem and goal arms. Second, we analyzed how well we can predict the trajectory type (i.e., to/from the left or right goal arm) as a function of location in the maze, and separately for outbound and inbound trajectories (Figure 6 - figure supplement 3C,D). The results show that on outbound journeys, decoding the future goal arm is close to chance when the animals are running along the stem. The decoding performance goes up around the choice point and reaches the highest level when animals are in the goal arm.

      (6) Figure 10 seems to show the same or similar data as Figures 5 (A,B) and 9 (C,D).

      Figure 10 (figure 11 in revised manuscript) re-analyzes the same data as presented in Figures 5 and 9, but separates the experimental sessions according to the behavioral task. We now explicitly state this.

      Minor comments

      (1) If cycle skipping in the periodicity of non-local representations was more prominent in alternation than in the switching task, one might expect them to be also prominent in early trials of the switching task, when the preference of a more rewarding arm is not yet established. Was this the case?

      The reviewer makes an interesting suggestion. Indeed, if theta cycle skipping and the alternation of non-local representations reflect that there are multiple paths that the animal is considering, one may predict that the theta skipping dynamics are similar between the two tasks in early trials (as the reviewer suggests). Similarly, one may predict that in the switching task, the alternation of non-local representations is weaker immediately before a reward contingency switch (when the animal has developed a bias towards the goal arm with a large reward) as compared to after the switch.

      We have now quantified the theta cycle dynamics of spatial representations in the early trials in each session of both tasks (Figure 11 - figure supplement 2) and in the trials before and after each switch in the switching task (Figure 11 - figure supplement 3).

      The results of the early trial analysis indicate stronger alternation of non-local representations in the alternation task than in the switching task (consistent with the whole session analysis), which is contrary to the prediction.

      The pre-/post-switch analysis did not reveal a significant difference between the trials before and after a reward contingency switch. If anything, there was a trend towards stronger theta cycle skipping/alternation in the trials before a switch, which would be opposite to the prediction.

      These results do not appear to support the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal. We have updated the text to incorporate these new data and discuss the implications.

      (2) Summary: sounds like the encoding of spatial information and its readout in the efferent regions are equally well established.

      Thank you for pointing this out.

      (3) Summary: "motivation and reward processing centers such as the ventral tegmental area." How about also mentioning here the hypothalamus, which is a more prominent output of the lateral septum than the VTA?

      We have now also mentioned the hypothalamus.

      (4) "lateral septum may contribute to the hippocampal theta" - readers not familiar with details of the medial vs. lateral septum research may misinterpret the modest role of LS in theta compared to MS.

      We have added “in addition to the strong theta drive originating from the medial septum” to make clear that the lateral septum has a modest role in hippocampal theta generation.

      (5) "(Tingley and Buzsáki, 2018) found a lack of spatial rate coding in the lateral septum and instead reported a place coding by specific phases of the hippocampal theta rhythm (Rizzi-Wise and Wang, 2021) " needs rephrasing.

      Thank you, we have rephrased the sentence.

      (6) Figure 4 is a bit hard to generalize. The authors may additionally consider a sorted raster presentation of the dataset in this main figure.

      We have removed this figure in the revised manuscript, as it was not necessary to make the point about the location of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2), and, following the reviewer’s suggestion, we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells (Figure 5A).

      (7) It would help if legends of Figure 5 (and related supplementary figures) state in which of the two tasks the data was acquired, as it is done for Figure 10.

      Thank you for the suggestion. The legends of Figure 4A,B (formerly Figure 5 – supplemental figures 1 and 2) and Figure 5 now include in which behavioral task the data was acquired.

      (8) Page 10, "Spatial coding...", 1st Citing the initial report by Leugeb and Mizumori would be appropriate here too.

      The reviewer is correct. We have added the citation.

      (9) The legend in Figure 6 (panels A-G) does not match the figure (only panels A,B). What is shown in Fig. 6B, the legend does not seem to fully match.

      Indeed, the legend was outdated. This has now been corrected.

      (10) 7 suppl., if extended to enable comparisons, could be a main figure. Presently, Figure 7C does not account for the confounding effect of population size and is therefore difficult to interpret without complex comparisons with the Supplementary Figure which is revealing per se.

      We thank the reviewer for their suggestion. We have changed Figure 7 such that it only shows the analysis of decoding performed with all LSD and LSI cells. Figure 7 – supplemental figure 1 has been transformed into main Figure 8, with the addition of a panel to show a statistical comparison between decoding performance in LSD and LSI with a fixed number of cells.

      (11) 14, line 10 there is no Figure 8A

      This has been corrected.

      (12) 15 paragraph 1, is the discussed here model the one from Kay et al?

      From Kay et al. (2020) and also Wang et al. (2020). We have added the citations.

      (13) Figure 5 - Figure Supplement 1 presents a nice analysis that, in my view, can merit a main figure. I could not find the description of the colour code in CSI panels, does grey/red refer to non/significant points?

      Indeed, grey/red refers to non-significant points and significant points respectively. We have clarified the color code in the figure legend. Following the reviewer’s suggestion, we have made Figure 5 Supplement 1 and 2 a main figure (Figure 4).

      (14) Figure 5 -Figure Supplement 2. Half of the cells (255 and 549) seems not to be representative of the typically high SCI in the goal arm in left and right inbound trials combined (Figure 5 A). Were the changes in CSI in the right and left inbound trials similar enough to be combined in Fig 5A? Otherwise, considering left and right inbound runs separately and trying to explain where the differences come from would seem to make sense.

      Figure 5 – figure supplement 2 is now part of the new main Figure 4. Originally, the examples were from a single session and the same cells as shown in the old Figure 4. However, since the old Figure 4 has been removed, we have selected examples from different sessions and both left/right trajectories that are more representative of the overall distribution. We have further added a plot with the spatially-resolved cycle skipping for all analyzed cells in Figure 5A.

      (15) In the second paragraph of the Discussion, dorso-ventral topography of hippocampal projections to the LS (Risold and Swanson, Science, 90s) could be more explicitly stated here.

      Thank you for the suggestion. We have now explicitly mentioned the dorsal-ventral topography of hippocampal-lateral septum projections and cite Risold & Swanson (1997).

      (16) Discussion point: why do the differences in spatial information of cells in the ventral/intermediate vs. dorsal hippocampus not translate into similarly prominent differences in LSI vs. LSD?

      In our data, we do observe clear differences in spatial coding between LSD and LSI. Specifically, cell activity in the LSD is more directional, has higher goal arm selectivity, and higher spatial information (we have now added statistical comparisons to Figure 6 – figure supplement 1). As a result, spatial decoding performance is much better for LSD cell populations than LSI cell populations (see updated Figure 8, with statistical comparison of decoding performance). Spatial coding in the LS is not as strong as in the hippocampus, likely because of the convergence of hippocampal inputs, which may give the impression of a less prominent difference between the two subregions.

      (17) Discussion, last paragraph: citation of the few original anatomical and neurophysiological studies would be fitting here, in addition to the recent review article.

      Thank you for the suggestion. We have added selected citations of the original literature.

      (18) Methods, what was the reference electrode?

      We used an external reference electrode that was soldered to a skull screw, which was positioned above the cerebellum. We have added this to the Methods section.

      (19) Methods, Theta cycle skipping: bandwidth = gaussian kerner parameter?

      The bandwidth is indeed a parameter of the Gaussian smoothing kernel and is equal to the standard deviation.

      Reviewer #3 (Recommendations For The Authors)

      Below I offer a short list of minor comments and suggestions that may benefit the manuscript.

      (A) I was not able to access the Open Science Framework Repository. Can this be rectified?

      Thank you for checking the OSF repository. The data and analysis code are now publicly available.

      (B) In the discussion the authors should attempt to flesh out whether they can place theta cycle skipping into context with left/right sweeps or scan ahead phenomena, as shown in the Redish lab.

      Thank you for the excellent suggestion. We have now added a discussion of the possible link between theta cycle skipping and the previously reported scan-ahead theta sweeps.

      (C) What is the mechanism of cycle skipping? This could be relevant to intrinsic vs network oscillator models. Reference should also be made to the Deshmukh model of interference between theta and delta (Deshmukh, Yoganarasimha, Voicu, & Knierim, 2010).

      We had discussed a potential mechanism in the discussion (2nd to last paragraph in the revised manuscript), which now includes a citation of a recent computational study (Chu et al., 2023). We have now also added a reference to the interference model in Deshmukh et al, 2010.

      (D) Little background was given for the motivation and expectation for potential differences between the comparison of the dorsal and intermediate lateral septum. I don't believe that this is the same as the dorsal/ventral axis of the hippocampus, but if there's a physiological justification, the authors need to make it.

      We have added a paragraph to the introduction to explain the anatomical and physiological differences across the lateral septum subregions that provide our rationale for comparing dorsal and intermediate lateral septum (we excluded the ventral lateral septum because the number of cells recorded in this region was too low).

      (E) It would help to label "outbound" and "inbound" on several of the figures. All axes need to be labeled, with appropriate units indicated.

      We have carefully checked the figures and added inbound/outbound labels and axes labels where appropriate.

      (F) In Figure 6, the legend doesn't match the figure.

      Indeed, the legend was outdated. This has now been corrected.

      (G) The firing rate was non-uniform across the Y-maze. Does this mean that the cells tended to fire more in specific positions of the maze? If so, how would this affect the result? Would increased theta cycle skipping at the choice point translate to a lower firing rate at the choice point? Perhaps less overdispersion of the firing rate (Fenton et al., 2010)?

      Individual cells indeed show a non-uniform firing rate across the maze. To address the reviewer’s comment and test if theta cycle skipping cells were active preferentially near the choice point or other locations, we computed the mean-corrected spatial tuning curves for cell-trajectory pairs with and without significant theta cycle skipping. This additional analysis indicates that, on average, the population of theta cycle skipping cells showed a higher firing rate in the goal arms than in the stem of the maze as compared to non-skipping cells for outbound and inbound directions (shown in Figure 5 - figure supplement 1).

      (H) As mentioned above, it could be helpful to look at phase preference. Was there an increased phase preference at the choice point? Would half-cycle firing correlate with an increased or decreased phase preference? Based on prior work, one would expect increased phase preference, at least in CA1, at the choice point (Schomburg et al., 2014). In contrast, other work might predict phasic preference according to spatial location (Tingley & Buzsaki, 2018). Including phase analyses is a suggestion, of course. The manuscript is already sufficiently novel and informative. Yet, the authors should state why phase was not analyzed and that these questions remain for follow-up analyses. If the authors did analyze this and found negative results, it should be included in this manuscript.

      We thank the reviewer for their suggestion. We have not yet analyzed the theta phase preference of lateral septum cells or other relations to the theta phase. We agree that this would be a valuable extension of our work, but prefer to leave it for future analyses.

      (I) One of the most important aspects of the manuscript, is that there is now evidence of theta cycle skipping in the circuit loop between the EC, CA1, and LS. This now creates a foundation for circuit-based studies that could dissect the origin of route planning. Perhaps the authors should state this? In the same line of thinking, how would one determine whether theta cycle skipping is necessary for route planning as opposed to a byproduct of route planning? While this question is extremely complex, other studies have shown that spatial navigation and memory are still possible during the optogenetic manipulation of septal oscillations (Mouchati, Kloc, Holmes, White, & Barry, 2020; Quirk et al., 2021). However, pharmacological perturbation or lesioning of septal activity can have a more profound effect on spatial navigation (Bolding, Ferbinteanu, Fox, & Muller, 2019; Winson, 1978). As a descriptive study, I think it would be helpful to remind the readers of these basic concepts.

      We thank the reviewer for their comment and for pointing out possible future directions for linking theta cycle skipping to route planning. Experimental manipulations to directly test this link would be very challenging, but worthwhile to pursue. We now mention how circuit-based studies may help to test if theta cycle skipping in the broader subcortical-cortical network is necessary for route planning. Given that the discussion is already quite long, we decided to omit a more detailed discussion of the possible role of the medial septum (which is the focus of the papers cited by the reviewer).

      Very minor points

      (A) In the introduction, "one study" begins the sentence but there is a second reference.

      Thank you, we have rephrased the sentence.

      (B) Also in the introduction, it could be helpful to have an operational definition of theta cycle skipping (i.e., 'enhanced rhythmicity at half theta frequency').

      We followed the reviewer’s suggestion.

      (C) The others should be more explicit in the introduction about their main question. Theta cycle skipping exists in CA1, and then import some of the explanations mentioned in the discussion to the introduction (i.e., attractors states of multiple routes). The main question is then whether this phenomenon, and others from CA1, translate to the output in LS.

      We have edited the introduction to more clearly state the main question of our study, following the suggestion from the reviewer.

      (D) There are a few instances of extra closing parentheses.

      We checked the text but did not find instances of erroneous extra closing parentheses. There are instances of nested parentheses, which may have given the impression that closing parentheses were duplicated.

      (E) The first paragraph of the Discussion lacks sufficient references.

      We have now added references to the first paragraph of the discussion.

      (F) At the end of the 2nd paragraph in the Discussion, the comparison is missing. More than what? It's not until the next reference that one can assume that the authors are referring to a dorsal/ventral axis. However, the physiological motivation for this comparison is lacking. Why would one expect a dorsal/intermediate continuum for theta modulation as there is along the dorsal/ventral axis of the hippocampus?

      Thank you for spotting this omission. We have rewritten the paragraph to more clearly make the parallel between dorsal-ventral gradients in the lateral septum and hippocampus and how this relates to the topographical connections between the two structures.

    2. eLife assessment

      In this study, the authors present convincing evidence to demonstrate theta cycle skipping by individual neurons of the lateral septum, which they then relate to population coding of future trajectories encapsulated by theta cycles. This valuable finding furthers our understanding of how the septum conveys navigational information downstream.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors provide very compelling evidence that the lateral septum (LS) engages in theta cycle skipping.

      Strengths:

      The data and analysis is highly compelling regarding the existence of cycle skipping.

      Comments on the revised version:

      All previous recommendations were addressed in this revision.

    4. Reviewer #2 (Public Review):

      Summary

      Recent evidence indicates that cells of the navigation system representing different directions and whole spatial routes fire in a rhythmic alternation during 5-10 Hz (theta) network oscillation (Brandon et al., 2013, Kay et al., 2020). This phenomenon of theta cycle skipping was also reported in broader circuitry connecting the navigation system with the cognitive control regions (Jankowski et al., 2014, Tang et al., 2021). Yet nothing was known about the translation of these temporally separate representations to midbrain regions involved in reward processing as well as the hypothalamic regions, which integrate metabolic, visceral, and sensory signals with the descending signals from the forebrain to ensure adaptive control of innate behaviors (Carus-Cadavieco et al., 2017). The present work aimed to investigate theta cycle skipping and alternating representations of trajectories in the lateral septum, neurons of which receive inputs from large number of CA1 and nearly all CA3 pyramidal cells (Risold and Swanson, 1995). While spatial firing has been reported in the lateral septum before (Leutgeb and Mizumori, 2002, Wirtshafter and Wilson, 2019), its dynamic aspects have remained elusive. The present study replicates the previous findings of theta-rhythmic neuronal activity in the lateral septum and reports a temporal alternation of spatial representations in this region, thus filling an important knowledge gap and significantly extending the understanding of the processing of spatial information in the brain. The lateral septum thus propagates the representations of alternative spatial behaviors to its efferent regions. The results can instruct further research of neural mechanisms supporting learning during goal-oriented navigation and decision-making in the behaviourally crucial circuits entailing the lateral septum.

      Strengths

      To this end, cutting-edge approaches for high-density monitoring of neuronal activity in freely behaving rodents and neural decoding were applied. Strengths of this work include comparisons of different anatomically and probably functionally distinct compartments of the lateral septum, innervated by different hippocampal domains and projecting to different parts of the hypothalamus; large neuronal datasets including many sessions with simultaneously recorded neurons; consequently, the rhythmic aspects of the spatial code could be directly revealed from the analysis of multiple spike trains, which were also used for decoding of spatial trajectories; and comparisons of the spatial coding between the two differently reinforced tasks.

      Weaknesses

      Without using perturbation techniques, the present approach could not identify the aspects of the spatial code actually influencing the generation of behaviors by downstream regions.

    1. eLife assessment

      This important work identifies a previously uncharacterized capacity for songbird to recover vocal targets even without sensory experience. The evidence supporting this claim is convincing, with technically difficult and innovative experiments exploring goal-directed vocal plasticity in deafened birds. This work has broad relevance to the fields of vocal and motor learning.

    2. Reviewer #1 (Public Review):

      Summary:

      Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables are driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

      Strengths:

      The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

      In this revision, the authors suitably addressed confusion about some statistical methods related to Fig. 4, where the main finding of vocal plasticity in deafened birds was presented.

      There remain minor issues in the presentation early in the results section and in Fig. 4 that should be straightforward to clarify in the revision.

    3. Reviewer #3 (Public Review):

      Summary:

      Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations-that is, modifying vocalizations towards a desired target without the need to learn this modification by practising and comparing sensory feedback of practised behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

      The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, which does not need auditory feedback) and large shifts (through a mechanism that requires auditory feedback).

      The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice and overt-requiring motor practice but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggests that indeed motor practice is required for pitch shifts.

      Strengths:

      The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. Technical hurdles, such as limited sample sizes and unstable song after surgical deafening, make this difficult to test. (Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back towards baseline.)

      This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

      A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

      Weaknesses:

      The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in ability for these two groups to restore pitch to baseline reflects the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. In the Discussion, in response to this point, the authors point out that birds are known to recover their pitch shift if those shifts are driven using electrical stimulation as reinforcement (McGregor et al. 2022); however, it is arguably still relevant to know whether a similar recovery occurs for the "lights off" paradigm used here.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations For The Authors):

      In this revision the authors address some of the key concerns, including clarification of the balanced nature of the RL driven pitch changes and conducting analyses to control for the possible effects of singing quantity on their results. The paper is much improved but still has some sources of confusion, especially around Fig. 4, that should be fixed. The authors also start the paper with a statistically underpowered minor claim that seems unnecessary in the context of the major finding. I recommend the authors may want to restructure their results section to focus on the major points backed by sufficient n and stats.

      Major issues.

      (1) The results section begins very weak - a negative result based on n=2 birds and then a technical mistake of tube clogging re-spun as an opportunity to peak at intermittent song in the otherwise muted birds. The logic may be sound but these issues detract from the main experiment, result, analysis, and interpretation. I recommend re-writing this section to home in on, from the outset, the well-powered results. How much is really gained from the n=2 birds that were muted before ANY experience? These negative results may not provide enough data to make a claim. Nor is this claim necessary to motivate what was done in the next 6 birds. I recommend dropping the claim?

      We thank the reviewer for the recommendation. We moved the information to the Methods.

      (2) Fig. 4 is very important yet remains very confusing, as detailed below.

      Fig. 4a. Can the authors clarify if the cohort of WNd birds that give rise to the positive result in Fig 4 ever experienced the mismatch in the absence of ongoing DAF reinforcement pre-deafening? Fig4a does nor the next clearly specifies this. This is important because we know that there are day timescale delays in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway (Andalman and Fee, 2009). Thus, if birds experienced mismatch pre-deafening in the absence of DAF, then an earnly learning phase in Area X could be set in place. Then deafening occurs, but these weight changes in X could result in LMAN bias that expresses only days later -independent of auditory feedback. Such a process would not require an internal model as the authors are arguing for here. It would simply arise from delays in implementing reinforcement-driven feedback. If the birds in Fig 4 always had DAF on before deafening, then this is not an issue. But if the birds had hours of singing with DAF off before deafening, and therefore had the opportunity to associate DA error signals with the targeted time in the song (e.g. pauses on the far-from-target renditions (Duffy et al, 2022), then the return-to-baseline would be expected to be set in place independent of auditory feedback. Please clarify exactly if the pitch-contingent DAF was on or off in the WNd cohort in the hours before deafening. In Fig. 3b it looks like the answer is yes but I cannot find this clearly stated in the text.

      We did not provide DAF-free singing experience to the birds in Fig. 4 before deafening. Thus, according to the reviewer, the concern does not apply.

      Note that we disagree with the reviewer’s premise that there is ‘day timescale delay in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway’. More recent data reveals immediate consolidation of the anterior forebrain bias without a night-time effect (Kollmorgen, Hahnloser, Mante 2020; Tachibana, Lee, Kai, Kojima 2022). Thus, the single bird in (Andalman and Fee 2009) seems to be somewhat of an outlier.

      Hearing birds can experience the mismatch regardless of whether they experience DAF-free singing (provided their song was sufficiently shifted): even the renditions followed by white noise can be assessed with regards to their pitch mismatch, so that DAF imposes no limitation on mismatch assessment.

      We disagree with their claim that no internal model would be needed in case consolidation was delayed in Area X. If indeed, Area X stores the needed change and it takes time to implement this change in LMAN, then we would interpret the change in Area X as the plan that birds would be able to implement without auditory feedback. Because pitch can either revert (after DAF stops) or shift further away (when DAF is still present), there is no rigid delay that is involved in recovering the target, but a flexible decision making of implementing the plan, which in our view amounts to using a model.

      Fig 4b. Early and Late colored dots in legend are both red; late should be yellow? Perhaps use colors that are more distinct - this may be an issue of my screen but the two colors are difficult to discern.

      We used colors yellow to red to distinguish different birds and not early and late. We modified the markers to improve visual clarity: Early is indicated with round markers and late with crosses.

      Fig 4b. R, E, and L phases are only plotted for 4c; not in 4b. But the figure legend says that R, E and L are on both panels.

      In Fig. 4b E and L are marked with markers because they are different for different birds. In Fig. 4c the phases are the same for all birds and thus we labeled them on top. We additionally marked R in Fig. 4b as in Fig. 4c.

      Fig 4e. Did the color code switch? In the rest of Fig 4, DLO is red and WND is blue. Then in 4e it swaps. Is this a typo in the caption? Or are the colors switch? Please fix this it's very confusing.

      Thank you for pointing out the typo in the caption. We corrected it.

      The y axes in Fig 4d-e are both in std of pitch change - yet they have different ylim which make it visually difficult to compare by eye. Is there a reason for this? Can the authors make the ylim the same for fig 4d-e?.

      We added dashed lines to clarify the difference in ylim.

      Fig 4d-3 is really the main positive finding of the paper. Can the others show an example bird that showcases this positive result, plotted as in Fig 3b? This will help the audience clearly visualize the raw data that go into the d' analyses and get a more intuitive sense of the magnitude of the positive result.

      We added example birds to figure 4, one for WNd and one for dLO.

      Please define 'late' in Fig.4 legend.

      Done

      Minor

      Define NRP In the text with an example. Is an NRP of 100 where the birds was before the withdrawal of reinforcement?

      We added the sentence to the results:

      "We quantified recovery in terms of 𝑵𝑹𝑷 to discount for differences in the amount of initial pitch shift where 𝑵𝑹𝑷 = 𝟎% corresponds to complete recovery and 𝑵𝑹𝑷 = 𝟏𝟎𝟎% corresponds pitch values before withdrawal of reinforcement (R) and thus no recovery."

      Reviewer #3 (Recommendations For The Authors):

      The use of "hierarchically lower" to refer to the flexible process is confusing to me, and possibly to many readers. Some people think of flexible, top-down processes as being _higher_ in a hierarchy. Regardless, it doesn't seem important, in this paper, to label the processes in a hierarchy, so perhaps avoid using that terminology.

      We reformulated the paragraph using ‘nested processes’ instead of hierarchical processes.

      In the statement "a seeming analogous task to re-pitching of zebra finch song, in humans, is to modify developmentally learned speech patterns", a few suggestions: it is not clear whether "re-pitching" refers to planning or feedback-dependent learning (I didn't see it introduced anywhere else). And if this means planning, then it is not clear why this would be analogous to "humans modifying developmentally learned speech patterns". As you mentioned, humans are more flexible at planning, so it seems re-pitching would _not_ be analogous (or is this referring to the less flexible modification of accents?).

      We changed the sentence to:

      "Thus, a seeming analogous task to feedback-dependent learning of zebra finch song, in humans, is to modify developmentally learned speech patterns."

    1. Author response:

      We would first like to thank the editor for considering our findings for publication in eLife. Furthermore, we thank the reviewers and editors for their encouraging reviews and for providing helpful and insightful comments.

      Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH.

      Our conclusion regarding the necessity of CCK signaling for FSH secretion is based on the following evidence:

      (1) CCK-like receptors are expressed in the pituitary gland predominantly on FSH cells.

      (2) Application of CCK to pituitaries elicits FSH cell activation and FSH release, and, to a lesser degree, activation of LH cells.

      (3) Mutating the CCK-like receptor causes a decrease in fsh and lh mRNA synthesis.

      (4) Mutating the CCK-like receptor gives rise to a phenotype which is identical to that caused by mutation of both lh and fsh genes in zebrafish.

      (5) Mutating the FSH-specific CCK receptor in a different species of fish (medaka) also causes a complete shutdown of FSH production and phenocopies a fsh-mutant phenotype (Uehara et al, BioRxiv, DOI: 10.1101/2023.05.26.542428).

      Taken together, we believe that this data strongly supports the conclusion that CCK is necessary for FSH production and release from the fish pituitary. Admittedly, the overlapping effects of CCK on both FSH and LH cells in zebrafish (evident in both our calcium imaging experiments and the KO phenotype) complicates the interpretation of the phenotype. We speculate that the effect of CCK on LH cells in zebrafish can be caused either by paracrine signaling within the gland or by the effects of CCK on higher levels of the axis. In our revised manuscript we will make sure to highlight the overlapping effects of CCK on LH cells rather than portray it as a selective activator of FSH cells.

      Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types.

      Although there is evidence for the expression of CCK receptor in other tissues, we do show a direct decrease of FSH and LH expression in the gonadotrophs of the pituitary of the mutant fish; taken together with its significant expression in FSH cells, it is the most reasonable and forward explanation for the mutant phenotype. Unfortunately, unlike in mice, technologies for conditional knockout of genes in specific cell types are not yet available for our model and cell types. However, in the revised manuscript we will add a supplementary figure describing the distribution of this receptor in other tissues.

      It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      While the observed gonadal phenotype of the KO (sex inversion) should have a developmental origin since it requires a long time to manifest, the effect of the KO on FSH and LH cells is probably more acute.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy Discussion of the link is speculative and not fully merited.

      In the revised manuscript, we will address this comment by either providing data to link cck with metabolic status or tuning down the Discussion of this topic.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level).

      Innervation of the fish pituitary does not imply a synaptic-like connection between axon terminals and endocrine cells. In fact, such connections are extremely rare, and their functionality is unclear. Instead, the mode of regulation between hypothalamic terminals and endocrine cells in the fish pituitary is more similar to "volume transmission" in the CNS, i.e. peptides are released into the tissue and carried to their endocrine cell targets by the circulation or via diffusion.

      Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown.

      Our revised manuscript will include detailed experiments showing the activation of the receptor by its ligand. Unfortunately, no antibody is available against this fish- specific receptor (one of the caveats of working with fish models); therefore, we cannot present receptor protein data.

      The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells.

      We agree with the reviewer that there are some disadvantages in choosing to work with a whole-tissue preparation. However, we believe that the advantages of working in a more physiological context far outweigh the drawbacks as it reflects the natural dynamics more precisely. Since our transcriptome data as well as our ISH staining, show that the CCK receptor is exclusively expressed on FSH cells, it is improbable that the observed calcium response is mediated via a different pituitary cell type.

      Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3).

      The difference between the synchronization levels of LH and FSH cells activity stems from the gap-junction mediated coupling between LH cells that does not exist between FSH cells (Golan et al 2016, DOI: 10.1038/srep23777). Therefore, the onset of calcium response in FSH cells is dependent on the irregular diffusion rate of the peptide within the preparation, whereas the tight homotypic coupling between LH cells generates a strong and synchronized calcium rise that propagates quickly throughout the entire population; we will make sure this is clear in the final revision.

      Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

      We agree with the reviewer that, for now, we are unable to determine whether hypothalamic or peripheral CCK are the main drivers of FSH cells. While the strong innervation of the gland by CCK-secreting hypothalamic neurons strengthens the notion of a hypothalamic-releasing hormone and also fits with the dogma of the neural control of the pituitary gland in fish (Ball, 1981; doi: 10.1016/0016-6480(81)90243-4.), more experiments are required to resolve this question.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH- containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate.

      The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary:

      • The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      This is a very important comment, also raised by reviewer 1. To avoid repetition, please see our detailed response to the comment above.

      • The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      As detailed in the responses to the first reviewer,we cannot conduct conditional, cell- specific gene knockout in our model.

      • Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      In the revised manuscript, we will clarify which of the receptors are expressed and which receptor is targeted. We will also provide data showing the specificity of the receptors (both WT and mutant) to the ligands.

      • Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      We agree with the reviewer that the overlap in the effect of CCK measured in the calcium activation of cells and in the KO model does not allow us to conclude selectivity. In this context, it is crucial to highlight that CCK-R exhibits high expression on FSH cells but not on LH cells. Therefore, the effect of CCK on LH cells is likely paracrine rather than solely endocrine. We will tone down our claims of selectivity in the revised manuscript.

      • The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

      We will update the colors of the image for better clarity. Also, The same antibody had been previously used to mark CCK-positive cells in the gut of the red drum fish (K.A. Webb, Jr. 2010; DOI: https://doi.org/10.1016/j.ygcen.2009.10.010), where a control (pre-absorbed with the peptide) experiment had been conducted.

    2. eLife assessment

      This study presents valuable findings on the potential role of a peptide typically associated with feeding in the control of a pituitary hormone, FSH, which is a critical regulator of reproductive physiology. The evidence supporting the main claims of the authors is thought-provoking but incomplete. In particular, the authors demonstrate that the peptide is sufficient to regulate FSH, but they have not established its necessity. The work will be of interest to reproductive biologists, especially those with an interest in the endocrine control of fertility.

    3. Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH. Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types. It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy discussion of the link is speculative and not fully merited.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level). Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown. The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells. Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3). Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

    4. Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH-containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate.

      The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary:

      - The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      - The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      - Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      - Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      - The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

    1. eLife assessment

      This study provides evidence that the quality of research in female-dominated fields of research is systematically undervalued by the research community. The authors' findings are based on analyses of data from a research assessment exercise in New Zealand and data on funding success rates in Australia, Canada, the European Union and the United Kingdom. This work is an important contribution to the discourse on gender biases in academia, underlining the pervasive influence of gender on whole fields of research, as well as on individual researchers. The evidence supporting the conclusions is solid, but the work would benefit from further explorations into the nuances of specific fields of fields of research.

    2. Reviewer #2 (Public Review):

      Summary:<br /> The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings, are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.

      Strengths:<br /> - The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.<br /> - Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.<br /> - The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.<br /> - The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.<br /> - The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.<br /> - The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.<br /> - This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.<br /> - The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.<br /> - While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.

      Weaknesses:<br /> - The study does not provide data on the gender of grant reviewers or stakeholders, which could be critical for understanding potential unconscious bias in funding decisions. These data are likely not available; however, this could be discussed. Are grant reviewers in fields dominated by women more likely to be women?<br /> - There could be more exploration into whether the research quality score is influenced by inherent biases towards disciplines themselves, rather than only being gender bias.<br /> - The manuscript should discuss how non-binary gender identities were addressed in the research. There is an opportunity to understand the impact on this group.<br /> - A significant limitation is absence of data from other major research-producing countries like China and the United States, raising questions about the generalizability of the findings. How comparable are the findings observed to these other countries?<br /> - The motivations and barriers that drive gender distribution in various fields could be expanded on. Are fields striving to reach gender parity through hiring or other mechanisms?<br /> - The authors could consider if the size of funding awards correlates with research scores, potentially overlooking a significant factor in the evaluation of research quality. Presumably there is less data on smaller 'pilot' funds and startup funds for disciplines where these are more common. Would funding success follow the same trend for these types of funds?<br /> - The language used in the manuscript at times may perpetuate bias, particularly when discussing "lower quality disciplines," which could influence the reader's perception of certain fields.<br /> - The manuscript does not clarify how many gender identities were represented in the datasets or how gender identity was determined, potentially conflating gender identity with biological sex.

    3. Reviewer #3 (Public Review):

      This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.<br /> This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:<br /> i) Individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.<br /> ii) Funding success in Australia, Canada, Europe, UK.

      The study would benefit from further discussion of it limitations, and from the clarification of some technical points (as described in the recommendations for the authors).

    1. Reviewer #3 (Public Review):

      Summary:

      The goal of this paper is to characterize an anti-diuretic signaling system in insects using Drosophila melanogaster as a model. Specifically, the authors wished to characterize a role of ion transport peptide (ITP) and its isoforms in regulating diverse aspects of physiology and metabolism. The authors combined genetic and comparative genomic approaches with classical physiological techniques and biochemical assays to provide a comprehensive analysis of ITP and its role in regulating fluid balance and metabolic homeostasis in Drosophila. The authors further characterized a previously unrecognized role for Gyc76C as a receptor for ITPa, an amidated isoform of ITP, and in mediating the effects of ITPa on fluid balance and metabolism. The evidence presented in favor of this model is very strong as it combines multiple approaches and employs ideal controls. Taken together, these findings represent an important contribution to the field of insect neuropeptides and neurohormones and have strong relevance for other animals.

      Strengths:

      Many approaches are used to support their model. Experiments were well-controlled, used appropriate statistical analyses, and were interpreted properly and without exaggeration.

      Weaknesses:

      No major weaknesses were identified by this reviewer. More evidence to support their model would be gained by using a loss-of-function approach with ITPa, and by providing more direct evidence that Gyc76C is the receptor that mediates the effects of ITPa on fat metabolism. However, these weaknesses do not detract from the overall quality of the evidence presented in this manuscript, which is very strong.

    2. eLife assessment

      This important study provides a comprehensive analysis of ITP and its role as an anti-diuretic and metabolic hormone in Drosophila. The evidence supporting the conclusion is solid in general with combined genetic, comparative genomic approaches, classical physiological techniques, and biochemical assays. However, the evidence of direct binding between ITPa and Gyc76C and their physiological functions is incomplete. This work represents a contribution to the field of neuropeptides and neurohormones in insects and other animals.

    3. Reviewer #1 (Public Review):

      Summary:

      In Drosophila melanogaster, ITP has functions on feeding, drinking, metabolism, excretion, and circadian rhythm. In the current study, the authors characterized and compared the expression of all three ITP isoforms (ITPa and ITPL1&2) in the CNS and peripheral tissues of Drosophila. An important finding is that they functionally characterized and identified Gyc76C as an ITPa receptor in Drosophila using both in vitro and in vivo approaches. In vitro, the authors nicely confirmed that the inhibitory function of recombinant Drosophila ITPa on MT secretion is Gyc76C-dependent (knockdown Gyc76C specifically in two types of cells abolished the anti-diuretic action of Drosophila ITPa on renal tubules). They also used a combination of multiple approaches to investigate the roles of ITPa and Gyc76C on osmotic and metabolic homeostasis modulation in vivo. They revealed that ITPa signaling to renal tubules and fat body modulates osmotic and metabolic homeostasis via Gyc76C.

      Furthermore, they tried to identify the upstream and downstream of ITP neurons in the nervous system by using connectomics and single-cell transcriptomic analysis. I found this interesting manuscript to be well-written and described. The findings in this study are valuable to help understand how ITP signals work on systemic homeostasis regulation. Both anatomical and single-cell transcriptome analysis here should be useful to many in the field.

      Strengths:

      - The question (what receptors of ITPa in Drosophila) that this study tries to address is important. The authors ruled out the Bombyx ITPa receptor orthologs as potential candidates. They identified a novel ITP receptor by using phylogenetic, anatomical analysis, and both in vitro and in vivo approaches.

      - The authors exhibited detailed anatomical data of both ITP isoforms and Gyc76C (in the main and supplementary figures), which helped audiences understand the expression of the neurons studied in the manuscript.

      - They also performed connectomes and single-cell transcriptomics analysis to study the synaptic and peptidergic connectivity of ITP-expressing neurons. This provided more information for better understanding and further study on systemic homeostasis modulation.

      Weaknesses:

      In the discussion section, the authors raised the limitations of the current study, which I mostly agree with, such as the lack of verification of direct binding between ITPa and Gyc76C, even though they provided different data to support that ITPa-Gyc76C signaling pathway regulates systemic homeostasis in adult flies.

    4. Reviewer #2 (Public Review):

      Summary:

      The physiology and behaviour of animals are regulated by a huge variety of neuropeptide signalling systems. In this paper, the authors focus on the neuropeptide ion transport peptide (ITP), which was first identified and named on account of its effects on the locust hindgut (Audsley et al. 1992). Using Drosophila as an experimental model, the authors have mapped the expression of three different isoforms of ITP (Figures 1, S1, and S2), all of which are encoded by the same gene.

      The authors then investigated candidate receptors for isoforms of ITP. Firstly, Drosophila orthologs of G-protein coupled receptors (GPCRs) that have been reported to act as receptors for ITPa or ITPL in the insect Bombyx mori were investigated. Importantly, the authors report that ITPa does not act as a ligand for the GPCRs TkR99D and PK2-R1 (Figure S3). Therefore, the authors investigated other putative receptors for ITPs. Informed by a previously reported finding that ITP-type peptides cause an increase in cGMP levels in cells/tissues (Dircksen, 2009, Nagai et al., 2014), the authors investigated guanylyl cyclases as candidate receptors for ITPs. In particular, the authors suggest that Gyc76C may act as an ITP receptor in Drosophila.

      Evidence that Gyc76C may be involved in mediating effects of ITP in Bombyx was first reported by Nagai et al. (2014) and here the authors present further evidence, based on a proposed concordance in the phylogenetic distribution ITP-type neuropeptides and Gyc76C (Figure 2). Having performed detailed mapping of the expression of Gyc76C in Drosophila (Figures 3, S4, S5, S6), the authors then investigated if Gyc76C knockdown affects the bioactivity of ITPa in Drosophila. The inhibitory effect of ITPa on leucokinin- and diuretic hormone-31-stimulated fluid secretion from Malpighian tubules was found to be abolished when expression of Gyc76C was knocked down in stellate cells and principal cells, respectively (Figure 4). However, as discussed below, this does not provide proof that Gyc76C directly mediates the effect of ITPa by acting as its receptor. The effect of Gyc76C knockdown on the action of ITPa could be an indirect consequence of an alteration in cGMP signalling.

      Having investigated the proposed mechanism of ITPa in Drosophila, the authors then investigated its physiological roles at a systemic level. In Figure 5 the authors present evidence that ITPa is released during desiccation and accordingly, overexpression of ITPa increases survival when animals are subjected to desiccation. Furthermore, knockdown of Gyc76C in stellate or principal cells of Malphigian tubules decreases survival when animals are subject to desiccation. However, whilst this is correlative, it does not prove that Gyc76C mediates the effects of ITPa. The authors investigated the effects of knockdown of Gyc76C in stellate or principal cells of Malphigian tubules on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. It is not clear, however, why animals over-expressing ITPa were also not tested for its effect on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. In Figures 6 and S8, the authors show the effects of Gyc76C knockdown in the female fat body on metabolism, feeding-associated behaviours and locomotor activity, which are interesting. Furthermore, the relevance of the phenotypes observed to potential in vivo actions of ITPa is explored in Figure 7. The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A. Furthermore, as discussed above, the results of these experiments do not prove that the effects of ITPa are mediated by Gyc76C because the effects reported here could be correlative, rather than causative.

      Lastly, in Figures 8, S9, and S10 the authors analyse publicly available connectomic data and single-cell transcriptomic data to identify putative inputs and outputs of ITPa-expressing neurons. These data are a valuable addition to our knowledge ITPa expressing neurons; but they do not address the core hypothesis of this paper - namely that Gyc76C acts as an ITPa receptor.

      Strengths:

      (1) The main strengths of this paper are i) the detailed analysis of the expression and actions of ITP and the phenotypic consequences of over-expression of ITPa in Drosophila. ii). the detailed analysis of the expression of Gyc76C and the phenotypic consequences of knockdown of Gyc76C expression in Drosophila.

      (2) Furthermore, the paper is generally well-written and the figures are of good quality.

      Weaknesses:

      (1) The main weakness of this paper is that the data obtained do not prove that Gyc76C acts as a receptor for ITPa. Therefore, the following statement in the abstract is premature: "Using a phylogenetic-driven approach and the ex vivo secretion assay, we identified and functionally characterized Gyc76C, a membrane guanylate cyclase, as an elusive Drosophila ITPa receptor." Further experimental studies are needed to determine if Gyc76C acts as a receptor for ITPa. In the section of the paper headed "Limitations of the study", the authors recognise this weakness. They state "While our phylogenetic analysis, anatomical mapping, and ex vivo and in vivo functional studies all indicate that Gyc76C functions as an ITPa receptor in Drosophila, we were unable to verify that ITPa directly binds to Gyc76C. This was largely due to the lack of a robust and sensitive reporter system to monitor mGC activation." It is not clear what the authors mean by "the lack of a robust and sensitive reporter system to monitor mGC activation". The discovery of mGCs as receptors for ANP in mammals was dependent on the use of assays that measure GC activity in cells (e.g. by measuring cGMP levels in cells). Furthermore, more recently cGMP reporters have been developed. The use of such assays is needed here to investigate directly whether Gyc76C acts as a receptor for ITPa. In summary, insufficient evidence has been obtained to conclude that Gyc76C acts as a receptor for ITPa. Therefore, I think there are two ways forward, either:<br /> (a) The authors obtain additional biochemical evidence that ITPa is a ligand for Gyc76C.<br /> or<br /> (b) The authors substantially revise the conclusions of the paper (in the title, abstract, and throughout the paper) to state that Gyc76C MAY act as a receptor for ITPa, but that additional experiments are needed to prove this.

      (2) The authors state in the abstract that a phylogenetic-driven approach led to their identification of Gyc76C as a candidate receptor for ITPa. However, there are weaknesses in this claim. Firstly, because the hypothesis that Gyc76C may be involved in mediating effects of ITPa was first proposed ten years ago by Nagai et al. 2014, so this surely was the primary basis for investigating this protein. Nevertheless, investigating if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins is a valuable approach to addressing this issue. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. Thus, are there protostome species in which both ITP-type and Gyc76C-type genes/proteins have been lost? Furthermore, are there any protostome species in which an ITP-type gene is present but an Gyc76C-type gene is absent, or vice versa? If there are protostome species in which an ITP-type gene is present but a Gyc76C-type gene is absent or vice versa, this would argue against Gyc76C being a receptor for ITPa. In this regard, it is noteworthy that in Figure 2A there are two ITP-type precursors in C. elegans, but there are no Gyc76C-type proteins shown in the tree in Figure 2B. Thus, what is needed is a more detailed analysis of protostomes to investigate if there really is correspondence in the phylogenetic distribution of Gyc76C-type and ITP-type genes at the species level.

      (3) The manuscript would benefit from a more comprehensive overview and discussion of published literature on Gyc76C in Drosophila, both as a basis for this study and for interpretation of the findings of this study.

    1. eLife assessment

      In this study, the authors developed a cell-based screening assay for the identification of small molecule inhibitors of nonsense-mediated decay (NMD). They used it to validate a novel small molecule SMG1 kinase inhibitor that inhibits NMD in cultured cells leading to the expression of neoantigens from NMD-targeted genes, and in vivo slows tumor growth of cells with a significant number of out-of-frame indel mutations. The conclusions are supported by convincing evidence, and the significance of this work consists in the development of a novel and very promising NMD inhibitor drug that acts as an inhibitor of the SMG1 NMD kinase and is suitable for use in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application.

    2. Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability, and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq-based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      While the RNA-seq-based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

    3. Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small-molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consists in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.

      - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.

      - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Figure 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Figure S5). I find this curious, but the authors do not comment on it.

      - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.

      - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.

      - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The authors have addressed my comments. As a final minor point, regarding comment 2, these condensates are likely viscoelastic rather than purely viscous. It is prudent to indicate that the data may refer to an apparent viscosity.

      We added the following text to the manuscript to highlight the viscoelastic nature of ELP condensates, and the relationship of reported values with the steady state viscosity. “It is worth noting that the reported values, although related, may not quantitatively represent the steady-state viscosity. This discrepancy arises from the slow relaxation timescale inherent in ELP condensates with viscoelastic properties.”

    2. eLife assessment

      This important study investigates the structural organization of a series of diblock elastin-like polypeptide condensates. The methodology is highly compelling, as it combines multiscale simulations and fluorescence lifetime imaging microscopy experiments. The results increase our understanding of model biomolecular condensates.

    3. Reviewer #1 (Public Review):

      This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates.

      The authors have adequately addressed my previous concerns.

    4. Reviewer #2 (Public Review):

      Summary:

      Latham A.P. et al. apply simulations and FLIM to analyse several di-block elastin-like polypetides and connect their sequence to the micro-structure of coacervates resulting from their phase-separation.

      Strengths:

      Understanding the molecular grammar of phase separating proteins and the connection with mesoscale properties of the coacervates is highly relevant. This work provides insights into micro-structures of coacervates resulting from di-block polypetides.

      Weaknesses:

      The results apply to a very specific architecture (di-block polypetides) with specific sequences.

    1. eLife assessment:

      This work describes an easily implemented method for measuring solid food intake in Drosophila, which is necessary for studying the consumption of experimentally challenging diets, such as high-fat foods, as well as their nutritional impacts on the organism. It is a valuable technical contribution with solid evidence supporting the conclusions, filling a significant gap in the field.

    2. Reviewer #1 (Public Review):

      Summary:

      Thakare et al propose a gravimetric method to evaluate feeding from solid food in Drosophila adults that can be used to evaluate the nutritional impact of high-fat food.

      Strengths:

      This method is new and fills a gap in the methods used in Drosophila research.

      Weaknesses:

      The data presented address a number of questions that are mainly interesting for people needing to reproduce such experiments. The work could be improved by being presented within a broader scope.

    3. Reviewer #2 (Public Review):

      Summary:

      Thakare et al. present the DIETS assay for quantifying food consumption in adult Drosophila. DIETS measures food intake by weighing fly food before and after feeding. Technically, this is a well-designed, executed, and analyzed study. The interpretations are generally conservative and justified by the results. Although the results aren't always consistent with other published studies, which might reflect some of the unique conditions of the DIETS assay, the technique can clearly distinguish between some expected differences in food intake. Although lifespan is shortened in the DIETS chamber, the method seems robust for various time scales up to a week. DIETS adds another useful and versatile tool for fly researchers interested in studying feeding behavior.

      Strengths:

      The authors test various conditions, including food presentation, surface area, and humidity (by changing the food cup distance to an agar base) to demonstrate an optimized technique for quantifying consumption. Under these conditions, evaporation is generally limited to <10%.

      The authors use DIETS to validate diverse feeding paradigms, including the published effects of temperature, food dilution, and intermittent fasting on food intake.

      Weaknesses:

      The studies to optimize and test the DIETS assay are technically rigorous and well-designed. However, the results reveal some weaknesses or potential caveats of the assay. As highlighted below, how much nutrition flies are actually obtaining may be misestimated due to vapor diffusion, and crowding/competition for food. This appears largely acceptable though, since the 'group' measurement can still clearly distinguish between expected feeding differences under different conditions, but it likely reduces accuracy, which may be important in some studies, and probably nullifies the effectiveness of using DIETS to restrict caloric intake.

      It is my understanding that flies suck out nutrients from the medium, leaving behind the agar/cornmeal matrix. This seems consistent with the images in Figure S2B, where the spheroidal medium in the food cup maintains its shape as it shrinks, but there don't seem to be any pits or holes from fly consumption. Given that flies in DIETS consume a significant portion of the available food, it seems that the food concentration at the medium surface may be changing throughout the experiment. This may also make it challenging to use other common fly food ingredients, such as cornmeal, much of which is indigestible.

      Similarly, vapor diffusion is expected between the agar bed and food cup (which the authors observed; in line 385), which may further affect assay accuracy, especially in comparisons between foods with different osmolarity.

      In DIETS, increased feeding is observed with increased flies per chamber, but this is not observed in other techniques, such as EX-Q (Wu et al. 2020). It is unclear whether sensitivity to adult density is a DIETS-specific feature, or if adult density instead directly affects food intake estimates using DIETS (e.g., by affecting chamber humidity).

      In another example, there is a ~300% difference in absolute feeding when the DIETS food cup is presented in different formats (Figure 3C). Again, it is unclear whether food presentation has an inherently greater effect in DIETS, or if the measurements themselves are highly sensitive to the environment.

      Although the control of total food mass given to the animals is a novel feature of the assay, the likely differences in nutrient intake between individuals (and shortened lifespan) in a DIETS chamber makes this a challenging method to use to study caloric restriction. The shortened lifespan likely stems from the high adult density per vial, which has been explored in previous publications (e.g., Pearl in the 1920s; Mueller in the 1990s).

    1. eLife assessment

      This study reports some useful information on the mechanisms by which a high-fat diet induces arrhythmias in the model organism Drosophila. Specifically, the authors propose that adipokinetic hormone (Akh) secretion is increased with this diet, and through binding of Akh to its receptor on cardiac neurons, arrhythmia is induced. The presented data, however, incompletely support the conclusions, with a number of concerns identified, such as the need for editorial clarifications, issues with experimental design (including additional control experiments), and over or misinterpretation of some of the experimental data. Nonetheless, some of the data will be helpful to those who wish to extend the research to a more complex model system, such as the mouse.

    2. Reviewer #1 (Public Review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high-fat diet are due in part to adipokinetic hormone (Akh) signaling activation. High-fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on a high-fat diet. Elimination of one of two AkhR-expressing cardiac neurons results in arrhythmia similar to a high-fat diet.

      Strengths:

      The authors propose a novel mechanism for high-fat diet-induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Weaknesses:

      Major comments:

      (1) The authors state, "Arrhythmic pathology is rooted in the cardiac conduction system." This assertion is incorrect as a blanket statement on arrhythmias. There are certain arrhythmias that have been attributable to the conduction system, such as bradycardic rhythms, heart block, sinus node reentry, inappropriate sinus tachycardia, AV nodal reentrant tachycardia, bundle branch reentry, fascicular ventricular tachycardia, or idiopathic ventricular fibrillation to name a few. However the etiological mechanism of many atrial and ventricular arrhythmias, such as atrial fibrillation or substrate-based ventricular tachycardia, are not rooted in the conduction system. The introduction should be revised to reflect a clear focus on atrial fibrillation (AF). In addition, AF susceptibility is known to be modulated by autonomic tone, which is topically relevant to this manuscript.

      (2) The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high-fat diet compared to normal-fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      (3) The authors state, "to test whether the HFD-induced increase in Akh in the APC affects APC neuron activity, we used CaLexA (https://doi.org/10.3109/01677063.2011.642910)." According to the reference, CaLexA is a tool to map active neurons and would not indicate, as the authors state, whether Akh affects APC neuron activity specifically. It is equally possible that APC neurons may be activated by HFD and produce more Akh. Please clarify this language.

      (4) Are the AkhR+ neurons parasympathetic or sympathetic? Please provide additional experimentation that characterizes these neurons. The AkhR+ neurons appear to be anti-arrhythmic. Please expand the discussion to include a working hypothesis of the overall findings on Akh, AkhR, and AkhR+ neurons.

      (5) The authors state, "Heart function is dependent on glucose as an energy source." However, the heart's main energy source is fatty acids with minimal use of glucose (doi: 10.1016/j.cbpa.2006.09.014). Glucose becomes more utilized by cardiomyocytes under heart failure conditions. Please amend/revise this statement.

    3. Reviewer #2 (Public Review):

      This manuscript explores mechanisms underlying heart contractility problems in metabolic disease using Drosophila as a model. They confirm, as others have demonstrated, that a high-fat diet (HFD) induces cardiac problems in flies. They showed that a high-fat diet increased Akh mRNA levels and calcium levels in the Akh-producing cells (APC), suggesting there is increased production and release of this hormone in a HFD context. When they knock down Akh production in the APCs using RNAi they see that cardiac contractility problems are abolished. They similarly show that levels of the Akh receptor (Akhr) are increased on a HFD and that loss of Akhr also rescues contractility problems on a HFD.

      One highlight of the paper was the identification of a pair of neurons that express a receptor for the metabolic hormone Akh, and showing initial data that these neurons innervate the cardiac muscle. They then overexpress cell death gene reaper (rpr) in all Akhr-positive cells with Akhr-GAL4 and see that cardiac contractility becomes abnormal.

      However, this paper contains several findings that have been reported elsewhere and it contains key flaws in both experimental design and data interpretation. There is some rationale for doing the experiments, and the data and images are of good quality. However, others have shown that HFD induces cardiac contractility problems (Birse 2010), that Akh mRNA levels are changed with HFD (Liao 2021) that Akh modulates cardiac rhythms (Noyes 1995), so Figures 1-4 are largely a confirmation of what is already known. This limits the overall magnitude of the advances presented in these figures. Overall, the stated concerns limit the impact of the manuscript in advancing our understanding of heart contractility.

    4. Reviewer #3 (Public Review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' augments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      It is intriguing to see an increase in Akh mRNA levels in HFD-fed animals. This is a key result for linking HFD-induced arrhythmia to Akh. Thus, demonstrating that HFD also increases the Akh protein levels and Akh is secreted more should significantly strengthen the manuscript.

      The experiments employing an AkhR null allele nicely demonstrate its requirement for HFD-induced cardiac arrhythmia. Depletion of Akh in Akh-expressing cells recapitulates the consequence of AkhR knockout, supporting that both Akh and its receptor are required for HFD-induced cardiac arrhythmia. Given that RNAi is associated with off-target effects and some RNAi reagents do not work, testing multiple independent RNAi lines is the standard procedure. It is also important to show the on-target effect of the RNAi reagents used in the study.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. The experiments presented in Figure 6 cannot justify the authors' conclusion. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutants could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs will allow for specific manipulation of ACNs, which is crucial for studying the specific role of ACNs in controlling cardiac rhythms.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

    1. Author response:

      We thank eLife and the reviewers for the thoughtful summary and valuable review of our manuscript. We largely agree with the summary and review and have provided our responses to the comments below. We believe BADGER is a significant new tool for identifying associated risk factors for complex diseases, and the associations we observed in the analysis provide insights into the genetic basis of Alzheimer's disease.

      Reviewer #1 (Public Review):

      The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

      The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

      We thank the reviewer for the conclusion and positive comments.

      Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

      We thank the reviewer for their comments. To clarify, the primary difference between our proposed method, BADGERS, and LDSC lies in their respective objectives and applications. LDSC is designed to estimate heritability and genetic correlations between traits by utilizing GWAS summary statistics, thereby aiding in the elucidation of the genetic architecture of complex traits and diseases. Conversely, BADGERS is specifically developed to explore causal relationships between risk factors, such as biomarkers, and diseases of interest. It employs genetic variants as variables to deduce causality, thereby addressing the challenges of confounding and reverse causation that are common in observational studies. Although BADGERS utilizes the LD reference panel derived from LDSC, the LD reference panel is used to obtain the predicted trait expression. The ultimate goal is to focus on linking biobank traits with Alzheimer’s disease and building causal relationships instead of identifying genetic architecture.

      Regarding the technical aspects mentioned, we acknowledge the concerns about the use of Python 2.7 and the issues encountered during the package installation. We are in the process of updating the software to ensure compatibility with current versions of Python and to enhance the installation process with standard tooling and automated testing for a more user-friendly experience. We have provided tests for each portion of the software so the user can test if the software is working properly.

      Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

      We thank the reviewer very much for their comment. We're glad that our findings align with existing research using similar data, increasing the validity of our work and the proposed BADGER algorithm. Your point about the lack of association between parental history, high cholesterol, and mild cognitive impairment (MCI) or cognitive composite scores in the WRAP cohort is well-taken. We agree that the selection criteria of the WRAP cohort may influence these findings, as it consists of individuals with a specific risk profile for Alzheimer's disease. This selection could indeed mitigate the observed association between these factors and cognitive outcomes, which we initially found surprising.

      Regarding the environmental factors, we appreciate your clarification and understand the confusion. Our intention was to discuss the potential for selection bias and confounding factors in biobank datasets for the identified associations, which might not necessarily be direct environmental effects.

      Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

      We thank the reviewer a lot for their endorsement of the BADGER framework. We believe that our method, BADGER, improves on existing approaches by effectively linking genetic data with the detailed phenotypic information in biobanks and large disease GWAS. This enhances our ability to detect associations without needing individual-level data, offering clearer insights while reducing issues like reverse causality and confounding factors.

      Even though the IGAP dataset is over five years old, it remains one of the largest publicly available datasets for Alzheimer’s Disease. Likewise, the UK biobank is one of the largest publicly available human traits datasets, which researchers continue to use. These datasets' continued utility demonstrates their value in the research community. Additionally, the versatility of the BADGER framework makes it suitable for future research investigating the relationship between human traits and various diseases using different datasets.

      Reviewer #2 (Public Review):

      Summary:

      Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

      Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

      We thank the reviewer a lot for the conclusion and positive comments.

      Strengths:

      BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

      We thank the reviewer a lot for the conclusion and positive comments.

      Weaknesses:

      However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

      We thank the reviewer a lot for the comments. We understand the importance of comparing BADGER to other methods. The comparison with LDSC, while not directly relevant to BADGER’s causal inference aims, is indeed an interesting aspect to consider for future studies. In this paper, we focused on comparing BADGER with Mendelian Randomization (MR), which shares its causal inference objective.

      As a result, BADGERS identified a total of 48 traits that reached Bonferroni-corrected statistical significance. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS. This demonstrates that BADGER holds higher power in detecting causal relationships.

      Regarding the use of polygenic risk scoring, we agree that it holds challenges in directly inferring causality. While BADGERS offers an innovative way to explore genetic correlations and can help generate new hypotheses about disease mechanisms, it does not replace the causal inferences that can be drawn from instrumental-variable-based analyses. Instead, it should be viewed as a complementary tool that can illuminate potential genetic relationships and guide further causal investigations.

      In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

      We thank the reviewer a lot for the conclusion and positive comments.

    1. Author response:

      We thank the reviewers and editors for their time and effort reviewing and improving this manuscript. We also thank them for their support.

      Following the guidelines received by eLife we submit here the preliminary author’s response to the Public review with our planned changes to the manuscript.

      Reviewer 1.

      Comment 1. Issue on cross-reactivities of MafB antibodies.

      We are confident that our description of MafB V1 interneurons is correct despite some cross-reactivity with one of the antibodies used. We test all antibodies we use, and unfortunately, we found an inverse relationship between sensitivity and specificity with the two MafB antibodies used in this study. We chose for quantification the one with highest sensitivity, despite the presence of some cross-reactivity in interneurons other than the dorsal and ventral (Renshaw) V1 populations we focus on. The dorsal and ventral (Renshaw) V1 populations we describe here are also reactive with the more specific antibody (although with lower sensitivity) and both are neatly labeled in a MafB-GFP reporter mouse as described in Figure 3. We will add an image to the supplement with MafB-GFP V1 Interneurons at P5 showing the immunoreactivity of both MafB antibodies as suggested by the reviewer. We agree with the reviewer that this will give further support to the characterization of these populations by either immunocytochemical or genetic means at P5.

      Unfortunately, we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth as a result of respiratory failure. This is due to removal of inhibitory interneurons in brainstem centers critical for respiration (Blanchi at al. 2003 MafB deficiency causes defective respiratory rhythmogenesis and fatal central apnea at birth. Nat Neurosci. 6(10):1091-100. doi: 10.1038/nn1129. PMID: 14513037). This is why we used tissues from late embryos for testing antibody specificity in KO spinal cords. We will make this clearer in the text.

      Comment 2. Overlap of V1 clades with lineage labeled Foxp2-V1s at P5.

      We collected the data requested by the reviewer for P5 Foxp2-V1 interneurons and this will be added to an updated version of this figure. In comparison to the results with the OTP mouse, we only found marginal overlap at P5 with Renshaw cells, Pou6f2, and Sp8 V1s in our genetic intersection to label Foxp2-V1s. We apologize for not showing the data. We will make this clearer.

      Reviewer 2.

      Comment 1. Paper VERY hard to read.

      We will make every effort to make the paper more readable by moving methodological discussions to supplementary materials. We strive to keep our methods as rigorous, clean, and replicable as possible, and that sometimes requires lengthy explanations of the details and reasoning behind our approaches. We will make sure this does not distract from the principal scientific messages we want to convey. We agree with the reviewer that these should be emphasized over methodological detail, and we will correct any mistakes in the text that lead to confusion. Thank you for pointing out this problem that we hope to correct in a new version. Why focus on Foxp2 V1s? We focus in the Foxp2 population for several reasons: 1) This is the largest population of V1s, and it is the one with a close spatial association to motoneurons, in particular limb motoneurons; 2) Given previous results (Benito-Gonzalez and Alvarez, 2012, cited in bibliography) it likely includes many reciprocal inhibitory interneurons; 3) We do not have the mice for studying the Pou6f2 (or Sp8) population, but similar studies are now being carried out in the Bikoff lab.

      Comment 2. Lack of functional studies.

      Functional studies are currently being carried out, both during development of limb function in postnatal mice as well as in adult animals. These studies required the creation of several new animal models and reagents. As with the present manuscript, we thoroughly characterize all animals and methods. This takes time and space. These studies are beyond the goals and length of the current manuscript, but we agree with the reviewer that these are the critical next experiments that need to be performed. We are now finalizing studies on the role of Foxp2-V1 interneurons in the postnatal development of limb coordination and validating approaches for silencing them in the adult while also optimizing behavioral assays and recordings. The data presented here on Foxp2-V1 interneuron heterogeneity and relations with limb motoneurons gives the necessary context for raising stronger hypotheses and aiding in the interpretation of future results in functional studies.

      Synapse counts.

      We respectfully disagree with the reviewer’s comments on our synapse density estimates. To fully explain the reasons and prevent any ambiguity, we need to focus on detailed methodological aspects. We apologize for the lengthy response. Two major issues were raised:

      (1) Focus on the cell body.

      The issue pointed by the reviewer of potential synapses in distal dendrites from V1 subgroups not projecting proximally was already discussed in the text. The reason we focus on the cell body is because 1) it is not feasible to study the full dendritic arbor of so many different types of motoneurons and 2) it allows us to identify V1 subpopulations that likely exert stronger modulation of motoneuron firing by targeting the proximal somatodendritic membrane. The fact that synaptic organization on motoneurons is similar on cell bodies and proximal dendrites (first 100 µm) suggests that inputs from V1 clades other than Renshaw cells are likely further away, and therefore there is limited benefit to include analyses of proximal dendrites in these data. Additionally, dendrites would be difficult to consistently follow in Chat immunostained tissue. We are currently using novel viral approaches to obtain labeling of single motoneurons and their full dendritic trees for more in depth dendritic analyses in the mouse. The classical method based on single cell in vivo intracellular labeling using micropipettes is presently very low yield in the adult mouse. We are experienced with detailed single motoneuron dendritic arbor analyses in cat and rat motoneurons (Alvarez et al. 1997 Cell-type specific organization of glycine receptor clusters in the mammalian spinal cord. J Comp Neurol. 379(1):150-70; Alvarez et al., 1998 Distribution of 5-hydroxytryptamine-immunoreactive boutons on alpha-motoneurons in the lumbar spinal cord of adult cats. J Comp Neurol. 393(1):69-83; Rotterman et al., 2014. Normal distribution of VGLUT1 synapses on spinal motoneuron dendrites and their reorganization after nerve injury. J Neurosci. 34(10):3475-92. doi: 10.1523/JNEUROSCI.4768-13.2014). Based on this experience, we do not believe it is feasible to include similar analyses to compare all motor columns throughout 6 segments of the spinal cord in this study. We agree with the reviewer that these are important data sets that need to be collected and they are planned for future experiments. These analyses will address different questions than the ones posed and answered in our current manuscript.

      (2) Number of motoneurons analyzed.

      We disagree with the reviewer assessment that our conclusions might be biased because of the numbers of motoneurons analyzed. We sampled a total of 295 motoneurons in 5 different mice (117 LMC/HMC, 99 MMC, and 79 PGC motoneurons), and we used stringent methods for synapse detection. Due to a technical error, Mouse 3 lacked data in upper lumbar and Th13, but all other mice included data in almost all motor columns and segments. We disagree with the characterization that these are small samples. For full transparency, all motoneurons analyzed were identified in Figure 6D. Each of the nearly 300 motoneuron cell bodies was carefully reconstructed through several optical planes to obtain an accurate estimate of synapse density. More automatic methods in current use in the literature sometimes analyze larger samples, but our methods are designed to avoid methodological biases inherent to these automatic methods. We do not use image thresholding to extract synaptic contacts because they lack accuracy identifying single synapses. Thus, estimates using this technique frequently refer to coverage, not synapse density. In addition, it is hard to keep threshold criteria consistent across multiple optical planes to analyze enough section thickness to estimate a motoneuron surface. This is because tissue light diffraction alters thresholding levels continuously across optical planes. Thus, many authors present data as linear densities across a perimeter (in a single plane) measuring many cells in one field in one plane. We avoid cell body linear densities (or coverage) because they bias counts towards larger synapses that have higher probability of being present at any single confocal plane. Moreover, estimates along a surface reduces synapse sampling variability and better estimate synaptic coverage compared to estimates derived from analyzing single cross-sections. We also confirm each genetically labeled varicosity as a likely synapse by accumulation of VGAT. In this manner we restrict our counts to synaptic boutons and not axons or intervaricose regions. Previously, we used bassoon to show the accuracy of our methods (Wootz et al. 2013 Alterations in the motor neuron-Renshaw cell circuit in the Sod1(G93A) mouse model. J Comp Neurol. 521(7):1449-69. doi: 10.1002/cne.23266). That means that our densities are true synaptic densities, which are difficult to extract from automatic methods that estimate fluorescence coverage over larger samples of somatic profiles but fail to individualize synapses and frequently bias results. These bulk methods introduce significant confounds in data interpretation: Is higher coverage due to bigger synapses or more synapses? Do threshold structures represent true synapses or also include axons? To what extent does sub- or over-thresholding in different planes affect identification of structures in contact with the motoneuron surface? We avoid all these problems. Not surprisingly, a nested ANOVA demonstrated consistent significant differences among motor columns and segments.

      In summary, while more automatic methods allow larger samples, they disregard true synaptic densities and are based on thresholding methods with high variability in different motoneurons, optical planes and histological sections, thereby they require much larger numbers of motoneurons to overcome their many biases and sources of error. This is not our case. Our sample size is large enough considering the accuracy of our methods and data quality. This is demonstrated by consistency in statistical results across motor columns in different segments and mice.

      Comment 3. Possibility of anterograde transsynaptic labeling from primary afferents infected with rabies virus.

      This is a fair question that we did not clearly explain. The reviewer compares our results with those of Pimpinella et al., 2022. The methods used are different. To obtain anterograde tracing, these authors used Cre lines to achieve high levels of expression of TVA and RV glycoprotein in specific subtypes of sensory neurons including proprioceptors. Then EnVa-coated Rabies virus was injected directly inside the spinal cord for cell-type specificity. This method transynaptically labeled in the anterograde direction interneurons receiving inputs from specific types of sensory afferents, but the method does not have the muscle specificity required in our analyses. In our case, we used intramuscular injections at P5 of AAV1-G for transcomplementation with Rabies virus delta G injected in the same muscles later, at P15. In previous studies in which we used the RV-delta G virus without AAV1G, we analyzed motoneuron and primary afferent infection rates and found both to be considerably reduced with injection age. In our hands, there is almost no RV infection of primary afferents when Rabies virus is injected i.m. at P15, but there is some limited motoneuron infection remaining (that we used to our advantage in this paper to avoid primary afferent and developmental confounds).

      Unfortunately, these methodological studies are presently communicated only in abstract form (GomezPerez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Therefore, we will add to the supplementary information some images from serial sections to those illustrated in the paper and that will show a few “start” LG motoneurons that remained labeled at this survival time point and the lack of any dorsal horn primary afferent labeling. This is consistent with our yet unpublished data that is based on a larger number of animals and more extensive time courses.

      Comment 4. Temporal resolution of birth-dating.

      We agree with the reviewer, and that is the reason we explicitly discuss that temporal resolution is not perfect (we also add a few more caveats that affect temporal resolution beyond the reviewers’ comments). However, the method is good enough to differentiate temporal sequences of neurogenesis with close to 12-hour resolution, once enough animals are analyzed to compensate for methodological temporal overlaps. That is the reason for our Figure 1D.

      Reviewer 3

      Comment 1. Text is too long and main message buried in technical details.

      We agree and similar to our response to the first comment of Reviewer 2, we will revise the writing to make it more straightforward while moving some of the information on methods and technical discussion to supplementary materials. As demonstrated by reviewer 2 comments, methodological discussions are still important to best interpret the data presented in this paper.

    1. eLife assessment

      This fundamental study for the first time defines genetically the role of the Clock gene in basal metazoa, using the cnidarian Nematostella vectensis. With convincing evidence, the study provides insight into the early evolution of circadian clocks. Clock in this species is necessary for daily rhythms under constant conditions, but not under a rhythmic light/dark cycle, suggesting that the major role of the circadian oscillator in this species could be a stabilizing function under non-rhythmic environmental conditions.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      In this revised manuscript Aguillon and collaborators convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions in the Cnidarian Nematostella. The results also convincingly show that CLK impacts rhythmic gene expression in this organism. This original work thus demonstrates that CLK was recruited very early during animal evolution in the circadian clock mechanism to optimize behavior and gene expression with the time-of-day. The manuscript could still benefit from some improvements so that it is more accessible for a wide readership.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Aguillon and collaborators have deeply revised, and in the progress significantly improved the presentation of their interesting results with the first Cnidarian circadian gene mutant. Results are now very convincingly demonstrating that CLK is required for free-running behavioral rhythms under constant conditions. The results also now more convincingly show that CLK impact rhythmic gene expression, although interpretation of the transcriptomics data is not straightforward. I think there is still improvements that are needed to make the manuscript more accessible. We authors need to keep in mind that a broad audience will read their report, not just chronobiologists. I have listed below several issues that I think should be addressed, and some editing suggestions.

      General comment to Editor and Reviewers:

      We are genuinely grateful to both reviewers and editors about all the feedback which helped us to make the best of our data, to question our analysis to the point we redefined our approach and end up with a great article we are proud of it. Only the name of authors is visible on the article, and considering how much the reviewing system help to improve the research it seems almost unfair. As such, we thank all of you and really appreciate the new eLife system. Bravo all.

      Abstract:

      (1) Line 40" It should read "transcript levels" instead of "transcription". There is no measurement of transcription rates in this manuscript, only mRNA levels.

      Modified accordingly.

      (2) Line 41: the authors mention "constant light". Does this refer to previous work? Their data in Figure 4 were in constant darkness, not in LL.

      Modified accordingly.

      (3) Line 46 and throughout the manuscript, the allelic nomenclature is not standard. 1-/- seems to indicate there are two different alleles. Since the allele might not be a null, I would suggest simply using 1/1, or perhaps delta/delta since the mutation results in a truncates CLK.

      NvClk1-/- became NvClkΔ/Δ. Except in the .xls supplementary table were the mutant kept the NvClk-/- nomenclature. It is not possible to replace only part of a word with a different font, here generating delta sign would require to do it one by one.

      (4) The last sentence of the abstract needs to be rephrased, as it suggests that CLK evolved to maintain circadian rhythms under constant conditions. Constant conditions very rarely exist on Earth, and thus cannot be an evolutionary driving force. Different explanations have been proposed on why a self-sustained clock is the evolutionary solution to timekeeping, but the purpose of the clock and of clock genes is not to maintain oscillations in constant conditions. Actually, this sentence conflicts with the title.

      Modified to: the Clock gene has evolved in cnidarians to sustain 24-hour rhythmic physiology and behavior in absence of diel environmental conditions. From my actual understanding, you are right, the purpose of clock gene is not to maintain oscillation in constant conditions (this is simply the result of the experiment), but to synchronize the physiology to the day/night rhythm, and surely to sustain 24h oscillations in case the environment challenges the perception of the diel cues. The DD or LL is just an artificial experimental design to reveal the endogenous time-keeping pacemaker.

      Results:

      (1) Line 148 and elsewhere in the MS: I would not use the word "lower" or "higher" to qualify acrophases. I would suggest advanced/delayed or earlier/later.

      Modified accordingly.

      (2) Line 157-9: The introductory sentence does not clearly present the rationale for the 6/6 experiments.

      We modified the paragraph accordingly: The presence of a 24-hour rhythm of NvClkΔ/Δ polyps under LD conditions could be attributed to either a direct light-response or the partial functioning of the circadian clock due to the nature of the mutation….

      (3) At the end of the behavior section, or perhaps at the end of each paragraph in this section, it would be helpful to have a summary of the results and more clearly explain their interpretation. The authors need to guide the readers, particularly non-chronobiologist, so that they can understand what the really neat data that were obtained mean. For example, what does it mean that the acrophase is different between mutant and wild-type, why are Clk mutants rhythmic under LD12/12 or 6/6, etc.

      We added a conclusion sentence to help non-specialist to understand each result.

      (4) Line 172 and elsewhere" "true rhythmic genes" sounds odd to me. Either they are, or they are not rhythmic.

      Modified to “rhythmic genes.”

      (5) Paragraph starting with line 184: I do not follow what is important about the number of genes per time cluster. What does it tell us, beyond the simple fact that less genes are rhythmic in the Clk mutants?

      We rewrote the result paragraph to make it clearer why we performed this clustering analysis. This clustering analysis became Extended Data Fig.2 with modification of the figures (see my comments in your review about Figure 3).

      (6) Line 197: The authors need to explain what they saw with circadian clock genes and their expression in CLk mutants. In some case, amplitude increased in LD. This surprising observation deserves some explanations. "Complex regulatory effect" is too vague.

      We replaced the vague “complex regulatory effect” by a more thorough description of the figure 3.a.

      (7) Line 198-203: Again, help the reader understand the significance of these observations.

      We rewrote the paragraph to help the reader to better understand the significance of these observations.

      Discussion:

      (1) Line 236-40. Careful with the use of -/-, which implies that an allele is a null. The first CLk mutants in mammals and flies, which the authors refer to. were actually dominant negatives.

      I went over the citations we used for this paragraph and this first mutation in fly dClkar is null, no dominant negative. Flies are still rhythmic in the dark. Unless there is an older mutation? However, you right the first mutation identified in mouse was a dominant-negative with loss of rhythmicity, while the gene deletion did not show any effect on the behavior, suggesting compensation by a paralog. I removed two references which were not relevant to the discussion.

      (2) Line 265-268 are not very clear. Do the authors mean that the lack of overlap for non-cricadian pacemaker genes is because of different experimental conditions? What would be those differences? It is reassuring that the Leach/Reitzel study and the present share pacemaker genes as rhythmic, but it is also surprising that there is almost no overlap beyond these genes. How robust are those other rhythms compared to circadian clock genes?

      We revised this paragraph and raised major points regarding the raising condition of our polyps between labs and their potential genetic differences which could explain these differences.

      (3) Line 270. I am not sure "compensation" is the right word, since there is no overlap between the rhythmic genes in mutants under LD and wild-type under either LD or DD. Also, saying on line 273 that the transcriptional pattern is not fully reproduced is a rather striking understatement, given the absence of rhythm gene overlap

      We rewrote the paragraph accordingly. We replaced by “alternative way to drive rhythmicity under LD condition”.

      (4) Line 279. The authors mention the possibility of false positives. Based on the FDR, is there more rhythmic genes than by chance?

      The possibility of false-positive is a risk to consider when you do not perform multiple-testing. We added within the results paragraph the number of rhythmic genes identified with BH.Q or p.adj. which both are the multiple testing for each algorithm (RAIN and JTK) we used.

      (5) Line 279-82. The references to the Ray study is rather obscure. What is the point the authors are trying to make here?

      Eventually, we removed the reference from this article and modify the paragraph of the discussion. Indeed, the discussion around the Ray study did not gave an interesting direction to discuss our results and analysis approach.

      (6) Line 284: define BHQ and p.adj

      Defined and referenced.

      (7) The way Lines 283-288 are worded do not provide a good rationale for how transcriptional rhythms were analyzed. The idea to combine two different approaches (JTK and RAIN) to be selective with rhythmicity was great. The authors need however to justify these choices in a more convincing manner. The goal is to detect rhythmic genes in a reliable manner, irrespective of the number of rhythmic genes observed Also, explaining the choice of methodology belongs to the result section.

      We explained our choice of methodology and moved it to the result section as suggested.

      (8) Line 292-3. There are known mechanisms that explain how transcriptional time clusters are generated. In particular, the use of interlocked feedback loop with antiphase peaks of transcriptions is well documented. Actually, it seems to me the clustering shown in Fig 4 might hint at such a mechanism.

      Indeed you are right the clustering shown in Fig 3 (former Fig 4) revealed such mechanism.

      Figures:

      Figure 2: Define relative amplitude

      We added the definition of the relative amplitude within the results. If this is what you asked for?

      Figure 3: Some of the cycles look odd (first row of graphs in panel C). Why would the first and last data point be so different in three of these graphs?

      We decided to modify this figure as we realized it was not informative and not objective enough, as we selected among multiple patterns few “representatives”. In the new figure we combined the cluster analysis to the behavior. Thus, readers can now pick a cluster according to a specific behavior activity level (or ZT/CT) and reach in supp. Table 4 the “genes of potential interest”. However generally speaking this figure does not explain more the consequences of the mutation, so we moved it into the Extended data Fig.2

      Figure4: define the color coding in the correlation panels (blue to red)

      These values from -1 to 1 are the Pearson correlation values. Now indicated on the figure with the color coding legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable contribution to cardiac arrhythmia research by demonstrating long noncoding RNA Dachshund homolog 1 (lncDACH1) tunes sodium channel functional expression and affects cardiac action potential conduction and rhythms. Whereas the evidence for functional impact of lncDACH1 expression on cardiac sodium currents and rhythms is convincing, biochemical experiments addressing the mechanism of changes in sodium channel expression and subcellular localization are incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors show that a long-non coding RNA lncDACH1 inhibits sodium currents in cardiomyocytes by binding to and altering the localization of dystrophin. The authors use a number of methodologies to demonstrate that lncDACH1 binds to dystrophin and disrupts its localization to the membrane, which in turn downregulates NaV1.5 currents. Knockdown of lncDACH1 upregulates NaV1.5 currents. Furthermore, in heart failure, lncDACH1 is shown to be upregulated which suggests that this mechanism may have pathophysiolgoical relevance.

      Strengths:

      (1) This study presents a novel mechanism of Na channel regulation which may be pathophysiologically important.

      (2) The experiments are comprehensive and systematically evaluate the physiological importance of lncDACH1.

      Weaknesses:

      (1). What is indicated by the cytoplasmic level of NaV1.5, a transmembrane protein? The methods do not provide details regarding how this was determined. Do you authors means NaV1.5 retained in various intracellular organelles?

      Thank you for the good suggestion. Our study showed that Nav1.5 was transferred to the cell membrane by the scaffold protein Dystropin in response to the regulation of LncDACH1, but not all Nav1.5 in the cytoplasm was transferred to the cell membrane. Therefore, the cytoplasmic level of Nav1.5 represents the Nav1.5 protein that is not transferred to the cell membrane but stays in the cytoplasm and various organelles within the cytoplasm when Nav1.5 is regulated by LncDACH1

      (2) What is the negative control in Fig. 2b, Fig. 4b, Fig. 6e, Fig. 7c? The maximum current amplitude in these seem quite different. -40 pA/pF in some, -30 pA/pF in others and this value seems to be different than in CMs from WT mice (<-20 pA/pF). Is there an explanation for what causes this variability between experiments and/or increase with transfection of the negative control? This is important since the effect of lncDACH1 is less than 50% reduction and these could fall in the range depending on the amplitude of the negative control.

      Thank you for the insightful comment. The negative control in Fig. 2b, Fig. 4b, Fig. 6e are primary cardiomyocytes transfected with empty plasmids. The negative control in Fig.7c are cardiomyocytes of wild-type mice injected with control virus. When we prepare cells before the patch-clamp experiments, the transfection efficiency of the transfection reagent used in different batches of cells, as well as the different cell sizes, ultimately lead to differences in CMS.

      (3) NaV1.5 staining in Fig. 1E is difficult to visualize and to separate from lncDACH1. Is it possible to pseudocolor differently so that all three channels can be visualized/distinguished more robustly?

      Thank you for the good suggestion. We have re-added color to the original image to distinguish between the three channels.

      Author response image 1.

      (4) The authors use shRNA to knockdown lncDACH1 levels. It would be helpful to have a scrambled ShRNA control.

      Thank you for the insightful comment. The control group we used was actually the scrambled shRNA, but we labeled the control group as NC in the article, maybe this has caused you to misunderstand.

      (5) Is there any measurement on the baseline levels of LncDACH1 in wild-type mice? It seems quite low and yet is a substantial increase in NaV1.5 currents upon knocking down LncDACH1. By comparison, the level of LncDACH1 seems to be massively upregulated in TAC models. Have the authors measured NaV1.5 currents in these cells? Furthermore, does LncDACH1 knockdown evoke a larger increase in NaV1.5 currents?

      Thank you for the insightful comment.

      (1).The baseline protein levels of LncDACH1 in wild-type mice and LncDACH1-CKO mice has been verified in a previously published article(Figure 3).(Hypertension. 2019;74:00-00. DOI: 10.1161/HYPERTENSIONAHA.119.12998.)

      Author response image 2.

      (2). We did not measure the Nav1.5 currents in cardiomyocytes of the TAC model mice in this artical, but in another published paper, we found that the Nav1.5 current in the TAC model mice was remarkably reduced than that in wild-type mice(Figure 4).(Gene Ther. 2023 Feb;30(1-2):142-149. DOI: 10.1038/s41434-022-00348-z)

      Author response image 3.

      This is consistent with our results in this artical, and our results show that LncDACH1 levels are significantly upregulated in the TAC model, then in the LncDACH1-TG group, the Nav1.5 current is significantly reduced after the LncDACH1 upregulation(Figure 3).

      Author response image 4.

      (6) What do error bars denote in all bar graphs, and also in the current voltage relationships?

      Thank you for the good comment. All the error bars represent the mean ± SEM. They represent the fluctuation of all individuals of a set of data based on the average value of this set of data, that is, the dispersion of a set of data.

      Reviewer #2 (Public Review):

      This manuscript by Xue et al. describes the effects of a long noncoding RNA, lncDACH1, on the localization of Nav channel expression, the magnitude of INa, and arrhythmia susceptibility in the mouse heart. Because lncDACH1 was previously reported to bind and disrupt membrane expression of dystrophin, which in turn is required for proper Nav1.5 localization, much of the findings are inferred through the lens of dystrophin alterations.

      The results report that cardiomyocyte-specific transgenic overexpression of lncDACH1 reduces INa in isolated cardiomyocytes; measurements in whole heart show a corresponding reduction in conduction velocity and enhanced susceptibility to arrhythmia. The effect on INa was confirmed in isolated WT mouse cardiomyocytes infected with a lncDACH1 adenoviral construct. Importantly, reducing lncDACH1 expression via either a cardiomyocyte-specific knockout or using shRNA had the opposite effect: INa was increased in isolated cells, as was conduction velocity in heart. Experiments were also conducted with a fragment of lnDACH1 identified by its conservation with other mammalian species. Overexpression of this fragment resulted in reduced INa and greater proarrhythmic behavior. Alteration of expression was confirmed by qPCR.

      The mechanism by which lnDACH1 exerts its effects on INa was explored by measuring protein levels from cell fractions and immunofluorescence localization in cells. In general, overexpression was reported to reduce Nav1.5 and dystrophin levels and knockout or knockdown increased them.

      Thank you for summarizing our work and thank you very much for your appreciation on our work.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report the first evidence of Nav1.5 regulation by a long noncoding RNA, LncRNA-DACH1, and suggest its implication in the reduction in sodium current observed in heart failure. Since no direct interaction is observed between Nav1.5 and the LncRNA, they propose that the regulation is via dystrophin and targeting of Nav1.5 to the plasma membrane.

      Strengths:

      (1) First evidence of Nav1.5 regulation by a long noncoding RNA.

      (2) Implication of LncRNA-DACH1 in heart failure and mechanisms of arrhythmias.

      (3) Demonstration of LncRNA-DACH1 binding to dystrophin.

      (4) Potential rescuing of dystrophin and Nav1.5 strategy.

      Thank you very much for your appreciation on our work.

      Weaknesses:

      (1) Main concern is that the authors do not provide evidence of how LncRNA-DACH1 regulates Nav1.5 protein level. The decrease in total Nav1.5 protein by about 50% seems to be the main consequence of the LncRNA on Nav1.5, but no mechanistic information is provided as to how this occurs.

      Thank you for the insightful comment.

      (1) The mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5, Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      And we performed pulldown and RNA immunoprecipitation experiments to verify it (Figure 1).

      Author response image 5.

      2) Then we found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 6.

      3). Lastly,we found that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1).

      Author response image 7.

      These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.Cytoplasmic Nav1.5 that failed to target on plasma membrane may be quickly distinguished and then degraded by these ubiquitination enzymes.

      (2) The fact that the total Nav1.5 protein is reduced by 50% which is similar to the reduction in the membrane reduction questions the main conclusion of the authors implicating dystrophin in the reduced Nav1.5 targeting. The reduction in membrane Nav1.5 could simply be due to the reduction in total protein.

      Thank you for the insightful comment. We do not rule out the possibility that the reduction in membrane Nav1.5 maybe be due to the reduction in total protein, but we don't think this is the main mechanism. Our data indicates that the membrane and total protein levels of Nav1.5 were reduced by 50%. However, the cytoplasmic Nav1.5 increased in the hearts of lncDACH1-TG mice than WT controls rather than reduced like membrane and total protein(Figure 1).

      Author response image 8.

      Therefore, we think the mian mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig. 6E the error bars are only in one direction for cF-lncDACH1. It seems that this error overlaps for NC and cF-lncDACH1 at several voltages, yet it is marked as statistically significant. Also in Fig. 7C, what statistical test was used? Do the authors account for multiple comparisons?

      Thank you for the insightful comment.

      (1) We have recalculated the two sets of data and confirmed that there are indeed statistically significant between the two sets of data for NC and cF-lncDACH1 at In Fig. 6E, The overlaps in the picture may only be visually apparent.

      (2) The data in Fig. 7C are expressed as mean ± SEM. Statistical analysis was performed using unpaired Student’s t test or One-Way Analysis of Variance (ANOVA) followed by Tukey’s post-hoc analysis.

      (2) line 57, "The Western blot" remove "The"

      Sorry for the mistake. We have corrected it.

      (3) line 61, "The opposite data were collected" It is unclear what is meant by opposite.

      Sorry for the mistake. We have corrected it.

      (4) Lines 137-140. This sentence is complex, I would simplify as two sentences.

      Sorry for the mistake. We have corrected it.

      (5) Line 150, "We firstly validated" should be "we first validated"

      Sorry for the mistake. We have corrected it.

      (6) Line 181, "Consistently, the membrane" Is this statement meant to indicate that the experiments yielded a consistent results or that this statement is consistent with the previous one? In either case, this sentence should be reworded for clarification.

      Sorry for the mistake. We have corrected it.

      (7) Line 223, "In consistent, the ex vivo" I am not sure what In consistent means here.

      Thank you for the good suggestion. We mean that the results of ex vivo is consistent with the results of in vivo. We have corrected it to make it clearer.

      (8) Line 285. "a bunch of studies" could be rephrased as "multiple studies"

      Sorry for the mistake. We have corrected it.

      (9) Line 299 "produced no influence" Do you mean produced no change?

      Thank you for the good suggestion.As you put it,we mean it produced no change.

      (10) Line 325 "is to interact with the molecules" no need for "the molecules

      Sorry for the mistake. We have corrected it.

      (11) lines 332-335. This sentence is very confusing.

      Thank you for the insightful comment. We have corrected it.

      (12) Lines 341-342. It is unnecessary to claim primacy here.

      Thank you for the good suggestion. We have removed this sentence.

      (13) Line 373. "Sodium channel remodeling is commonly occured in" perhaps rephrase as occurs commonly

      Thank you for the insightful comment. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      Critique

      (1) Aside from some issues with presentation noted below, these data provide convincing evidence of a link between lncDACH1 and Na channel function. The identification of a lncDACH1 segment conserved among mammalian species is compelling. The observation that lncDACH1 is increased in a heart failure model and provides a plausible hypothesis for disease mechanism.

      Thank you very much for your appreciation on our work.

      (2) Has a causal link between dystrophin and Na channel surface expression has been made, or is it an argument based on correlation? Is it possible to rule out a direct effect of lncDACH1 on Na channel expression? A bit more discussion of the limitations of the study would help here.

      Thank you for the insightful comment.

      (1). Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      Author response image 9.

      (2).we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1 (Online Supplementary Figure 11). These data indicated that lncDACH does not interact with Nav1.5 directly. ( Supplementary Fig. 1)

      Author response image 10.

      (3) What normalization procedures were used for qPCR quantification? I could not find these.

      Thank you for the good suggestion.The expression levels of mRNA were calculated using the comparative cycle threshold (Ct) method (2−ΔΔCt). Each data point was then normalized to ACTIN as an internal control in each sample. The final results are expressed as fold changes by normalizing the data to the values from control subjects. We have added the normalization procedures in the methods section of the article.

      (4) In general, I found the IF to be unconvincing - first, because the reported effects were not very apparent to me, but more importantly, because only exemplars were shown without quantification of a larger sample size.

      Thank you for the good suggestion. Accordingly, we quantified the immunostaining data. The data have been included in Supplementary Figure 2- 16.The sample size is labeled in the caption.

      Author response image 11.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-TG mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1. N=10. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 12.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocyte overexpressing lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=9. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9 for dys. N=12 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 13.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-cKO mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=12 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1 expression. N=8. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 14.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes after knocking down of lncDACH1. a,b, Distribution of membrane levels of dystrophin and Nav1.5. N=11 for dys. N=8 for Nav1.5.P<0.05 versus NC group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12 for dys. N=9 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 15.

      Fluorescence intensity of dystrophin and Nav1.5 in isolated cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=7 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=6 for dys. N=7 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 16.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=10 for dys. N=11 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=7 for dys. N=6 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 17.

      Fluorescence intensity of Nav1.5 in human iPS differentiated cardiomyocytes overexpressing cF-lncDACH1. a, Membrane levels of Nav1.5. N=8 for Nav1.5. P<0.05 versus NC group. b, Cytoplasm levels of Nav1.5. N=10 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      (5) More information on how the fractionation kit works would be helpful. How are membrane v. cytoplasm fractions identified?

      a. I presume the ER is part of the membrane fraction? When Nav1.5 is found in the cytoplasmic fraction, what subcompartment is it in - the proteasome?

      b. In the middle panel of A - is the dystrophin signal visible on the WB for WT? I assume the selected exemplar is the best of the blots and so this raises concerns. Much is riding on the confidence with which the fractions report "membrane" v "cytoplasm."

      Thank you for the insightful comment.

      (1). How the fractionation kit works:

      The kit utilizes centrifuge column technology to obtain plasma membrane structures with native activity and minimal cross-contamination with organelles without the need for an ultracentrifuge and can be used for a variety of downstream assays. Separation principle: cells/tissues are sensitized by Buffer A, the cells pass through the centrifuge column under the action of 16000Xg centrifugation, the cell membrane is cut to make the cell rupture, and then the four components of nucleus, cytoplasm, organelle and plasma membrane will be obtained sequentially through differential centrifugation and density centrifugation, which can be used for downstream detection.

      Author response image 18.

      (2). How are membrane v. cytoplasm fractions identified:

      The membrane proteins and cytosolic proteins isolated by the kit, and then the internal controls we chose when performing the western blot experiment were :membrane protein---N-cadherin cytosolic protein---β-Actin

      Most importantly, when we incubate either the primary antibody of N-cadherin with the PVDF membrane of the cytosolic protein, or the primary antibody of the cytosolic control β-Actin with the PVDF membrane of the membrane protein, the protein bands cannot be obtained in the scan results

      Author response image 19.

      (6) More detail in Results, figures, and figure legends will assist the reader.

      a. In Fig. 5, it would be helpful to label sinus rhythm vs. arrhythmia segments.

      Thank you for the good suggestion. We've marked Sinus Rhythm and Arrhythmia segments with arrows

      Author response image 20.

      b. Please explain in the figure legend what the red bars in 5A are

      Thank you for the insightful comment. We've added the explanation to the figure legend .The red lines in the ECG traces indicate VT duration.

      c. In 5C, what the durations pertain to.

      Thank you for the good suggestion. 720ms-760ms refers to the duration of one action potential, with 720ms being the peak of one action potential and 760ms being the peak of another action potential.The interval duration is not fixed, in this artical, we use 10ms as an interval to count the phase singularities from the Consecutive phase maps. Because the shorter the interval duration, the larger the sample size and the more convincing the data.

      d. In the text, please define "breaking points" and explain what the physiological underpinning is. Define "phase singularity."

      Thank you for the insightful comment. Cardiac excitation can be viewed as an electrical wave, with a wavefront corresponding to the action potential upstroke (phase 0) and a waveback corresponding to rapid repolarization (phase 3). Normally, Under normal circumstances, cardiac conduction is composed of a sequence of well-ordered action potentials, and in the results of optical mapping experiments, different colors represent different phases.when a wave propagates through cardiac tissue, wavefront and waveback never touch.when arrhythmias occur in the heart, due to factors such as reenfrant phenomenon, the activation contour will meet the refractory contour and waves will break up, initiating a newly spiral reentry. Corresponding to the optical mapping result graph, different colors representing different time phases (including depolarization and repolarization) come together to form a vortex, and the center of the vortex is defined as the phase singularity.

      (7) In reflecting on why enhanced INa is not proarrhythmic, it is noted that the kinetics are not altered. I agree that is key, but perhaps the consequence could be better articulated. Because lncDACH1 does not alter Nav1.5 gating, the late Na current may not be enhanced to the same effect as observed with LQT gain-of-function Nav1.5 mutations, in which APD prolongation is attributed to gating defects that increase late Na current.

      Thank you for the good suggestion. Your explanation is very brilliant and important for this article. We have revised the discussion section of the article and added these explanations to it.

      Reviewer #3 (Recommendations For The Authors):

      (1) Experiments to specifically address the reduction in total Nav1.5 protein should be included.

      Thank you for the insightful comment. We examined the ubiquitination of Nav1.5. We found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 21.

      (2) Experiments to convincingly demonstrate that LncRNA-DACH1 regulates Nav1.5 targeting via dystrophin are missing. As it is, total reduction in Nav1.5 seems to be the explanation as to why there is a decrease in membrane Nav1.5.

      Thank you for the insightful comment. we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 can pulldown dystrophin(Figure 1),but failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1). These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.

      Author response image 22.